new heapcheck contrib module
Hackers,
I have been talking with Robert about table corruption that occurs from time to time. The page checksum feature seems sufficient to detect most random corruption problems, but it can't detect "logical" corruption, where the page is valid but inconsistent with the rest of the database cluster. This can happen due to faulty or ill-conceived backup and restore tools, or bad storage, or user error, or bugs in the server itself. (Also, not everyone enables checksums.)
The attached module provides the means to scan a relation and sanity check it. Currently, it checks xmin and xmax values against relfrozenxid and relminmxid, and also validates TOAST pointers. If people like this, it could be expanded to perform additional checks.
There was a prior v1 patch, discussed offlist with Robert, not posted. Here is v2:
Attachments:
v2-0001-Adding-heapcheck-contrib-module.patchapplication/octet-stream; name=v2-0001-Adding-heapcheck-contrib-module.patch; x-unix-mode=0644Download
From 2a1bc0bb9fa94bd929adc1a408900cb925ebcdd5 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Apr 2020 08:05:58 -0700
Subject: [PATCH v2] Adding heapcheck contrib module.
The heapcheck module introduces a new function for checking a heap
relation and associated toast relation, if any, for corruption.
The postgres backend already defends against certain forms of
corruption, by checking the page header of each page before allowing
it into the page cache, and by checking the page checksum, if enabled.
Experience shows that broken or ill-conceived backup and restore
mechanisms can result in a page, or an entire file, being overwritten
with an earlier version of itself, restored from backup. Pages thus
overwritten will appear to have valid page headers and checksums,
while potentially containing xmin, xmax, and toast pointers that are
invalid.
contrib/heapcheck introduces a function, heapcheck_relation, that
takes a regclass argument, scans the given heap relation, and returns
rows containing information about corruption found within the table.
The main focus of the scan is to find invalid xmin, xmax, and toast
pointer values. It also checks for structural corruption within the
page (such as invalid t_hoff values) that could lead to the backend
aborting should the function blindly trust the data as it finds it.
---
contrib/Makefile | 1 +
contrib/heapcheck/.gitignore | 4 +
contrib/heapcheck/Makefile | 25 +
.../expected/001_create_extension.out | 1 +
.../expected/002_disallowed_reltypes.out | 27 +
contrib/heapcheck/heapcheck--1.0.sql | 21 +
contrib/heapcheck/heapcheck.c | 1167 +++++++++++++++++
contrib/heapcheck/heapcheck.control | 5 +
.../heapcheck/sql/001_create_extension.sql | 1 +
.../heapcheck/sql/002_disallowed_reltypes.sql | 29 +
contrib/heapcheck/t/003_heapcheck_relation.pl | 361 +++++
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/heapcheck.sgml | 133 ++
14 files changed, 1777 insertions(+)
create mode 100644 contrib/heapcheck/.gitignore
create mode 100644 contrib/heapcheck/Makefile
create mode 100644 contrib/heapcheck/expected/001_create_extension.out
create mode 100644 contrib/heapcheck/expected/002_disallowed_reltypes.out
create mode 100644 contrib/heapcheck/heapcheck--1.0.sql
create mode 100644 contrib/heapcheck/heapcheck.c
create mode 100644 contrib/heapcheck/heapcheck.control
create mode 100644 contrib/heapcheck/sql/001_create_extension.sql
create mode 100644 contrib/heapcheck/sql/002_disallowed_reltypes.sql
create mode 100644 contrib/heapcheck/t/003_heapcheck_relation.pl
create mode 100644 doc/src/sgml/heapcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..27ac131526 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -20,6 +20,7 @@ SUBDIRS = \
earthdistance \
file_fdw \
fuzzystrmatch \
+ heapcheck \
hstore \
intagg \
intarray \
diff --git a/contrib/heapcheck/.gitignore b/contrib/heapcheck/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/contrib/heapcheck/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/heapcheck/Makefile b/contrib/heapcheck/Makefile
new file mode 100644
index 0000000000..8d780a41ab
--- /dev/null
+++ b/contrib/heapcheck/Makefile
@@ -0,0 +1,25 @@
+# contrib/heapcheck/Makefile
+
+MODULE_big = heapcheck
+OBJS = \
+ $(WIN32RES) \
+ heapcheck.o
+
+EXTENSION = heapcheck
+DATA = heapcheck--1.0.sql
+PGFILEDESC = "heapcheck - page corruption information"
+
+REGRESS = 001_create_extension 002_disallowed_reltypes
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/heapcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/heapcheck/expected/001_create_extension.out b/contrib/heapcheck/expected/001_create_extension.out
new file mode 100644
index 0000000000..0ca79c22be
--- /dev/null
+++ b/contrib/heapcheck/expected/001_create_extension.out
@@ -0,0 +1 @@
+create extension heapcheck;
diff --git a/contrib/heapcheck/expected/002_disallowed_reltypes.out b/contrib/heapcheck/expected/002_disallowed_reltypes.out
new file mode 100644
index 0000000000..8e0b18dfc3
--- /dev/null
+++ b/contrib/heapcheck/expected/002_disallowed_reltypes.out
@@ -0,0 +1,27 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000)) partition by list (a);
+-- these should all fail
+select * from heapcheck_relation('test_partitioned');
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from heapcheck_relation('test_index');
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from heapcheck_relation('test_view');
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from heapcheck_relation('test_sequence');
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from heapcheck_relation('test_foreign_table');
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/heapcheck/heapcheck--1.0.sql b/contrib/heapcheck/heapcheck--1.0.sql
new file mode 100644
index 0000000000..48251e6781
--- /dev/null
+++ b/contrib/heapcheck/heapcheck--1.0.sql
@@ -0,0 +1,21 @@
+/* contrib/heapcheck/heapcheck--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION heapcheck" to load this file. \quit
+
+-- Show visibility map and page-level visibility information for each block.
+CREATE FUNCTION heapcheck_relation(regclass,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'heapcheck_relation'
+LANGUAGE C STRICT;
+REVOKE ALL ON FUNCTION heapcheck_relation(regclass) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION heapcheck_relation(regclass) TO pg_stat_scan_tables;
diff --git a/contrib/heapcheck/heapcheck.c b/contrib/heapcheck/heapcheck.c
new file mode 100644
index 0000000000..7cd4690f98
--- /dev/null
+++ b/contrib/heapcheck/heapcheck.c
@@ -0,0 +1,1167 @@
+/*-------------------------------------------------------------------------
+ *
+ * heapcheck.c
+ * Functions to check postgresql relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/heapcheck/heapcheck.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(heapcheck_relation);
+
+typedef struct CorruptionInfo
+{
+ BlockNumber blkno;
+ OffsetNumber offnum;
+ int16 lp_off;
+ int16 lp_flags;
+ int16 lp_len;
+ int32 attnum;
+ int32 chunk;
+ char *msg;
+} CorruptionInfo;
+
+typedef struct HeapCheckContext
+{
+ /* Values concerning the heap relation being checked */
+ Oid relid;
+ Relation rel;
+ TupleDesc relDesc;
+ TransactionId relfrozenxid;
+ MultiXactId relminmxid;
+ int rel_natts;
+ bool has_toastrel;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ OffsetNumber maxoff;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ TransactionId xmin;
+ TransactionId xmax;
+ uint16 infomask;
+ int natts;
+ bool hasnulls;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+ char *tp; /* pointer to the tuple data */
+ bits8 *bp; /* ptr to null bitmap in tuple */
+ Form_pg_attribute thisatt;
+
+ /* Values for iterating over toast for the attribute */
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ int32 chunkno;
+ HeapTuple toasttup;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+ TupleDesc toasttupDesc;
+ bool found_toasttup;
+
+ /* List of CorruptionInfo */
+ List *corruption;
+} HeapCheckContext;
+
+/* Public API */
+typedef struct CheckRelCtx
+{
+ List *corruption;
+ int idx;
+} CheckRelCtx;
+
+Datum heapcheck_relation(PG_FUNCTION_ARGS);
+
+/* Internal implementation */
+void record_corruption(HeapCheckContext * ctx, char *msg);
+TupleDesc heapcheck_relation_tupdesc(void);
+
+void beginRelBlockIteration(HeapCheckContext * ctx);
+bool relBlockIteration_next(HeapCheckContext * ctx);
+void endRelBlockIteration(HeapCheckContext * ctx);
+
+void beginPageTupleIteration(HeapCheckContext * ctx);
+bool pageTupleIteration_next(HeapCheckContext * ctx);
+void endPageTupleIteration(HeapCheckContext * ctx);
+
+void beginTupleAttributeIteration(HeapCheckContext * ctx);
+bool tupleAttributeIteration_next(HeapCheckContext * ctx);
+void endTupleAttributeIteration(HeapCheckContext * ctx);
+
+void beginToastTupleIteration(HeapCheckContext * ctx,
+ struct varatt_external *toast_pointer);
+void endToastTupleIteration(HeapCheckContext * ctx);
+bool toastTupleIteration_next(HeapCheckContext * ctx);
+
+bool TransactionIdStillValid(TransactionId xid, FullTransactionId *fxid);
+bool HeapTupleIsVisible(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
+void check_toast_tuple(HeapCheckContext * ctx);
+bool check_tuple_attribute(HeapCheckContext * ctx);
+void check_tuple(HeapCheckContext * ctx);
+
+List *check_relation(Oid relid);
+void check_relation_relkind(Relation rel);
+
+/*
+ * record_corruption
+ *
+ * Record a message about corruption, including information
+ * about where in the relation the corruption was found.
+ */
+void
+record_corruption(HeapCheckContext * ctx, char *msg)
+{
+ CorruptionInfo *info = (CorruptionInfo *) palloc0(sizeof(CorruptionInfo));
+
+ info->blkno = ctx->blkno;
+ info->offnum = ctx->offnum;
+ info->lp_off = ItemIdGetOffset(ctx->itemid);
+ info->lp_flags = ItemIdGetFlags(ctx->itemid);
+ info->lp_len = ItemIdGetLength(ctx->itemid);
+ info->attnum = ctx->attnum;
+ info->chunk = ctx->chunkno;
+ info->msg = msg;
+
+ ctx->corruption = lappend(ctx->corruption, info);
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by heapcheck_relation.
+ */
+TupleDesc
+heapcheck_relation_tupdesc()
+{
+ TupleDesc tupdesc;
+ AttrNumber maxattr = 8;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(maxattr);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == maxattr);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * heapcheck_relation
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+heapcheck_relation(PG_FUNCTION_ARGS)
+{
+ FuncCallContext *funcctx;
+ CheckRelCtx *ctx;
+
+ if (SRF_IS_FIRSTCALL())
+ {
+ Oid relid = PG_GETARG_OID(0);
+ MemoryContext oldcontext;
+
+ /*
+ * Scan the entire relation, building up a list of corruption found in
+ * ctx->corruption, for returning later. The scan must be performed
+ * in a memory context that will survive until after all rows are
+ * returned.
+ */
+ funcctx = SRF_FIRSTCALL_INIT();
+ oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+ funcctx->tuple_desc = heapcheck_relation_tupdesc();
+ ctx = (CheckRelCtx *) palloc0(sizeof(CheckRelCtx));
+ ctx->corruption = check_relation(relid);
+ ctx->idx = 0; /* start the iterator at the beginning */
+ funcctx->user_fctx = (void *) ctx;
+ MemoryContextSwitchTo(oldcontext);
+ }
+
+ funcctx = SRF_PERCALL_SETUP();
+ ctx = (CheckRelCtx *) funcctx->user_fctx;
+
+ /*
+ * Return the next corruption message from the list, if any. Our location
+ * in the list is recorded in ctx->idx. The special value -1 is used in
+ * the list of corruptions to represent NULL; we check for negative
+ * numbers when setting the nulls[] values.
+ */
+ if (ctx->idx < list_length(ctx->corruption))
+ {
+ Datum values[8];
+ bool nulls[8];
+ HeapTuple tuple;
+ CorruptionInfo *info = list_nth(ctx->corruption, ctx->idx);
+
+ MemSet(values, 0, sizeof(nulls));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(info->blkno);
+ values[1] = Int32GetDatum(info->offnum);
+ nulls[1] = (info->offnum < 0);
+ values[2] = Int16GetDatum(info->lp_off);
+ nulls[2] = (info->lp_off < 0);
+ values[3] = Int16GetDatum(info->lp_flags);
+ nulls[3] = (info->lp_flags < 0);
+ values[4] = Int16GetDatum(info->lp_len);
+ nulls[4] = (info->lp_len < 0);
+ values[5] = Int32GetDatum(info->attnum);
+ nulls[5] = (info->attnum < 0);
+ values[6] = Int32GetDatum(info->chunk);
+ nulls[6] = (info->chunk < 0);
+ values[7] = CStringGetTextDatum(info->msg);
+ ctx->idx++;
+
+ tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
+ SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
+ }
+
+ SRF_RETURN_DONE(funcctx);
+}
+
+/*
+ * beginRelBlockIteration
+ *
+ * For the given heap relation being checked, as recorded in ctx, sets up
+ * variables for iterating over the heap's pages.
+ *
+ * The caller should have already opened the heap relation, ctx->rel
+ */
+void
+beginRelBlockIteration(HeapCheckContext * ctx)
+{
+ ctx->nblocks = RelationGetNumberOfBlocks(ctx->rel);
+ ctx->blkno = InvalidBlockNumber;
+ ctx->bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx->buffer = InvalidBuffer;
+ ctx->page = NULL;
+}
+
+/*
+ * endRelBlockIteration
+ *
+ * Releases resources that were reserved by either beginRelBlockIteration or
+ * relBlockIteration_next.
+ */
+void
+endRelBlockIteration(HeapCheckContext * ctx)
+{
+ /*
+ * Clean up. If the caller iterated to the end, the final call to
+ * relBlockIteration_next will already have released the buffer, but if
+ * the caller is bailing out early, we have to release it ourselves.
+ */
+ if (InvalidBuffer != ctx->buffer)
+ UnlockReleaseBuffer(ctx->buffer);
+}
+
+/*
+ * relBlockIteration_next
+ *
+ * Updates the state in ctx to point to the next page in the relation.
+ * Returns true if there is any such page, else false.
+ *
+ * The caller should have already called beginRelBlockIteration, and should
+ * only continue calling until the false result.
+ */
+bool
+relBlockIteration_next(HeapCheckContext * ctx)
+{
+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx->blkno == InvalidBlockNumber || ctx->buffer != InvalidBuffer);
+ if (InvalidBuffer != ctx->buffer)
+ {
+ UnlockReleaseBuffer(ctx->buffer);
+ ctx->buffer = InvalidBuffer;
+ }
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidBlockNumber + 1 == 0,
+ "InvalidBlockNumber increments to zero");
+ ctx->blkno++;
+ if (ctx->blkno >= ctx->nblocks)
+ return false;
+
+ /* Read and lock the next page. */
+ ctx->buffer = ReadBufferExtended(ctx->rel, MAIN_FORKNUM, ctx->blkno,
+ RBM_NORMAL, ctx->bstrategy);
+ LockBuffer(ctx->buffer, BUFFER_LOCK_SHARE);
+ ctx->page = BufferGetPage(ctx->buffer);
+
+ return true;
+}
+
+/*
+ * beginPageTupleIteration
+ *
+ * For the given page begin visited, as stored in ctx, sets up variables for
+ * iterating over the tuples on the page.
+ */
+void
+beginPageTupleIteration(HeapCheckContext * ctx)
+{
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx->offnum = InvalidOffsetNumber;
+ ctx->maxoff = PageGetMaxOffsetNumber(ctx->page);
+ ctx->itemid = NULL;
+ ctx->lp_len = 0;
+ ctx->tuphdr = NULL;
+ ctx->xmin = InvalidOid;
+ ctx->xmax = InvalidOid;
+ ctx->infomask = 0;
+ ctx->natts = 0;
+ ctx->hasnulls = false;
+}
+
+/*
+ * endPageTupleIteration
+ *
+ * Releases resources taken by beginPageTupleIteration or
+ * pageTupleIteration_next.
+ */
+void
+endPageTupleIteration(HeapCheckContext * ctx)
+{
+ /* Abuse beginPageTupleIteration to reset the tuple iteration variables */
+ beginPageTupleIteration(ctx);
+}
+
+/*
+ * pageTupleIteration_next
+ *
+ * Advances the state tracked in ctx to the next tuple on the page.
+ *
+ * Caller should have already set up the iteration via
+ * beginPageTupleIteration, and should stop calling when this function
+ * returns false.
+ */
+bool
+pageTupleIteration_next(HeapCheckContext * ctx)
+{
+ /*
+ * Iterate to the next interesting line pointer, if any. Unused, dead and
+ * redirect line pointers are of no interest.
+ */
+ do
+ {
+ ctx->offnum = OffsetNumberNext(ctx->offnum);
+ if (ctx->offnum > ctx->maxoff)
+ return false;
+ ctx->itemid = PageGetItemId(ctx->page, ctx->offnum);
+ } while (!ItemIdIsUsed(ctx->itemid) ||
+ ItemIdIsDead(ctx->itemid) ||
+ ItemIdIsRedirected(ctx->itemid));
+
+ /* Set up context information about this next tuple */
+ ctx->lp_len = ItemIdGetLength(ctx->itemid);
+ ctx->tuphdr = (HeapTupleHeader) PageGetItem(ctx->page, ctx->itemid);
+ ctx->xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ ctx->xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ ctx->infomask = ctx->tuphdr->t_infomask;
+ ctx->natts = HeapTupleHeaderGetNatts(ctx->tuphdr);
+ ctx->hasnulls = ctx->infomask & HEAP_HASNULL;
+
+ /*
+ * Reset information about individual attributes and related toast values,
+ * so they show as NULL in the corruption report if we record a corruption
+ * before beginning to iterate over the attributes.
+ */
+ ctx->attnum = -1;
+ ctx->chunkno = -1;
+
+ return true;
+}
+
+/*
+ * beginTupleAttributeIteration
+ *
+ * For the given tuple begin visited, as stored in ctx, sets up variables for
+ * iterating over the attributes in the tuple.
+ */
+void
+beginTupleAttributeIteration(HeapCheckContext * ctx)
+{
+ ctx->offset = 0;
+ ctx->attnum = -1;
+ ctx->tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+ ctx->bp = ctx->tuphdr->t_bits;
+}
+
+/*
+ * tupleAttributeIteration_next
+ *
+ * Advances the state tracked in ctx to the next attribute in the tuple.
+ *
+ * Caller should have already set up the iteration via
+ * beginTupleAttributeIteration, and should stop calling when this function
+ * returns false.
+ */
+bool
+tupleAttributeIteration_next(HeapCheckContext * ctx)
+{
+ ctx->attnum++;
+ if (ctx->attnum >= ctx->natts)
+ return false;
+ ctx->thisatt = TupleDescAttr(ctx->relDesc, ctx->attnum);
+ return true;
+}
+
+/*
+ * endTupleAttributeIteration
+ *
+ * Resets state tracked in ctx after iteration over attributes ends.
+ */
+void
+endTupleAttributeIteration(HeapCheckContext * ctx)
+{
+ ctx->offset = -1;
+ ctx->attnum = -1;
+}
+
+/*
+ * beginToastTupleIteration
+ *
+ * For the given attribute begin visited, as stored in ctx, sets up variables for
+ * iterating over the related toast value.
+ */
+void
+beginToastTupleIteration(HeapCheckContext * ctx,
+ struct varatt_external *toast_pointer)
+{
+ ctx->toasttupDesc = ctx->toastrel->rd_att;
+ ctx->found_toasttup = false;
+
+ ctx->attrsize = toast_pointer->va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&ctx->toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer->va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&ctx->SnapshotToast);
+ ctx->toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &ctx->SnapshotToast, 1,
+ &ctx->toastkey);
+ ctx->chunkno = 0;
+}
+
+/*
+ * toastTupleIteration_next
+ *
+ * Advances the state tracked in ctx to the next toast tuple for the
+ * attribute.
+ *
+ * Caller should have already set up the iteration via
+ * beginToastTupleIteration, and should stop calling when this function
+ * returns false.
+ */
+bool
+toastTupleIteration_next(HeapCheckContext * ctx)
+{
+ ctx->toasttup = systable_getnext_ordered(ctx->toastscan,
+ ForwardScanDirection);
+ return ctx->toasttup != NULL;
+}
+
+/*
+ * endToastTupleIteration
+ *
+ * Releases resources taken by beginToastTupleIteration or
+ * toastTupleIteration_next.
+ */
+void
+endToastTupleIteration(HeapCheckContext * ctx)
+{
+ systable_endscan_ordered(ctx->toastscan);
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+bool
+TransactionIdStillValid(TransactionId xid, FullTransactionId *fxid)
+{
+ FullTransactionId fnow;
+ uint32 epoch;
+
+ /* Initialize fxid; we'll overwrite this later if needed */
+ *fxid = FullTransactionIdFromEpochAndXid(0, xid);
+
+ /* Special xids can quickly be turned into invalid fxids */
+ if (!TransactionIdIsValid(xid))
+ return false;
+ if (!TransactionIdIsNormal(xid))
+ return true;
+
+ /*
+ * Charitably infer the full transaction id as being within one epoch ago
+ */
+ fnow = ReadNextFullTransactionId();
+ epoch = EpochFromFullTransactionId(fnow);
+ *fxid = FullTransactionIdFromEpochAndXid(epoch, xid);
+ if (!FullTransactionIdPrecedes(*fxid, fnow))
+ *fxid = FullTransactionIdFromEpochAndXid(epoch - 1, xid);
+ if (!FullTransactionIdPrecedes(*fxid, fnow))
+ return false;
+
+ /* The oldestClogXid is protected by CLogTruncationLock */
+ Assert(LWLockHeldByMe(CLogTruncationLock));
+ if (TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid))
+ return false;
+ return true;
+}
+
+/*
+ * HeapTupleIsVisible
+ *
+ * Determine whether tuples are visible for heapcheck. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Gracefully handles xids that are too old by calling
+ * TransactionIdStillValid before TransactionLogFetch, thus avoiding
+ * a backend abort.
+ * 3) Only makes a boolean determination of whether heapcheck should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ */
+bool
+HeapTupleIsVisible(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
+{
+ FullTransactionId fxmin,
+ fxmax;
+ uint16 infomask = tuphdr->t_infomask;
+ TransactionId xmin = HeapTupleHeaderGetXmin(tuphdr);
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ if (!TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetRawXmin(tuphdr)))
+ return false; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmin(tuphdr)))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+
+ /*
+ * The tuple appears to either be or to have been visible to us, but
+ * the xmin may be too far in the past to be used. We have to check
+ * that before calling TransactionIdDidCommit to avoid an Assertion.
+ */
+ LWLockAcquire(CLogTruncationLock, LW_SHARED);
+ if (!TransactionIdStillValid(xmin, &fxmin))
+ {
+ LWLockRelease(CLogTruncationLock);
+ record_corruption(ctx, psprintf("tuple xmin = %u (interpreted as "
+ UINT64_FORMAT
+ ") not or no longer valid",
+ xmin, fxmin.value));
+ return false;
+ }
+ else if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuphdr)))
+ {
+ LWLockRelease(CLogTruncationLock);
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ LWLockRelease(CLogTruncationLock);
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ record_corruption(ctx, _("heap tuple with XMAX_IS_MULTI is "
+ "neither LOCKED_ONLY nor has a "
+ "valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ LWLockAcquire(CLogTruncationLock, LW_SHARED);
+ if (!TransactionIdStillValid(xmax, &fxmax))
+ {
+ LWLockRelease(CLogTruncationLock);
+ record_corruption(ctx, psprintf("tuple xmax = %u (interpreted "
+ "as " UINT64_FORMAT
+ ") not or no longer valid",
+ xmax, fxmax.value));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xmax))
+ {
+ LWLockRelease(CLogTruncationLock);
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ LWLockRelease(CLogTruncationLock);
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * The caller should have iterated to a tuple via toastTupleIteration_next.
+ */
+void
+check_toast_tuple(HeapCheckContext * ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ ctx->found_toasttup = true;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(ctx->toasttup, 2,
+ ctx->toasttupDesc, &isnull));
+ if (isnull)
+ {
+ record_corruption(ctx, _("toast chunk sequencenumber is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(ctx->toasttup, 3,
+ ctx->toasttupDesc, &isnull));
+ if (isnull)
+ {
+ record_corruption(ctx, _("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ record_corruption(ctx, _("toast chunk is neither short nor extended"));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ record_corruption(ctx, psprintf("toast chunk sequence number %u "
+ "not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ record_corruption(ctx, psprintf("toast chunk sequence number %u "
+ "exceeds the end chunk sequence "
+ "number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ record_corruption(ctx, psprintf("chunk size %u differs from "
+ "expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * The caller should have iterated to a tuple via
+ * tupleAttributeIteration_next.
+ */
+bool
+check_tuple_attribute(HeapCheckContext * ctx)
+{
+ Datum attdatum;
+ struct varlena *attr;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ record_corruption(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+ ctx->tuphdr->t_hoff, ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (ctx->hasnulls && att_isnull(ctx->attnum, ctx->bp))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (ctx->thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, ctx->thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, ctx->thisatt->attlen,
+ ctx->tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, ctx->thisatt->attalign, -1,
+ ctx->tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(ctx->thisatt, ctx->tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_1B_E(ctx->tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(ctx->tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ record_corruption(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, ctx->thisatt->attlen,
+ ctx->tp + ctx->offset);
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ record_corruption(ctx,
+ _("attribute is external but not marked as on disk"));
+ return true;
+ }
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(ctx->infomask & HEAP_HASEXTERNAL))
+ {
+ record_corruption(ctx, _("attribute is external but tuple header "
+ "flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->has_toastrel)
+ {
+ record_corruption(ctx, _("attribute is external but relation has "
+ "no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
+
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ attr = (struct varlena *) redirect.pointer;
+
+ /* nested indirect Datums aren't allowed */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ record_corruption(ctx, _("attribute has nested external "
+ "indirect toast pointer"));
+ return true;
+ }
+ }
+
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ struct varatt_external toast_pointer;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+ beginToastTupleIteration(ctx, &toast_pointer);
+
+ while (toastTupleIteration_next(ctx))
+ check_toast_tuple(ctx);
+
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ record_corruption(ctx, psprintf("final chunk number differs from "
+ "expected (%u vs. %u)",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!ctx->found_toasttup)
+ record_corruption(ctx, _("toasted value missing from "
+ "toast table"));
+ endToastTupleIteration(ctx);
+ }
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ *
+ * The caller should have iterated to a tuple via pageTupleIteration_next.
+ */
+void
+check_tuple(HeapCheckContext * ctx)
+{
+ bool fatal = false;
+
+ /* Check relminmxid against mxid, if any */
+ if (ctx->infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(ctx->xmax, ctx->relminmxid))
+ {
+ record_corruption(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relminmxid = %u",
+ ctx->xmax, ctx->relminmxid));
+ }
+
+ /* Check xmin against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(ctx->xmin) &&
+ TransactionIdPrecedes(ctx->xmin, ctx->relfrozenxid))
+ {
+ record_corruption(ctx, psprintf("tuple xmin = %u precedes relation "
+ "relfrozenxid = %u",
+ ctx->xmin, ctx->relfrozenxid));
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(ctx->xmax) &&
+ TransactionIdPrecedes(ctx->xmax, ctx->relfrozenxid))
+ {
+ record_corruption(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relfrozenxid = %u",
+ ctx->xmax, ctx->relfrozenxid));
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ record_corruption(ctx, psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ record_corruption(ctx, psprintf("t_hoff > lp_len (%u > %u)",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ record_corruption(ctx, psprintf("t_hoff not max-aligned (%u)",
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the natts field or the infomask
+ * bit HEAP_HASNULL.
+ */
+ if (ctx->hasnulls &&
+ SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ {
+ record_corruption(ctx, psprintf("SizeofHeapTupleHeader + "
+ "BITMAPLEN(natts) > t_hoff "
+ "(%u + %u > %u)",
+ (unsigned) SizeofHeapTupleHeader,
+ BITMAPLEN(ctx->natts),
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /* Cannot process tuple data if tuple header was corrupt */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!HeapTupleIsVisible(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel_natts, but it cannot be longer than rel_natts.
+ */
+ if (ctx->rel_natts < ctx->natts)
+ {
+ record_corruption(ctx, psprintf("relation natts < tuple natts (%u < %u)",
+ ctx->rel_natts, ctx->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ beginTupleAttributeIteration(ctx);
+ while (tupleAttributeIteration_next(ctx) &&
+ check_tuple_attribute(ctx))
+ ;
+ endTupleAttributeIteration(ctx);
+}
+
+/*
+ * check_relation
+ *
+ * Checks the relation given by relid for corruption, returning a list of all
+ * it finds.
+ *
+ * The caller should set up the memory context as desired before calling.
+ * The returned list belongs to the caller.
+ */
+List *
+check_relation(Oid relid)
+{
+ HeapCheckContext ctx;
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* Open the relation */
+ ctx.relid = relid;
+ ctx.corruption = NIL;
+ ctx.rel = relation_open(relid, AccessShareLock);
+ check_relation_relkind(ctx.rel);
+
+ ctx.relDesc = RelationGetDescr(ctx.rel);
+ ctx.rel_natts = RelationGetDescr(ctx.rel)->natts;
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ /* Open the toast relation, if any */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.has_toastrel = true;
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ AccessShareLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ AccessShareLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.has_toastrel = false;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /* check all blocks of the relation */
+ beginRelBlockIteration(&ctx);
+ while (relBlockIteration_next(&ctx))
+ {
+ /* Perform tuple checks */
+ beginPageTupleIteration(&ctx);
+ while (pageTupleIteration_next(&ctx))
+ check_tuple(&ctx);
+ endPageTupleIteration(&ctx);
+ }
+ endRelBlockIteration(&ctx);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.has_toastrel)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ AccessShareLock);
+ table_close(ctx.toastrel, AccessShareLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, AccessShareLock);
+
+ return ctx.corruption;
+}
+
+/*
+ * check_relation_relkind
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+void
+check_relation_relkind(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, "
+ "or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap AM",
+ RelationGetRelationName(rel))));
+}
diff --git a/contrib/heapcheck/heapcheck.control b/contrib/heapcheck/heapcheck.control
new file mode 100644
index 0000000000..23b076169e
--- /dev/null
+++ b/contrib/heapcheck/heapcheck.control
@@ -0,0 +1,5 @@
+# heapcheck extension
+comment = 'examine relations for corruption'
+default_version = '1.0'
+module_pathname = '$libdir/heapcheck'
+relocatable = true
diff --git a/contrib/heapcheck/sql/001_create_extension.sql b/contrib/heapcheck/sql/001_create_extension.sql
new file mode 100644
index 0000000000..0ca79c22be
--- /dev/null
+++ b/contrib/heapcheck/sql/001_create_extension.sql
@@ -0,0 +1 @@
+create extension heapcheck;
diff --git a/contrib/heapcheck/sql/002_disallowed_reltypes.sql b/contrib/heapcheck/sql/002_disallowed_reltypes.sql
new file mode 100644
index 0000000000..782e2c7039
--- /dev/null
+++ b/contrib/heapcheck/sql/002_disallowed_reltypes.sql
@@ -0,0 +1,29 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000)) partition by list (a);
+-- these should all fail
+select * from heapcheck_relation('test_partitioned');
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from heapcheck_relation('test_index');
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from heapcheck_relation('test_view');
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from heapcheck_relation('test_sequence');
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from heapcheck_relation('test_foreign_table');
+
+
diff --git a/contrib/heapcheck/t/003_heapcheck_relation.pl b/contrib/heapcheck/t/003_heapcheck_relation.pl
new file mode 100644
index 0000000000..8630ac798b
--- /dev/null
+++ b/contrib/heapcheck/t/003_heapcheck_relation.pl
@@ -0,0 +1,361 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 1;
+
+# This regression test demonstrates that the heapcheck_relation() function
+# supplied with this contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot be
+# expected to help us with this, as its design is not consistent with the goal
+# of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that heapcheck_relation
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# This formatting produces heap pages where each tuple is 58 bytes long, padded
+# out to 64 bytes for alignment, with the first one on the page starting at
+# offset 8128, as follows:
+#
+# [ lp_off: 8128 lp_len: 58]
+# [ lp_off: 8064 lp_len: 58]
+# [ lp_off: 8000 lp_len: 58]
+# [ lp_off: 7936 lp_len: 58]
+# [ lp_off: 7872 lp_len: 58]
+# [ lp_off: 7808 lp_len: 58]
+# ...
+#
+
+use constant LP_OFF_BEGIN => 8128;
+use constant LP_OFF_DELTA => 64;
+
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my ($result, $node);
+
+# Set up the node and test table.
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION heapcheck");
+
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 12;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ repeat('f', 7),
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause heapcheck_relation to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+for (my $offset = LP_OFF_BEGIN, my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++, $offset -= LP_OFF_DELTA)
+{
+ my $tup = read_tuple($file, $offset);
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run heapcheck_relation on the corrupted file
+$node->start;
+
+$result = $node->safe_psql('postgres', q(SELECT * FROM heapcheck_relation('test')));
+is ($result,
+"0|1|8128|1|58|||tuple xmin = 3 precedes relation relfrozenxid = $relfrozenxid
+0|1|8128|1|58|||tuple xmin = 3 (interpreted as 3) not or no longer valid
+0|2|8064|1|58|||tuple xmin = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|2|8064|1|58|||tuple xmin = 4026531839 (interpreted as 18446744073441116159) not or no longer valid
+0|3|8000|1|58|||tuple xmax = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|4|7936|1|58|||t_hoff > lp_len (152 > 58)
+0|5|7872|1|58|||t_hoff not max-aligned (27)
+0|6|7808|1|58|||t_hoff < SizeofHeapTupleHeader (16 < 23)
+0|7|7744|1|58|||t_hoff < SizeofHeapTupleHeader (21 < 23)
+0|7|7744|1|58|||t_hoff not max-aligned (21)
+0|8|7680|1|58|||relation natts < tuple natts (3 < 2047)
+0|9|7616|1|58|||SizeofHeapTupleHeader + BITMAPLEN(natts) > t_hoff (23 + 256 > 24)
+0|10|7552|1|58|||relation natts < tuple natts (3 < 67)
+0|11|7488|1|58|2||t_hoff + offset > lp_len (24 + 429496744 > 58)
+0|12|7424|1|58|2|0|final chunk number differs from expected (0 vs. 6)
+0|12|7424|1|58|2|0|toasted value missing from toast table",
+"Expected heapcheck_relation output");
+
+$node->teardown_node;
+$node->clean_node;
+
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..f32b8ac5ef 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -110,6 +110,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&earthdistance;
&file-fdw;
&fuzzystrmatch;
+ &heapcheck;
&hstore;
&intagg;
&intarray;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 68179f71cd..b43d72b8bb 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -122,6 +122,7 @@
<!ENTITY earthdistance SYSTEM "earthdistance.sgml">
<!ENTITY file-fdw SYSTEM "file-fdw.sgml">
<!ENTITY fuzzystrmatch SYSTEM "fuzzystrmatch.sgml">
+<!ENTITY heapcheck SYSTEM "heapcheck.sgml">
<!ENTITY hstore SYSTEM "hstore.sgml">
<!ENTITY intagg SYSTEM "intagg.sgml">
<!ENTITY intarray SYSTEM "intarray.sgml">
diff --git a/doc/src/sgml/heapcheck.sgml b/doc/src/sgml/heapcheck.sgml
new file mode 100644
index 0000000000..0a9942a452
--- /dev/null
+++ b/doc/src/sgml/heapcheck.sgml
@@ -0,0 +1,133 @@
+<!-- doc/src/sgml/heapcheck.sgml -->
+
+<sect1 id="heapcheck" xreflabel="heapcheck">
+ <title>heapcheck</title>
+
+ <indexterm zone="heapcheck">
+ <primary>heapcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>heapcheck</filename> module provides a means for examining the
+ integrity of a table relation.
+ </para>
+
+ <sect2>
+ <title>Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>
+ heapcheck_relation(relation regclass,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, for each
+ corruption detected, returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the corruption
+ is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>
+ By default, this function is executable only by superusers and members
+ of the <literal>pg_stat_scan_tables</literal> role.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Author</title>
+
+ <para>
+ Mark Dilger <email>mark.dilger@enterprisedb.com</email>
+ </para>
+ </sect2>
+
+</sect1>
--
2.21.1 (Apple Git-122.3)
On Mon, Apr 20, 2020 at 10:59 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
The attached module provides the means to scan a relation and sanity check it. Currently, it checks xmin and xmax values against relfrozenxid and relminmxid, and also validates TOAST pointers. If people like this, it could be expanded to perform additional checks.
Cool. Why not make it part of contrib/amcheck?
We talked about the kinds of checks that we'd like to have for a tool
like this before:
/messages/by-id/20161017014605.GA1220186@tornado.leadboat.com
--
Peter Geoghegan
On Mon, Apr 20, 2020 at 2:09 PM Peter Geoghegan <pg@bowt.ie> wrote:
Cool. Why not make it part of contrib/amcheck?
I wondered if people would suggest that. Didn't take long.
The documentation would need some updating, but that's doable.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Apr 20, 2020 at 11:19 AM Robert Haas <robertmhaas@gmail.com> wrote:
I wondered if people would suggest that. Didn't take long.
You were the one that pointed out that my first version of
contrib/amcheck, which was called "contrib/btreecheck", should have
a more general name. And rightly so!
The basic interface used for the heap checker functions seem very
similar to what amcheck already offers for B-Tree indexes, so it seems
very natural to distribute them together.
IMV, the problem that we have with amcheck is that it's too hard to
use in a top down kind of way. Perhaps there is an opportunity to
provide a more top-down interface to an expanded version of amcheck
that does heap checking. Something with a high level practical focus,
in addition to the low level functions. I'm not saying that Mark
should be required to solve that problem, but it certainly seems worth
considering now.
The documentation would need some updating, but that's doable.
It would also probably need a bit of renaming, so that analogous
function names are used.
--
Peter Geoghegan
On Apr 20, 2020, at 11:31 AM, Peter Geoghegan <pg@bowt.ie> wrote:
IMV, the problem that we have with amcheck is that it's too hard to
use in a top down kind of way. Perhaps there is an opportunity to
provide a more top-down interface to an expanded version of amcheck
that does heap checking. Something with a high level practical focus,
in addition to the low level functions. I'm not saying that Mark
should be required to solve that problem, but it certainly seems worth
considering now.
Thanks for your quick response and interest in this submission!
Can you elaborate on "top-down"? I'm not sure what that means in this context.
I don't mind going further with this project if I understand what you are suggesting.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
I mean an interface that's friendly to DBAs, that verifies an entire
database. No custom sql query required. Something that provides a
reasonable mix of verification options based on high level directives. All
verification methods can be combined in a granular, possibly randomized
fashion. Maybe we can make this run in parallel.
For example, maybe your heap checker code sometimes does index probes for a
subset of indexes and heap tuples. It's not hard to combine it with the
rootdescend stuff from amcheck. It should be composable.
The interface you've chosen is a good starting point. But let's not miss an
opportunity to make everything work together.
Peter Geoghegan
(Sent from my phone)
On Apr 20, 2020, at 12:37 PM, Peter Geoghegan <pg@bowt.ie> wrote:
I mean an interface that's friendly to DBAs, that verifies an entire database. No custom sql query required. Something that provides a reasonable mix of verification options based on high level directives. All verification methods can be combined in a granular, possibly randomized fashion. Maybe we can make this run in parallel.
For example, maybe your heap checker code sometimes does index probes for a subset of indexes and heap tuples. It's not hard to combine it with the rootdescend stuff from amcheck. It should be composable.
The interface you've chosen is a good starting point. But let's not miss an opportunity to make everything work together.
Ok, I'll work in that direction and repost when I have something along those lines.
Thanks again for your input.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-04-20 10:59:28 -0700, Mark Dilger wrote:
I have been talking with Robert about table corruption that occurs
from time to time. The page checksum feature seems sufficient to
detect most random corruption problems, but it can't detect "logical"
corruption, where the page is valid but inconsistent with the rest of
the database cluster. This can happen due to faulty or ill-conceived
backup and restore tools, or bad storage, or user error, or bugs in
the server itself. (Also, not everyone enables checksums.)
This is something we really really really need. I'm very excited to see
progress!
From 2a1bc0bb9fa94bd929adc1a408900cb925ebcdd5 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Apr 2020 08:05:58 -0700
Subject: [PATCH v2] Adding heapcheck contrib module.The heapcheck module introduces a new function for checking a heap
relation and associated toast relation, if any, for corruption.
Why not add it to amcheck?
I wonder if a mode where heapcheck optionally would only checks
non-frozen (perhaps also non-all-visible) regions of a table would be a
good idea? Would make it a lot more viable to run this regularly on
bigger databases. Even if there's a window to not check some data
(because it's frozen before the next heapcheck run).
The attached module provides the means to scan a relation and sanity
check it. Currently, it checks xmin and xmax values against
relfrozenxid and relminmxid, and also validates TOAST pointers. If
people like this, it could be expanded to perform additional checks.
The postgres backend already defends against certain forms of
corruption, by checking the page header of each page before allowing
it into the page cache, and by checking the page checksum, if enabled.
Experience shows that broken or ill-conceived backup and restore
mechanisms can result in a page, or an entire file, being overwritten
with an earlier version of itself, restored from backup. Pages thus
overwritten will appear to have valid page headers and checksums,
while potentially containing xmin, xmax, and toast pointers that are
invalid.
We also had a *lot* of bugs that we'd have found a lot earlier, possibly
even during development, if we had a way to easily perform these checks.
contrib/heapcheck introduces a function, heapcheck_relation, that
takes a regclass argument, scans the given heap relation, and returns
rows containing information about corruption found within the table.
The main focus of the scan is to find invalid xmin, xmax, and toast
pointer values. It also checks for structural corruption within the
page (such as invalid t_hoff values) that could lead to the backend
aborting should the function blindly trust the data as it finds it.
+typedef struct CorruptionInfo +{ + BlockNumber blkno; + OffsetNumber offnum; + int16 lp_off; + int16 lp_flags; + int16 lp_len; + int32 attnum; + int32 chunk; + char *msg; +} CorruptionInfo;
Adding a short comment explaining what this is for would be good.
+/* Internal implementation */ +void record_corruption(HeapCheckContext * ctx, char *msg); +TupleDesc heapcheck_relation_tupdesc(void); + +void beginRelBlockIteration(HeapCheckContext * ctx); +bool relBlockIteration_next(HeapCheckContext * ctx); +void endRelBlockIteration(HeapCheckContext * ctx); + +void beginPageTupleIteration(HeapCheckContext * ctx); +bool pageTupleIteration_next(HeapCheckContext * ctx); +void endPageTupleIteration(HeapCheckContext * ctx); + +void beginTupleAttributeIteration(HeapCheckContext * ctx); +bool tupleAttributeIteration_next(HeapCheckContext * ctx); +void endTupleAttributeIteration(HeapCheckContext * ctx); + +void beginToastTupleIteration(HeapCheckContext * ctx, + struct varatt_external *toast_pointer); +void endToastTupleIteration(HeapCheckContext * ctx); +bool toastTupleIteration_next(HeapCheckContext * ctx); + +bool TransactionIdStillValid(TransactionId xid, FullTransactionId *fxid); +bool HeapTupleIsVisible(HeapTupleHeader tuphdr, HeapCheckContext * ctx); +void check_toast_tuple(HeapCheckContext * ctx); +bool check_tuple_attribute(HeapCheckContext * ctx); +void check_tuple(HeapCheckContext * ctx); + +List *check_relation(Oid relid); +void check_relation_relkind(Relation rel);
Why aren't these static?
+/* + * record_corruption + * + * Record a message about corruption, including information + * about where in the relation the corruption was found. + */ +void +record_corruption(HeapCheckContext * ctx, char *msg) +{
Given that you went through the trouble of adding prototypes for all of
these, I'd start with the most important functions, not the unimportant
details.
+/* + * Helper function to construct the TupleDesc needed by heapcheck_relation. + */ +TupleDesc +heapcheck_relation_tupdesc()
Missing (void) (it's our style, even though you could theoretically not
have it as long as you have a prototype).
+{ + TupleDesc tupdesc; + AttrNumber maxattr = 8;
This 8 is in multiple places, I'd add a define for it.
+ AttrNumber a = 0; + + tupdesc = CreateTemplateTupleDesc(maxattr); + TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0); + Assert(a == maxattr); + + return BlessTupleDesc(tupdesc); +}
+/* + * heapcheck_relation + * + * Scan and report corruption in heap pages or in associated toast relation. + */ +Datum +heapcheck_relation(PG_FUNCTION_ARGS) +{ + FuncCallContext *funcctx; + CheckRelCtx *ctx; + + if (SRF_IS_FIRSTCALL()) + {
I think it'd be good to have a version that just returned a boolean. For
one, in many cases that's all we care about when scripting things. But
also, on a large relation, there could be a lot of errors.
+ Oid relid = PG_GETARG_OID(0); + MemoryContext oldcontext; + + /* + * Scan the entire relation, building up a list of corruption found in + * ctx->corruption, for returning later. The scan must be performed + * in a memory context that will survive until after all rows are + * returned. + */ + funcctx = SRF_FIRSTCALL_INIT(); + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); + funcctx->tuple_desc = heapcheck_relation_tupdesc(); + ctx = (CheckRelCtx *) palloc0(sizeof(CheckRelCtx)); + ctx->corruption = check_relation(relid); + ctx->idx = 0; /* start the iterator at the beginning */ + funcctx->user_fctx = (void *) ctx; + MemoryContextSwitchTo(oldcontext);
Hm. This builds up all the errors in memory. Is that a good idea? I mean
for a large relation having one returned value for each tuple could be a
heck of a lot of data.
I think it'd be better to use the spilling SRF protocol here. It's not
like you're benefitting from deferring the tuple construction to the
return currently.
+/* + * beginRelBlockIteration + * + * For the given heap relation being checked, as recorded in ctx, sets up + * variables for iterating over the heap's pages. + * + * The caller should have already opened the heap relation, ctx->rel + */ +void +beginRelBlockIteration(HeapCheckContext * ctx) +{ + ctx->nblocks = RelationGetNumberOfBlocks(ctx->rel); + ctx->blkno = InvalidBlockNumber; + ctx->bstrategy = GetAccessStrategy(BAS_BULKREAD); + ctx->buffer = InvalidBuffer; + ctx->page = NULL; +} + +/* + * endRelBlockIteration + * + * Releases resources that were reserved by either beginRelBlockIteration or + * relBlockIteration_next. + */ +void +endRelBlockIteration(HeapCheckContext * ctx) +{ + /* + * Clean up. If the caller iterated to the end, the final call to + * relBlockIteration_next will already have released the buffer, but if + * the caller is bailing out early, we have to release it ourselves. + */ + if (InvalidBuffer != ctx->buffer) + UnlockReleaseBuffer(ctx->buffer); +}
These seem mighty granular and generically named to me.
+ * pageTupleIteration_next + * + * Advances the state tracked in ctx to the next tuple on the page. + * + * Caller should have already set up the iteration via + * beginPageTupleIteration, and should stop calling when this function + * returns false. + */ +bool +pageTupleIteration_next(HeapCheckContext * ctx)
I don't think this is a naming scheme we use anywhere in postgres. I
don't think it's a good idea to add yet more of those.
+{ + /* + * Iterate to the next interesting line pointer, if any. Unused, dead and + * redirect line pointers are of no interest. + */ + do + { + ctx->offnum = OffsetNumberNext(ctx->offnum); + if (ctx->offnum > ctx->maxoff) + return false; + ctx->itemid = PageGetItemId(ctx->page, ctx->offnum); + } while (!ItemIdIsUsed(ctx->itemid) || + ItemIdIsDead(ctx->itemid) || + ItemIdIsRedirected(ctx->itemid));
This is an odd loop. Part of the test is in the body, part of in the
loop header.
+/* + * Given a TransactionId, attempt to interpret it as a valid + * FullTransactionId, neither in the future nor overlong in + * the past. Stores the inferred FullTransactionId in *fxid. + * + * Returns whether the xid is newer than the oldest clog xid. + */ +bool +TransactionIdStillValid(TransactionId xid, FullTransactionId *fxid)
I don't at all like the naming of this function. This isn't a reliable
check. As before, it obviously also shouldn't be static.
+{ + FullTransactionId fnow; + uint32 epoch; + + /* Initialize fxid; we'll overwrite this later if needed */ + *fxid = FullTransactionIdFromEpochAndXid(0, xid);
+ /* Special xids can quickly be turned into invalid fxids */ + if (!TransactionIdIsValid(xid)) + return false; + if (!TransactionIdIsNormal(xid)) + return true; + + /* + * Charitably infer the full transaction id as being within one epoch ago + */ + fnow = ReadNextFullTransactionId(); + epoch = EpochFromFullTransactionId(fnow); + *fxid = FullTransactionIdFromEpochAndXid(epoch, xid);
So now you're overwriting the fxid value from above unconditionally?
+ if (!FullTransactionIdPrecedes(*fxid, fnow)) + *fxid = FullTransactionIdFromEpochAndXid(epoch - 1, xid);
I think it'd be better to do the conversion the following way:
*fxid = FullTransactionIdFromU64(U64FromFullTransactionId(fnow)
+ (int32) (XidFromFullTransactionId(fnow) - xid));
+ if (!FullTransactionIdPrecedes(*fxid, fnow)) + return false; + /* The oldestClogXid is protected by CLogTruncationLock */ + Assert(LWLockHeldByMe(CLogTruncationLock)); + if (TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) + return false; + return true; +}
Why is this testing oldestClogXid instead of oldestXid?
+/* + * HeapTupleIsVisible + * + * Determine whether tuples are visible for heapcheck. Similar to + * HeapTupleSatisfiesVacuum, but with critical differences. + * + * 1) Does not touch hint bits. It seems imprudent to write hint bits + * to a table during a corruption check. + * 2) Gracefully handles xids that are too old by calling + * TransactionIdStillValid before TransactionLogFetch, thus avoiding + * a backend abort.
I think it'd be better to protect against this by avoiding checks for
xids that are older than relfrozenxid. And ones that are newer than
ReadNextTransactionId(). But all of those cases should be errors
anyway, so it doesn't seem like that should be handled within the
visibility routine.
+ * 3) Only makes a boolean determination of whether heapcheck should + * see the tuple, rather than doing extra work for vacuum-related + * categorization. + */ +bool +HeapTupleIsVisible(HeapTupleHeader tuphdr, HeapCheckContext * ctx) +{
+ FullTransactionId fxmin, + fxmax; + uint16 infomask = tuphdr->t_infomask; + TransactionId xmin = HeapTupleHeaderGetXmin(tuphdr); + + if (!HeapTupleHeaderXminCommitted(tuphdr)) + {
Hm. I wonder if it'd be good to crosscheck the xid committed hint bits
with clog?
+ else if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuphdr))) + { + LWLockRelease(CLogTruncationLock); + return false; /* HEAPTUPLE_DEAD */ + }
Note that this actually can error out, if xmin is a subtransaction xid,
because pg_subtrans is truncated a lot more aggressively than anything
else. I think you'd need to filter against subtransactions older than
RecentXmin before here, and treat that as an error.
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask)) + { + if (infomask & HEAP_XMAX_IS_MULTI) + { + TransactionId xmax = HeapTupleGetUpdateXid(tuphdr); + + /* not LOCKED_ONLY, so it has to have an xmax */ + if (!TransactionIdIsValid(xmax)) + { + record_corruption(ctx, _("heap tuple with XMAX_IS_MULTI is " + "neither LOCKED_ONLY nor has a " + "valid xmax")); + return false; + }
I think it's bad to have code like this in a routine that's named like a
generic visibility check routine.
+ if (TransactionIdIsInProgress(xmax)) + return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */ + + LWLockAcquire(CLogTruncationLock, LW_SHARED); + if (!TransactionIdStillValid(xmax, &fxmax)) + { + LWLockRelease(CLogTruncationLock); + record_corruption(ctx, psprintf("tuple xmax = %u (interpreted " + "as " UINT64_FORMAT + ") not or no longer valid", + xmax, fxmax.value)); + return false; + } + else if (TransactionIdDidCommit(xmax)) + { + LWLockRelease(CLogTruncationLock); + return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */ + } + LWLockRelease(CLogTruncationLock); + /* Ok, the tuple is live */
I don't think random interspersed uses of CLogTruncationLock are a good
idea. If you move to only checking visibility after tuple fits into
[relfrozenxid, nextXid), then you don't need to take any locks here, as
long as a lock against vacuum is taken (which I think this should do
anyway).
+/* + * check_tuple + * + * Checks the current tuple as tracked in ctx for corruption. Records any + * corruption found in ctx->corruption. + * + * The caller should have iterated to a tuple via pageTupleIteration_next. + */ +void +check_tuple(HeapCheckContext * ctx) +{ + bool fatal = false;
Wait, aren't some checks here duplicate with ones in
HeapTupleIsVisible()?
+ /* Check relminmxid against mxid, if any */ + if (ctx->infomask & HEAP_XMAX_IS_MULTI && + MultiXactIdPrecedes(ctx->xmax, ctx->relminmxid)) + { + record_corruption(ctx, psprintf("tuple xmax = %u precedes relation " + "relminmxid = %u", + ctx->xmax, ctx->relminmxid)); + }
It's pretty weird that the routines here access xmin/xmax/... via
HeapCheckContext, but HeapTupleIsVisible() doesn't.
+ /* Check xmin against relfrozenxid */ + if (TransactionIdIsNormal(ctx->relfrozenxid) && + TransactionIdIsNormal(ctx->xmin) && + TransactionIdPrecedes(ctx->xmin, ctx->relfrozenxid)) + { + record_corruption(ctx, psprintf("tuple xmin = %u precedes relation " + "relfrozenxid = %u", + ctx->xmin, ctx->relfrozenxid)); + } + + /* Check xmax against relfrozenxid */ + if (TransactionIdIsNormal(ctx->relfrozenxid) && + TransactionIdIsNormal(ctx->xmax) && + TransactionIdPrecedes(ctx->xmax, ctx->relfrozenxid)) + { + record_corruption(ctx, psprintf("tuple xmax = %u precedes relation " + "relfrozenxid = %u", + ctx->xmax, ctx->relfrozenxid)); + }
these all should be fatal. You definitely cannot just continue
afterwards given the justification below:
+ /* + * Iterate over the attributes looking for broken toast values. This + * roughly follows the logic of heap_deform_tuple, except that it doesn't + * bother building up isnull[] and values[] arrays, since nobody wants + * them, and it unrolls anything that might trip over an Assert when + * processing corrupt data. + */ + beginTupleAttributeIteration(ctx); + while (tupleAttributeIteration_next(ctx) && + check_tuple_attribute(ctx)) + ; + endTupleAttributeIteration(ctx); +}
I really don't find these helpers helpful.
+/* + * check_relation + * + * Checks the relation given by relid for corruption, returning a list of all + * it finds. + * + * The caller should set up the memory context as desired before calling. + * The returned list belongs to the caller. + */ +List * +check_relation(Oid relid) +{ + HeapCheckContext ctx; + + memset(&ctx, 0, sizeof(HeapCheckContext)); + + /* Open the relation */ + ctx.relid = relid; + ctx.corruption = NIL; + ctx.rel = relation_open(relid, AccessShareLock);
I think you need to protect at least against concurrent schema changes
given some of your checks. But I think it'd be better to also conflict
with vacuum here.
+ check_relation_relkind(ctx.rel);
I think you also need to ensure that the table is actually using heap
AM, not another tableam. Oh - you're doing that inside the check. But
that's confusing, because that's not 'relkind'.
+ ctx.relDesc = RelationGetDescr(ctx.rel); + ctx.rel_natts = RelationGetDescr(ctx.rel)->natts; + ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid; + ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
three naming schemes in three lines...
+ /* check all blocks of the relation */ + beginRelBlockIteration(&ctx); + while (relBlockIteration_next(&ctx)) + { + /* Perform tuple checks */ + beginPageTupleIteration(&ctx); + while (pageTupleIteration_next(&ctx)) + check_tuple(&ctx); + endPageTupleIteration(&ctx); + } + endRelBlockIteration(&ctx);
I again do not find this helper stuff helpful.
+ /* Close the associated toast table and indexes, if any. */ + if (ctx.has_toastrel) + { + toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes, + AccessShareLock); + table_close(ctx.toastrel, AccessShareLock); + } + + /* Close the main relation */ + relation_close(ctx.rel, AccessShareLock);
Why the closing here?
+# This regression test demonstrates that the heapcheck_relation() function +# supplied with this contrib module correctly identifies specific kinds of +# corruption within pages. To test this, we need a mechanism to create corrupt +# pages with predictable, repeatable corruption. The postgres backend cannot be +# expected to help us with this, as its design is not consistent with the goal +# of intentionally corrupting pages. +# +# Instead, we create a table to corrupt, and with careful consideration of how +# postgresql lays out heap pages, we seek to offsets within the page and +# overwrite deliberately chosen bytes with specific values calculated to +# corrupt the page in expected ways. We then verify that heapcheck_relation +# reports the corruption, and that it runs without crashing. Note that the +# backend cannot simply be started to run queries against the corrupt table, as +# the backend will crash, at least for some of the corruption types we +# generate. +# +# Autovacuum potentially touching the table in the background makes the exact +# behavior of this test harder to reason about. We turn it off to keep things +# simpler. We use a "belt and suspenders" approach, turning it off for the +# system generally in postgresql.conf, and turning it off specifically for the +# test table. +# +# This test depends on the table being written to the heap file exactly as we +# expect it to be, so we take care to arrange the columns of the table, and +# insert rows of the table, that give predictable sizes and locations within +# the table page.
I have a hard time believing this is going to be really
reliable. E.g. the alignment requirements will vary between platforms,
leading to different layouts. In particular, MAXALIGN differs between
platforms.
Also, it's supported to compile postgres with a different pagesize.
Greetings,
Andres Freund
[ retrying from the email address I intended to use ]
On Mon, Apr 20, 2020 at 3:42 PM Andres Freund <andres@anarazel.de> wrote:
I don't think random interspersed uses of CLogTruncationLock are a good
idea. If you move to only checking visibility after tuple fits into
[relfrozenxid, nextXid), then you don't need to take any locks here, as
long as a lock against vacuum is taken (which I think this should do
anyway).
I think it would be *really* good to avoid ShareUpdateExclusiveLock
here. Running with only AccessShareLock would be a big advantage. I
agree that any use of CLogTruncationLock should not be "random", but I
don't see why the same method we use to make txid_status() safe to
expose to SQL shouldn't also be used here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-04-20 15:59:49 -0400, Robert Haas wrote:
On Mon, Apr 20, 2020 at 3:42 PM Andres Freund <andres@anarazel.de> wrote:
I don't think random interspersed uses of CLogTruncationLock are a good
idea. If you move to only checking visibility after tuple fits into
[relfrozenxid, nextXid), then you don't need to take any locks here, as
long as a lock against vacuum is taken (which I think this should do
anyway).I think it would be *really* good to avoid ShareUpdateExclusiveLock
here. Running with only AccessShareLock would be a big advantage. I
agree that any use of CLogTruncationLock should not be "random", but I
don't see why the same method we use to make txid_status() safe to
expose to SQL shouldn't also be used here.
A few billion CLogTruncationLock acquisitions in short order will likely
have at least as big an impact as ShareUpdateExclusiveLock held for the
duration of the check. That's not really a relevant concern or
txid_status(). Per-tuple lock acquisitions aren't great.
I think it might be doable to not need either. E.g. we could set the
checking backend's xmin to relfrozenxid, and set somethign like
PROC_IN_VACUUM. That should, I think, prevent clog from being truncated
in a problematic way (clog truncations look at PROC_IN_VACUUM backends),
while not blocking vacuum.
The similar concern for ReadNewTransactionId() can probably more easily
be addressed, by only calling ReadNewTransactionId() when encountering
an xid that's newer than the last value read.
I think it'd be good to set PROC_IN_VACUUM (or maybe a separate version
of it) while checking anyway. Reading the full relation can take quite a
while, and we shouldn't prevent hot pruning while doing so.
There's some things we'd need to figure out to be able to use
PROC_IN_VACUUM, as that's really only safe in some
circumstances. Possibly it'd be easiest to address that if we'd make the
check a procedure...
Greetings,
Andres Freund
Import Notes
Reply to msg id not found: CA+3TBzuQXZu9UdAkVNTSQ-uR6xnEmzmXdTHh+7SqD4v1pBN3rg@mail.gmail.com
On Mon, Apr 20, 2020 at 12:42 PM Andres Freund <andres@anarazel.de> wrote:
This is something we really really really need. I'm very excited to see
progress!
+1
My experience with amcheck was that the requirement that we document
and verify pretty much every invariant (the details of which differ
slightly based on the B-Tree version in use) has had intangible
benefits. It helped me come up with a simpler, better design in the
first place. Also, many of the benchmarks that I perform get to be a
stress-test of the feature itself. It saves quite a lot of testing
work in the long run.
I wonder if a mode where heapcheck optionally would only checks
non-frozen (perhaps also non-all-visible) regions of a table would be a
good idea? Would make it a lot more viable to run this regularly on
bigger databases. Even if there's a window to not check some data
(because it's frozen before the next heapcheck run).
That's a great idea. It could also make it practical to use the
rootdescend verification option to verify indexes selectively -- if
you don't have too many blocks to check on average, the overhead is
tolerable. This is the kind of thing that naturally belongs in the
higher level interface that I sketched already.
We also had a *lot* of bugs that we'd have found a lot earlier, possibly
even during development, if we had a way to easily perform these checks.
I can think of a case where it was quite unclear what the invariants
for the heap even were, at least temporarily. And this was in the
context of fixing a bug that was really quite nasty. Formally defining
the invariants in one place, and taking a position on exactly what
correct looks like seems like a very valuable exercise. Even without
the tool catching a single bug.
I have a hard time believing this is going to be really
reliable. E.g. the alignment requirements will vary between platforms,
leading to different layouts. In particular, MAXALIGN differs between
platforms.
Over on another thread, I suggested that Mark might want to have a
corruption test framework that exposes some of the bufpage.c routines.
The idea is that you can destructively manipulate a page using the
logical page interface. Something that works one level below the
access method, but one level above the raw page image. It probably
wouldn't test everything that Mark wants to test, but it would test
some things in a way that seems maintainable to me.
--
Peter Geoghegan
On Mon, Apr 20, 2020 at 4:30 PM Andres Freund <andres@anarazel.de> wrote:
A few billion CLogTruncationLock acquisitions in short order will likely
have at least as big an impact as ShareUpdateExclusiveLock held for the
duration of the check. That's not really a relevant concern or
txid_status(). Per-tuple lock acquisitions aren't great.
Yeah, that's true. Doing it for every tuple is going to be too much, I
think. I was hoping we could avoid that.
I think it might be doable to not need either. E.g. we could set the
checking backend's xmin to relfrozenxid, and set somethign like
PROC_IN_VACUUM. That should, I think, prevent clog from being truncated
in a problematic way (clog truncations look at PROC_IN_VACUUM backends),
while not blocking vacuum.
Hmm, OK, I don't know if that would be OK or not.
The similar concern for ReadNewTransactionId() can probably more easily
be addressed, by only calling ReadNewTransactionId() when encountering
an xid that's newer than the last value read.
Yeah, if we can cache some things to avoid repetitive calls, that would be good.
I think it'd be good to set PROC_IN_VACUUM (or maybe a separate version
of it) while checking anyway. Reading the full relation can take quite a
while, and we shouldn't prevent hot pruning while doing so.There's some things we'd need to figure out to be able to use
PROC_IN_VACUUM, as that's really only safe in some
circumstances. Possibly it'd be easiest to address that if we'd make the
check a procedure...
I think we sure want to set things up so that we do this check without
holding a snapshot, if we can. Not sure exactly how to get there.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Apr 20, 2020 at 12:40 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Ok, I'll work in that direction and repost when I have something along those lines.
Great, thanks!
It also occurs to me that the B-Tree checks that amcheck already has
have one remaining blindspot: While the heapallindexed verification
option has the ability to detect the absence of an index tuple that
the dummy CREATE INDEX that we perform under the hood says should be
in the index, it cannot do the opposite: It cannot detect the presence
of a malformed tuple that shouldn't be there at all, unless the index
tuple itself is corrupt. That could miss an inconsistent page image
when a few tuples have been VACUUMed away, but still appear in the
index.
In order to do that, we'd have to have something a bit like the
validate_index() heap scan that CREATE INDEX CONCURRENTLY uses. We'd
have to get a list of heap TIDs that any index tuple might be pointing
to, and then make sure that there were no TIDs in the index that were
not in that list -- tuples that were pointing to nothing in the heap
at all. This could use the index_bulk_delete() interface. This is the
kind of verification option that I might work on for debugging
purposes, but not the kind of thing I could really recommend to
ordinary users outside of exceptional cases. This is the kind of thing
that argues for more or less providing all of the verification
functionality we have through both high level and low level
interfaces. This isn't likely to be all that valuable most of the
time, and users shouldn't have to figure that out for themselves the
hard way. (BTW, I think that this could be implemented in an
index-AM-agnostic way, I think, so perhaps you can consider adding it
too, if you have time.)
One last thing for now: take a look at amcheck's
bt_tuple_present_callback() function. It has comments about HOT chain
corruption that you may find interesting. Note that this check played
a role in the "freeze the dead" corruption bug [1]/messages/by-id/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com -- Peter Geoghegan -- it detected that
our initial fix for that was broken. It seems like it would be a good
idea to go back through the reproducers we've seen for some of the
more memorable corruption bugs, and actually make sure that your tool
detects them where that isn't clear. History doesn't repeat itself,
but it often rhymes.
[1]: /messages/by-id/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan
On Mon, Apr 20, 2020 at 1:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Apr 20, 2020 at 4:30 PM Andres Freund <andres@anarazel.de> wrote:
A few billion CLogTruncationLock acquisitions in short order will likely
have at least as big an impact as ShareUpdateExclusiveLock held for the
duration of the check. That's not really a relevant concern or
txid_status(). Per-tuple lock acquisitions aren't great.Yeah, that's true. Doing it for every tuple is going to be too much, I
think. I was hoping we could avoid that.
What about the visibility map? It would be nice if pg_visibility was
merged into amcheck, since it mostly provides integrity checking for
the visibility map. Maybe we could just merge the functions that
perform verification, and leave other functions (like
pg_truncate_visibility_map()) where they are. We could keep the
current interface for functions like pg_check_visible(), but also
allow the same verification to occur in passing, as part of a higher
level check.
It wouldn't be so bad if pg_visibility was an expert-only tool. But
ISTM that the verification performed by code like
collect_corrupt_items() could easily take place at the same time as
the new checks that Mark proposes. Possibly only some of the time. It
can work in a totally additive way. (Though like Andres I don't really
like the current "helper" functions used to iterate through a heap
relation; they seem like they'd make this harder.)
--
Peter Geoghegan
On Apr 20, 2020, at 12:42 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-04-20 10:59:28 -0700, Mark Dilger wrote:
I have been talking with Robert about table corruption that occurs
from time to time. The page checksum feature seems sufficient to
detect most random corruption problems, but it can't detect "logical"
corruption, where the page is valid but inconsistent with the rest of
the database cluster. This can happen due to faulty or ill-conceived
backup and restore tools, or bad storage, or user error, or bugs in
the server itself. (Also, not everyone enables checksums.)This is something we really really really need. I'm very excited to see
progress!
Thanks for the review!
From 2a1bc0bb9fa94bd929adc1a408900cb925ebcdd5 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Apr 2020 08:05:58 -0700
Subject: [PATCH v2] Adding heapcheck contrib module.The heapcheck module introduces a new function for checking a heap
relation and associated toast relation, if any, for corruption.Why not add it to amcheck?
That seems to be the general consensus. The functionality has been moved there, renamed as "verify_heapam", as that seems more in line with the "verify_nbtree" name already present in that module. The docs have also been moved there, although not very gracefully. It seems premature to polish the documentation given that the interface will likely change at least one more time, to incorporate more of Peter's suggestions. There are still design differences between the two implementations that need to be harmonized. The verify_heapam function returns rows detailing the corruption found, which is inconsistent with how verify_heapam does things.
I wonder if a mode where heapcheck optionally would only checks
non-frozen (perhaps also non-all-visible) regions of a table would be a
good idea? Would make it a lot more viable to run this regularly on
bigger databases. Even if there's a window to not check some data
(because it's frozen before the next heapcheck run).
Perhaps we should come back to that. Version 3 of this patch addresses concerns about the v2 patch without adding too many new features.
The attached module provides the means to scan a relation and sanity
check it. Currently, it checks xmin and xmax values against
relfrozenxid and relminmxid, and also validates TOAST pointers. If
people like this, it could be expanded to perform additional checks.The postgres backend already defends against certain forms of
corruption, by checking the page header of each page before allowing
it into the page cache, and by checking the page checksum, if enabled.
Experience shows that broken or ill-conceived backup and restore
mechanisms can result in a page, or an entire file, being overwritten
with an earlier version of itself, restored from backup. Pages thus
overwritten will appear to have valid page headers and checksums,
while potentially containing xmin, xmax, and toast pointers that are
invalid.We also had a *lot* of bugs that we'd have found a lot earlier, possibly
even during development, if we had a way to easily perform these checks.
I certainly hope this is useful for testing.
contrib/heapcheck introduces a function, heapcheck_relation, that
takes a regclass argument, scans the given heap relation, and returns
rows containing information about corruption found within the table.
The main focus of the scan is to find invalid xmin, xmax, and toast
pointer values. It also checks for structural corruption within the
page (such as invalid t_hoff values) that could lead to the backend
aborting should the function blindly trust the data as it finds it.+typedef struct CorruptionInfo +{ + BlockNumber blkno; + OffsetNumber offnum; + int16 lp_off; + int16 lp_flags; + int16 lp_len; + int32 attnum; + int32 chunk; + char *msg; +} CorruptionInfo;Adding a short comment explaining what this is for would be good.
This struct has been removed.
+/* Internal implementation */ +void record_corruption(HeapCheckContext * ctx, char *msg); +TupleDesc heapcheck_relation_tupdesc(void); + +void beginRelBlockIteration(HeapCheckContext * ctx); +bool relBlockIteration_next(HeapCheckContext * ctx); +void endRelBlockIteration(HeapCheckContext * ctx); + +void beginPageTupleIteration(HeapCheckContext * ctx); +bool pageTupleIteration_next(HeapCheckContext * ctx); +void endPageTupleIteration(HeapCheckContext * ctx); + +void beginTupleAttributeIteration(HeapCheckContext * ctx); +bool tupleAttributeIteration_next(HeapCheckContext * ctx); +void endTupleAttributeIteration(HeapCheckContext * ctx); + +void beginToastTupleIteration(HeapCheckContext * ctx, + struct varatt_external *toast_pointer); +void endToastTupleIteration(HeapCheckContext * ctx); +bool toastTupleIteration_next(HeapCheckContext * ctx); + +bool TransactionIdStillValid(TransactionId xid, FullTransactionId *fxid); +bool HeapTupleIsVisible(HeapTupleHeader tuphdr, HeapCheckContext * ctx); +void check_toast_tuple(HeapCheckContext * ctx); +bool check_tuple_attribute(HeapCheckContext * ctx); +void check_tuple(HeapCheckContext * ctx); + +List *check_relation(Oid relid); +void check_relation_relkind(Relation rel);Why aren't these static?
They are now, except for the iterator style functions, which are gone.
+/* + * record_corruption + * + * Record a message about corruption, including information + * about where in the relation the corruption was found. + */ +void +record_corruption(HeapCheckContext * ctx, char *msg) +{Given that you went through the trouble of adding prototypes for all of
these, I'd start with the most important functions, not the unimportant
details.
Yeah, good idea. The most important functions are now at the top.
+/* + * Helper function to construct the TupleDesc needed by heapcheck_relation. + */ +TupleDesc +heapcheck_relation_tupdesc()Missing (void) (it's our style, even though you could theoretically not
have it as long as you have a prototype).
That was unintentional, and is now fixed.
+{ + TupleDesc tupdesc; + AttrNumber maxattr = 8;This 8 is in multiple places, I'd add a define for it.
Done.
+ AttrNumber a = 0; + + tupdesc = CreateTemplateTupleDesc(maxattr); + TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0); + TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0); + Assert(a == maxattr); + + return BlessTupleDesc(tupdesc); +}+/* + * heapcheck_relation + * + * Scan and report corruption in heap pages or in associated toast relation. + */ +Datum +heapcheck_relation(PG_FUNCTION_ARGS) +{ + FuncCallContext *funcctx; + CheckRelCtx *ctx; + + if (SRF_IS_FIRSTCALL()) + {I think it'd be good to have a version that just returned a boolean. For
one, in many cases that's all we care about when scripting things. But
also, on a large relation, there could be a lot of errors.
There is now a second parameter to the function, "stop_on_error". The function performs exactly the same checks, but returns after the first page that contains corruption.
+ Oid relid = PG_GETARG_OID(0); + MemoryContext oldcontext; + + /* + * Scan the entire relation, building up a list of corruption found in + * ctx->corruption, for returning later. The scan must be performed + * in a memory context that will survive until after all rows are + * returned. + */ + funcctx = SRF_FIRSTCALL_INIT(); + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); + funcctx->tuple_desc = heapcheck_relation_tupdesc(); + ctx = (CheckRelCtx *) palloc0(sizeof(CheckRelCtx)); + ctx->corruption = check_relation(relid); + ctx->idx = 0; /* start the iterator at the beginning */ + funcctx->user_fctx = (void *) ctx; + MemoryContextSwitchTo(oldcontext);Hm. This builds up all the errors in memory. Is that a good idea? I mean
for a large relation having one returned value for each tuple could be a
heck of a lot of data.I think it'd be better to use the spilling SRF protocol here. It's not
like you're benefitting from deferring the tuple construction to the
return currently.
Done.
+/* + * beginRelBlockIteration + * + * For the given heap relation being checked, as recorded in ctx, sets up + * variables for iterating over the heap's pages. + * + * The caller should have already opened the heap relation, ctx->rel + */ +void +beginRelBlockIteration(HeapCheckContext * ctx) +{ + ctx->nblocks = RelationGetNumberOfBlocks(ctx->rel); + ctx->blkno = InvalidBlockNumber; + ctx->bstrategy = GetAccessStrategy(BAS_BULKREAD); + ctx->buffer = InvalidBuffer; + ctx->page = NULL; +} + +/* + * endRelBlockIteration + * + * Releases resources that were reserved by either beginRelBlockIteration or + * relBlockIteration_next. + */ +void +endRelBlockIteration(HeapCheckContext * ctx) +{ + /* + * Clean up. If the caller iterated to the end, the final call to + * relBlockIteration_next will already have released the buffer, but if + * the caller is bailing out early, we have to release it ourselves. + */ + if (InvalidBuffer != ctx->buffer) + UnlockReleaseBuffer(ctx->buffer); +}These seem mighty granular and generically named to me.
Removed.
+ * pageTupleIteration_next + * + * Advances the state tracked in ctx to the next tuple on the page. + * + * Caller should have already set up the iteration via + * beginPageTupleIteration, and should stop calling when this function + * returns false. + */ +bool +pageTupleIteration_next(HeapCheckContext * ctx)I don't think this is a naming scheme we use anywhere in postgres. I
don't think it's a good idea to add yet more of those.
Removed.
+{ + /* + * Iterate to the next interesting line pointer, if any. Unused, dead and + * redirect line pointers are of no interest. + */ + do + { + ctx->offnum = OffsetNumberNext(ctx->offnum); + if (ctx->offnum > ctx->maxoff) + return false; + ctx->itemid = PageGetItemId(ctx->page, ctx->offnum); + } while (!ItemIdIsUsed(ctx->itemid) || + ItemIdIsDead(ctx->itemid) || + ItemIdIsRedirected(ctx->itemid));This is an odd loop. Part of the test is in the body, part of in the
loop header.
Refactored.
+/* + * Given a TransactionId, attempt to interpret it as a valid + * FullTransactionId, neither in the future nor overlong in + * the past. Stores the inferred FullTransactionId in *fxid. + * + * Returns whether the xid is newer than the oldest clog xid. + */ +bool +TransactionIdStillValid(TransactionId xid, FullTransactionId *fxid)I don't at all like the naming of this function. This isn't a reliable
check. As before, it obviously also shouldn't be static.
Renamed and refactored.
+{ + FullTransactionId fnow; + uint32 epoch; + + /* Initialize fxid; we'll overwrite this later if needed */ + *fxid = FullTransactionIdFromEpochAndXid(0, xid);+ /* Special xids can quickly be turned into invalid fxids */ + if (!TransactionIdIsValid(xid)) + return false; + if (!TransactionIdIsNormal(xid)) + return true; + + /* + * Charitably infer the full transaction id as being within one epoch ago + */ + fnow = ReadNextFullTransactionId(); + epoch = EpochFromFullTransactionId(fnow); + *fxid = FullTransactionIdFromEpochAndXid(epoch, xid);So now you're overwriting the fxid value from above unconditionally?
+ if (!FullTransactionIdPrecedes(*fxid, fnow)) + *fxid = FullTransactionIdFromEpochAndXid(epoch - 1, xid);I think it'd be better to do the conversion the following way:
*fxid = FullTransactionIdFromU64(U64FromFullTransactionId(fnow)
+ (int32) (XidFromFullTransactionId(fnow) - xid));
This has been refactored to the point that these review comments cannot be directly replied to.
+ if (!FullTransactionIdPrecedes(*fxid, fnow)) + return false; + /* The oldestClogXid is protected by CLogTruncationLock */ + Assert(LWLockHeldByMe(CLogTruncationLock)); + if (TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) + return false; + return true; +}Why is this testing oldestClogXid instead of oldestXid?
References to clog have been refactored out of this module.
+/* + * HeapTupleIsVisible + * + * Determine whether tuples are visible for heapcheck. Similar to + * HeapTupleSatisfiesVacuum, but with critical differences. + * + * 1) Does not touch hint bits. It seems imprudent to write hint bits + * to a table during a corruption check. + * 2) Gracefully handles xids that are too old by calling + * TransactionIdStillValid before TransactionLogFetch, thus avoiding + * a backend abort.I think it'd be better to protect against this by avoiding checks for
xids that are older than relfrozenxid. And ones that are newer than
ReadNextTransactionId(). But all of those cases should be errors
anyway, so it doesn't seem like that should be handled within the
visibility routine.
The new implementation caches a range of expected xids. With the relation locked against concurrent vacuum runs, it can trust that the old end of the range won't move during the course of the scan. The newest end may move, but it only has to check for that when it encounters a newer than expected xid, and it updates the cache with the new maximum.
+ * 3) Only makes a boolean determination of whether heapcheck should + * see the tuple, rather than doing extra work for vacuum-related + * categorization. + */ +bool +HeapTupleIsVisible(HeapTupleHeader tuphdr, HeapCheckContext * ctx) +{+ FullTransactionId fxmin, + fxmax; + uint16 infomask = tuphdr->t_infomask; + TransactionId xmin = HeapTupleHeaderGetXmin(tuphdr); + + if (!HeapTupleHeaderXminCommitted(tuphdr)) + {Hm. I wonder if it'd be good to crosscheck the xid committed hint bits
with clog?
This is not done in v3, as it no longer checks clog.
+ else if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmin(tuphdr))) + { + LWLockRelease(CLogTruncationLock); + return false; /* HEAPTUPLE_DEAD */ + }Note that this actually can error out, if xmin is a subtransaction xid,
because pg_subtrans is truncated a lot more aggressively than anything
else. I think you'd need to filter against subtransactions older than
RecentXmin before here, and treat that as an error.
Calls to TransactionIdDidCommit are now preceded by checks that the xid argument is not too old.
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask)) + { + if (infomask & HEAP_XMAX_IS_MULTI) + { + TransactionId xmax = HeapTupleGetUpdateXid(tuphdr); + + /* not LOCKED_ONLY, so it has to have an xmax */ + if (!TransactionIdIsValid(xmax)) + { + record_corruption(ctx, _("heap tuple with XMAX_IS_MULTI is " + "neither LOCKED_ONLY nor has a " + "valid xmax")); + return false; + }I think it's bad to have code like this in a routine that's named like a
generic visibility check routine.
Renamed.
+ if (TransactionIdIsInProgress(xmax)) + return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */ + + LWLockAcquire(CLogTruncationLock, LW_SHARED); + if (!TransactionIdStillValid(xmax, &fxmax)) + { + LWLockRelease(CLogTruncationLock); + record_corruption(ctx, psprintf("tuple xmax = %u (interpreted " + "as " UINT64_FORMAT + ") not or no longer valid", + xmax, fxmax.value)); + return false; + } + else if (TransactionIdDidCommit(xmax)) + { + LWLockRelease(CLogTruncationLock); + return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */ + } + LWLockRelease(CLogTruncationLock); + /* Ok, the tuple is live */I don't think random interspersed uses of CLogTruncationLock are a good
idea. If you move to only checking visibility after tuple fits into
[relfrozenxid, nextXid), then you don't need to take any locks here, as
long as a lock against vacuum is taken (which I think this should do
anyway).
Done.
+/* + * check_tuple + * + * Checks the current tuple as tracked in ctx for corruption. Records any + * corruption found in ctx->corruption. + * + * The caller should have iterated to a tuple via pageTupleIteration_next. + */ +void +check_tuple(HeapCheckContext * ctx) +{ + bool fatal = false;Wait, aren't some checks here duplicate with ones in
HeapTupleIsVisible()?
Yeah, there was some overlap. That should be better now.
+ /* Check relminmxid against mxid, if any */ + if (ctx->infomask & HEAP_XMAX_IS_MULTI && + MultiXactIdPrecedes(ctx->xmax, ctx->relminmxid)) + { + record_corruption(ctx, psprintf("tuple xmax = %u precedes relation " + "relminmxid = %u", + ctx->xmax, ctx->relminmxid)); + }It's pretty weird that the routines here access xmin/xmax/... via
HeapCheckContext, but HeapTupleIsVisible() doesn't.
Fair point. HeapCheckContext no longer has fields for xmin/xmax after the refactoring.
+ /* Check xmin against relfrozenxid */ + if (TransactionIdIsNormal(ctx->relfrozenxid) && + TransactionIdIsNormal(ctx->xmin) && + TransactionIdPrecedes(ctx->xmin, ctx->relfrozenxid)) + { + record_corruption(ctx, psprintf("tuple xmin = %u precedes relation " + "relfrozenxid = %u", + ctx->xmin, ctx->relfrozenxid)); + } + + /* Check xmax against relfrozenxid */ + if (TransactionIdIsNormal(ctx->relfrozenxid) && + TransactionIdIsNormal(ctx->xmax) && + TransactionIdPrecedes(ctx->xmax, ctx->relfrozenxid)) + { + record_corruption(ctx, psprintf("tuple xmax = %u precedes relation " + "relfrozenxid = %u", + ctx->xmax, ctx->relfrozenxid)); + }these all should be fatal. You definitely cannot just continue
afterwards given the justification below:
They are now fatal.
+ /* + * Iterate over the attributes looking for broken toast values. This + * roughly follows the logic of heap_deform_tuple, except that it doesn't + * bother building up isnull[] and values[] arrays, since nobody wants + * them, and it unrolls anything that might trip over an Assert when + * processing corrupt data. + */ + beginTupleAttributeIteration(ctx); + while (tupleAttributeIteration_next(ctx) && + check_tuple_attribute(ctx)) + ; + endTupleAttributeIteration(ctx); +}I really don't find these helpers helpful.
Removed.
+/* + * check_relation + * + * Checks the relation given by relid for corruption, returning a list of all + * it finds. + * + * The caller should set up the memory context as desired before calling. + * The returned list belongs to the caller. + */ +List * +check_relation(Oid relid) +{ + HeapCheckContext ctx; + + memset(&ctx, 0, sizeof(HeapCheckContext)); + + /* Open the relation */ + ctx.relid = relid; + ctx.corruption = NIL; + ctx.rel = relation_open(relid, AccessShareLock);I think you need to protect at least against concurrent schema changes
given some of your checks. But I think it'd be better to also conflict
with vacuum here.
The relation is now opened with ShareUpdateExclusiveLock.
+ check_relation_relkind(ctx.rel);
I think you also need to ensure that the table is actually using heap
AM, not another tableam. Oh - you're doing that inside the check. But
that's confusing, because that's not 'relkind'.
It is checking both relkind and relam. The function has been renamed to reflect that.
+ ctx.relDesc = RelationGetDescr(ctx.rel); + ctx.rel_natts = RelationGetDescr(ctx.rel)->natts; + ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid; + ctx.relminmxid = ctx.rel->rd_rel->relminmxid;three naming schemes in three lines...
Fixed.
+ /* check all blocks of the relation */ + beginRelBlockIteration(&ctx); + while (relBlockIteration_next(&ctx)) + { + /* Perform tuple checks */ + beginPageTupleIteration(&ctx); + while (pageTupleIteration_next(&ctx)) + check_tuple(&ctx); + endPageTupleIteration(&ctx); + } + endRelBlockIteration(&ctx);I again do not find this helper stuff helpful.
Removed.
+ /* Close the associated toast table and indexes, if any. */ + if (ctx.has_toastrel) + { + toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes, + AccessShareLock); + table_close(ctx.toastrel, AccessShareLock); + } + + /* Close the main relation */ + relation_close(ctx.rel, AccessShareLock);Why the closing here?
As opposed to where...? It seems fairly standard to close the relation in the function where it was opened. Do you prefer that the relation not be closed? Or that it be closed but the lock retained?
+# This regression test demonstrates that the heapcheck_relation() function +# supplied with this contrib module correctly identifies specific kinds of +# corruption within pages. To test this, we need a mechanism to create corrupt +# pages with predictable, repeatable corruption. The postgres backend cannot be +# expected to help us with this, as its design is not consistent with the goal +# of intentionally corrupting pages. +# +# Instead, we create a table to corrupt, and with careful consideration of how +# postgresql lays out heap pages, we seek to offsets within the page and +# overwrite deliberately chosen bytes with specific values calculated to +# corrupt the page in expected ways. We then verify that heapcheck_relation +# reports the corruption, and that it runs without crashing. Note that the +# backend cannot simply be started to run queries against the corrupt table, as +# the backend will crash, at least for some of the corruption types we +# generate. +# +# Autovacuum potentially touching the table in the background makes the exact +# behavior of this test harder to reason about. We turn it off to keep things +# simpler. We use a "belt and suspenders" approach, turning it off for the +# system generally in postgresql.conf, and turning it off specifically for the +# test table. +# +# This test depends on the table being written to the heap file exactly as we +# expect it to be, so we take care to arrange the columns of the table, and +# insert rows of the table, that give predictable sizes and locations within +# the table page.I have a hard time believing this is going to be really
reliable. E.g. the alignment requirements will vary between platforms,
leading to different layouts. In particular, MAXALIGN differs between
platforms.Also, it's supported to compile postgres with a different pagesize.
It's simple enough to extend the tap test a little to check for those things. In v3, the tap test skips tests if the page size is not 8k, and also if the tuples do not fall on the page where expected (which would happen due to alignment issues, gremlins, or whatever.). There are other approaches, though. The HeapFile/HeapPage/HeapTuple perl modules recently submitted on another thread *could* be used here, but only if those modules are likely to be committed. This test *could* be extended to autodetect the page size and alignment issues and calculate at runtime where tuples will be on the page, but only if folks don't mind the test having that extra complexity in it. (There is a school of thought that regression tests should avoid excess complexity.). Do you have a recommendation about which way to go with this?
Here is the work thus far:
Attachments:
v3-0001-Adding-verify_heapam-to-amcheck-contrib-module.patchapplication/octet-stream; name=v3-0001-Adding-verify_heapam-to-amcheck-contrib-module.patch; x-unix-mode=0644Download
From bcbabe645ce807ce115cf394f6badc7eeff45a82 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Apr 2020 08:05:58 -0700
Subject: [PATCH v3] Adding verify_heapam to amcheck contrib module.
Adding a new function for checking a heap relation and associated
toast relation, if any, for corruption.
The postgres backend already defends against certain forms of
corruption, by checking the page header of each page before allowing
it into the page cache, and by checking the page checksum, if
enabled. Experience shows that broken or ill-conceived backup and
restore mechanisms can result in a page, or an entire file, being
overwritten with an earlier version of itself, restored from backup.
Pages thus overwritten will appear to have valid page headers and
checksums, while potentially containing xmin, xmax, and toast
pointers that are invalid.
contrib/amcheck now has a function, verify_heapam, that takes a
regclass argument, scans the given heap relation, and returns rows
containing information about corruption found within the table. The
main focus of the scan is to find invalid xmin, xmax, and toast
pointer values. It also checks for structural corruption within the
page (such as invalid t_hoff values) that could lead to the backend
aborting should the function blindly trust the data as it finds it.
A second boolean argument, stop_on_error, can be used to return
after the first corrupt page is detected.
---
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 28 +
contrib/amcheck/amcheck.control | 2 +-
.../amcheck/expected/disallowed_reltypes.out | 27 +
contrib/amcheck/sql/disallowed_reltypes.sql | 29 +
contrib/amcheck/t/verify_heapam.pl | 387 +++++++
contrib/amcheck/verify_heapam.c | 966 ++++++++++++++++++
contrib/heapcheck/.gitignore | 4 +
contrib/heapcheck/Makefile | 25 +
.../expected/001_create_extension.out | 1 +
contrib/heapcheck/heapcheck--1.0.sql | 21 +
contrib/heapcheck/heapcheck.control | 5 +
.../heapcheck/sql/001_create_extension.sql | 1 +
doc/src/sgml/amcheck.sgml | 102 ++
14 files changed, 1602 insertions(+), 3 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
create mode 100644 contrib/heapcheck/.gitignore
create mode 100644 contrib/heapcheck/Makefile
create mode 100644 contrib/heapcheck/expected/001_create_extension.out
create mode 100644 contrib/heapcheck/heapcheck--1.0.sql
create mode 100644 contrib/heapcheck/heapcheck.control
create mode 100644 contrib/heapcheck/sql/001_create_extension.sql
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..410f0a76ad 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..f685ccd868
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,28 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(regclass,
+ boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C STRICT;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..1829320a2f
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,27 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000)) partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned', false);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index', false);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view', false);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence', false);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table', false);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..c923e54b6f
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,29 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000)) partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned', false);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index', false);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view', false);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence', false);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table', false);
+
+
diff --git a/contrib/amcheck/t/verify_heapam.pl b/contrib/amcheck/t/verify_heapam.pl
new file mode 100644
index 0000000000..65be5963ec
--- /dev/null
+++ b/contrib/amcheck/t/verify_heapam.pl
@@ -0,0 +1,387 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More;
+
+# This regression test demonstrates that the verify_heapam() function
+# supplied with this contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot be
+# expected to help us with this, as its design is not consistent with the goal
+# of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# This formatting produces heap pages where each tuple is 58 bytes long, padded
+# out to 64 bytes for alignment, with the first one on the page starting at
+# offset 8128, as follows:
+#
+# [ lp_off: 8128 lp_len: 58]
+# [ lp_off: 8064 lp_len: 58]
+# [ lp_off: 8000 lp_len: 58]
+# [ lp_off: 7936 lp_len: 58]
+# [ lp_off: 7872 lp_len: 58]
+# [ lp_off: 7808 lp_len: 58]
+# ...
+#
+
+use constant LP_OFF_BEGIN => 8128;
+use constant LP_OFF_DELTA => 64;
+
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my ($result, $node);
+
+# Set up the node and test table.
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ ));
+
+$result = $node->safe_psql('postgres', q(SHOW block_size));
+if ($result != 8192)
+{
+ plan skip_all => 'Only default 8192 byte block size supported by this test';
+ $node->teardown_node;
+ $node->clean_node;
+ exit;
+}
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 12;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+# If we (this regression test) are being run on a system with different alignment
+# our offsets into the page may be wrong. Rather than automatically configuring
+# for different alignment sizes, we just skip the test if the aligments aren't
+# what we expect.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $offset = LP_OFF_BEGIN, my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++, $offset -= LP_OFF_DELTA)
+{
+ my $tup = read_tuple($file, $offset);
+
+ # Verify the data appears to be where we would expect on the page. If alignment
+ # issues have caused data to be placed elsewhere, we should be able to tell.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ plan skip_all => 'Page layout differs from our expectations';
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+plan tests => 1;
+
+$result = $node->safe_psql('postgres', q(SELECT * FROM verify_heapam('test', false)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin = 3 precedes relation relfrozenxid = $relfrozenxid
+0|2|8064|1|58|||tuple xmin = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|3|8000|1|58|||tuple xmax = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|4|7936|1|58|||t_hoff > lp_len (152 > 58)
+0|5|7872|1|58|||t_hoff not max-aligned (27)
+0|6|7808|1|58|||t_hoff < SizeofHeapTupleHeader (16 < 23)
+0|7|7744|1|58|||t_hoff < SizeofHeapTupleHeader (21 < 23)
+0|7|7744|1|58|||t_hoff not max-aligned (21)
+0|8|7680|1|58|||relation natts < tuple natts (3 < 2047)
+0|9|7616|1|58|||SizeofHeapTupleHeader + BITMAPLEN(natts) > t_hoff (23 + 256 > 24)
+0|10|7552|1|58|||relation natts < tuple natts (3 < 67)
+0|11|7488|1|58|2||t_hoff + offset > lp_len (24 + 416847976 > 58)
+0|12|7424|1|58|2|0|final chunk number differs from expected (0 vs. 6)
+0|12|7424|1|58|2|0|toasted value missing from toast table",
+"Expected verify_heapam output");
+
+$node->teardown_node;
+$node->clean_node;
+
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..5d547f2ff9
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,966 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam() execution.
+ */
+typedef struct HeapCheckContext
+{
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Values for returning tuples */
+ bool is_corrupt; /* have we encountered any corruption? */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Public API */
+Datum verify_heapam(PG_FUNCTION_ARGS);
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext * ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx);
+static bool check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx);
+static bool check_tuple_attribute(HeapCheckContext * ctx);
+static void check_tuple(HeapCheckContext * ctx);
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+#define HEAPCHECK_RELATION_COLS 8
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ bool randomAccess;
+ HeapCheckContext ctx;
+ FullTransactionId nextFullXid;
+
+ Oid relid = PG_GETARG_OID(0);
+ bool stop_on_error = PG_GETARG_BOOL(1);
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those. This protection saves us from
+ * having to reacquire the locks and recheck those minimums for every
+ * tuple, which would be expensive.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance. We keep a cached copy of the oldest valid xid that we may
+ * encounter in the table, which is relfrozenxid if valid, and oldestXid
+ * otherwise.
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldestValidXid = ctx.relfrozenxid;
+
+ /* check all blocks of the relation */
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ for (ctx.blkno = 0; ctx.blkno < ctx.nblocks; ctx.blkno++)
+ {
+ OffsetNumber maxoff;
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ ctx.tuphdr = NULL;
+ ctx.natts = 0;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = 0; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead/redirected line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) ||
+ ItemIdIsDead(ctx.itemid) ||
+ ItemIdIsRedirected(ctx.itemid))
+ continue;
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (stop_on_error && ctx.is_corrupt)
+ break;
+ }
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, "
+ "or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap AM",
+ RelationGetRelationName(rel))));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext * ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext * ctx)
+{
+ return (TransactionIdPrecedes(ctx->oldestValidXid, xid) &&
+ TransactionIdPrecedes(xid, ctx->nextKnownValidXid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->nextKnownValidXid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * check_tuphdr_xids
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!TransactionIdValidInRel(xvac, ctx))
+ {
+ confess(ctx, psprintf("tuple xvac = %u invalid", xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return false; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx,
+ pstrdup("heap tuple with XMAX_IS_MULTI is "
+ "neither LOCKED_ONLY nor has a "
+ "valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequencenumber is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx, pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("toast chunk is neither short nor extended"));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "exceeds the end chunk sequence "
+ "number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx, psprintf("chunk size %u differs from "
+ "expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * The caller should have iterated to a tuple via
+ * tupleAttributeIteration_next.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext * ctx)
+{
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask = ctx->tuphdr->t_infomask;
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
+ ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+ ctx->tuphdr->t_hoff, ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_1B_E(tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx, pstrdup("attribute is external but tuple header "
+ "flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx, pstrdup("attribute is external but relation has "
+ "no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
+
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ attr = (struct varlena *) redirect.pointer;
+
+ /* nested indirect Datums aren't allowed */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ confess(ctx, pstrdup("attribute has nested external "
+ "indirect toast pointer"));
+ return true;
+ }
+ }
+
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching
+ * va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast
+ * table, accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx, psprintf("final chunk number differs from "
+ "expected (%u vs. %u)",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx, pstrdup("toasted value missing from "
+ "toast table"));
+ systable_endscan_ordered(toastscan);
+ }
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext * ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relminmxid = %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmin = %u precedes relation "
+ "relfrozenxid = %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmin, ctx))
+ {
+ confess(ctx, psprintf("tuple xmin = %u is in the future",
+ xmin));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relfrozenxid = %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmax, ctx))
+ {
+ confess(ctx, psprintf("tuple xmax = %u is in the future",
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff > lp_len (%u > %u)",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx, psprintf("t_hoff not max-aligned (%u)",
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the natts field or the infomask
+ * bit HEAP_HASNULL.
+ */
+ if (infomask & HEAP_HASNULL &&
+ SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ {
+ confess(ctx, psprintf("SizeofHeapTupleHeader + "
+ "BITMAPLEN(natts) > t_hoff "
+ "(%u + %u > %u)",
+ (unsigned) SizeofHeapTupleHeader,
+ BITMAPLEN(ctx->natts),
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!check_tuphdr_xids(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx,
+ psprintf("relation natts < tuple natts (%u < %u)",
+ RelationGetDescr(ctx->rel)->natts,
+ ctx->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+ ctx->offset = -1;
+ ctx->attnum = -1;
+}
diff --git a/contrib/heapcheck/.gitignore b/contrib/heapcheck/.gitignore
new file mode 100644
index 0000000000..5dcb3ff972
--- /dev/null
+++ b/contrib/heapcheck/.gitignore
@@ -0,0 +1,4 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
diff --git a/contrib/heapcheck/Makefile b/contrib/heapcheck/Makefile
new file mode 100644
index 0000000000..8d780a41ab
--- /dev/null
+++ b/contrib/heapcheck/Makefile
@@ -0,0 +1,25 @@
+# contrib/heapcheck/Makefile
+
+MODULE_big = heapcheck
+OBJS = \
+ $(WIN32RES) \
+ heapcheck.o
+
+EXTENSION = heapcheck
+DATA = heapcheck--1.0.sql
+PGFILEDESC = "heapcheck - page corruption information"
+
+REGRESS = 001_create_extension 002_disallowed_reltypes
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/heapcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/heapcheck/expected/001_create_extension.out b/contrib/heapcheck/expected/001_create_extension.out
new file mode 100644
index 0000000000..0ca79c22be
--- /dev/null
+++ b/contrib/heapcheck/expected/001_create_extension.out
@@ -0,0 +1 @@
+create extension heapcheck;
diff --git a/contrib/heapcheck/heapcheck--1.0.sql b/contrib/heapcheck/heapcheck--1.0.sql
new file mode 100644
index 0000000000..48251e6781
--- /dev/null
+++ b/contrib/heapcheck/heapcheck--1.0.sql
@@ -0,0 +1,21 @@
+/* contrib/heapcheck/heapcheck--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION heapcheck" to load this file. \quit
+
+-- Show visibility map and page-level visibility information for each block.
+CREATE FUNCTION heapcheck_relation(regclass,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'heapcheck_relation'
+LANGUAGE C STRICT;
+REVOKE ALL ON FUNCTION heapcheck_relation(regclass) FROM PUBLIC;
+GRANT EXECUTE ON FUNCTION heapcheck_relation(regclass) TO pg_stat_scan_tables;
diff --git a/contrib/heapcheck/heapcheck.control b/contrib/heapcheck/heapcheck.control
new file mode 100644
index 0000000000..23b076169e
--- /dev/null
+++ b/contrib/heapcheck/heapcheck.control
@@ -0,0 +1,5 @@
+# heapcheck extension
+comment = 'examine relations for corruption'
+default_version = '1.0'
+module_pathname = '$libdir/heapcheck'
+relocatable = true
diff --git a/contrib/heapcheck/sql/001_create_extension.sql b/contrib/heapcheck/sql/001_create_extension.sql
new file mode 100644
index 0000000000..0ca79c22be
--- /dev/null
+++ b/contrib/heapcheck/sql/001_create_extension.sql
@@ -0,0 +1 @@
+create extension heapcheck;
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 75518a7820..6bf3110bb3 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -165,6 +165,108 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ stop_on_error boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if stop_on_error is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the corruption
+ is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
--
2.21.1 (Apple Git-122.3)
I wonder if a mode where heapcheck optionally would only checks
non-frozen (perhaps also non-all-visible) regions of a table would be a
good idea?
Version 4 of this patch now includes boolean options skip_all_frozen and skip_all_visible.
Would make it a lot more viable to run this regularly on
bigger databases. Even if there's a window to not check some data
(because it's frozen before the next heapcheck run).
Do you think it would make sense to have the amcheck contrib module have, in addition to the SQL queriable functions, a bgworker based mode that periodically checks your database? The work along those lines is not included in v4, but if it were part of v5, would you have specific design preferences?
Attachments:
v4-0001-Adding-verify_heapam-to-amcheck-contrib-module.patchapplication/octet-stream; name=v4-0001-Adding-verify_heapam-to-amcheck-contrib-module.patch; x-unix-mode=0644Download
From 3b53c426d85167a9b412e07c6cb77851bbfeeea5 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 28 Apr 2020 15:18:48 -0700
Subject: [PATCH v4] Adding verify_heapam to amcheck contrib module.
Adding a new function for checking a heap relation and associated
toast relation, if any, for corruption.
The postgres backend already defends against certain forms of
corruption, by checking the page header of each page before allowing
it into the page cache, and by checking the page checksum, if
enabled. Experience shows that broken or ill-conceived backup and
restore mechanisms can result in a page, or an entire file, being
overwritten with an earlier version of itself, restored from backup.
Pages thus overwritten will appear to have valid page headers and
checksums, while potentially containing xmin, xmax, and toast
pointers that are invalid.
contrib/amcheck now has a function, verify_heapam, that takes a
regclass argument, scans the given heap relation, and returns rows
containing information about corruption found within the table. The
main focus of the scan is to find invalid xmin, xmax, and toast
pointer values. It also checks for structural corruption within the
page (such as invalid t_hoff values) that could lead to the backend
aborting should the function blindly trust the data as it finds it.
A second boolean argument, stop_on_error, can be used to return
after the first corrupt page is detected.
---
contrib/amcheck/Makefile | 8 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 31 +
contrib/amcheck/amcheck.c | 20 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/amcheck.h | 6 +
.../amcheck/expected/disallowed_reltypes.out | 27 +
contrib/amcheck/sql/disallowed_reltypes.sql | 29 +
contrib/amcheck/t/skipping.pl | 100 ++
contrib/amcheck/t/verify_heapam.pl | 389 +++++++
contrib/amcheck/verify_heapam.c | 982 ++++++++++++++++++
contrib/amcheck/verify_nbtree.c | 6 +-
doc/src/sgml/amcheck.sgml | 106 +-
12 files changed, 1697 insertions(+), 9 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.c
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/skipping.pl
create mode 100644 contrib/amcheck/t/verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..5e816394d3 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,17 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ amcheck.o \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..aa3ae32441
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,31 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(rel regclass,
+ stop_on_error boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C STRICT;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean, boolean, boolean)
+FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.c b/contrib/amcheck/amcheck.c
new file mode 100644
index 0000000000..e9e669a136
--- /dev/null
+++ b/contrib/amcheck/amcheck.c
@@ -0,0 +1,20 @@
+/*-------------------------------------------------------------------------
+ *
+ * amcheck.c
+ * Verifies the integrity of objects within a database.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/amcheck/amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "fmgr.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(bt_index_check);
+PG_FUNCTION_INFO_V1(bt_index_parent_check);
+PG_FUNCTION_INFO_V1(verify_heapam);
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..ba0102eb3e
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,6 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
+
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..65064afccf
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,27 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000)) partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned', false, false, false);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index', false, false, false);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view', false, false, false);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence', false, false, false);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table', false, false, false);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..a2411b0641
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,29 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000)) partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned', false, false, false);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index', false, false, false);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view', false, false, false);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence', false, false, false);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table', false, false, false);
+
+
diff --git a/contrib/amcheck/t/skipping.pl b/contrib/amcheck/t/skipping.pl
new file mode 100644
index 0000000000..e333ab3d71
--- /dev/null
+++ b/contrib/amcheck/t/skipping.pl
@@ -0,0 +1,100 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 43;
+
+my ($node, $result);
+
+# Check various options are stable (don't abort) when running verify_heapam on
+# the test table. For uncorrupted tables, there isn't anything to check except
+# that it runs without crashing.
+sub check_all_options
+{
+ my @checks = (
+ "SELECT verify_heapam('test', false, false, false)",
+ "SELECT verify_heapam('test', false, false, true )",
+ "SELECT verify_heapam('test', false, true, false)",
+ "SELECT verify_heapam('test', false, true, true )",
+ "SELECT verify_heapam('test', true, false, false)",
+ "SELECT verify_heapam('test', true, false, true )",
+ "SELECT verify_heapam('test', true, true, false)",
+ "SELECT verify_heapam('test', true, true, true )",
+ );
+ for my $check (@checks)
+ {
+ $result = $node->safe_psql('postgres', "$check; SELECT 1");
+ is ($result, 1, "checked: $check");
+ }
+}
+
+# Stops the server and writes nulls in the first page of the table,
+# assuming page size is large enough for offset 1000..1016 to be
+# in the midst of the first page of data.
+sub corrupt_first_page
+{
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('test')));
+ my $relpath = "$pgdata/$rel";
+ $node->stop;
+
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 1000, 0);
+ syswrite($fh, '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', 16);
+ close($fh);
+
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Check empty table
+$node->safe_psql('postgres', q(
+ CREATE TABLE test (a integer);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+));
+check_all_options();
+
+# Check table with trivial data
+$node->safe_psql('postgres', q(INSERT INTO test VALUES (0)));
+check_all_options();
+
+# Check table with non-trivial data (more than a page worth) but
+# without any all frozen or all visible
+$node->safe_psql('postgres', q(
+INSERT INTO test SELECT generate_series(1,10000)));
+check_all_options();
+
+# Check table with all-visible data
+$node->safe_psql('postgres', q(VACUUM test));
+check_all_options();
+
+# Check table with all-frozen data
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+check_all_options();
+
+# Check table with corruption, no skipping
+corrupt_first_page();
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', false, false, false)));
+is($result, 't', 'corruption detected on first page');
+
+# Check table with corruption, skipping all visible blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', false, false, true)));
+is($result, 'f', 'skipping all visible first page');
+
+# Check table with corruption, skipping all frozen blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', false, true, false)));
+is($result, 'f', 'skipping all frozen first page');
diff --git a/contrib/amcheck/t/verify_heapam.pl b/contrib/amcheck/t/verify_heapam.pl
new file mode 100644
index 0000000000..ba50c1b35c
--- /dev/null
+++ b/contrib/amcheck/t/verify_heapam.pl
@@ -0,0 +1,389 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More;
+
+# This regression test demonstrates that the verify_heapam() function
+# supplied with this contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot be
+# expected to help us with this, as its design is not consistent with the goal
+# of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# This formatting produces heap pages where each tuple is 58 bytes long, padded
+# out to 64 bytes for alignment, with the first one on the page starting at
+# offset 8128, as follows:
+#
+# [ lp_off: 8128 lp_len: 58]
+# [ lp_off: 8064 lp_len: 58]
+# [ lp_off: 8000 lp_len: 58]
+# [ lp_off: 7936 lp_len: 58]
+# [ lp_off: 7872 lp_len: 58]
+# [ lp_off: 7808 lp_len: 58]
+# ...
+#
+
+use constant LP_OFF_BEGIN => 8128;
+use constant LP_OFF_DELTA => 64;
+
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+my ($result, $node);
+
+# Set up the node and test table.
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ ));
+
+$result = $node->safe_psql('postgres', q(SHOW block_size));
+if ($result != 8192)
+{
+ plan skip_all => 'Only default 8192 byte block size supported by this test';
+ $node->teardown_node;
+ $node->clean_node;
+ exit;
+}
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 12;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+# If we (this regression test) are being run on a system with different alignment
+# our offsets into the page may be wrong. Rather than automatically configuring
+# for different alignment sizes, we just skip the test if the aligments aren't
+# what we expect.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $offset = LP_OFF_BEGIN, my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++, $offset -= LP_OFF_DELTA)
+{
+ my $tup = read_tuple($file, $offset);
+
+ # Verify the data appears to be where we would expect on the page. If alignment
+ # issues have caused data to be placed elsewhere, we should be able to tell.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ plan skip_all => 'Page layout differs from our expectations';
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+plan tests => 1;
+
+$result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', false, false, false)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin = 3 precedes relation relfrozenxid = $relfrozenxid
+0|2|8064|1|58|||tuple xmin = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|3|8000|1|58|||tuple xmax = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|4|7936|1|58|||t_hoff > lp_len (152 > 58)
+0|5|7872|1|58|||t_hoff not max-aligned (27)
+0|6|7808|1|58|||t_hoff < SizeofHeapTupleHeader (16 < 23)
+0|7|7744|1|58|||t_hoff < SizeofHeapTupleHeader (21 < 23)
+0|7|7744|1|58|||t_hoff not max-aligned (21)
+0|8|7680|1|58|||relation natts < tuple natts (3 < 2047)
+0|9|7616|1|58|||SizeofHeapTupleHeader + BITMAPLEN(natts) > t_hoff (23 + 256 > 24)
+0|10|7552|1|58|||relation natts < tuple natts (3 < 67)
+0|11|7488|1|58|2||t_hoff + offset > lp_len (24 + 416847976 > 58)
+0|12|7424|1|58|2|0|final chunk number differs from expected (0 vs. 6)
+0|12|7424|1|58|2|0|toasted value missing from toast table",
+"Expected verify_heapam output");
+
+$node->teardown_node;
+$node->clean_node;
+
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..a240a9a015
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,982 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "amcheck.h"
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam() execution.
+ */
+typedef struct HeapCheckContext
+{
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Values for returning tuples */
+ bool is_corrupt; /* have we encountered any corruption? */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext * ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx);
+static bool check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx);
+static bool check_tuple_attribute(HeapCheckContext * ctx);
+static void check_tuple(HeapCheckContext * ctx);
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+#define HEAPCHECK_RELATION_COLS 8
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ bool randomAccess;
+ HeapCheckContext ctx;
+ FullTransactionId nextFullXid;
+ Buffer vmbuffer = InvalidBuffer;
+
+ Oid relid = PG_GETARG_OID(0);
+ bool stop_on_error = PG_GETARG_BOOL(1);
+ bool skip_all_frozen = PG_GETARG_BOOL(2);
+ bool skip_all_visible = PG_GETARG_BOOL(3);
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those. This protection saves us from
+ * having to reacquire the locks and recheck those minimums for every
+ * tuple, which would be expensive.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance. We keep a cached copy of the oldest valid xid that we may
+ * encounter in the table, which is relfrozenxid if valid, and oldestXid
+ * otherwise.
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldestValidXid = ctx.relfrozenxid;
+
+ /* check all blocks of the relation */
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ for (ctx.blkno = 0; ctx.blkno < ctx.nblocks; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ ctx.tuphdr = NULL;
+ ctx.natts = 0;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = 0; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead/redirected line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) ||
+ ItemIdIsDead(ctx.itemid) ||
+ ItemIdIsRedirected(ctx.itemid))
+ continue;
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (stop_on_error && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, "
+ "or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap AM",
+ RelationGetRelationName(rel))));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext * ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext * ctx)
+{
+ return (TransactionIdPrecedes(ctx->oldestValidXid, xid) &&
+ TransactionIdPrecedes(xid, ctx->nextKnownValidXid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->nextKnownValidXid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * check_tuphdr_xids
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!TransactionIdValidInRel(xvac, ctx))
+ {
+ confess(ctx, psprintf("tuple xvac = %u invalid", xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return false; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx,
+ pstrdup("heap tuple with XMAX_IS_MULTI is "
+ "neither LOCKED_ONLY nor has a "
+ "valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequencenumber is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx, pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("toast chunk is neither short nor extended"));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "exceeds the end chunk sequence "
+ "number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx, psprintf("chunk size %u differs from "
+ "expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * The caller should have iterated to a tuple via
+ * tupleAttributeIteration_next.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext * ctx)
+{
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask = ctx->tuphdr->t_infomask;
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
+ ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+ ctx->tuphdr->t_hoff, ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_1B_E(tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx, pstrdup("attribute is external but tuple header "
+ "flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx, pstrdup("attribute is external but relation has "
+ "no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
+
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ attr = (struct varlena *) redirect.pointer;
+
+ /* nested indirect Datums aren't allowed */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ confess(ctx, pstrdup("attribute has nested external "
+ "indirect toast pointer"));
+ return true;
+ }
+ }
+
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching
+ * va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast
+ * table, accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx, psprintf("final chunk number differs from "
+ "expected (%u vs. %u)",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx, pstrdup("toasted value missing from "
+ "toast table"));
+ systable_endscan_ordered(toastscan);
+ }
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext * ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relminmxid = %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmin = %u precedes relation "
+ "relfrozenxid = %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmin, ctx))
+ {
+ confess(ctx, psprintf("tuple xmin = %u is in the future",
+ xmin));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relfrozenxid = %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmax, ctx))
+ {
+ confess(ctx, psprintf("tuple xmax = %u is in the future",
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff > lp_len (%u > %u)",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx, psprintf("t_hoff not max-aligned (%u)",
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the natts field or the infomask
+ * bit HEAP_HASNULL.
+ */
+ if (infomask & HEAP_HASNULL &&
+ SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ {
+ confess(ctx, psprintf("SizeofHeapTupleHeader + "
+ "BITMAPLEN(natts) > t_hoff "
+ "(%u + %u > %u)",
+ (unsigned) SizeofHeapTupleHeader,
+ BITMAPLEN(ctx->natts),
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!check_tuphdr_xids(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx,
+ psprintf("relation natts < tuple natts (%u < %u)",
+ RelationGetDescr(ctx->rel)->natts,
+ ctx->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+ ctx->offset = -1;
+ ctx->attnum = -1;
+}
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 857759bdcb..377016f25b 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -38,10 +38,9 @@
#include "storage/smgr.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
+#include "amcheck.h"
-PG_MODULE_MAGIC;
-
/*
* A B-Tree cannot possibly have this many levels, since there must be one
* block per level, which is bound by the range of BlockNumber:
@@ -133,9 +132,6 @@ typedef struct BtreeLevel
bool istruerootlevel;
} BtreeLevel;
-PG_FUNCTION_INFO_V1(bt_index_check);
-PG_FUNCTION_INFO_V1(bt_index_parent_check);
-
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
bool heapallindexed, bool rootdescend);
static inline void btree_index_checkable(Relation rel);
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 75518a7820..dc12f2ab5a 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -69,7 +69,7 @@ AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
@@ -165,6 +165,110 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ stop_on_error boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if stop_on_error is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
--
2.21.1 (Apple Git-122.3)
On Wed, Apr 29, 2020 at 12:30 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Do you think it would make sense to have the amcheck contrib module have, in addition to the SQL queriable functions, a bgworker based mode that periodically checks your database? The work along those lines is not included in v4, but if it were part of v5, would you have specific design preferences?
-1 on that idea from me. That sounds like it's basically building
"cron" into PostgreSQL, but in a way that can only be used by amcheck.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Apr 22, 2020 at 10:43 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
It's simple enough to extend the tap test a little to check for those things. In v3, the tap test skips tests if the page size is not 8k, and also if the tuples do not fall on the page where expected (which would happen due to alignment issues, gremlins, or whatever.).
Skipping the test if the tuple isn't in the expected location sounds
really bad. That will just lead to the tests passing without actually
doing anything. If the tuple isn't in the expected location, the tests
should fail.
There are other approaches, though. The HeapFile/HeapPage/HeapTuple perl modules recently submitted on another thread *could* be used here, but only if those modules are likely to be committed.
Yeah, I don't know if we want that stuff or not.
This test *could* be extended to autodetect the page size and alignment issues and calculate at runtime where tuples will be on the page, but only if folks don't mind the test having that extra complexity in it. (There is a school of thought that regression tests should avoid excess complexity.). Do you have a recommendation about which way to go with this?
How much extra complexity are we talking about? It feels to me like
for a heap page, the only things that are going to affect the position
of the tuples on the page -- supposing we know the tuple size -- are
the page size and, I think, MAXALIGN, and that doesn't sound too bad.
Another possibility is to use pageinspect's heap_page_items() to
determine the position within the page (lp_off), which seems like it
might simplify things considerably. Then, we're entirely relying on
the backend to tell us where the tuples are, and we only need to worry
about the offsets relative to the start of the tuple.
I kind of like that approach, because it doesn't involve having Perl
code that knows how heap pages are laid out; we rely entirely on the C
code for that. I'm not sure if it'd be a problem to have a TAP test
for one contrib module that uses another contrib module, but maybe
there's some way to figure that problem out.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Apr 29, 2020 at 12:30 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Version 4 of this patch now includes boolean options skip_all_frozen and skip_all_visible.
I'm not sure sure, but maybe there should just be one argument with
three possible values, because skip_all_frozen = true and
skip_all_visible = false seems nonsensical. On the other hand, if we
used a text argument with three possible values, I'm not sure what
we'd call the argument or what strings we'd use as the values.
Also, what do people -- either those who have already responded, or
others -- think about the idea of putting a command-line tool around
this? I know that there were some rumblings about this in respect to
pg_verifybackup, but I think a pg_amcheck binary would be
well-received. It could do some interesting things, too. For instance,
it could query pg_class for a list of relations that amcheck would
know how to check, and then issue a separate query for each relation,
which would avoid holding a snapshot or heavyweight locks across the
whole operation. It could do parallelism across relations by opening
multiple connections, or even within a single relation if -- as I
think would be a good idea -- we extended heapcheck to take a range of
block numbers after the style of pg_prewarm.
Apart from allowing for client-driven parallelism, accepting block
number ranges would have the advantage -- IMHO pretty significant --
of making it far easier to use this on a relation where some blocks
are entirely unreadable. You could specify ranges to check out the
remaining blocks.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Apr 29, 2020, at 11:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Apr 22, 2020 at 10:43 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:It's simple enough to extend the tap test a little to check for those things. In v3, the tap test skips tests if the page size is not 8k, and also if the tuples do not fall on the page where expected (which would happen due to alignment issues, gremlins, or whatever.).
Skipping the test if the tuple isn't in the expected location sounds
really bad. That will just lead to the tests passing without actually
doing anything. If the tuple isn't in the expected location, the tests
should fail.There are other approaches, though. The HeapFile/HeapPage/HeapTuple perl modules recently submitted on another thread *could* be used here, but only if those modules are likely to be committed.
Yeah, I don't know if we want that stuff or not.
This test *could* be extended to autodetect the page size and alignment issues and calculate at runtime where tuples will be on the page, but only if folks don't mind the test having that extra complexity in it. (There is a school of thought that regression tests should avoid excess complexity.). Do you have a recommendation about which way to go with this?
How much extra complexity are we talking about?
The page size is easy to query, and the test already does so, skipping if the answer isn't 8k. The test could recalculate offsets based on the pagesize rather than skipping the test easily enough, but the MAXALIGN stuff is a little harder. I don't know (perhaps someone would share?) how to easily query that from within a perl test. So the test could guess all possible alignments that occur in the real world, read from the page at the offset that alignment would create, and check if the expected datum is there. The test would have to be careful to avoid false positives, by placing data before and after the datum being checked with bit patterns that cannot be misinterpreted as a match. That level of complexity seems unappealing, at least to me. It's not hard to write, but maintaining stuff like that is an unwelcome burden.
It feels to me like
for a heap page, the only things that are going to affect the position
of the tuples on the page -- supposing we know the tuple size -- are
the page size and, I think, MAXALIGN, and that doesn't sound too bad.
Another possibility is to use pageinspect's heap_page_items() to
determine the position within the page (lp_off), which seems like it
might simplify things considerably. Then, we're entirely relying on
the backend to tell us where the tuples are, and we only need to worry
about the offsets relative to the start of the tuple.I kind of like that approach, because it doesn't involve having Perl
code that knows how heap pages are laid out; we rely entirely on the C
code for that. I'm not sure if it'd be a problem to have a TAP test
for one contrib module that uses another contrib module, but maybe
there's some way to figure that problem out.
Yeah, I'll give this a try.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Here is v5 of the patch. Major changes in this version include:
1) A new module, pg_amcheck, which includes a command line client for checking a database or subset of a database. Internally it functions by querying the database for a list of tables which are appropriate given the command line switches, and then calls amcheck's functions to validate each table and/or index. The options for selecting/excluding tables and schemas is patterned on pg_dump, on the assumption that interface is already familiar to users.
2) amcheck's btree checking functions have been refactored to be able to operate in two modes; the original mode in which all errors are reported via ereport, and a new mode for returning errors as rows from a set returning function. The new mode is used by a new function verify_btreeam(), analogous to verify_heapam(), both of which are used by the pg_amcheck command line tool.
3) The regression test which generates corruption within a table uses the pageinspect module to determine the location of each tuple on disk for corrupting. This was suggested upthread.
Testing on the command line shows that the pre-existing btree checking code could use some hardening, as it currently crashes the backend on certain corruptions. When I corrupt relation files for tables and indexes in the backend and then use pg_amcheck to check all objects in the database, I keep getting assertions from the btree checking code. I think I need to harden this code, but wanted to post an updated patch and solicit opinions before doing so. Here are some example problems I'm seeing. Note the stack trace when calling from the command line tool includes the new verify_btreeam function, but you can get the same crashes using the old interface via psql:
From psql, first error:
test=# select bt_index_parent_check('corrupted_idx', true, true);
TRAP: FailedAssertion("_bt_check_natts(rel, key->heapkeyspace, page, offnum)", File: "nbtsearch.c", Line: 663)
0 postgres 0x0000000106872977 ExceptionalCondition + 103
1 postgres 0x00000001063a33e2 _bt_compare + 1090
2 amcheck.so 0x0000000106d62921 bt_target_page_check + 6033
3 amcheck.so 0x0000000106d5fd2f bt_index_check_internal + 2847
4 amcheck.so 0x0000000106d60433 bt_index_parent_check + 67
5 postgres 0x00000001064d6762 ExecInterpExpr + 1634
6 postgres 0x000000010650d071 ExecResult + 321
7 postgres 0x00000001064ddc3d standard_ExecutorRun + 301
8 postgres 0x00000001066600c5 PortalRunSelect + 389
9 postgres 0x000000010665fc7f PortalRun + 527
10 postgres 0x000000010665ed59 exec_simple_query + 1641
11 postgres 0x000000010665c99d PostgresMain + 3661
12 postgres 0x00000001065d6a8a BackendRun + 410
13 postgres 0x00000001065d61c4 ServerLoop + 3044
14 postgres 0x00000001065d2fe9 PostmasterMain + 3769
15 postgres 0x000000010652e3b0 help + 0
16 libdyld.dylib 0x00007fff6725fcc9 start + 1
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2020-05-11 10:11:47.394 PDT [41091] LOG: server process (PID 41309) was terminated by signal 6: Abort trap: 6
From commandline, second error:
pgtest % pg_amcheck -i test
(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
tuple xmin = 3289393 is in the future
(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
tuple xmax = 0 precedes relation relminmxid = 1
(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
tuple xmin = 12593 is in the future
(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
<snip>
(relname=corrupted,blkno=107,offnum=20,lp_off=7392,lp_flags=1,lp_len=34,attnum=,chunk=)
tuple xmin = 306 precedes relation relfrozenxid = 487
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
tuple xmax = 0 precedes relation relminmxid = 1
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
tuple xmin = 305 precedes relation relfrozenxid = 487
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
t_hoff > lp_len (54 > 34)
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
t_hoff not max-aligned (54)
TRAP: FailedAssertion("TransactionIdIsValid(xmax)", File: "heapam_visibility.c", Line: 1319)
0 postgres 0x0000000105b22977 ExceptionalCondition + 103
1 postgres 0x0000000105636e86 HeapTupleSatisfiesVacuum + 1158
2 postgres 0x0000000105634aa1 heapam_index_build_range_scan + 1089
3 amcheck.so 0x00000001060100f3 bt_index_check_internal + 3811
4 amcheck.so 0x000000010601057c verify_btreeam + 316
5 postgres 0x0000000105796266 ExecMakeTableFunctionResult + 422
6 postgres 0x00000001057a8c35 FunctionNext + 101
7 postgres 0x00000001057bbf3e ExecNestLoop + 478
8 postgres 0x000000010578dc3d standard_ExecutorRun + 301
9 postgres 0x00000001059100c5 PortalRunSelect + 389
10 postgres 0x000000010590fc7f PortalRun + 527
11 postgres 0x000000010590ed59 exec_simple_query + 1641
12 postgres 0x000000010590c99d PostgresMain + 3661
13 postgres 0x0000000105886a8a BackendRun + 410
14 postgres 0x00000001058861c4 ServerLoop + 3044
15 postgres 0x0000000105882fe9 PostmasterMain + 3769
16 postgres 0x00000001057de3b0 help + 0
17 libdyld.dylib 0x00007fff6725fcc9 start + 1
pg_amcheck: error: query failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Attachments:
v5-0001-Adding-verify_heapam-and-pg_amcheck.patchapplication/octet-stream; name=v5-0001-Adding-verify_heapam-and-pg_amcheck.patch; x-unix-mode=0644Download
From 4689b57d58de56af9968952430a8f61213fdb56c Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 11 May 2020 06:55:04 -0700
Subject: [PATCH v5] Adding verify_heapam and pg_amcheck
Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it. This is
used by the new pg_amcheck command line tool for reporting back to
the caller.
---
contrib/Makefile | 1 +
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 54 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/amcheck.h | 5 +
contrib/amcheck/expected/check_btree.out | 31 +
contrib/amcheck/expected/check_heap.out | 58 +
.../amcheck/expected/disallowed_reltypes.out | 48 +
contrib/amcheck/sql/check_btree.sql | 10 +
contrib/amcheck/sql/check_heap.sql | 34 +
contrib/amcheck/sql/disallowed_reltypes.sql | 50 +
contrib/amcheck/t/skipping.pl | 101 ++
contrib/amcheck/verify_heapam.c | 1024 +++++++++++++++++
contrib/amcheck/verify_nbtree.c | 773 +++++++------
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 884 ++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 57 +
contrib/pg_amcheck/t/003_check.pl | 87 ++
contrib/pg_amcheck/t/004_verify_heapam.pl | 408 +++++++
doc/src/sgml/amcheck.sgml | 106 +-
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pg_amcheck.sgml | 136 +++
25 files changed, 3574 insertions(+), 344 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/skipping.pl
create mode 100644 contrib/amcheck/verify_heapam.c
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 doc/src/sgml/pg_amcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..c21c27cbeb 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..27d38b2e86 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..2ab7d8b0d2
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,54 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(rel regclass,
+ on_error_stop boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
+
+--
+-- verify_btreeam()
+--
+CREATE FUNCTION verify_btreeam(rel regclass,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+CREATE FUNCTION verify_btreeam(rel regclass,
+ on_error_stop boolean,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_btreeam(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION verify_btreeam(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..74edfc2f65
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index f82f48d23b..c1acf238d7 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -21,6 +21,8 @@ SELECT bt_index_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_check
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_parent_check
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
-- to run this cluster-wide with a restricted account, and as tested
@@ -29,6 +31,7 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -42,23 +45,31 @@ SELECT bt_index_parent_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
ERROR: "bttest_a" is not an index
SELECT bt_index_parent_check('bttest_a');
ERROR: "bttest_a" is not an index
+SELECT * FROM verify_btreeam('bttest_a');
+ERROR: "bttest_a" is not an index
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
ERROR: could not open relation with OID 17
SELECT bt_index_parent_check(17);
ERROR: could not open relation with OID 17
+SELECT * FROM verify_btreeam(17);
+ERROR: could not open relation with OID 17
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
ERROR: only B-Tree indexes are supported as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
@@ -67,6 +78,11 @@ SELECT bt_index_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
bt_index_check
@@ -93,6 +109,11 @@ SELECT bt_index_parent_check('bttest_b_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -118,6 +139,11 @@ SELECT bt_index_check('bttest_multi_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
bt_index_parent_check
@@ -134,6 +160,11 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
-- checks
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..6d30ca8023
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,58 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..892ae89652
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_btree.sql b/contrib/amcheck/sql/check_btree.sql
index a1fef644cb..f5d0f8c1f6 100644
--- a/contrib/amcheck/sql/check_btree.sql
+++ b/contrib/amcheck/sql/check_btree.sql
@@ -24,6 +24,7 @@ CREATE ROLE regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx'::regclass);
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
@@ -33,27 +34,33 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
SELECT bt_index_parent_check('bttest_a');
+SELECT * FROM verify_btreeam('bttest_a');
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
SELECT bt_index_parent_check(17);
+SELECT * FROM verify_btreeam(17);
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
SELECT bt_index_parent_check('bttest_b_idx', true);
@@ -61,6 +68,7 @@ SELECT bt_index_parent_check('bttest_b_idx', true);
BEGIN;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_b_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -74,6 +82,7 @@ SELECT bt_index_check('bttest_a_idx', true);
-- normal check outside of xact for index with included columns
SELECT bt_index_check('bttest_multi_idx');
+SELECT * FROM verify_btreeam('bttest_multi_idx');
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
@@ -81,6 +90,7 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
TRUNCATE bttest_multi;
INSERT INTO bttest_multi SELECT i, i%2 FROM generate_series(1, 100000) as i;
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
+SELECT * FROM verify_btreeam('bttest_multi_idx');
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..5759d5526e
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,34 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..bf6171a353
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,50 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+
diff --git a/contrib/amcheck/t/skipping.pl b/contrib/amcheck/t/skipping.pl
new file mode 100644
index 0000000000..8b2a1033b5
--- /dev/null
+++ b/contrib/amcheck/t/skipping.pl
@@ -0,0 +1,101 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 183;
+
+my ($node, $result);
+
+# Check various options are stable (don't abort) when running verify_heapam on
+# the test table. For uncorrupted tables, there isn't anything to check except
+# that it runs without crashing.
+sub check_all_options
+{
+ for my $stop (qw(NULL true false))
+ {
+ for my $skip ("NULL", "'all frozen'", "'all visible'")
+ {
+ for my $startblock (qw(NULL 5))
+ {
+ for my $endblock (qw(NULL 10))
+ {
+ my $check = "SELECT verify_heapam('test', $stop, $skip, " .
+ "$startblock, $endblock)";
+ $result = $node->safe_psql('postgres', "$check; SELECT 1");
+ is ($result, 1, "checked: $check");
+ }
+ }
+ }
+ }
+}
+
+# Stops the server and writes nulls in the first page of the table,
+# assuming page size is large enough for offset 1000..1016 to be
+# in the midst of the first page of data.
+sub corrupt_first_page
+{
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('test')));
+ my $relpath = "$pgdata/$rel";
+ $node->stop;
+
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 1000, 0);
+ syswrite($fh, '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', 16);
+ close($fh);
+
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Check empty table
+$node->safe_psql('postgres', q(
+ CREATE TABLE test (a integer);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+));
+check_all_options();
+
+# Check table with trivial data
+$node->safe_psql('postgres', q(INSERT INTO test VALUES (0)));
+check_all_options();
+
+# Check table with non-trivial data (more than a page worth) but
+# without any all frozen or all visible
+$node->safe_psql('postgres', q(
+INSERT INTO test SELECT generate_series(1,10000)));
+check_all_options();
+
+# Check table with all-visible data
+$node->safe_psql('postgres', q(VACUUM test));
+check_all_options();
+
+# Check table with all-frozen data
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+check_all_options();
+
+# Check table with corruption, no skipping
+corrupt_first_page();
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is($result, 't', 'corruption detected on first page');
+
+# Check table with corruption, skipping all visible blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all visible', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all visible first page');
+
+# Check table with corruption, skipping all frozen blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all frozen', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all frozen first page');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..1bddff7fc6
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1024 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "amcheck.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam() execution.
+ */
+typedef struct HeapCheckContext
+{
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Values for returning tuples */
+ bool is_corrupt; /* have we encountered any corruption? */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext * ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx);
+static bool check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx);
+static bool check_tuple_attribute(HeapCheckContext * ctx);
+static void check_tuple(HeapCheckContext * ctx);
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+#define HEAPCHECK_RELATION_COLS 8
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ bool randomAccess;
+ HeapCheckContext ctx;
+ FullTransactionId nextFullXid;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool skip_all_frozen = false;
+ bool skip_all_visible = false;
+ int64 startblock = -1;
+ int64 endblock = -1;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ relid = PG_GETARG_OID(0);
+ on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+ if (!PG_ARGISNULL(2))
+ {
+ const char *skip = PG_GETARG_CSTRING(2);
+
+ if (pg_strcasecmp(skip, "all visible") == 0)
+ {
+ skip_all_visible = true;
+ }
+ else if (pg_strcasecmp(skip, "all frozen") == 0)
+ {
+ skip_all_visible = true;
+ skip_all_frozen = true;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all visible', 'all frozen', "
+ "or NULL")));
+ }
+ }
+ if (!PG_ARGISNULL(3))
+ startblock = PG_GETARG_INT64(3);
+ if (!PG_ARGISNULL(4))
+ endblock = PG_GETARG_INT64(4);
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those. This protection saves us from
+ * having to reacquire the locks and recheck those minimums for every
+ * tuple, which would be expensive.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance. We keep a cached copy of the oldest valid xid that we may
+ * encounter in the table, which is relfrozenxid if valid, and oldestXid
+ * otherwise.
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldestValidXid = ctx.relfrozenxid;
+
+ /* check all blocks of the relation */
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
+
+ for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ ctx.tuphdr = NULL;
+ ctx.natts = 0;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = 0; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead/redirected line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) ||
+ ItemIdIsDead(ctx.itemid) ||
+ ItemIdIsRedirected(ctx.itemid))
+ continue;
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, "
+ "or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap AM",
+ RelationGetRelationName(rel))));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext * ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext * ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldestValidXid, xid) &&
+ TransactionIdPrecedes(xid, ctx->nextKnownValidXid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->nextKnownValidXid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * check_tuphdr_xids
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!TransactionIdValidInRel(xvac, ctx))
+ {
+ confess(ctx, psprintf("tuple xvac = %u invalid", xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return false; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx,
+ pstrdup("heap tuple with XMAX_IS_MULTI is "
+ "neither LOCKED_ONLY nor has a "
+ "valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequencenumber is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx, pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("toast chunk is neither short nor extended"));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "exceeds the end chunk sequence "
+ "number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx, psprintf("chunk size %u differs from "
+ "expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * The caller should have iterated to a tuple via
+ * tupleAttributeIteration_next.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext * ctx)
+{
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask = ctx->tuphdr->t_infomask;
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
+ ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+ ctx->tuphdr->t_hoff, ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_1B_E(tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx, pstrdup("attribute is external but tuple header "
+ "flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx, pstrdup("attribute is external but relation has "
+ "no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
+
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ attr = (struct varlena *) redirect.pointer;
+
+ /* nested indirect Datums aren't allowed */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ confess(ctx, pstrdup("attribute has nested external "
+ "indirect toast pointer"));
+ return true;
+ }
+ }
+
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching
+ * va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast
+ * table, accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx, psprintf("final chunk number differs from "
+ "expected (%u vs. %u)",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx, pstrdup("toasted value missing from "
+ "toast table"));
+ systable_endscan_ordered(toastscan);
+ }
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext * ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relminmxid = %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmin = %u precedes relation "
+ "relfrozenxid = %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmin, ctx))
+ {
+ confess(ctx, psprintf("tuple xmin = %u is in the future",
+ xmin));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relfrozenxid = %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmax, ctx))
+ {
+ confess(ctx, psprintf("tuple xmax = %u is in the future",
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff > lp_len (%u > %u)",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx, psprintf("t_hoff not max-aligned (%u)",
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the natts field or the infomask
+ * bit HEAP_HASNULL.
+ */
+ if (infomask & HEAP_HASNULL &&
+ SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ {
+ confess(ctx, psprintf("SizeofHeapTupleHeader + "
+ "BITMAPLEN(natts) > t_hoff "
+ "(%u + %u > %u)",
+ (unsigned) SizeofHeapTupleHeader,
+ BITMAPLEN(ctx->natts),
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!check_tuphdr_xids(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx,
+ psprintf("relation natts < tuple natts (%u < %u)",
+ RelationGetDescr(ctx->rel)->natts,
+ ctx->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+ ctx->offset = -1;
+ ctx->attnum = -1;
+}
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index a4c4c09850..b1699118fe 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -32,16 +32,22 @@
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "commands/tablecmds.h"
+#include "funcapi.h"
#include "lib/bloomfilter.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
#include "storage/smgr.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
-
+#include "amcheck.h"
PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(bt_index_check);
+PG_FUNCTION_INFO_V1(bt_index_parent_check);
+PG_FUNCTION_INFO_V1(verify_btreeam);
+
/*
* A B-Tree cannot possibly have this many levels, since there must be one
* block per level, which is bound by the range of BlockNumber:
@@ -50,6 +56,20 @@ PG_MODULE_MAGIC;
#define BTreeTupleGetNKeyAtts(itup, rel) \
Min(IndexRelationGetNumberOfKeyAttributes(rel), BTreeTupleGetNAtts(itup, rel))
+/*
+ * Context for use within verify_btreeam()
+ */
+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;
+
+#define CONTINUE_CHECKING(ctx) \
+ (ctx == NULL || !((ctx)->is_corrupt && (ctx)->on_error_stop))
+
/*
* State associated with verifying a B-Tree index
*
@@ -116,6 +136,9 @@ typedef struct BtreeCheckState
bloom_filter *filter;
/* Debug counter */
int64 heaptuplespresent;
+
+ /* Error reporting context */
+ BtreeCheckContext *ctx;
} BtreeCheckState;
/*
@@ -133,16 +156,14 @@ typedef struct BtreeLevel
bool istruerootlevel;
} BtreeLevel;
-PG_FUNCTION_INFO_V1(bt_index_check);
-PG_FUNCTION_INFO_V1(bt_index_parent_check);
-
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);
static inline void btree_index_checkable(Relation rel);
static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
- bool rootdescend);
+ bool rootdescend, BtreeCheckContext * ctx);
static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
BtreeLevel level);
static void bt_target_page_check(BtreeCheckState *state);
@@ -185,6 +206,26 @@ static inline ItemPointer BTreeTupleGetHeapTIDCareful(BtreeCheckState *state,
IndexTuple itup, bool nonpivot);
static inline ItemPointer BTreeTupleGetPointsToTID(IndexTuple itup);
+static TupleDesc verify_btreeam_tupdesc(void);
+static void confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg);
+
+/*
+ * Macro for either calling ereport(...) or confess(...) depending on whether
+ * a context for returning the error message exists. Prior to version 1.3,
+ * all functions reported any detected corruption via ereport, but starting in
+ * 1.3, the new function verify_btreeam reports detected corruption back to
+ * the caller as a set of rows, and pre-existing functions continue to report
+ * corruption via ereport. This macro allows the shared implementation to
+ * to do the right thing depending on context.
+ */
+#define econfess(ctx, blkno, code, ...) \
+ do { \
+ if (ctx) \
+ confess(ctx, blkno, psprintf(__VA_ARGS__)); \
+ else \
+ ereport(ERROR, (errcode(code), errmsg(__VA_ARGS__))); \
+ } while(0)
+
/*
* bt_index_check(index regclass, heapallindexed boolean)
*
@@ -203,7 +244,7 @@ bt_index_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 2)
heapallindexed = PG_GETARG_BOOL(1);
- bt_index_check_internal(indrelid, false, heapallindexed, false);
+ bt_index_check_internal(indrelid, false, heapallindexed, false, NULL);
PG_RETURN_VOID();
}
@@ -229,17 +270,66 @@ bt_index_parent_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 3)
rootdescend = PG_GETARG_BOOL(2);
- bt_index_check_internal(indrelid, true, heapallindexed, rootdescend);
+ bt_index_check_internal(indrelid, true, heapallindexed, rootdescend, NULL);
PG_RETURN_VOID();
}
+Datum
+verify_btreeam(PG_FUNCTION_ARGS)
+{
+#define BTREECHECK_RELATION_COLS 2
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ BtreeCheckContext ctx;
+ bool randomAccess;
+ Oid indrelid;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ indrelid = PG_GETARG_OID(0);
+
+ memset(&ctx, 0, sizeof(BtreeCheckContext));
+
+ ctx.on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_btreeam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ bt_index_check_internal(indrelid, true, true, true, &ctx);
+
+ PG_RETURN_NULL();
+}
+
/*
* Helper for bt_index_[parent_]check, coordinating the bulk of the work.
*/
static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
- bool rootdescend)
+ bool rootdescend, BtreeCheckContext * ctx)
{
Oid heapid;
Relation indrel;
@@ -300,15 +390,16 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
RelationOpenSmgr(indrel);
if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" lacks a main relation fork",
- RelationGetRelationName(indrel))));
+ econfess(ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" lacks a main relation fork",
+ RelationGetRelationName(indrel));
/* Check index, possibly against table it is an index on */
- _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
- bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
- heapallindexed, rootdescend);
+ if (CONTINUE_CHECKING(ctx))
+ _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
+ if (CONTINUE_CHECKING(ctx))
+ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
+ heapallindexed, rootdescend, ctx);
}
/*
@@ -402,7 +493,8 @@ btree_index_mainfork_expected(Relation rel)
*/
static void
bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
- bool readonly, bool heapallindexed, bool rootdescend)
+ bool readonly, bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx)
{
BtreeCheckState *state;
Page metapage;
@@ -434,6 +526,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->readonly = readonly;
state->heapallindexed = heapallindexed;
state->rootdescend = rootdescend;
+ state->ctx = ctx;
if (state->heapallindexed)
{
@@ -535,7 +628,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current.level = metad->btm_level;
current.leftmost = metad->btm_root;
current.istruerootlevel = true;
- while (current.leftmost != P_NONE)
+ while (CONTINUE_CHECKING(state->ctx) && current.leftmost != P_NONE)
{
/*
* Verify this level, and get left most page for next level down, if
@@ -544,10 +637,9 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current = bt_check_level_from_leftmost(state, current);
if (current.leftmost == InvalidBlockNumber)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" has no valid pages on level below %u or first level",
- RelationGetRelationName(rel), previouslevel)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" has no valid pages on level below %u or first level",
+ RelationGetRelationName(rel), previouslevel);
previouslevel = current.level;
}
@@ -555,7 +647,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
/*
* * Check whether heap contains unindexed/malformed tuples *
*/
- if (state->heapallindexed)
+ if (CONTINUE_CHECKING(state->ctx) && state->heapallindexed)
{
IndexInfo *indexinfo = BuildIndexInfo(state->rel);
TableScanDesc scan;
@@ -691,18 +783,16 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* checked.
*/
if (state->readonly && P_ISDELETED(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink or sibling link points to deleted block in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "downlink or sibling link points to deleted block in index \"%s\" "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
if (P_RIGHTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u fell off the end of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u fell off the end of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
else
ereport(DEBUG1,
(errcode(ERRCODE_NO_DATA),
@@ -722,16 +812,14 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
if (state->readonly)
{
if (!P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not leftmost in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not leftmost in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
if (level.istruerootlevel && !P_ISROOT(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not true root in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not true root in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
}
/*
@@ -780,21 +868,19 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* so sibling pointers should always be in mutual agreement
*/
if (state->readonly && opaque->btpo_prev != leftcurrent)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("left link/right link pair in index \"%s\" not in agreement",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "left link/right link pair in index \"%s\" not in agreement "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
/* Check level, which must be valid for non-ignorable page */
if (level.level != opaque->btpo.level)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leftmost down link for level points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- current, level.level, opaque->btpo.level)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "leftmost down link for level points to block in index \"%s\" whose level is not one level down "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ current, level.level, opaque->btpo.level);
/* Verify invariants for page */
bt_target_page_check(state);
@@ -803,10 +889,9 @@ nextpage:
/* Try to detect circular links */
if (current == leftcurrent || current == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
leftcurrent = current;
current = opaque->btpo_next;
@@ -850,7 +935,7 @@ nextpage:
/* Free page and associated memory for this iteration */
MemoryContextReset(state->targetcontext);
}
- while (current != P_NONE);
+ while (CONTINUE_CHECKING(state->ctx) && current != P_NONE);
if (state->lowkey)
{
@@ -930,16 +1015,15 @@ bt_target_page_check(BtreeCheckState *state)
P_HIKEY))
{
itup = (IndexTuple) PageGetItem(state->target, itemid);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of high key index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index block=%u natts=%u block type=%s page lsn=%X/%X.",
- state->targetblock,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of high key index tuple attributes in index \"%s\" "
+ "(Index block=%u natts=%u block type=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -949,7 +1033,7 @@ bt_target_page_check(BtreeCheckState *state)
* real item (if any).
*/
for (offset = P_FIRSTDATAKEY(topaque);
- offset <= max;
+ offset <= max && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
ItemId itemid;
@@ -973,16 +1057,15 @@ bt_target_page_check(BtreeCheckState *state)
* frequently, and is surprisingly tolerant of corrupt lp_len fields.
*/
if (tupsize != ItemIdGetLength(itemid))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index tuple size does not equal lp_len in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X.",
- state->targetblock, offset,
- tupsize, ItemIdGetLength(itemid),
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn),
- errhint("This could be a torn page problem.")));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "index tuple size does not equal lp_len in index \"%s\" "
+ "(Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X) "
+ "(This could be a torn page problem)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ tupsize, ItemIdGetLength(itemid),
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check the number of index tuple attributes */
if (!_bt_check_natts(state->rel, state->heapkeyspace, state->target,
@@ -998,17 +1081,16 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X.",
- itid,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of index tuple attributes in index \"%s\" "
+ "(Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1049,14 +1131,13 @@ bt_target_page_check(BtreeCheckState *state)
htid = psprintf("(%u,%u)", ItemPointerGetBlockNumber(tid),
ItemPointerGetOffsetNumber(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("could not find tuple using search from root page in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to heap tid=%s page lsn=%X/%X.",
- itid, htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "could not find tuple using search from root page in index \"%s\" "
+ "(Index tid=%s points to heap tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1079,14 +1160,13 @@ bt_target_page_check(BtreeCheckState *state)
{
char *itid = psprintf("(%u,%u)", state->targetblock, offset);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("posting list contains misplaced TID in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s posting list offset=%d page lsn=%X/%X.",
- itid, i,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "posting list contains misplaced TID in index \"%s\" "
+ "(Index tid=%s posting list offset=%d page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, i,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
ItemPointerCopy(current, &last);
@@ -1134,16 +1214,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index row size %zu exceeds maximum for index \"%s\"",
- tupsize, RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index row size %zu exceeds maximum for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ tupsize, RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Fingerprint leaf page tuples (those that point to the heap) */
@@ -1242,16 +1321,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("high key invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "high key invariant violated for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Reset, in case scantid was set to (itup) posting tuple's max TID */
skey->scantid = scantid;
@@ -1289,21 +1367,20 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Lower index tid=%s (points to %s tid=%s) "
- "higher index tid=%s (points to %s tid=%s) "
- "page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- nitid,
- P_ISLEAF(topaque) ? "heap" : "index",
- nhtid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "item order invariant violated for index \"%s\" "
+ "(Lower index tid=%s (points to %s tid=%s) "
+ "higher index tid=%s (points to %s tid=%s) "
+ "page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ nitid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ nhtid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1354,14 +1431,13 @@ bt_target_page_check(BtreeCheckState *state)
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("cross page item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Last item on page tid=(%u,%u) page lsn=%X/%X.",
- state->targetblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "cross page item order invariant violated for index \"%s\" "
+ "(Last item on page tid=(%u,%u) page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1386,7 +1462,8 @@ bt_target_page_check(BtreeCheckState *state)
* right of the child page pointer to by our rightmost downlink. And they
* might have missing downlinks. This final call checks for them.
*/
- if (!P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
+ if (CONTINUE_CHECKING(state->ctx) &&
+ !P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
{
bt_child_highkey_check(state, InvalidOffsetNumber,
NULL, topaque->btpo.level);
@@ -1708,7 +1785,7 @@ bt_child_highkey_check(BtreeCheckState *state,
}
/* Move to the right on the child level */
- while (true)
+ while (CONTINUE_CHECKING(state->ctx))
{
/*
* Did we traverse the whole tree level and this is check for pages to
@@ -1723,11 +1800,10 @@ bt_child_highkey_check(BtreeCheckState *state,
/* Did we traverse the whole tree level and don't find next downlink? */
if (blkno == P_NONE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't traverse from downlink %u to downlink %u of index \"%s\"",
- state->prevrightlink, downlink,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "can't traverse from downlink %u to downlink %u of index \"%s\"",
+ state->prevrightlink, downlink,
+ RelationGetRelationName(state->rel));
/* Load page contents */
if (blkno == downlink && loaded_child)
@@ -1739,30 +1815,27 @@ bt_child_highkey_check(BtreeCheckState *state,
/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && !P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("the first child of leftmost target page is not leftmost of its level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "the first child of leftmost target page is not leftmost of its level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check level for non-ignorable page */
if (!P_IGNORE(opaque) && opaque->btpo.level != target_level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block found while following rightlinks from child of index \"%s\" has invalid level",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, target_level - 1, opaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "block found while following rightlinks from child of index \"%s\" has invalid level "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, target_level - 1, opaque->btpo.level);
/* Try to detect circular links */
if ((!first && blkno == state->prevrightlink) || blkno == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- blkno, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ blkno, RelationGetRelationName(state->rel));
if (blkno != downlink && !P_IGNORE(opaque))
{
@@ -1825,14 +1898,13 @@ bt_child_highkey_check(BtreeCheckState *state,
if (pivotkey_offset > PageGetMaxOffsetNumber(state->target))
{
if (P_RIGHTMOST(topaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("child high key is greater than rightmost pivot key on target level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "child high key is greater than rightmost pivot key on target level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
pivotkey_offset = P_HIKEY;
}
itemid = PageGetItemIdCareful(state, state->targetblock,
@@ -1856,27 +1928,25 @@ bt_child_highkey_check(BtreeCheckState *state,
* page.
*/
if (!state->lowkey)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't find left sibling high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "can't find left sibling high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
itup = state->lowkey;
}
if (!bt_pivot_tuple_identical(highkey, itup))
{
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("mismatch between parent key and child high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "mismatch between parent key and child high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -2014,17 +2084,16 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* to test.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink to deleted page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child block=%u parent page lsn=%X/%X.",
- state->targetblock, childblock,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted page found in index \"%s\" "
+ "(Parent block=%u child block=%u parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
for (offset = P_FIRSTDATAKEY(copaque);
- offset <= maxoffset;
+ offset <= maxoffset && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
/*
@@ -2056,14 +2125,13 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
if (!invariant_l_nontarget_offset(state, targetkey, childblock, child,
offset))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("down-link lower bound invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X.",
- state->targetblock, childblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "down-link lower bound invariant violated for index \"%s\" "
+ "(Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
pfree(child);
@@ -2150,14 +2218,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* inconsistencies anywhere else.
*/
if (P_ISLEAF(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leaf index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u page lsn=%X/%X.",
- blkno,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "leaf index block lacks downlink in index \"%s\" "
+ "(Block=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/* Descend from the given page, which is an internal page */
elog(DEBUG1, "checking for interrupted multi-level deletion due to missing downlink in index \"%s\"",
@@ -2167,7 +2234,7 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
itemid = PageGetItemIdCareful(state, blkno, page, P_FIRSTDATAKEY(opaque));
itup = (IndexTuple) PageGetItem(page, itemid);
childblk = BTreeTupleGetDownLink(itup);
- for (;;)
+ while (CONTINUE_CHECKING(state->ctx))
{
CHECK_FOR_INTERRUPTS();
@@ -2179,13 +2246,12 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
/* Do an extra sanity check in passing on internal pages */
if (copaque->btpo.level != level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, childblk,
- level - 1, copaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink points to block in index \"%s\" whose level is not one level down "
+ "(Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ level - 1, copaque->btpo.level);
level = copaque->btpo.level;
itemid = PageGetItemIdCareful(state, childblk, child,
@@ -2217,14 +2283,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* parent/ancestor page) lacked a downlink is incidental.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink to deleted leaf page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X.",
- blkno, childblk,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted leaf page found in index \"%s\" "
+ "(Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/*
* Iff leaf page is half-dead, its high key top parent link should point
@@ -2244,14 +2309,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u level=%u page lsn=%X/%X.",
- blkno, opaque->btpo.level,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "internal index block lacks downlink in index \"%s\" "
+ "(Block=%u level=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, opaque->btpo.level,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
}
/*
@@ -2327,16 +2391,12 @@ bt_tuple_present_callback(Relation index, ItemPointer tid, Datum *values,
/* Probe Bloom filter -- tuple should be present */
if (bloom_lacks_element(state->filter, (unsigned char *) norm,
IndexTupleSize(norm)))
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->heaprel),
- RelationGetRelationName(state->rel)),
- !state->readonly
- ? errhint("Retrying verification using the function bt_index_parent_check() might provide a more specific error.")
- : 0));
+ econfess(state->ctx, ItemPointerGetBlockNumber(&(itup->t_tid)), ERRCODE_DATA_CORRUPTED,
+ "heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->heaprel),
+ RelationGetRelationName(state->rel));
state->heaptuplespresent++;
pfree(itup);
@@ -2395,7 +2455,7 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
if (!IndexTupleHasVarwidths(itup))
return itup;
- for (i = 0; i < tupleDescriptor->natts; i++)
+ for (i = 0; CONTINUE_CHECKING(state->ctx) && i < tupleDescriptor->natts; i++)
{
Form_pg_attribute att;
@@ -2415,12 +2475,11 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
* should never be encountered here
*/
if (VARATT_IS_EXTERNAL(DatumGetPointer(normalized[i])))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->rel));
else if (VARATT_IS_COMPRESSED(DatumGetPointer(normalized[i])))
{
formnewtup = true;
@@ -2810,10 +2869,9 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISMETA(opaque) && blocknum != BTREE_METAPAGE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid meta page found at block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid meta page found at block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/* Check page from block that ought to be meta page */
if (blocknum == BTREE_METAPAGE)
@@ -2822,20 +2880,18 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
if (!P_ISMETA(opaque) ||
metad->btm_magic != BTREE_MAGIC)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" meta page is corrupt",
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" meta page is corrupt",
+ RelationGetRelationName(state->rel));
if (metad->btm_version < BTREE_MIN_VERSION ||
metad->btm_version > BTREE_VERSION)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("version mismatch in index \"%s\": file version %d, "
- "current version %d, minimum supported version %d",
- RelationGetRelationName(state->rel),
- metad->btm_version, BTREE_VERSION,
- BTREE_MIN_VERSION)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "version mismatch in index \"%s\": file version %d, "
+ "current version %d, minimum supported version %d",
+ RelationGetRelationName(state->rel),
+ metad->btm_version, BTREE_VERSION,
+ BTREE_MIN_VERSION);
/* Finished with metapage checks */
return page;
@@ -2846,17 +2902,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* page level
*/
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && opaque->btpo.level != 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid leaf page level %u for block %u in index \"%s\"",
- opaque->btpo.level, blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "invalid leaf page level %u for block %u in index \"%s\"",
+ opaque->btpo.level, blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) &&
opaque->btpo.level == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid internal page level 0 for block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid internal page level 0 for block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/*
* Sanity checks for number of items on page.
@@ -2878,23 +2932,20 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
*/
maxoffset = PageGetMaxOffsetNumber(page);
if (maxoffset > MaxIndexTuplesPerPage)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("Number of items on block %u of index \"%s\" exceeds MaxIndexTuplesPerPage (%u)",
- blocknum, RelationGetRelationName(state->rel),
- MaxIndexTuplesPerPage)));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "Number of items on block %u of index \"%s\" exceeds MaxIndexTuplesPerPage (%u)",
+ blocknum, RelationGetRelationName(state->rel),
+ MaxIndexTuplesPerPage);
if (!P_ISLEAF(opaque) && maxoffset < P_FIRSTDATAKEY(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal block %u in index \"%s\" lacks high key and/or at least one downlink",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal block %u in index \"%s\" lacks high key and/or at least one downlink",
+ blocknum, RelationGetRelationName(state->rel));
if (P_ISLEAF(opaque) && !P_RIGHTMOST(opaque) && maxoffset < P_HIKEY)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("non-rightmost leaf block %u in index \"%s\" lacks high key item",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "non-rightmost leaf block %u in index \"%s\" lacks high key item",
+ blocknum, RelationGetRelationName(state->rel));
/*
* In general, internal pages are never marked half-dead, except on
@@ -2906,17 +2957,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* Internal pages should never have garbage items, either.
*/
if (!P_ISLEAF(opaque) && P_ISHALFDEAD(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" is half-dead",
- blocknum, RelationGetRelationName(state->rel)),
- errhint("This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it.")));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" is half-dead "
+ "(This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it)",
+ blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && P_HAS_GARBAGE(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" has garbage items",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" has garbage items",
+ blocknum, RelationGetRelationName(state->rel));
return page;
}
@@ -2967,14 +3016,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
BLCKSZ - sizeof(BTPageOpaqueData))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("line pointer points past end of tuple space in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "line pointer points past end of tuple space in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
/*
* Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree
@@ -2983,14 +3031,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
*/
if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "invalid line pointer storage in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
return itemid;
}
@@ -3012,26 +3059,23 @@ BTreeTupleGetHeapTIDCareful(BtreeCheckState *state, IndexTuple itup,
*/
Assert(state->heapkeyspace);
if (BTreeTupleIsPivot(itup) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
if (!BTreeTupleIsPivot(itup) && !nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
htid = BTreeTupleGetHeapTID(itup);
if (!ItemPointerIsValid(htid) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
return htid;
}
@@ -3062,3 +3106,52 @@ BTreeTupleGetPointsToTID(IndexTuple itup)
/* Pivot tuple returns TID with downlink block (heapkeyspace variant) */
return &itup->t_tid;
}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_btreeam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(BTREECHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == BTREECHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * confess
+ *
+ * Return a message about index corruption
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg)
+{
+ Datum values[BTREECHECK_RELATION_COLS];
+ bool nulls[BTREECHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(blkno);
+ nulls[0] = (blkno == InvalidBlockNumber);
+ values[1] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..3e47b717f1
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,884 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --schema=PATTERN check all relations in the specified schema(s)",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified "
+ "schema(s)",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -i, --check-indexes check associated btree indexes, if any",
+ " -I, --exclude-indexes do NOT check associated btree indexes",
+ " --strict-names require table and/or schema include patterns "
+ "to match at least one entity each",
+ " -b, --startblock check relations beginning at the given "
+ "starting block number",
+ " -e, --endblock check relations only up to the given ending "
+ "block number",
+ " -f, --skip-all-frozen do not check blocks marked as all frozen",
+ " -v, --skip-all-visible do not check blocks marked as all visible",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen "
+ "automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes for tables */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of OIDs of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+
+static void check_tables(SimpleOidList *checklist);
+static void check_table(Oid tbloid);
+static void check_indexes(Oid tbloid);
+static void check_index(Oid tbloid, Oid idxoid);
+
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions * connOpts);
+static void usage(void);
+static void showVersion(void);
+
+static void NoticeProcessor(void *arg, const char *message);
+
+static void expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_table_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(SimpleOidList *include_nsp,
+ SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl,
+ SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+
+static void die_on_query_failure(const char *query);
+static void ExecuteSqlStatement(const char *query);
+static PGresult *ExecuteSqlQuery(const char *query, ExecStatusType status);
+static PGresult *ExecuteSqlQueryForSingleRow(const char *query);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+#define NOPAGER 0
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ values[4] = (connOpts.dbname == NULL) ? "postgres" : connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /* Expand schema selection patterns into OID lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+ expand_schema_name_patterns(&schema_exclude_patterns,
+ &schema_exclude_oids,
+ false);
+ /* non-matching exclusion patterns aren't an error */
+
+ /* Expand table selection patterns into OID lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+ expand_table_name_patterns(&table_exclude_patterns,
+ &table_exclude_oids,
+ false);
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ check_table(cell->val);
+ if (settings.check_indexes)
+ check_indexes(cell->val);
+ }
+}
+
+static void
+check_table(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+ char *skip;
+ const char *stop;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (settings.startblock == NULL)
+ settings.startblock = pg_strdup("NULL");
+ if (settings.endblock == NULL)
+ settings.endblock = pg_strdup("NULL");
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("NULL");
+ stop = (settings.on_error_stop) ? "true" : "false";
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(relname=%s,blkno=%s,offnum=%s,lp_off=%s,lp_flags=%s,"
+ "lp_len=%s,attnum=%s,chunk=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* lp_off */
+ PQgetvalue(res, i, 4), /* lp_flags */
+ PQgetvalue(res, i, 5), /* lp_len */
+ PQgetvalue(res, i, 6), /* attnum */
+ PQgetvalue(res, i, 7), /* chunk */
+ PQgetvalue(res, i, 8)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_indexes(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ check_index(tbloid, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_index(Oid tbloid, Oid idxoid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT ct.relname, ci.relname, blkno, msg"
+ "\nFROM verify_btreeam(%u,%s),"
+ "\n pg_catalog.pg_class ci,"
+ "\n pg_catalog.pg_class ct"
+ "\nWHERE ci.oid = %u"
+ "\n AND ct.oid = %u",
+ idxoid,
+ settings.on_error_stop ? "true" : "false",
+ idxoid, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(table=%s,index=%s,blkno=%s)"
+ "\n%s\n",
+ PQgetvalue(res, i, 0), /* table relname */
+ PQgetvalue(res, i, 1), /* index relname */
+ PQgetvalue(res, i, 2), /* index blkno */
+ PQgetvalue(res, i, 3)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions * connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"startblock", required_argument, NULL, 'b'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"host", required_argument, NULL, 'h'},
+ {"check-indexes", no_argument, NULL, 'i'},
+ {"exclude-indexes", no_argument, NULL, 'I'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"schema", required_argument, NULL, 'n'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"port", required_argument, NULL, 'p'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"help", optional_argument, NULL, '?'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "b:d:e:fh:iIn:N:op:st:T:U:vVwW?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ settings.check_indexes = true;
+ break;
+ case 'I':
+ settings.check_indexes = false;
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ FILE *output;
+ int lines;
+ int lineno;
+
+ for (lines = 0; usage_text[lines]; lines++)
+ ;
+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+
+ ClosePager(output);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+static void
+expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list. See also expand_dbname_patterns()
+ * in pg_dumpall.c
+ */
+static void
+expand_table_name_patterns(SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(settings.db, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+static void
+append_csv_oids(PQExpBuffer query, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(query, "%s%u", comma, cell->val);
+}
+
+static bool
+append_filter(PQExpBuffer query, const char *lval, const char *operator,
+ const SimpleOidList *oids)
+{
+ if (!oids->head)
+ return false;
+ appendPQExpBuffer(query, "\nAND %s %s ANY(array[\n", lval, operator);
+ append_csv_oids(query, oids);
+ appendPQExpBuffer(query, "\n])");
+ return true;
+}
+
+static void
+get_table_check_list(SimpleOidList *include_nsp, SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl, SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.=)", include_nsp);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.!=)", exclude_nsp);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.=)", include_tbl);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.!=)", exclude_tbl);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+static void
+ExecuteSqlStatement(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ die_on_query_failure(query);
+ PQclear(res);
+}
+
+static PGresult *
+ExecuteSqlQuery(const char *query, ExecStatusType status)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != status)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute an SQL query and verify that we got exactly one row back.
+ */
+static PGresult *
+ExecuteSqlQueryForSingleRow(const char *query)
+{
+ PGresult *res;
+ int ntups;
+
+ res = ExecuteSqlQuery(query, PGRES_TUPLES_OK);
+
+ /* Expecting a single result only */
+ ntups = PQntuples(res);
+ if (ntups != 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+
+ return res;
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..da98ae5b5e
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,57 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..eeee090c08
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,87 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+CREATE SCHEMA s1;
+CREATE SCHEMA s2;
+CREATE SCHEMA s3;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE TABLE s1.t3 (a TEXT);
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE TABLE s2.t3 (a TEXT);
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE TABLE s3.t3 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+CREATE INDEX i3 ON s1.t3(a);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+CREATE INDEX i3 ON s2.t3(a);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+CREATE INDEX i3 ON s3.t3(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas and tables implicitly');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-i', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+;$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1'
+ ],
+ 'pg_amcheck all tables in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1'
+ ],
+ 'pg_amcheck all tables not in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-i', '-n', 's*', '-t', 't1'
+ ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..23a45581f0
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,408 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 36;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 12;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '--check-indexes', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin = 3 precedes relation relfrozenxid = $relfrozenxid
+0|2|8064|1|58|||tuple xmin = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|3|8000|1|58|||tuple xmax = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|4|7936|1|58|||t_hoff > lp_len (152 > 58)
+0|5|7872|1|58|||t_hoff not max-aligned (27)
+0|6|7808|1|58|||t_hoff < SizeofHeapTupleHeader (16 < 23)
+0|7|7744|1|58|||t_hoff < SizeofHeapTupleHeader (21 < 23)
+0|7|7744|1|58|||t_hoff not max-aligned (21)
+0|8|7680|1|58|||relation natts < tuple natts (3 < 2047)
+0|9|7616|1|58|||SizeofHeapTupleHeader + BITMAPLEN(natts) > t_hoff (23 + 256 > 24)
+0|10|7552|1|58|||relation natts < tuple natts (3 < 67)
+0|11|7488|1|58|2||t_hoff + offset > lp_len (24 + 416847976 > 58)
+0|12|7424|1|58|2|0|final chunk number differs from expected (0 vs. 6)
+0|12|7424|1|58|2|0|toasted value missing from toast table",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,lp_off=\d*,lp_flags=\d*,lp_len=\d*,attnum=\d*,chunk=\d*/,
+
+ # individual detected corruptions
+ qr/tuple xmin = \d+ precedes relation relfrozenxid = \d+/,
+ qr/tuple xmax = \d+ precedes relation relfrozenxid = \d+/,
+ qr/t_hoff > lp_len/,
+ qr/t_hoff not max-aligned/,
+ qr/t_hoff < SizeofHeapTupleHeader/,
+ qr/relation natts < tuple natts/,
+ qr/SizeofHeapTupleHeader \+ BITMAPLEN\(natts\) > t_hoff/,
+ qr/t_hoff \+ offset > lp_le/,
+ qr/final chunk number differs from expected/,
+ qr/toasted value missing from toast table/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
+
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 75518a7820..cc36d92f72 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -69,7 +69,7 @@ AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
@@ -165,6 +165,110 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if on_error_stop is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..f606e42fb9 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pg_amcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 68179f71cd..9e58765121 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pg_amcheck SYSTEM "pg_amcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pg_amcheck.sgml b/doc/src/sgml/pg_amcheck.sgml
new file mode 100644
index 0000000000..f379af2258
--- /dev/null
+++ b/doc/src/sgml/pg_amcheck.sgml
@@ -0,0 +1,136 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-scheam</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "corrupted",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck -i test
+(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 3289393 is in the future
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmax = 0 precedes relation relminmxid = 1
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 12593 is in the future
+</screen>
+
+ <para>
+ .... many pages of output removed for brevity ....
+ </para>
+
+<screen>
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+tuple xmin = 305 precedes relation relfrozenxid = 487
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff > lp_len (54 > 34)
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff not max-aligned (54)
+</screen>
+
+ <para>
+ Each detected corruption is reported on two lines, the first line shows the
+ location and the second line shows a message describing the problem.
+ </para>
+ </sect2>
+</sect1>
--
2.21.1 (Apple Git-122.3)
On Mon, May 11, 2020 at 10:21 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
2) amcheck's btree checking functions have been refactored to be able to operate in two modes; the original mode in which all errors are reported via ereport, and a new mode for returning errors as rows from a set returning function.
Somebody suggested that I make amcheck work in this way during its
initial development. I rejected that idea at the time, though. It
seems hard to make it work because the B-Tree index scan is a logical
order index scan. It's quite possible that a corrupt index will have
circular sibling links, and things like that. Making everything an
error removes that concern. There are clearly some failures that we
could just soldier on from, but the distinction gets rather blurred.
I understand why you want to do it this way. It makes sense that the
heap stuff would report all inconsistencies together, at the end. I
don't think that that's really workable (or even desirable) in the
case of B-Tree indexes, though. When an index is corrupt, the solution
is always to do root cause analysis, to make sure that the issue does
not recur, and then to REINDEX. There isn't really a question about
doing data recovery of the index structure.
Would it be possible to log the first B-Tree inconsistency, and then
move on to the next high-level phase of verification? You don't have
to throw an error, but it seems like a good idea for amcheck to still
give up on further verification of the index.
The assertion failure that you reported happens because of a generic
assertion made from _bt_compare(). It doesn't have anything to do with
amcheck (you'll see the same thing from regular index scans), really.
I think that removing that assertion would be the opposite of
hardening. Even if you removed it, the backend will still crash once
you come up with a slightly more evil index tuple. Maybe *that* could
be mostly avoided with widespread hardening; we could in principle
perform cross-checks of varlena headers against the tuple or page
layout at any point reachable from _bt_compare(). That seems like
something that would have unacceptable overhead, because the cost
would be imposed on everything. And even then you've only ameliorated
the problem.
Code like amcheck's PageGetItemIdCareful() goes further than the
equivalent backend macro (PageGetItemId()) to avoid assertion failures
and crashes with corrupt data. I doubt that it is practical to take it
much further than that, though. It's subject to diminishing returns.
In general, _bt_compare() calls user-defined code that is usually
written in C. This C code could in principle feel entitled to do any
number of scary things when you corrupt the input data. The amcheck
module's dependency on user-defined operator code is totally
unavoidable -- it is the single source of truth for the nbtree checks.
It boils down to this: I think that regression tests that run on the
buildfarm and actually corrupt data are not practical, at least in the
case of the index checks -- though probably in all cases. Look at the
pageinspect "btree.out" test output file -- it's very limited, because
we have to work around a bunch of implementation details. It's no
accident that the bt_page_items() test shows a palindrome value in the
data column (the value is "01 00 00 00 00 00 00 01"). That's an
endianness workaround.
--
Peter Geoghegan
On May 12, 2020, at 5:34 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, May 11, 2020 at 10:21 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:2) amcheck's btree checking functions have been refactored to be able to operate in two modes; the original mode in which all errors are reported via ereport, and a new mode for returning errors as rows from a set returning function.
Thank you yet again for reviewing. I really appreciate the feedback!
Somebody suggested that I make amcheck work in this way during its
initial development. I rejected that idea at the time, though. It
seems hard to make it work because the B-Tree index scan is a logical
order index scan. It's quite possible that a corrupt index will have
circular sibling links, and things like that. Making everything an
error removes that concern. There are clearly some failures that we
could just soldier on from, but the distinction gets rather blurred.
Ok, I take your point that the code cannot soldier on after the first error is returned. I'll change that for v6 of the patch, moving on to the next relation after hitting the first corruption in any particular index. Do you mind that I refactored the code to return the error rather than ereporting? If it offends your sensibilities, I could rip that back out, at the expense of having to use try/catch logic in some other places. I prefer to avoid the try/catch stuff, but I'm not going to put up a huge fuss.
I understand why you want to do it this way. It makes sense that the
heap stuff would report all inconsistencies together, at the end. I
don't think that that's really workable (or even desirable) in the
case of B-Tree indexes, though. When an index is corrupt, the solution
is always to do root cause analysis, to make sure that the issue does
not recur, and then to REINDEX. There isn't really a question about
doing data recovery of the index structure.
Yes, I agree that reindexing is the most sensible remedy. I certainly have no plans to implement some pg_fsck_index type tool. Even for tables, I'm not interested in creating such a tool. I just want a good tool for finding out what the nature of the corruption is, as that might make it easier to debug what went wrong. It's not just for debugging production systems, but also for chasing down problems in half-baked code prior to release.
Would it be possible to log the first B-Tree inconsistency, and then
move on to the next high-level phase of verification? You don't have
to throw an error, but it seems like a good idea for amcheck to still
give up on further verification of the index.
Ok, good, it sounds like we're converging on the same idea. I'm happy to do so.
The assertion failure that you reported happens because of a generic
assertion made from _bt_compare(). It doesn't have anything to do with
amcheck (you'll see the same thing from regular index scans), really.
Oh, I know that already. I could see that easily enough in the backtrace. But if you look at the way I implemented verify_heapam, you might notice this:
/*
* check_tuphdr_xids
*
* Determine whether tuples are visible for verification. Similar to
* HeapTupleSatisfiesVacuum, but with critical differences.
*
* 1) Does not touch hint bits. It seems imprudent to write hint bits
* to a table during a corruption check.
* 2) Only makes a boolean determination of whether verification should
* see the tuple, rather than doing extra work for vacuum-related
* categorization.
*
* The caller should already have checked that xmin and xmax are not out of
* bounds for the relation.
*/
The point is that when checking the table for corruption I avoid calling anything that might assert (or segfault, or whatever). I was talking about refactoring the btree checking code to be similarly careful.
I think that removing that assertion would be the opposite of
hardening. Even if you removed it, the backend will still crash once
you come up with a slightly more evil index tuple. Maybe *that* could
be mostly avoided with widespread hardening; we could in principle
perform cross-checks of varlena headers against the tuple or page
layout at any point reachable from _bt_compare(). That seems like
something that would have unacceptable overhead, because the cost
would be imposed on everything. And even then you've only ameliorated
the problem.
I think we may have different mental models of how this all works in practice. I am (or was) envisioning that the backend, during regular table and index scans, cannot afford to check for corruption at all steps along the way, and therefore does not, but that a corruption checking tool has a fundamentally different purpose, and can and should choose to operate in a way that won't blow up when checking a corrupt relation. It's the difference between a car designed to drive down the highway at high speed vs. a military vehicle designed to drive over a minefield with a guy on the front bumper scanning for landmines, the whole while going half a mile an hour.
I'm starting to infer from your comments that you see the landmine detection vehicle as also driving at high speed, detecting landmines on occasion by seeing them first, but frequently by failing to see them and just blowing up.
Code like amcheck's PageGetItemIdCareful() goes further than the
equivalent backend macro (PageGetItemId()) to avoid assertion failures
and crashes with corrupt data. I doubt that it is practical to take it
much further than that, though. It's subject to diminishing returns.
Ok.
In general, _bt_compare() calls user-defined code that is usually
written in C. This C code could in principle feel entitled to do any
number of scary things when you corrupt the input data. The amcheck
module's dependency on user-defined operator code is totally
unavoidable -- it is the single source of truth for the nbtree checks.
I don't really understand this argument, since users with buggy user defined operators are not the target audience, but I also don't think there is any point in arguing it, since I'm already resolved to take your advice about not hardening the btree stuff any further.
It boils down to this: I think that regression tests that run on the
buildfarm and actually corrupt data are not practical, at least in the
case of the index checks -- though probably in all cases. Look at the
pageinspect "btree.out" test output file -- it's very limited, because
we have to work around a bunch of implementation details. It's no
accident that the bt_page_items() test shows a palindrome value in the
data column (the value is "01 00 00 00 00 00 00 01"). That's an
endianness workaround.
One of the delays in submitting the most recent version of the patch is that I was having trouble creating a reliable, portable btree corrupting regression test. Ultimately, I submitted v5 without any btree corrupting regression test, as it proved pretty difficult to write one good enough for submission, and I had already put a couple more days into developing v5 than I had intended. So I can't argue too much with your point here.
I did however address (some?) issues that you and others mentioned about the table corrupting regression test. Perhaps there are remaining issues that will show up on machines with different endianness than I have thus far tested, but I don't see that they will be insurmountable. Are you fundamentally opposed to that test framework? If you're going to vote against committing the patch with that test, I'll back down and just remove it from the patch, but it doesn't seem like a bad regression test to me.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, May 12, 2020 at 7:07 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Thank you yet again for reviewing. I really appreciate the feedback!
Happy to help. It's important work.
Ok, I take your point that the code cannot soldier on after the first error is returned. I'll change that for v6 of the patch, moving on to the next relation after hitting the first corruption in any particular index. Do you mind that I refactored the code to return the error rather than ereporting?
try/catch seems like the way to do it. Not all amcheck errors come
from amcheck -- some are things that the backend code does, that are
known to appear in amcheck from time to time. I'm thinking in
particular of the
table_index_build_scan()/heapam_index_build_range_scan() errors, as
well as the errors from _bt_checkpage().
Yes, I agree that reindexing is the most sensible remedy. I certainly have no plans to implement some pg_fsck_index type tool. Even for tables, I'm not interested in creating such a tool. I just want a good tool for finding out what the nature of the corruption is, as that might make it easier to debug what went wrong. It's not just for debugging production systems, but also for chasing down problems in half-baked code prior to release.
All good goals.
* check_tuphdr_xids
The point is that when checking the table for corruption I avoid calling anything that might assert (or segfault, or whatever).
I don't think that you can expect to avoid assertion failures in
general. I'll stick with your example. You're calling
TransactionIdDidCommit() from check_tuphdr_xids(), which will
interrogate the commit log and pg_subtrans. It's just not under your
control. I'm sure that you could get an assertion failure somewhere in
there, and even if you couldn't that could change at any time.
You've quasi-duplicated some sensitive code to do that much, which
seems excessive. But it's also not enough.
I'm starting to infer from your comments that you see the landmine detection vehicle as also driving at high speed, detecting landmines on occasion by seeing them first, but frequently by failing to see them and just blowing up.
That's not it. I would certainly prefer if the landmine detector
didn't blow up. Not having that happen is certainly a goal I share --
that's why PageGetItemIdCareful() exists. But not at any cost,
especially not when "blow up" means an assertion failure that users
won't actually see in production. Avoiding assertion failures like the
one you showed is likely to have a high cost (removing defensive
asserts in low level access method code) for a low benefit. Any
attempt to avoid having the checker itself blow up rather than throw
an error message needs to be assessed pragmatically, on a case-by-case
basis.
One of the delays in submitting the most recent version of the patch is that I was having trouble creating a reliable, portable btree corrupting regression test.
To be clear, I think that corrupting data is very helpful with ad-hoc
testing during development.
I did however address (some?) issues that you and others mentioned about the table corrupting regression test. Perhaps there are remaining issues that will show up on machines with different endianness than I have thus far tested, but I don't see that they will be insurmountable. Are you fundamentally opposed to that test framework?
I haven't thought about it enough just yet, but I am certainly suspicious of it.
--
Peter Geoghegan
On Tue, May 12, 2020 at 11:06 PM Peter Geoghegan <pg@bowt.ie> wrote:
try/catch seems like the way to do it. Not all amcheck errors come
from amcheck -- some are things that the backend code does, that are
known to appear in amcheck from time to time. I'm thinking in
particular of the
table_index_build_scan()/heapam_index_build_range_scan() errors, as
well as the errors from _bt_checkpage().
That would require the use of a subtransaction.
You've quasi-duplicated some sensitive code to do that much, which
seems excessive. But it's also not enough.
I think this is a good summary of the problems in this area. On the
one hand, I think it's hideous that we sanity check user input to
death, but blindly trust the bytes on disk to the point of seg
faulting if they're wrong. The idea that int4 + int4 has to have
overflow checking because otherwise a user might be sad when they get
a negative result from adding two negative numbers, while at the same
time supposing that the same user will be unwilling to accept the
performance hit to avoid crashing if they have a bad tuple, is quite
suspect in my mind. The overflow checking is also expensive, but we do
it because it's the right thing to do, and then we try to minimize the
overhead. It is unclear to me why we shouldn't also take that approach
with bytes that come from disk. In particular, using Assert() checks
for such things instead of elog() is basically Assert(there is no such
thing as a corrupted database).
On the other hand, that problem is clearly way above this patch's pay
grade. There's a lot of stuff all over the code base that would have
to be changed to fix it. It can't be done as an incidental thing as
part of this patch or any other. It's a massive effort unto itself. We
need to somehow draw a clean line between what this patch does and
what it does not do, such that the scope of this patch remains
something achievable. Otherwise, we'll end up with nothing.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, May 13, 2020 at 12:22 PM Robert Haas <robertmhaas@gmail.com> wrote:
I think this is a good summary of the problems in this area. On the
one hand, I think it's hideous that we sanity check user input to
death, but blindly trust the bytes on disk to the point of seg
faulting if they're wrong. The idea that int4 + int4 has to have
overflow checking because otherwise a user might be sad when they get
a negative result from adding two negative numbers, while at the same
time supposing that the same user will be unwilling to accept the
performance hit to avoid crashing if they have a bad tuple, is quite
suspect in my mind. The overflow checking is also expensive, but we do
it because it's the right thing to do, and then we try to minimize the
overhead. It is unclear to me why we shouldn't also take that approach
with bytes that come from disk. In particular, using Assert() checks
for such things instead of elog() is basically Assert(there is no such
thing as a corrupted database).
I think that it depends. It's nice to be able to add an Assert()
without really having to worry about the overhead at all. I sometimes
call relatively expensive functions in assertions. For example, there
is an assert that calls _bt_compare() within _bt_check_unique() that I
added at one point -- it caught a real bug a few weeks later. You
could always be doing more.
In general we don't exactly trust the bytes blindly. I've found that
corrupting tuples in a creative way with pg_hexedit doesn't usually
result in a segfault. Sometimes we'll do things like display NULL
values when heap line pointers are corrupt, which isn't as good as an
error but is still okay. We ought to protect against Murphy, not
Machiavelli. ISTM that access method code naturally evolves towards
avoiding the most disruptive errors in the event of real world
corruption, in particular avoiding segfaulting. It's very hard to
prove that, though.
Do you recall seeing corruption resulting in segfaults in production?
I personally don't recall seeing that. If it happened, the segfaults
themselves probably wouldn't be the main concern.
On the other hand, that problem is clearly way above this patch's pay
grade. There's a lot of stuff all over the code base that would have
to be changed to fix it. It can't be done as an incidental thing as
part of this patch or any other. It's a massive effort unto itself. We
need to somehow draw a clean line between what this patch does and
what it does not do, such that the scope of this patch remains
something achievable. Otherwise, we'll end up with nothing.
I can easily come up with an adversarial input that will segfault a
backend, even amcheck, but it'll be somewhat contrived. It's hard to
fool amcheck currently because it doesn't exactly trust line pointers.
But I'm sure I could get the backend to segfault amcheck if I tried.
I'd probably try to play around with varlena headers. It would require
a certain amount of craftiness.
It's not exactly clear where you draw the line here. And I don't think
that the line will be very clearly defined, in the end. It'll be
something that is subject to change over time, as new information
comes to light. I think that it's necessary to accept a certain amount
of ambiguity here.
--
Peter Geoghegan
On 2020-May-12, Peter Geoghegan wrote:
The point is that when checking the table for corruption I avoid
calling anything that might assert (or segfault, or whatever).I don't think that you can expect to avoid assertion failures in
general.
Hmm. I think we should (try to?) write code that avoids all crashes
with production builds, but not extend that to assertion failures.
Sticking again with the provided example,
I'll stick with your example. You're calling
TransactionIdDidCommit() from check_tuphdr_xids(), which will
interrogate the commit log and pg_subtrans. It's just not under your
control.
in a production build this would just fail with an error that the
pg_xact file cannot be found, which is fine -- if this happens in a
production system, you're not disturbing any other sessions. Or maybe
the file is there and the byte can be read, in which case you would get
the correct response; but that's fine too.
I don't know to what extent this is possible.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Hmm. I think we should (try to?) write code that avoids all crashes
with production builds, but not extend that to assertion failures.
Assertions are only a problem at all because Mark would like to write
tests that involve a selection of truly corrupt data. That's a new
requirement, and one that I have my doubts about.
I'll stick with your example. You're calling
TransactionIdDidCommit() from check_tuphdr_xids(), which will
interrogate the commit log and pg_subtrans. It's just not under your
control.in a production build this would just fail with an error that the
pg_xact file cannot be found, which is fine -- if this happens in a
production system, you're not disturbing any other sessions. Or maybe
the file is there and the byte can be read, in which case you would get
the correct response; but that's fine too.
I think that this is fine, too, since I don't consider assertion
failures with corrupt data all that important. I'd make some effort to
avoid it, but not too much, and not at the expense of a useful general
purpose assertion that could catch bugs in many different contexts.
I would be willing to make a larger effort to avoid crashing a
backend, since that affects production. I might go to some effort to
not crash with downright adversarial inputs, for example. But it seems
inappropriate to take extreme measures just to avoid a crash with
extremely contrived inputs that will probably never occur. My sense is
that this is subject to sharply diminishing returns. Completely
nailing down hard crashes from corrupt data seems like the wrong
priority, at the very least. Pursuing that objective over other
objectives sounds like zero-risk bias.
--
Peter Geoghegan
On 2020-May-13, Peter Geoghegan wrote:
On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Hmm. I think we should (try to?) write code that avoids all crashes
with production builds, but not extend that to assertion failures.Assertions are only a problem at all because Mark would like to write
tests that involve a selection of truly corrupt data. That's a new
requirement, and one that I have my doubts about.
I agree that this (a test tool that exercises our code against
arbitrarily corrupted data pages) is not going to work as a test that
all buildfarm members run -- it seems something for specialized
buildfarm members to run, or even something that's run outside of the
buildfarm, like sqlsmith. Obviously such a tool would not be able to
run against an assertion-enabled build, and we shouldn't even try.
I would be willing to make a larger effort to avoid crashing a
backend, since that affects production. I might go to some effort to
not crash with downright adversarial inputs, for example. But it seems
inappropriate to take extreme measures just to avoid a crash with
extremely contrived inputs that will probably never occur. My sense is
that this is subject to sharply diminishing returns. Completely
nailing down hard crashes from corrupt data seems like the wrong
priority, at the very least. Pursuing that objective over other
objectives sounds like zero-risk bias.
I think my initial approach for this would be to use a fuzzing tool that
generates data blocks semi-randomly, then uses them as Postgres data
pages somehow, and see what happens -- examine any resulting crashes and
make individual judgement calls about the fix(es) necessary to prevent
each of them. I expect that many such pages would be rejected as
corrupt by page header checks.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, May 13, 2020 at 4:32 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I think my initial approach for this would be to use a fuzzing tool that
generates data blocks semi-randomly, then uses them as Postgres data
pages somehow, and see what happens -- examine any resulting crashes and
make individual judgement calls about the fix(es) necessary to prevent
each of them. I expect that many such pages would be rejected as
corrupt by page header checks.
As I mentioned in my response to Robert earlier, that's more or less
been my experience with adversarial corruption generated using
pg_hexedit. Within nbtree, as well as heapam. I put a lot of work into
that tool, and have used it to simulate all kinds of weird scenarios.
I've done things like corrupt individual tuple header fields, swap
line pointers, create circular sibling links in indexes, corrupt
varlena headers, and corrupt line pointer flags/status bits. Postgres
itself rarely segfaults, and amcheck will only segfault with a truly
contrived input.
--
Peter Geoghegan
On May 13, 2020, at 3:29 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Hmm. I think we should (try to?) write code that avoids all crashes
with production builds, but not extend that to assertion failures.Assertions are only a problem at all because Mark would like to write
tests that involve a selection of truly corrupt data. That's a new
requirement, and one that I have my doubts about.I'll stick with your example. You're calling
TransactionIdDidCommit() from check_tuphdr_xids(), which will
interrogate the commit log and pg_subtrans. It's just not under your
control.in a production build this would just fail with an error that the
pg_xact file cannot be found, which is fine -- if this happens in a
production system, you're not disturbing any other sessions. Or maybe
the file is there and the byte can be read, in which case you would get
the correct response; but that's fine too.I think that this is fine, too, since I don't consider assertion
failures with corrupt data all that important. I'd make some effort to
avoid it, but not too much, and not at the expense of a useful general
purpose assertion that could catch bugs in many different contexts.
I am not removing any assertions. I do not propose to remove any assertions. When I talk about "hardening against assertions", that is not in any way a proposal to remove assertions from the code. What I'm talking about is writing the amcheck contrib module code in such a way that it only calls a function that could assert on bad data after checking that the data is not bad.
I don't know that hardening against assertions in this manner is worth doing, but this is none the less what I'm talking about. You have made decent arguments that it probably isn't worth doing for the btree checking code. And in any event, it is probably something that could be addressed in a future patch after getting this patch committed.
There is a separate but related question in the offing about whether the backend code, independently of any amcheck contrib stuff, should be more paranoid in how it processes tuples to check for corruption. The heap deform tuple code in question is on a pretty hot code path, and I don't know that folks would accept the performance hit of more checks being done in that part of the system, but that's pretty far from relevant to this patch. That should be hashed out, or not, at some other time on some other thread.
I would be willing to make a larger effort to avoid crashing a
backend, since that affects production. I might go to some effort to
not crash with downright adversarial inputs, for example. But it seems
inappropriate to take extreme measures just to avoid a crash with
extremely contrived inputs that will probably never occur.
I think this is a misrepresentation of the tests that I've been running. There are two kinds of tests that I have done:
First, there is the regression tests, t/004_verify_heapam.pl, which is obviously contrived. That was included in the regression test suite because it needed to be something other developers could read, verify, "yeah, I can see why that would be corruption, and would give an error message of the sort the test expects", and then could be run to verify that indeed that expected error message was generated.
The second kind of corruption test I have been running is nothing more than writing random nonsense into randomly chosen locations within heap files and then running verify_heapam against those heap relations. It's much more Murphy than Machiavelli when it's just generated by calling random(). When I initially did this kind of testing, the heapam checking code had lots of problems. Now it doesn't. There's very little contrived about that which I can see. It's the kind of corruption you'd expect from any number of faulty storage systems. The one "contrived" aspect of my testing in this regard is that the script I use to write random nonsense to random locations in heap files is smart enough not to write random junk to the page headers. This is because if I corrupt the page headers, the backend never even gets as far as running the verify_heapam functions, as the page cache rejects loading the page.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, May 13, 2020 at 5:18 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I am not removing any assertions. I do not propose to remove any assertions. When I talk about "hardening against assertions", that is not in any way a proposal to remove assertions from the code.
I'm sorry if I seemed to suggest that you wanted to remove assertions,
rather than test more things earlier. I recognize that that could be a
useful thing to do, both in general, and maybe even in the specific
example you gave -- on general robustness grounds. At the same time,
it's something that can only be taken so far. It's probably not going
to make it practical to corrupt data in a regression test or tap test.
There is a separate but related question in the offing about whether the backend code, independently of any amcheck contrib stuff, should be more paranoid in how it processes tuples to check for corruption.
I bet that there is something that we could do to be a bit more
defensive. Of course, we do a certain amount of that on general
robustness grounds already. A systematic review of that could be quite
useful. But as you point out, it's not really in scope here.
I would be willing to make a larger effort to avoid crashing a
backend, since that affects production. I might go to some effort to
not crash with downright adversarial inputs, for example. But it seems
inappropriate to take extreme measures just to avoid a crash with
extremely contrived inputs that will probably never occur.I think this is a misrepresentation of the tests that I've been running.
I didn't actually mean it that way, but I can see how my words could
reasonably be interpreted that way. I apologize.
There are two kinds of tests that I have done:
First, there is the regression tests, t/004_verify_heapam.pl, which is obviously contrived. That was included in the regression test suite because it needed to be something other developers could read, verify, "yeah, I can see why that would be corruption, and would give an error message of the sort the test expects", and then could be run to verify that indeed that expected error message was generated.
I still don't think that this is necessary. It could work for one type
of corruption, that happens to not have any of the problems, but just
testing that one type of corruption seems rather arbitrary to me.
The second kind of corruption test I have been running is nothing more than writing random nonsense into randomly chosen locations within heap files and then running verify_heapam against those heap relations. It's much more Murphy than Machiavelli when it's just generated by calling random().
That sounds like a good initial test case, to guide your intuitions
about how to make the feature robust.
--
Peter Geoghegan
On May 13, 2020, at 5:36 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, May 13, 2020 at 5:18 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:I am not removing any assertions. I do not propose to remove any assertions. When I talk about "hardening against assertions", that is not in any way a proposal to remove assertions from the code.
I'm sorry if I seemed to suggest that you wanted to remove assertions
Not a problem at all. As always, I appreciate your involvement in this code and design review.
I think this is a misrepresentation of the tests that I've been running.
I didn't actually mean it that way, but I can see how my words could
reasonably be interpreted that way. I apologize.
Again, no worries.
There are two kinds of tests that I have done:
First, there is the regression tests, t/004_verify_heapam.pl, which is obviously contrived. That was included in the regression test suite because it needed to be something other developers could read, verify, "yeah, I can see why that would be corruption, and would give an error message of the sort the test expects", and then could be run to verify that indeed that expected error message was generated.
I still don't think that this is necessary. It could work for one type
of corruption, that happens to not have any of the problems, but just
testing that one type of corruption seems rather arbitrary to me.
As discussed with Robert off list, this probably doesn't matter. The patch can be committed with or without this particular TAP test.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, May 13, 2020 at 5:33 PM Peter Geoghegan <pg@bowt.ie> wrote:
Do you recall seeing corruption resulting in segfaults in production?
I have seen that, I believe. I think it's more common to fail with
errors about not being able to palloc>1GB, not being able to look up
an xid or mxid, etc. but I am pretty sure I've seen multiple cases
involving seg faults, too. Unfortunately for my credibility, I can't
remember the details right now.
I personally don't recall seeing that. If it happened, the segfaults
themselves probably wouldn't be the main concern.
I don't really agree. Hypothetically speaking, suppose you corrupt
your only copy of a critical table in such a way that every time you
select from it, the system seg faults. A user in this situation might
ask questions like:
1. How did my table get corrupted?
2. Why do I only have one copy of it?
3. How do I retrieve the non-corrupted portion of my data from that
table and get back up and running?
In the grand scheme of things, #1 and #2 are the most important
questions, but when something like this actually happens, #3 tends to
be the most urgent question, and it's a lot harder to get the
uncorrupted data out if the system keeps crashing.
Also, a seg fault tends to lead customers to think that the database
has a bug, rather than that the database is corrupted.
Slightly off-topic here, but I think our error reporting in this area
is pretty lame. I've learned over the years that when a customer
reports that they get a complaint about a too-large memory allocation
every time they access a table, they've probably got a corrupted
varlena header. However, that's extremely non-obvious to a typical
user. We should try to report errors indicative of corruption in a way
that gives the user some clue that corruption has happened. Peter made
a stab at improving things there by adding
errcode(ERRCODE_DATA_CORRUPTED) in a bunch of places, but a lot of
users will never see the error code, only the message, and a lot of
corruption produces still produces errors that weren't changed by that
commit.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, May 13, 2020 at 7:32 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I agree that this (a test tool that exercises our code against
arbitrarily corrupted data pages) is not going to work as a test that
all buildfarm members run -- it seems something for specialized
buildfarm members to run, or even something that's run outside of the
buildfarm, like sqlsmith. Obviously such a tool would not be able to
run against an assertion-enabled build, and we shouldn't even try.
I have a question about what you mean here by "arbitrarily."
If you mean that we shouldn't have the buildfarm run the proposed heap
corruption checker against heap pages full of randomly-generated
garbage, I tend to agree. Such a test wouldn't be very stable and
might fail in lots of low-probability ways that could require
unreasonable effort to find and fix.
If you mean that we shouldn't have the buildfarm run the proposed heap
corruption checker against any corrupted heap pages at all, I tend to
disagree. If we did that, then we'd basically be releasing a heap
corruption checker with very limited test coverage. Like, we shouldn't
only have negative test cases, where the absence of corruption
produces no results. We should also have positive test cases, where
the thing finds some problem...
At least, that's what I think.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, May 14, 2020 at 11:33 AM Robert Haas <robertmhaas@gmail.com> wrote:
I have seen that, I believe. I think it's more common to fail with
errors about not being able to palloc>1GB, not being able to look up
an xid or mxid, etc. but I am pretty sure I've seen multiple cases
involving seg faults, too. Unfortunately for my credibility, I can't
remember the details right now.
I believe you, both in general, and also because what you're saying
here is plausible, even if it doesn't fit my own experience.
Corruption is by its very nature exceptional. At least, if that isn't
true then something must be seriously wrong, so the idea that it will
be different in some way each time seems like a good working
assumption. Your exceptional cases are not necessarily the same as
mine, especially where hardware problems are concerned. On the other
hand, it's also possible for corruption that originates from very
different sources to exhibit the same basic inconsistencies and
symptoms.
I've noticed that SLRU corruption is often a leading indicator of
general storage problems. The inconsistencies between certain SLRU
state and the heap happens to be far easier to notice in practice,
particularly when VACUUM runs. But it's not fundamentally different to
inconsistencies from pages within one single main fork of some heap
relation.
I personally don't recall seeing that. If it happened, the segfaults
themselves probably wouldn't be the main concern.I don't really agree. Hypothetically speaking, suppose you corrupt
your only copy of a critical table in such a way that every time you
select from it, the system seg faults. A user in this situation might
ask questions like:
I agree that that could be a problem. But that's not what I've seen
happen in production systems myself.
Maybe there is some low hanging fruit here. Perhaps we can make the
real PageGetItemId() a little closer to PageGetItemIdCareful() without
noticeable overhead, as I suggested already. Are there any real
generalizations that we can make about why backends segfault with
corrupt data? Maybe there is. That seems important.
Slightly off-topic here, but I think our error reporting in this area
is pretty lame. I've learned over the years that when a customer
reports that they get a complaint about a too-large memory allocation
every time they access a table, they've probably got a corrupted
varlena header.
I certainlt learned the same lesson in the same way.
However, that's extremely non-obvious to a typical
user. We should try to report errors indicative of corruption in a way
that gives the user some clue that corruption has happened. Peter made
a stab at improving things there by adding
errcode(ERRCODE_DATA_CORRUPTED) in a bunch of places, but a lot of
users will never see the error code, only the message, and a lot of
corruption produces still produces errors that weren't changed by that
commit.
The theory is that "can't happen" errors having an errcode that should
be considered similar to or equivalent to ERRCODE_DATA_CORRUPTED. I
doubt that it works out that way in practice, though.
--
Peter Geoghegan
On 2020-May-14, Robert Haas wrote:
I have a question about what you mean here by "arbitrarily."
If you mean that we shouldn't have the buildfarm run the proposed heap
corruption checker against heap pages full of randomly-generated
garbage, I tend to agree. Such a test wouldn't be very stable and
might fail in lots of low-probability ways that could require
unreasonable effort to find and fix.
This is what I meant. I was thinking of blocks generated randomly.
If you mean that we shouldn't have the buildfarm run the proposed heap
corruption checker against any corrupted heap pages at all, I tend to
disagree.
Yeah, IMV those would not be arbitrarily corrupted -- instead they're
crafted to be corrupted in some specific way.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2020-May-14, Robert Haas wrote:
If you mean that we shouldn't have the buildfarm run the proposed heap
corruption checker against heap pages full of randomly-generated
garbage, I tend to agree. Such a test wouldn't be very stable and
might fail in lots of low-probability ways that could require
unreasonable effort to find and fix.
This is what I meant. I was thinking of blocks generated randomly.
Yeah, -1 for using random data --- when it fails, how you gonna
reproduce the problem?
If you mean that we shouldn't have the buildfarm run the proposed heap
corruption checker against any corrupted heap pages at all, I tend to
disagree.
Yeah, IMV those would not be arbitrarily corrupted -- instead they're
crafted to be corrupted in some specific way.
I think there's definitely value in corrupting data in some predictable
(reproducible) way and verifying that the check code catches it and
responds as expected. Sure, this will not be 100% coverage, but it'll be
a lot better than 0% coverage.
regards, tom lane
On 2020-05-11 19:21, Mark Dilger wrote:
1) A new module, pg_amcheck, which includes a command line client for checking a database or subset of a database. Internally it functions by querying the database for a list of tables which are appropriate given the command line switches, and then calls amcheck's functions to validate each table and/or index. The options for selecting/excluding tables and schemas is patterned on pg_dump, on the assumption that interface is already familiar to users.
Why is this useful over just using the extension's functions via psql?
I suppose you could make an argument for a command-line wrapper around
almost every admin-focused contrib module (pageinspect, pg_prewarm,
pgstattuple, ...), but that doesn't seem very sensible.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On May 14, 2020, at 1:02 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
On 2020-05-11 19:21, Mark Dilger wrote:
1) A new module, pg_amcheck, which includes a command line client for checking a database or subset of a database. Internally it functions by querying the database for a list of tables which are appropriate given the command line switches, and then calls amcheck's functions to validate each table and/or index. The options for selecting/excluding tables and schemas is patterned on pg_dump, on the assumption that interface is already familiar to users.
Why is this useful over just using the extension's functions via psql?
The tool doesn't hold a single snapshot or transaction for the lifetime of checking the entire database. A future improvement to the tool might add parallelism. Users could do all of this in scripts, but having a single tool with the most commonly useful options avoids duplication of effort.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, May 11, 2020 at 10:51 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Here is v5 of the patch. Major changes in this version include:
1) A new module, pg_amcheck, which includes a command line client for checking a database or subset of a database. Internally it functions by querying the database for a list of tables which are appropriate given the command line switches, and then calls amcheck's functions to validate each table and/or index. The options for selecting/excluding tables and schemas is patterned on pg_dump, on the assumption that interface is already familiar to users.
2) amcheck's btree checking functions have been refactored to be able to operate in two modes; the original mode in which all errors are reported via ereport, and a new mode for returning errors as rows from a set returning function. The new mode is used by a new function verify_btreeam(), analogous to verify_heapam(), both of which are used by the pg_amcheck command line tool.
3) The regression test which generates corruption within a table uses the pageinspect module to determine the location of each tuple on disk for corrupting. This was suggested upthread.
Testing on the command line shows that the pre-existing btree checking code could use some hardening, as it currently crashes the backend on certain corruptions. When I corrupt relation files for tables and indexes in the backend and then use pg_amcheck to check all objects in the database, I keep getting assertions from the btree checking code. I think I need to harden this code, but wanted to post an updated patch and solicit opinions before doing so. Here are some example problems I'm seeing. Note the stack trace when calling from the command line tool includes the new verify_btreeam function, but you can get the same crashes using the old interface via psql:
From psql, first error:
test=# select bt_index_parent_check('corrupted_idx', true, true);
TRAP: FailedAssertion("_bt_check_natts(rel, key->heapkeyspace, page, offnum)", File: "nbtsearch.c", Line: 663)
0 postgres 0x0000000106872977 ExceptionalCondition + 103
1 postgres 0x00000001063a33e2 _bt_compare + 1090
2 amcheck.so 0x0000000106d62921 bt_target_page_check + 6033
3 amcheck.so 0x0000000106d5fd2f bt_index_check_internal + 2847
4 amcheck.so 0x0000000106d60433 bt_index_parent_check + 67
5 postgres 0x00000001064d6762 ExecInterpExpr + 1634
6 postgres 0x000000010650d071 ExecResult + 321
7 postgres 0x00000001064ddc3d standard_ExecutorRun + 301
8 postgres 0x00000001066600c5 PortalRunSelect + 389
9 postgres 0x000000010665fc7f PortalRun + 527
10 postgres 0x000000010665ed59 exec_simple_query + 1641
11 postgres 0x000000010665c99d PostgresMain + 3661
12 postgres 0x00000001065d6a8a BackendRun + 410
13 postgres 0x00000001065d61c4 ServerLoop + 3044
14 postgres 0x00000001065d2fe9 PostmasterMain + 3769
15 postgres 0x000000010652e3b0 help + 0
16 libdyld.dylib 0x00007fff6725fcc9 start + 1
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: 2020-05-11 10:11:47.394 PDT [41091] LOG: server process (PID 41309) was terminated by signal 6: Abort trap: 6From commandline, second error:
pgtest % pg_amcheck -i test
(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
tuple xmin = 3289393 is in the future
(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
tuple xmax = 0 precedes relation relminmxid = 1
(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
tuple xmin = 12593 is in the future
(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)<snip>
(relname=corrupted,blkno=107,offnum=20,lp_off=7392,lp_flags=1,lp_len=34,attnum=,chunk=)
tuple xmin = 306 precedes relation relfrozenxid = 487
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
tuple xmax = 0 precedes relation relminmxid = 1
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
tuple xmin = 305 precedes relation relfrozenxid = 487
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
t_hoff > lp_len (54 > 34)
(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
t_hoff not max-aligned (54)
TRAP: FailedAssertion("TransactionIdIsValid(xmax)", File: "heapam_visibility.c", Line: 1319)
0 postgres 0x0000000105b22977 ExceptionalCondition + 103
1 postgres 0x0000000105636e86 HeapTupleSatisfiesVacuum + 1158
2 postgres 0x0000000105634aa1 heapam_index_build_range_scan + 1089
3 amcheck.so 0x00000001060100f3 bt_index_check_internal + 3811
4 amcheck.so 0x000000010601057c verify_btreeam + 316
5 postgres 0x0000000105796266 ExecMakeTableFunctionResult + 422
6 postgres 0x00000001057a8c35 FunctionNext + 101
7 postgres 0x00000001057bbf3e ExecNestLoop + 478
8 postgres 0x000000010578dc3d standard_ExecutorRun + 301
9 postgres 0x00000001059100c5 PortalRunSelect + 389
10 postgres 0x000000010590fc7f PortalRun + 527
11 postgres 0x000000010590ed59 exec_simple_query + 1641
12 postgres 0x000000010590c99d PostgresMain + 3661
13 postgres 0x0000000105886a8a BackendRun + 410
14 postgres 0x00000001058861c4 ServerLoop + 3044
15 postgres 0x0000000105882fe9 PostmasterMain + 3769
16 postgres 0x00000001057de3b0 help + 0
17 libdyld.dylib 0x00007fff6725fcc9 start + 1
pg_amcheck: error: query failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
I have just browsed through the patch and the idea is quite
interesting. I think we can expand it to check that whether the flags
set in the infomask are sane or not w.r.t other flags and xid status.
Some examples are
- If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
should not be set in new_infomask2.
- If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
actually cross verify the transaction status from the CLOG and check
whether is matching the hint bit or not.
While browsing through the code I could not find that we are doing
this kind of check, ignore if we are already checking this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On May 11, 2020, at 10:21 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
<v5-0001-Adding-verify_heapam-and-pg_amcheck.patch>
Rebased with some whitespace fixes, but otherwise unmodified from v5.
Attachments:
v6-0001-Adding-verify_heapam-and-pg_amcheck.patchapplication/octet-stream; name=v6-0001-Adding-verify_heapam-and-pg_amcheck.patch; x-unix-mode=0644Download
From 3cbf8970ea647268aa31c4e00a1b8427f6134999 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Thu, 11 Jun 2020 11:18:59 -0700
Subject: [PATCH v6] Adding verify_heapam and pg_amcheck
Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it. This is
used by the new pg_amcheck command line tool for reporting back to
the caller.
---
contrib/Makefile | 1 +
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 54 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/amcheck.h | 5 +
contrib/amcheck/expected/check_btree.out | 31 +
contrib/amcheck/expected/check_heap.out | 58 +
.../amcheck/expected/disallowed_reltypes.out | 48 +
contrib/amcheck/sql/check_btree.sql | 10 +
contrib/amcheck/sql/check_heap.sql | 34 +
contrib/amcheck/sql/disallowed_reltypes.sql | 48 +
contrib/amcheck/t/skipping.pl | 101 ++
contrib/amcheck/verify_heapam.c | 1024 +++++++++++++++++
contrib/amcheck/verify_nbtree.c | 750 ++++++------
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 884 ++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 55 +
contrib/pg_amcheck/t/003_check.pl | 85 ++
contrib/pg_amcheck/t/004_verify_heapam.pl | 407 +++++++
doc/src/sgml/amcheck.sgml | 106 +-
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pg_amcheck.sgml | 136 +++
25 files changed, 3557 insertions(+), 331 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/skipping.pl
create mode 100644 contrib/amcheck/verify_heapam.c
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 doc/src/sgml/pg_amcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..c21c27cbeb 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..27d38b2e86 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..2ab7d8b0d2
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,54 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(rel regclass,
+ on_error_stop boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
+
+--
+-- verify_btreeam()
+--
+CREATE FUNCTION verify_btreeam(rel regclass,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+CREATE FUNCTION verify_btreeam(rel regclass,
+ on_error_stop boolean,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_btreeam(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION verify_btreeam(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..74edfc2f65
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index f82f48d23b..c1acf238d7 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -21,6 +21,8 @@ SELECT bt_index_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_check
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_parent_check
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
-- to run this cluster-wide with a restricted account, and as tested
@@ -29,6 +31,7 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -42,23 +45,31 @@ SELECT bt_index_parent_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
ERROR: "bttest_a" is not an index
SELECT bt_index_parent_check('bttest_a');
ERROR: "bttest_a" is not an index
+SELECT * FROM verify_btreeam('bttest_a');
+ERROR: "bttest_a" is not an index
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
ERROR: could not open relation with OID 17
SELECT bt_index_parent_check(17);
ERROR: could not open relation with OID 17
+SELECT * FROM verify_btreeam(17);
+ERROR: could not open relation with OID 17
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
ERROR: only B-Tree indexes are supported as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
@@ -67,6 +78,11 @@ SELECT bt_index_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
bt_index_check
@@ -93,6 +109,11 @@ SELECT bt_index_parent_check('bttest_b_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -118,6 +139,11 @@ SELECT bt_index_check('bttest_multi_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
bt_index_parent_check
@@ -134,6 +160,11 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
-- checks
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..6d30ca8023
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,58 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..892ae89652
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_btree.sql b/contrib/amcheck/sql/check_btree.sql
index a1fef644cb..f5d0f8c1f6 100644
--- a/contrib/amcheck/sql/check_btree.sql
+++ b/contrib/amcheck/sql/check_btree.sql
@@ -24,6 +24,7 @@ CREATE ROLE regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx'::regclass);
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
@@ -33,27 +34,33 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
SELECT bt_index_parent_check('bttest_a');
+SELECT * FROM verify_btreeam('bttest_a');
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
SELECT bt_index_parent_check(17);
+SELECT * FROM verify_btreeam(17);
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
SELECT bt_index_parent_check('bttest_b_idx', true);
@@ -61,6 +68,7 @@ SELECT bt_index_parent_check('bttest_b_idx', true);
BEGIN;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_b_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -74,6 +82,7 @@ SELECT bt_index_check('bttest_a_idx', true);
-- normal check outside of xact for index with included columns
SELECT bt_index_check('bttest_multi_idx');
+SELECT * FROM verify_btreeam('bttest_multi_idx');
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
@@ -81,6 +90,7 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
TRUNCATE bttest_multi;
INSERT INTO bttest_multi SELECT i, i%2 FROM generate_series(1, 100000) as i;
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
+SELECT * FROM verify_btreeam('bttest_multi_idx');
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..5759d5526e
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,34 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..fc90e6ca33
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
diff --git a/contrib/amcheck/t/skipping.pl b/contrib/amcheck/t/skipping.pl
new file mode 100644
index 0000000000..e716fc8c33
--- /dev/null
+++ b/contrib/amcheck/t/skipping.pl
@@ -0,0 +1,101 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 183;
+
+my ($node, $result);
+
+# Check various options are stable (don't abort) when running verify_heapam on
+# the test table. For uncorrupted tables, there isn't anything to check except
+# that it runs without crashing.
+sub check_all_options
+{
+ for my $stop (qw(NULL true false))
+ {
+ for my $skip ("NULL", "'all frozen'", "'all visible'")
+ {
+ for my $startblock (qw(NULL 5))
+ {
+ for my $endblock (qw(NULL 10))
+ {
+ my $check = "SELECT verify_heapam('test', $stop, $skip, " .
+ "$startblock, $endblock)";
+ $result = $node->safe_psql('postgres', "$check; SELECT 1");
+ is ($result, 1, "checked: $check");
+ }
+ }
+ }
+ }
+}
+
+# Stops the server and writes nulls in the first page of the table,
+# assuming page size is large enough for offset 1000..1016 to be
+# in the midst of the first page of data.
+sub corrupt_first_page
+{
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('test')));
+ my $relpath = "$pgdata/$rel";
+ $node->stop;
+
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 1000, 0);
+ syswrite($fh, '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', 16);
+ close($fh);
+
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Check empty table
+$node->safe_psql('postgres', q(
+ CREATE TABLE test (a integer);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+));
+check_all_options();
+
+# Check table with trivial data
+$node->safe_psql('postgres', q(INSERT INTO test VALUES (0)));
+check_all_options();
+
+# Check table with non-trivial data (more than a page worth) but
+# without any all frozen or all visible
+$node->safe_psql('postgres', q(
+INSERT INTO test SELECT generate_series(1,10000)));
+check_all_options();
+
+# Check table with all-visible data
+$node->safe_psql('postgres', q(VACUUM test));
+check_all_options();
+
+# Check table with all-frozen data
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+check_all_options();
+
+# Check table with corruption, no skipping
+corrupt_first_page();
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is($result, 't', 'corruption detected on first page');
+
+# Check table with corruption, skipping all visible blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all visible', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all visible first page');
+
+# Check table with corruption, skipping all frozen blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all frozen', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all frozen first page');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..1bddff7fc6
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1024 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "amcheck.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam() execution.
+ */
+typedef struct HeapCheckContext
+{
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Values for returning tuples */
+ bool is_corrupt; /* have we encountered any corruption? */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext * ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx);
+static bool check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx);
+static bool check_tuple_attribute(HeapCheckContext * ctx);
+static void check_tuple(HeapCheckContext * ctx);
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+#define HEAPCHECK_RELATION_COLS 8
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ bool randomAccess;
+ HeapCheckContext ctx;
+ FullTransactionId nextFullXid;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool skip_all_frozen = false;
+ bool skip_all_visible = false;
+ int64 startblock = -1;
+ int64 endblock = -1;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ relid = PG_GETARG_OID(0);
+ on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+ if (!PG_ARGISNULL(2))
+ {
+ const char *skip = PG_GETARG_CSTRING(2);
+
+ if (pg_strcasecmp(skip, "all visible") == 0)
+ {
+ skip_all_visible = true;
+ }
+ else if (pg_strcasecmp(skip, "all frozen") == 0)
+ {
+ skip_all_visible = true;
+ skip_all_frozen = true;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all visible', 'all frozen', "
+ "or NULL")));
+ }
+ }
+ if (!PG_ARGISNULL(3))
+ startblock = PG_GETARG_INT64(3);
+ if (!PG_ARGISNULL(4))
+ endblock = PG_GETARG_INT64(4);
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those. This protection saves us from
+ * having to reacquire the locks and recheck those minimums for every
+ * tuple, which would be expensive.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance. We keep a cached copy of the oldest valid xid that we may
+ * encounter in the table, which is relfrozenxid if valid, and oldestXid
+ * otherwise.
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldestValidXid = ctx.relfrozenxid;
+
+ /* check all blocks of the relation */
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
+
+ for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ ctx.tuphdr = NULL;
+ ctx.natts = 0;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = 0; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead/redirected line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) ||
+ ItemIdIsDead(ctx.itemid) ||
+ ItemIdIsRedirected(ctx.itemid))
+ continue;
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, "
+ "or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap AM",
+ RelationGetRelationName(rel))));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext * ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext * ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldestValidXid, xid) &&
+ TransactionIdPrecedes(xid, ctx->nextKnownValidXid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->nextKnownValidXid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * check_tuphdr_xids
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!TransactionIdValidInRel(xvac, ctx))
+ {
+ confess(ctx, psprintf("tuple xvac = %u invalid", xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return false; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx,
+ pstrdup("heap tuple with XMAX_IS_MULTI is "
+ "neither LOCKED_ONLY nor has a "
+ "valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequencenumber is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx, pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("toast chunk is neither short nor extended"));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "exceeds the end chunk sequence "
+ "number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx, psprintf("chunk size %u differs from "
+ "expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * The caller should have iterated to a tuple via
+ * tupleAttributeIteration_next.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext * ctx)
+{
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask = ctx->tuphdr->t_infomask;
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
+ ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+ ctx->tuphdr->t_hoff, ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_1B_E(tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx, pstrdup("attribute is external but tuple header "
+ "flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx, pstrdup("attribute is external but relation has "
+ "no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
+
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ attr = (struct varlena *) redirect.pointer;
+
+ /* nested indirect Datums aren't allowed */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ confess(ctx, pstrdup("attribute has nested external "
+ "indirect toast pointer"));
+ return true;
+ }
+ }
+
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching
+ * va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast
+ * table, accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx, psprintf("final chunk number differs from "
+ "expected (%u vs. %u)",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx, pstrdup("toasted value missing from "
+ "toast table"));
+ systable_endscan_ordered(toastscan);
+ }
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext * ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relminmxid = %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmin = %u precedes relation "
+ "relfrozenxid = %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmin, ctx))
+ {
+ confess(ctx, psprintf("tuple xmin = %u is in the future",
+ xmin));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relfrozenxid = %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmax, ctx))
+ {
+ confess(ctx, psprintf("tuple xmax = %u is in the future",
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff > lp_len (%u > %u)",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx, psprintf("t_hoff not max-aligned (%u)",
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the natts field or the infomask
+ * bit HEAP_HASNULL.
+ */
+ if (infomask & HEAP_HASNULL &&
+ SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ {
+ confess(ctx, psprintf("SizeofHeapTupleHeader + "
+ "BITMAPLEN(natts) > t_hoff "
+ "(%u + %u > %u)",
+ (unsigned) SizeofHeapTupleHeader,
+ BITMAPLEN(ctx->natts),
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!check_tuphdr_xids(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx,
+ psprintf("relation natts < tuple natts (%u < %u)",
+ RelationGetDescr(ctx->rel)->natts,
+ ctx->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+ ctx->offset = -1;
+ ctx->attnum = -1;
+}
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index e4d501a85d..bf68b554a8 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -32,16 +32,22 @@
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "commands/tablecmds.h"
+#include "funcapi.h"
#include "lib/bloomfilter.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
#include "storage/smgr.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
-
+#include "amcheck.h"
PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(bt_index_check);
+PG_FUNCTION_INFO_V1(bt_index_parent_check);
+PG_FUNCTION_INFO_V1(verify_btreeam);
+
/*
* A B-Tree cannot possibly have this many levels, since there must be one
* block per level, which is bound by the range of BlockNumber:
@@ -50,6 +56,20 @@ PG_MODULE_MAGIC;
#define BTreeTupleGetNKeyAtts(itup, rel) \
Min(IndexRelationGetNumberOfKeyAttributes(rel), BTreeTupleGetNAtts(itup, rel))
+/*
+ * Context for use within verify_btreeam()
+ */
+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;
+
+#define CONTINUE_CHECKING(ctx) \
+ (ctx == NULL || !((ctx)->is_corrupt && (ctx)->on_error_stop))
+
/*
* State associated with verifying a B-Tree index
*
@@ -116,6 +136,9 @@ typedef struct BtreeCheckState
bloom_filter *filter;
/* Debug counter */
int64 heaptuplespresent;
+
+ /* Error reporting context */
+ BtreeCheckContext *ctx;
} BtreeCheckState;
/*
@@ -133,16 +156,14 @@ typedef struct BtreeLevel
bool istruerootlevel;
} BtreeLevel;
-PG_FUNCTION_INFO_V1(bt_index_check);
-PG_FUNCTION_INFO_V1(bt_index_parent_check);
-
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);
static inline void btree_index_checkable(Relation rel);
static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
- bool rootdescend);
+ bool rootdescend, BtreeCheckContext * ctx);
static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
BtreeLevel level);
static void bt_target_page_check(BtreeCheckState *state);
@@ -185,6 +206,26 @@ static inline ItemPointer BTreeTupleGetHeapTIDCareful(BtreeCheckState *state,
IndexTuple itup, bool nonpivot);
static inline ItemPointer BTreeTupleGetPointsToTID(IndexTuple itup);
+static TupleDesc verify_btreeam_tupdesc(void);
+static void confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg);
+
+/*
+ * Macro for either calling ereport(...) or confess(...) depending on whether
+ * a context for returning the error message exists. Prior to version 1.3,
+ * all functions reported any detected corruption via ereport, but starting in
+ * 1.3, the new function verify_btreeam reports detected corruption back to
+ * the caller as a set of rows, and pre-existing functions continue to report
+ * corruption via ereport. This macro allows the shared implementation to
+ * to do the right thing depending on context.
+ */
+#define econfess(ctx, blkno, code, ...) \
+ do { \
+ if (ctx) \
+ confess(ctx, blkno, psprintf(__VA_ARGS__)); \
+ else \
+ ereport(ERROR, (errcode(code), errmsg(__VA_ARGS__))); \
+ } while(0)
+
/*
* bt_index_check(index regclass, heapallindexed boolean)
*
@@ -203,7 +244,7 @@ bt_index_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 2)
heapallindexed = PG_GETARG_BOOL(1);
- bt_index_check_internal(indrelid, false, heapallindexed, false);
+ bt_index_check_internal(indrelid, false, heapallindexed, false, NULL);
PG_RETURN_VOID();
}
@@ -229,17 +270,66 @@ bt_index_parent_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 3)
rootdescend = PG_GETARG_BOOL(2);
- bt_index_check_internal(indrelid, true, heapallindexed, rootdescend);
+ bt_index_check_internal(indrelid, true, heapallindexed, rootdescend, NULL);
PG_RETURN_VOID();
}
+Datum
+verify_btreeam(PG_FUNCTION_ARGS)
+{
+#define BTREECHECK_RELATION_COLS 2
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ BtreeCheckContext ctx;
+ bool randomAccess;
+ Oid indrelid;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ indrelid = PG_GETARG_OID(0);
+
+ memset(&ctx, 0, sizeof(BtreeCheckContext));
+
+ ctx.on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_btreeam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ bt_index_check_internal(indrelid, true, true, true, &ctx);
+
+ PG_RETURN_NULL();
+}
+
/*
* Helper for bt_index_[parent_]check, coordinating the bulk of the work.
*/
static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
- bool rootdescend)
+ bool rootdescend, BtreeCheckContext * ctx)
{
Oid heapid;
Relation indrel;
@@ -300,15 +390,16 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
RelationOpenSmgr(indrel);
if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" lacks a main relation fork",
- RelationGetRelationName(indrel))));
+ econfess(ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" lacks a main relation fork",
+ RelationGetRelationName(indrel));
/* Check index, possibly against table it is an index on */
- _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
- bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
- heapallindexed, rootdescend);
+ if (CONTINUE_CHECKING(ctx))
+ _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
+ if (CONTINUE_CHECKING(ctx))
+ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
+ heapallindexed, rootdescend, ctx);
}
/*
@@ -402,7 +493,8 @@ btree_index_mainfork_expected(Relation rel)
*/
static void
bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
- bool readonly, bool heapallindexed, bool rootdescend)
+ bool readonly, bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx)
{
BtreeCheckState *state;
Page metapage;
@@ -434,6 +526,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->readonly = readonly;
state->heapallindexed = heapallindexed;
state->rootdescend = rootdescend;
+ state->ctx = ctx;
if (state->heapallindexed)
{
@@ -535,7 +628,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current.level = metad->btm_level;
current.leftmost = metad->btm_root;
current.istruerootlevel = true;
- while (current.leftmost != P_NONE)
+ while (CONTINUE_CHECKING(state->ctx) && current.leftmost != P_NONE)
{
/*
* Verify this level, and get left most page for next level down, if
@@ -544,10 +637,9 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current = bt_check_level_from_leftmost(state, current);
if (current.leftmost == InvalidBlockNumber)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" has no valid pages on level below %u or first level",
- RelationGetRelationName(rel), previouslevel)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" has no valid pages on level below %u or first level",
+ RelationGetRelationName(rel), previouslevel);
previouslevel = current.level;
}
@@ -555,7 +647,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
/*
* * Check whether heap contains unindexed/malformed tuples *
*/
- if (state->heapallindexed)
+ if (CONTINUE_CHECKING(state->ctx) && state->heapallindexed)
{
IndexInfo *indexinfo = BuildIndexInfo(state->rel);
TableScanDesc scan;
@@ -691,18 +783,16 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* checked.
*/
if (state->readonly && P_ISDELETED(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink or sibling link points to deleted block in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "downlink or sibling link points to deleted block in index \"%s\" "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
if (P_RIGHTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u fell off the end of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u fell off the end of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
else
ereport(DEBUG1,
(errcode(ERRCODE_NO_DATA),
@@ -722,16 +812,14 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
if (state->readonly)
{
if (!P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not leftmost in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not leftmost in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
if (level.istruerootlevel && !P_ISROOT(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not true root in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not true root in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
}
/*
@@ -780,21 +868,19 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* so sibling pointers should always be in mutual agreement
*/
if (state->readonly && opaque->btpo_prev != leftcurrent)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("left link/right link pair in index \"%s\" not in agreement",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "left link/right link pair in index \"%s\" not in agreement "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
/* Check level, which must be valid for non-ignorable page */
if (level.level != opaque->btpo.level)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leftmost down link for level points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- current, level.level, opaque->btpo.level)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "leftmost down link for level points to block in index \"%s\" whose level is not one level down "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ current, level.level, opaque->btpo.level);
/* Verify invariants for page */
bt_target_page_check(state);
@@ -803,10 +889,9 @@ nextpage:
/* Try to detect circular links */
if (current == leftcurrent || current == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
leftcurrent = current;
current = opaque->btpo_next;
@@ -850,7 +935,7 @@ nextpage:
/* Free page and associated memory for this iteration */
MemoryContextReset(state->targetcontext);
}
- while (current != P_NONE);
+ while (CONTINUE_CHECKING(state->ctx) && current != P_NONE);
if (state->lowkey)
{
@@ -930,16 +1015,15 @@ bt_target_page_check(BtreeCheckState *state)
P_HIKEY))
{
itup = (IndexTuple) PageGetItem(state->target, itemid);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of high key index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index block=%u natts=%u block type=%s page lsn=%X/%X.",
- state->targetblock,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of high key index tuple attributes in index \"%s\" "
+ "(Index block=%u natts=%u block type=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -949,7 +1033,7 @@ bt_target_page_check(BtreeCheckState *state)
* real item (if any).
*/
for (offset = P_FIRSTDATAKEY(topaque);
- offset <= max;
+ offset <= max && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
ItemId itemid;
@@ -973,16 +1057,15 @@ bt_target_page_check(BtreeCheckState *state)
* frequently, and is surprisingly tolerant of corrupt lp_len fields.
*/
if (tupsize != ItemIdGetLength(itemid))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index tuple size does not equal lp_len in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X.",
- state->targetblock, offset,
- tupsize, ItemIdGetLength(itemid),
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn),
- errhint("This could be a torn page problem.")));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "index tuple size does not equal lp_len in index \"%s\" "
+ "(Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X) "
+ "(This could be a torn page problem)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ tupsize, ItemIdGetLength(itemid),
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check the number of index tuple attributes */
if (!_bt_check_natts(state->rel, state->heapkeyspace, state->target,
@@ -998,17 +1081,16 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X.",
- itid,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of index tuple attributes in index \"%s\" "
+ "(Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1049,14 +1131,13 @@ bt_target_page_check(BtreeCheckState *state)
htid = psprintf("(%u,%u)", ItemPointerGetBlockNumber(tid),
ItemPointerGetOffsetNumber(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("could not find tuple using search from root page in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to heap tid=%s page lsn=%X/%X.",
- itid, htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "could not find tuple using search from root page in index \"%s\" "
+ "(Index tid=%s points to heap tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1079,14 +1160,13 @@ bt_target_page_check(BtreeCheckState *state)
{
char *itid = psprintf("(%u,%u)", state->targetblock, offset);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("posting list contains misplaced TID in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s posting list offset=%d page lsn=%X/%X.",
- itid, i,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "posting list contains misplaced TID in index \"%s\" "
+ "(Index tid=%s posting list offset=%d page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, i,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
ItemPointerCopy(current, &last);
@@ -1134,16 +1214,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index row size %zu exceeds maximum for index \"%s\"",
- tupsize, RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index row size %zu exceeds maximum for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ tupsize, RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Fingerprint leaf page tuples (those that point to the heap) */
@@ -1242,16 +1321,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("high key invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "high key invariant violated for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Reset, in case scantid was set to (itup) posting tuple's max TID */
skey->scantid = scantid;
@@ -1289,21 +1367,20 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Lower index tid=%s (points to %s tid=%s) "
- "higher index tid=%s (points to %s tid=%s) "
- "page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- nitid,
- P_ISLEAF(topaque) ? "heap" : "index",
- nhtid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "item order invariant violated for index \"%s\" "
+ "(Lower index tid=%s (points to %s tid=%s) "
+ "higher index tid=%s (points to %s tid=%s) "
+ "page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ nitid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ nhtid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1354,14 +1431,13 @@ bt_target_page_check(BtreeCheckState *state)
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("cross page item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Last item on page tid=(%u,%u) page lsn=%X/%X.",
- state->targetblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "cross page item order invariant violated for index \"%s\" "
+ "(Last item on page tid=(%u,%u) page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1386,7 +1462,8 @@ bt_target_page_check(BtreeCheckState *state)
* right of the child page pointer to by our rightmost downlink. And they
* might have missing downlinks. This final call checks for them.
*/
- if (!P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
+ if (CONTINUE_CHECKING(state->ctx) &&
+ !P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
{
bt_child_highkey_check(state, InvalidOffsetNumber,
NULL, topaque->btpo.level);
@@ -1708,7 +1785,7 @@ bt_child_highkey_check(BtreeCheckState *state,
}
/* Move to the right on the child level */
- while (true)
+ while (CONTINUE_CHECKING(state->ctx))
{
/*
* Did we traverse the whole tree level and this is check for pages to
@@ -1723,11 +1800,10 @@ bt_child_highkey_check(BtreeCheckState *state,
/* Did we traverse the whole tree level and don't find next downlink? */
if (blkno == P_NONE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't traverse from downlink %u to downlink %u of index \"%s\"",
- state->prevrightlink, downlink,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "can't traverse from downlink %u to downlink %u of index \"%s\"",
+ state->prevrightlink, downlink,
+ RelationGetRelationName(state->rel));
/* Load page contents */
if (blkno == downlink && loaded_child)
@@ -1739,30 +1815,27 @@ bt_child_highkey_check(BtreeCheckState *state,
/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && !P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("the first child of leftmost target page is not leftmost of its level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "the first child of leftmost target page is not leftmost of its level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check level for non-ignorable page */
if (!P_IGNORE(opaque) && opaque->btpo.level != target_level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block found while following rightlinks from child of index \"%s\" has invalid level",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, target_level - 1, opaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "block found while following rightlinks from child of index \"%s\" has invalid level "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, target_level - 1, opaque->btpo.level);
/* Try to detect circular links */
if ((!first && blkno == state->prevrightlink) || blkno == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- blkno, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ blkno, RelationGetRelationName(state->rel));
if (blkno != downlink && !P_IGNORE(opaque))
{
@@ -1825,14 +1898,13 @@ bt_child_highkey_check(BtreeCheckState *state,
if (pivotkey_offset > PageGetMaxOffsetNumber(state->target))
{
if (P_RIGHTMOST(topaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("child high key is greater than rightmost pivot key on target level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "child high key is greater than rightmost pivot key on target level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
pivotkey_offset = P_HIKEY;
}
itemid = PageGetItemIdCareful(state, state->targetblock,
@@ -1856,27 +1928,25 @@ bt_child_highkey_check(BtreeCheckState *state,
* page.
*/
if (!state->lowkey)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't find left sibling high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "can't find left sibling high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
itup = state->lowkey;
}
if (!bt_pivot_tuple_identical(highkey, itup))
{
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("mismatch between parent key and child high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "mismatch between parent key and child high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -2014,17 +2084,16 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* to test.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink to deleted page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child block=%u parent page lsn=%X/%X.",
- state->targetblock, childblock,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted page found in index \"%s\" "
+ "(Parent block=%u child block=%u parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
for (offset = P_FIRSTDATAKEY(copaque);
- offset <= maxoffset;
+ offset <= maxoffset && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
/*
@@ -2056,14 +2125,13 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
if (!invariant_l_nontarget_offset(state, targetkey, childblock, child,
offset))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("down-link lower bound invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X.",
- state->targetblock, childblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "down-link lower bound invariant violated for index \"%s\" "
+ "(Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
pfree(child);
@@ -2150,14 +2218,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* inconsistencies anywhere else.
*/
if (P_ISLEAF(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leaf index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u page lsn=%X/%X.",
- blkno,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "leaf index block lacks downlink in index \"%s\" "
+ "(Block=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/* Descend from the given page, which is an internal page */
elog(DEBUG1, "checking for interrupted multi-level deletion due to missing downlink in index \"%s\"",
@@ -2167,7 +2234,7 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
itemid = PageGetItemIdCareful(state, blkno, page, P_FIRSTDATAKEY(opaque));
itup = (IndexTuple) PageGetItem(page, itemid);
childblk = BTreeTupleGetDownLink(itup);
- for (;;)
+ while (CONTINUE_CHECKING(state->ctx))
{
CHECK_FOR_INTERRUPTS();
@@ -2179,13 +2246,12 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
/* Do an extra sanity check in passing on internal pages */
if (copaque->btpo.level != level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, childblk,
- level - 1, copaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink points to block in index \"%s\" whose level is not one level down "
+ "(Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ level - 1, copaque->btpo.level);
level = copaque->btpo.level;
itemid = PageGetItemIdCareful(state, childblk, child,
@@ -2217,14 +2283,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* parent/ancestor page) lacked a downlink is incidental.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink to deleted leaf page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X.",
- blkno, childblk,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted leaf page found in index \"%s\" "
+ "(Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/*
* Iff leaf page is half-dead, its high key top parent link should point
@@ -2244,14 +2309,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u level=%u page lsn=%X/%X.",
- blkno, opaque->btpo.level,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "internal index block lacks downlink in index \"%s\" "
+ "(Block=%u level=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, opaque->btpo.level,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
}
/*
@@ -2327,16 +2391,12 @@ bt_tuple_present_callback(Relation index, ItemPointer tid, Datum *values,
/* Probe Bloom filter -- tuple should be present */
if (bloom_lacks_element(state->filter, (unsigned char *) norm,
IndexTupleSize(norm)))
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->heaprel),
- RelationGetRelationName(state->rel)),
- !state->readonly
- ? errhint("Retrying verification using the function bt_index_parent_check() might provide a more specific error.")
- : 0));
+ econfess(state->ctx, ItemPointerGetBlockNumber(&(itup->t_tid)), ERRCODE_DATA_CORRUPTED,
+ "heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->heaprel),
+ RelationGetRelationName(state->rel));
state->heaptuplespresent++;
pfree(itup);
@@ -2395,7 +2455,7 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
if (!IndexTupleHasVarwidths(itup))
return itup;
- for (i = 0; i < tupleDescriptor->natts; i++)
+ for (i = 0; CONTINUE_CHECKING(state->ctx) && i < tupleDescriptor->natts; i++)
{
Form_pg_attribute att;
@@ -2415,12 +2475,11 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
* should never be encountered here
*/
if (VARATT_IS_EXTERNAL(DatumGetPointer(normalized[i])))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->rel));
else if (VARATT_IS_COMPRESSED(DatumGetPointer(normalized[i])))
{
formnewtup = true;
@@ -2810,10 +2869,9 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISMETA(opaque) && blocknum != BTREE_METAPAGE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid meta page found at block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid meta page found at block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/* Check page from block that ought to be meta page */
if (blocknum == BTREE_METAPAGE)
@@ -2822,20 +2880,18 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
if (!P_ISMETA(opaque) ||
metad->btm_magic != BTREE_MAGIC)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" meta page is corrupt",
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" meta page is corrupt",
+ RelationGetRelationName(state->rel));
if (metad->btm_version < BTREE_MIN_VERSION ||
metad->btm_version > BTREE_VERSION)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("version mismatch in index \"%s\": file version %d, "
- "current version %d, minimum supported version %d",
- RelationGetRelationName(state->rel),
- metad->btm_version, BTREE_VERSION,
- BTREE_MIN_VERSION)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "version mismatch in index \"%s\": file version %d, "
+ "current version %d, minimum supported version %d",
+ RelationGetRelationName(state->rel),
+ metad->btm_version, BTREE_VERSION,
+ BTREE_MIN_VERSION);
/* Finished with metapage checks */
return page;
@@ -2846,17 +2902,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* page level
*/
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && opaque->btpo.level != 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid leaf page level %u for block %u in index \"%s\"",
- opaque->btpo.level, blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "invalid leaf page level %u for block %u in index \"%s\"",
+ opaque->btpo.level, blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) &&
opaque->btpo.level == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid internal page level 0 for block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid internal page level 0 for block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/*
* Sanity checks for number of items on page.
@@ -2910,17 +2964,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* Internal pages should never have garbage items, either.
*/
if (!P_ISLEAF(opaque) && P_ISHALFDEAD(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" is half-dead",
- blocknum, RelationGetRelationName(state->rel)),
- errhint("This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it.")));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" is half-dead "
+ "(This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it)",
+ blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && P_HAS_GARBAGE(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" has garbage items",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" has garbage items",
+ blocknum, RelationGetRelationName(state->rel));
return page;
}
@@ -2971,14 +3023,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
BLCKSZ - sizeof(BTPageOpaqueData))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("line pointer points past end of tuple space in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "line pointer points past end of tuple space in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
/*
* Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree
@@ -2987,14 +3038,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
*/
if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "invalid line pointer storage in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
return itemid;
}
@@ -3016,26 +3066,23 @@ BTreeTupleGetHeapTIDCareful(BtreeCheckState *state, IndexTuple itup,
*/
Assert(state->heapkeyspace);
if (BTreeTupleIsPivot(itup) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
if (!BTreeTupleIsPivot(itup) && !nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
htid = BTreeTupleGetHeapTID(itup);
if (!ItemPointerIsValid(htid) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
return htid;
}
@@ -3066,3 +3113,52 @@ BTreeTupleGetPointsToTID(IndexTuple itup)
/* Pivot tuple returns TID with downlink block (heapkeyspace variant) */
return &itup->t_tid;
}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_btreeam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(BTREECHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == BTREECHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * confess
+ *
+ * Return a message about index corruption
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg)
+{
+ Datum values[BTREECHECK_RELATION_COLS];
+ bool nulls[BTREECHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(blkno);
+ nulls[0] = (blkno == InvalidBlockNumber);
+ values[1] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..3e47b717f1
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,884 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --schema=PATTERN check all relations in the specified schema(s)",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified "
+ "schema(s)",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -i, --check-indexes check associated btree indexes, if any",
+ " -I, --exclude-indexes do NOT check associated btree indexes",
+ " --strict-names require table and/or schema include patterns "
+ "to match at least one entity each",
+ " -b, --startblock check relations beginning at the given "
+ "starting block number",
+ " -e, --endblock check relations only up to the given ending "
+ "block number",
+ " -f, --skip-all-frozen do not check blocks marked as all frozen",
+ " -v, --skip-all-visible do not check blocks marked as all visible",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen "
+ "automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes for tables */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of OIDs of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+
+static void check_tables(SimpleOidList *checklist);
+static void check_table(Oid tbloid);
+static void check_indexes(Oid tbloid);
+static void check_index(Oid tbloid, Oid idxoid);
+
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions * connOpts);
+static void usage(void);
+static void showVersion(void);
+
+static void NoticeProcessor(void *arg, const char *message);
+
+static void expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_table_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(SimpleOidList *include_nsp,
+ SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl,
+ SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+
+static void die_on_query_failure(const char *query);
+static void ExecuteSqlStatement(const char *query);
+static PGresult *ExecuteSqlQuery(const char *query, ExecStatusType status);
+static PGresult *ExecuteSqlQueryForSingleRow(const char *query);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+#define NOPAGER 0
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ values[4] = (connOpts.dbname == NULL) ? "postgres" : connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /* Expand schema selection patterns into OID lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+ expand_schema_name_patterns(&schema_exclude_patterns,
+ &schema_exclude_oids,
+ false);
+ /* non-matching exclusion patterns aren't an error */
+
+ /* Expand table selection patterns into OID lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+ expand_table_name_patterns(&table_exclude_patterns,
+ &table_exclude_oids,
+ false);
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ check_table(cell->val);
+ if (settings.check_indexes)
+ check_indexes(cell->val);
+ }
+}
+
+static void
+check_table(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+ char *skip;
+ const char *stop;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (settings.startblock == NULL)
+ settings.startblock = pg_strdup("NULL");
+ if (settings.endblock == NULL)
+ settings.endblock = pg_strdup("NULL");
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("NULL");
+ stop = (settings.on_error_stop) ? "true" : "false";
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(relname=%s,blkno=%s,offnum=%s,lp_off=%s,lp_flags=%s,"
+ "lp_len=%s,attnum=%s,chunk=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* lp_off */
+ PQgetvalue(res, i, 4), /* lp_flags */
+ PQgetvalue(res, i, 5), /* lp_len */
+ PQgetvalue(res, i, 6), /* attnum */
+ PQgetvalue(res, i, 7), /* chunk */
+ PQgetvalue(res, i, 8)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_indexes(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ check_index(tbloid, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_index(Oid tbloid, Oid idxoid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT ct.relname, ci.relname, blkno, msg"
+ "\nFROM verify_btreeam(%u,%s),"
+ "\n pg_catalog.pg_class ci,"
+ "\n pg_catalog.pg_class ct"
+ "\nWHERE ci.oid = %u"
+ "\n AND ct.oid = %u",
+ idxoid,
+ settings.on_error_stop ? "true" : "false",
+ idxoid, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(table=%s,index=%s,blkno=%s)"
+ "\n%s\n",
+ PQgetvalue(res, i, 0), /* table relname */
+ PQgetvalue(res, i, 1), /* index relname */
+ PQgetvalue(res, i, 2), /* index blkno */
+ PQgetvalue(res, i, 3)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions * connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"startblock", required_argument, NULL, 'b'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"host", required_argument, NULL, 'h'},
+ {"check-indexes", no_argument, NULL, 'i'},
+ {"exclude-indexes", no_argument, NULL, 'I'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"schema", required_argument, NULL, 'n'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"port", required_argument, NULL, 'p'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"help", optional_argument, NULL, '?'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "b:d:e:fh:iIn:N:op:st:T:U:vVwW?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ settings.check_indexes = true;
+ break;
+ case 'I':
+ settings.check_indexes = false;
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ FILE *output;
+ int lines;
+ int lineno;
+
+ for (lines = 0; usage_text[lines]; lines++)
+ ;
+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+
+ ClosePager(output);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+static void
+expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list. See also expand_dbname_patterns()
+ * in pg_dumpall.c
+ */
+static void
+expand_table_name_patterns(SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(settings.db, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+static void
+append_csv_oids(PQExpBuffer query, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(query, "%s%u", comma, cell->val);
+}
+
+static bool
+append_filter(PQExpBuffer query, const char *lval, const char *operator,
+ const SimpleOidList *oids)
+{
+ if (!oids->head)
+ return false;
+ appendPQExpBuffer(query, "\nAND %s %s ANY(array[\n", lval, operator);
+ append_csv_oids(query, oids);
+ appendPQExpBuffer(query, "\n])");
+ return true;
+}
+
+static void
+get_table_check_list(SimpleOidList *include_nsp, SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl, SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.=)", include_nsp);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.!=)", exclude_nsp);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.=)", include_tbl);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.!=)", exclude_tbl);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+static void
+ExecuteSqlStatement(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ die_on_query_failure(query);
+ PQclear(res);
+}
+
+static PGresult *
+ExecuteSqlQuery(const char *query, ExecStatusType status)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != status)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute an SQL query and verify that we got exactly one row back.
+ */
+static PGresult *
+ExecuteSqlQueryForSingleRow(const char *query)
+{
+ PGresult *res;
+ int ntups;
+
+ res = ExecuteSqlQuery(query, PGRES_TUPLES_OK);
+
+ /* Expecting a single result only */
+ ntups = PQntuples(res);
+ if (ntups != 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+
+ return res;
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..c63ba4452e
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,55 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..de3ce54e8e
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,85 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+CREATE SCHEMA s1;
+CREATE SCHEMA s2;
+CREATE SCHEMA s3;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE TABLE s1.t3 (a TEXT);
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE TABLE s2.t3 (a TEXT);
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE TABLE s3.t3 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+CREATE INDEX i3 ON s1.t3(a);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+CREATE INDEX i3 ON s2.t3(a);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+CREATE INDEX i3 ON s3.t3(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas and tables implicitly');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-i', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+;$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1'
+ ],
+ 'pg_amcheck all tables in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1'
+ ],
+ 'pg_amcheck all tables not in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-i', '-n', 's*', '-t', 't1'
+ ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..a96b763886
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,407 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 36;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 12;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '--check-indexes', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin = 3 precedes relation relfrozenxid = $relfrozenxid
+0|2|8064|1|58|||tuple xmin = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|3|8000|1|58|||tuple xmax = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|4|7936|1|58|||t_hoff > lp_len (152 > 58)
+0|5|7872|1|58|||t_hoff not max-aligned (27)
+0|6|7808|1|58|||t_hoff < SizeofHeapTupleHeader (16 < 23)
+0|7|7744|1|58|||t_hoff < SizeofHeapTupleHeader (21 < 23)
+0|7|7744|1|58|||t_hoff not max-aligned (21)
+0|8|7680|1|58|||relation natts < tuple natts (3 < 2047)
+0|9|7616|1|58|||SizeofHeapTupleHeader + BITMAPLEN(natts) > t_hoff (23 + 256 > 24)
+0|10|7552|1|58|||relation natts < tuple natts (3 < 67)
+0|11|7488|1|58|2||t_hoff + offset > lp_len (24 + 416847976 > 58)
+0|12|7424|1|58|2|0|final chunk number differs from expected (0 vs. 6)
+0|12|7424|1|58|2|0|toasted value missing from toast table",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,lp_off=\d*,lp_flags=\d*,lp_len=\d*,attnum=\d*,chunk=\d*/,
+
+ # individual detected corruptions
+ qr/tuple xmin = \d+ precedes relation relfrozenxid = \d+/,
+ qr/tuple xmax = \d+ precedes relation relfrozenxid = \d+/,
+ qr/t_hoff > lp_len/,
+ qr/t_hoff not max-aligned/,
+ qr/t_hoff < SizeofHeapTupleHeader/,
+ qr/relation natts < tuple natts/,
+ qr/SizeofHeapTupleHeader \+ BITMAPLEN\(natts\) > t_hoff/,
+ qr/t_hoff \+ offset > lp_le/,
+ qr/final chunk number differs from expected/,
+ qr/toasted value missing from toast table/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 75518a7820..cc36d92f72 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -69,7 +69,7 @@ AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
@@ -165,6 +165,110 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if on_error_stop is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..f606e42fb9 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pg_amcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..10e1ca9663 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pg_amcheck SYSTEM "pg_amcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pg_amcheck.sgml b/doc/src/sgml/pg_amcheck.sgml
new file mode 100644
index 0000000000..f379af2258
--- /dev/null
+++ b/doc/src/sgml/pg_amcheck.sgml
@@ -0,0 +1,136 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-scheam</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "corrupted",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck -i test
+(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 3289393 is in the future
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmax = 0 precedes relation relminmxid = 1
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 12593 is in the future
+</screen>
+
+ <para>
+ .... many pages of output removed for brevity ....
+ </para>
+
+<screen>
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+tuple xmin = 305 precedes relation relfrozenxid = 487
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff > lp_len (54 > 34)
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff not max-aligned (54)
+</screen>
+
+ <para>
+ Each detected corruption is reported on two lines, the first line shows the
+ location and the second line shows a message describing the problem.
+ </para>
+ </sect2>
+</sect1>
--
2.21.1 (Apple Git-122.3)
On Jun 11, 2020, at 9:14 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have just browsed through the patch and the idea is quite
interesting. I think we can expand it to check that whether the flags
set in the infomask are sane or not w.r.t other flags and xid status.
Some examples are- If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
should not be set in new_infomask2.
- If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
actually cross verify the transaction status from the CLOG and check
whether is matching the hint bit or not.While browsing through the code I could not find that we are doing
this kind of check, ignore if we are already checking this.
Thanks for taking a look!
Having both of those bits set simultaneously appears to fall into a different category than what I wrote verify_heapam.c to detect. It doesn't violate any assertion in the backend, nor does it cause the code to crash. (At least, I don't immediately see how it does either of those things.) At first glance it appears invalid to have those bits both set simultaneously, but I'm hesitant to enforce that without good reason. If it is a good thing to enforce, should we also change the backend code to Assert?
I integrated your idea into one of the regression tests. It now sets these two bits in the header of one of the rows in a table. The verify_heapam check output (which includes all detected corruptions) does not change, which verifies your observation that verify_heapam is not checking for this. I've attached that as a patch to this email. Note that this patch should be applied atop the v6 patch recently posted in another email.
Attachments:
WIP_dilip_kumar_idea.patchapplication/octet-stream; name=WIP_dilip_kumar_idea.patch; x-unix-mode=0644Download
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
index a96b763886..0e19d4c7ab 100644
--- a/contrib/pg_amcheck/t/004_verify_heapam.pl
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -215,7 +215,7 @@ $node->safe_psql(
my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
my $relpath = "$pgdata/$rel";
-use constant ROWCOUNT => 12;
+use constant ROWCOUNT => 13;
$node->safe_psql('postgres', qq(
INSERT INTO public.test (a, b, c)
VALUES (
@@ -250,10 +250,12 @@ $node->stop;
# Some #define constants from access/htup_details.h for use while corrupting.
use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
use constant HEAP_XMIN_COMMITTED => 0x0100;
use constant HEAP_XMIN_INVALID => 0x0200;
use constant HEAP_XMAX_INVALID => 0x0800;
use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_KEYS_UPDATED => 0x2000;
# Corrupt the tuples, one type of corruption per tuple. Some types of
# corruption cause verify_heapam to skip to the next tuple without
@@ -350,6 +352,12 @@ for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
$tup->{c6} = 41;
$tup->{c7} = 41;
}
+ elsif ($tupidx == 12)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+ }
write_tuple($file, $offset, $tup);
}
close($file);
On Fri, Jun 12, 2020 at 12:40 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
On Jun 11, 2020, at 9:14 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have just browsed through the patch and the idea is quite
interesting. I think we can expand it to check that whether the flags
set in the infomask are sane or not w.r.t other flags and xid status.
Some examples are- If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
should not be set in new_infomask2.
- If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
actually cross verify the transaction status from the CLOG and check
whether is matching the hint bit or not.While browsing through the code I could not find that we are doing
this kind of check, ignore if we are already checking this.Thanks for taking a look!
Having both of those bits set simultaneously appears to fall into a different category than what I wrote verify_heapam.c to detect.
Ok
It doesn't violate any assertion in the backend, nor does it cause
the code to crash. (At least, I don't immediately see how it does
either of those things.) At first glance it appears invalid to have
those bits both set simultaneously, but I'm hesitant to enforce that
without good reason. If it is a good thing to enforce, should we also
change the backend code to Assert?
Yeah, it may not hit assert or crash but it could lead to a wrong
result. But I agree that it could be an assertion in the backend
code. What about the other check, like hint bit is saying the
transaction is committed but actually as per the clog the status is
something else. I think in general processing it is hard to check
such things in backend no? because if the hint bit is set saying that
the transaction is committed then we will directly check its
visibility with the snapshot. I think a corruption checker may be a
good tool for catching such anomalies.
I integrated your idea into one of the regression tests. It now sets these two bits in the header of one of the rows in a table. The verify_heapam check output (which includes all detected corruptions) does not change, which verifies your observation that verifies _heapam is not checking for this. I've attached that as a patch to this email. Note that this patch should be applied atop the v6 patch recently posted in another email.
Ok.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Jun 11, 2020, at 11:35 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Jun 12, 2020 at 12:40 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:On Jun 11, 2020, at 9:14 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have just browsed through the patch and the idea is quite
interesting. I think we can expand it to check that whether the flags
set in the infomask are sane or not w.r.t other flags and xid status.
Some examples are- If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
should not be set in new_infomask2.
- If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
actually cross verify the transaction status from the CLOG and check
whether is matching the hint bit or not.While browsing through the code I could not find that we are doing
this kind of check, ignore if we are already checking this.Thanks for taking a look!
Having both of those bits set simultaneously appears to fall into a different category than what I wrote verify_heapam.c to detect.
Ok
It doesn't violate any assertion in the backend, nor does it cause
the code to crash. (At least, I don't immediately see how it does
either of those things.) At first glance it appears invalid to have
those bits both set simultaneously, but I'm hesitant to enforce that
without good reason. If it is a good thing to enforce, should we also
change the backend code to Assert?Yeah, it may not hit assert or crash but it could lead to a wrong
result. But I agree that it could be an assertion in the backend
code.
For v7, I've added an assertion for this. Per heap/README.tuplock, "We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit is set." I added an assertion for that, too. Both new assertions are in RelationPutHeapTuple(). I'm not sure if that is the best place to put the assertion, but I am confident that the assertion needs to only check tuples destined for disk, as in memory tuples can and do violate the assertion.
Also for v7, I've updated contrib/amcheck to report these two conditions as corruption.
What about the other check, like hint bit is saying the
transaction is committed but actually as per the clog the status is
something else. I think in general processing it is hard to check
such things in backend no? because if the hint bit is set saying that
the transaction is committed then we will directly check its
visibility with the snapshot. I think a corruption checker may be a
good tool for catching such anomalies.
I already made some design changes to this patch to avoid taking the CLogTruncationLock too often. I'm happy to incorporate this idea, but perhaps you could provide a design on how to do it without all the extra locking? If not, I can try to get this into v8 as an optional check, so users can turn it on at their discretion. Having the check enabled by default is probably a non-starter.
Attachments:
v7-0001-Adding-verify_heapam-and-pg_amcheck.patchapplication/octet-stream; name=v7-0001-Adding-verify_heapam-and-pg_amcheck.patch; x-unix-mode=0644Download
From 315c5edbcde3160ee6d64ca74e5ae3c6c3ca070a Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Fri, 12 Jun 2020 13:21:52 -0700
Subject: [PATCH v7 1/2] Adding verify_heapam and pg_amcheck
Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it. This is
used by the new pg_amcheck command line tool for reporting back to
the caller.
---
contrib/Makefile | 1 +
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 54 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/amcheck.h | 5 +
contrib/amcheck/expected/check_btree.out | 31 +
contrib/amcheck/expected/check_heap.out | 58 +
.../amcheck/expected/disallowed_reltypes.out | 48 +
contrib/amcheck/sql/check_btree.sql | 10 +
contrib/amcheck/sql/check_heap.sql | 34 +
contrib/amcheck/sql/disallowed_reltypes.sql | 48 +
contrib/amcheck/t/skipping.pl | 101 ++
contrib/amcheck/verify_heapam.c | 1024 +++++++++++++++++
contrib/amcheck/verify_nbtree.c | 750 ++++++------
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 884 ++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 55 +
contrib/pg_amcheck/t/003_check.pl | 85 ++
contrib/pg_amcheck/t/004_verify_heapam.pl | 407 +++++++
doc/src/sgml/amcheck.sgml | 106 +-
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pg_amcheck.sgml | 136 +++
25 files changed, 3557 insertions(+), 331 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/skipping.pl
create mode 100644 contrib/amcheck/verify_heapam.c
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 doc/src/sgml/pg_amcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..c21c27cbeb 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..27d38b2e86 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..2ab7d8b0d2
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,54 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(rel regclass,
+ on_error_stop boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
+
+--
+-- verify_btreeam()
+--
+CREATE FUNCTION verify_btreeam(rel regclass,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+CREATE FUNCTION verify_btreeam(rel regclass,
+ on_error_stop boolean,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_btreeam(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION verify_btreeam(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..74edfc2f65
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index f82f48d23b..c1acf238d7 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -21,6 +21,8 @@ SELECT bt_index_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_check
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_parent_check
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
-- to run this cluster-wide with a restricted account, and as tested
@@ -29,6 +31,7 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -42,23 +45,31 @@ SELECT bt_index_parent_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
ERROR: "bttest_a" is not an index
SELECT bt_index_parent_check('bttest_a');
ERROR: "bttest_a" is not an index
+SELECT * FROM verify_btreeam('bttest_a');
+ERROR: "bttest_a" is not an index
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
ERROR: could not open relation with OID 17
SELECT bt_index_parent_check(17);
ERROR: could not open relation with OID 17
+SELECT * FROM verify_btreeam(17);
+ERROR: could not open relation with OID 17
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
ERROR: only B-Tree indexes are supported as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
@@ -67,6 +78,11 @@ SELECT bt_index_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
bt_index_check
@@ -93,6 +109,11 @@ SELECT bt_index_parent_check('bttest_b_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -118,6 +139,11 @@ SELECT bt_index_check('bttest_multi_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
bt_index_parent_check
@@ -134,6 +160,11 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
-- checks
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..6d30ca8023
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,58 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..892ae89652
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_btree.sql b/contrib/amcheck/sql/check_btree.sql
index a1fef644cb..f5d0f8c1f6 100644
--- a/contrib/amcheck/sql/check_btree.sql
+++ b/contrib/amcheck/sql/check_btree.sql
@@ -24,6 +24,7 @@ CREATE ROLE regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx'::regclass);
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
@@ -33,27 +34,33 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
SELECT bt_index_parent_check('bttest_a');
+SELECT * FROM verify_btreeam('bttest_a');
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
SELECT bt_index_parent_check(17);
+SELECT * FROM verify_btreeam(17);
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
SELECT bt_index_parent_check('bttest_b_idx', true);
@@ -61,6 +68,7 @@ SELECT bt_index_parent_check('bttest_b_idx', true);
BEGIN;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_b_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -74,6 +82,7 @@ SELECT bt_index_check('bttest_a_idx', true);
-- normal check outside of xact for index with included columns
SELECT bt_index_check('bttest_multi_idx');
+SELECT * FROM verify_btreeam('bttest_multi_idx');
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
@@ -81,6 +90,7 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
TRUNCATE bttest_multi;
INSERT INTO bttest_multi SELECT i, i%2 FROM generate_series(1, 100000) as i;
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
+SELECT * FROM verify_btreeam('bttest_multi_idx');
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..5759d5526e
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,34 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..fc90e6ca33
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
diff --git a/contrib/amcheck/t/skipping.pl b/contrib/amcheck/t/skipping.pl
new file mode 100644
index 0000000000..e716fc8c33
--- /dev/null
+++ b/contrib/amcheck/t/skipping.pl
@@ -0,0 +1,101 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 183;
+
+my ($node, $result);
+
+# Check various options are stable (don't abort) when running verify_heapam on
+# the test table. For uncorrupted tables, there isn't anything to check except
+# that it runs without crashing.
+sub check_all_options
+{
+ for my $stop (qw(NULL true false))
+ {
+ for my $skip ("NULL", "'all frozen'", "'all visible'")
+ {
+ for my $startblock (qw(NULL 5))
+ {
+ for my $endblock (qw(NULL 10))
+ {
+ my $check = "SELECT verify_heapam('test', $stop, $skip, " .
+ "$startblock, $endblock)";
+ $result = $node->safe_psql('postgres', "$check; SELECT 1");
+ is ($result, 1, "checked: $check");
+ }
+ }
+ }
+ }
+}
+
+# Stops the server and writes nulls in the first page of the table,
+# assuming page size is large enough for offset 1000..1016 to be
+# in the midst of the first page of data.
+sub corrupt_first_page
+{
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('test')));
+ my $relpath = "$pgdata/$rel";
+ $node->stop;
+
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 1000, 0);
+ syswrite($fh, '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', 16);
+ close($fh);
+
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Check empty table
+$node->safe_psql('postgres', q(
+ CREATE TABLE test (a integer);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+));
+check_all_options();
+
+# Check table with trivial data
+$node->safe_psql('postgres', q(INSERT INTO test VALUES (0)));
+check_all_options();
+
+# Check table with non-trivial data (more than a page worth) but
+# without any all frozen or all visible
+$node->safe_psql('postgres', q(
+INSERT INTO test SELECT generate_series(1,10000)));
+check_all_options();
+
+# Check table with all-visible data
+$node->safe_psql('postgres', q(VACUUM test));
+check_all_options();
+
+# Check table with all-frozen data
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+check_all_options();
+
+# Check table with corruption, no skipping
+corrupt_first_page();
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is($result, 't', 'corruption detected on first page');
+
+# Check table with corruption, skipping all visible blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all visible', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all visible first page');
+
+# Check table with corruption, skipping all frozen blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all frozen', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all frozen first page');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..1bddff7fc6
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1024 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "amcheck.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam() execution.
+ */
+typedef struct HeapCheckContext
+{
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Values for returning tuples */
+ bool is_corrupt; /* have we encountered any corruption? */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext * ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx);
+static bool check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx);
+static bool check_tuple_attribute(HeapCheckContext * ctx);
+static void check_tuple(HeapCheckContext * ctx);
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+#define HEAPCHECK_RELATION_COLS 8
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ bool randomAccess;
+ HeapCheckContext ctx;
+ FullTransactionId nextFullXid;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool skip_all_frozen = false;
+ bool skip_all_visible = false;
+ int64 startblock = -1;
+ int64 endblock = -1;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ relid = PG_GETARG_OID(0);
+ on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+ if (!PG_ARGISNULL(2))
+ {
+ const char *skip = PG_GETARG_CSTRING(2);
+
+ if (pg_strcasecmp(skip, "all visible") == 0)
+ {
+ skip_all_visible = true;
+ }
+ else if (pg_strcasecmp(skip, "all frozen") == 0)
+ {
+ skip_all_visible = true;
+ skip_all_frozen = true;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all visible', 'all frozen', "
+ "or NULL")));
+ }
+ }
+ if (!PG_ARGISNULL(3))
+ startblock = PG_GETARG_INT64(3);
+ if (!PG_ARGISNULL(4))
+ endblock = PG_GETARG_INT64(4);
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those. This protection saves us from
+ * having to reacquire the locks and recheck those minimums for every
+ * tuple, which would be expensive.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance. We keep a cached copy of the oldest valid xid that we may
+ * encounter in the table, which is relfrozenxid if valid, and oldestXid
+ * otherwise.
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldestValidXid = ctx.relfrozenxid;
+
+ /* check all blocks of the relation */
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
+
+ for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ ctx.tuphdr = NULL;
+ ctx.natts = 0;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = 0; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead/redirected line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) ||
+ ItemIdIsDead(ctx.itemid) ||
+ ItemIdIsRedirected(ctx.itemid))
+ continue;
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, "
+ "or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap AM",
+ RelationGetRelationName(rel))));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext * ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext * ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldestValidXid, xid) &&
+ TransactionIdPrecedes(xid, ctx->nextKnownValidXid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->nextKnownValidXid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * check_tuphdr_xids
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!TransactionIdValidInRel(xvac, ctx))
+ {
+ confess(ctx, psprintf("tuple xvac = %u invalid", xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return false; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx,
+ pstrdup("heap tuple with XMAX_IS_MULTI is "
+ "neither LOCKED_ONLY nor has a "
+ "valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequencenumber is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx, pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("toast chunk is neither short nor extended"));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "exceeds the end chunk sequence "
+ "number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx, psprintf("chunk size %u differs from "
+ "expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * The caller should have iterated to a tuple via
+ * tupleAttributeIteration_next.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext * ctx)
+{
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask = ctx->tuphdr->t_infomask;
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
+ ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+ ctx->tuphdr->t_hoff, ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_1B_E(tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx, pstrdup("attribute is external but tuple header "
+ "flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx, pstrdup("attribute is external but relation has "
+ "no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
+
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ attr = (struct varlena *) redirect.pointer;
+
+ /* nested indirect Datums aren't allowed */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ confess(ctx, pstrdup("attribute has nested external "
+ "indirect toast pointer"));
+ return true;
+ }
+ }
+
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching
+ * va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast
+ * table, accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx, psprintf("final chunk number differs from "
+ "expected (%u vs. %u)",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx, pstrdup("toasted value missing from "
+ "toast table"));
+ systable_endscan_ordered(toastscan);
+ }
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext * ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relminmxid = %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmin = %u precedes relation "
+ "relfrozenxid = %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmin, ctx))
+ {
+ confess(ctx, psprintf("tuple xmin = %u is in the future",
+ xmin));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relfrozenxid = %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmax, ctx))
+ {
+ confess(ctx, psprintf("tuple xmax = %u is in the future",
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff > lp_len (%u > %u)",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx, psprintf("t_hoff not max-aligned (%u)",
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the natts field or the infomask
+ * bit HEAP_HASNULL.
+ */
+ if (infomask & HEAP_HASNULL &&
+ SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ {
+ confess(ctx, psprintf("SizeofHeapTupleHeader + "
+ "BITMAPLEN(natts) > t_hoff "
+ "(%u + %u > %u)",
+ (unsigned) SizeofHeapTupleHeader,
+ BITMAPLEN(ctx->natts),
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!check_tuphdr_xids(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx,
+ psprintf("relation natts < tuple natts (%u < %u)",
+ RelationGetDescr(ctx->rel)->natts,
+ ctx->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+ ctx->offset = -1;
+ ctx->attnum = -1;
+}
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index e4d501a85d..bf68b554a8 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -32,16 +32,22 @@
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "commands/tablecmds.h"
+#include "funcapi.h"
#include "lib/bloomfilter.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
#include "storage/smgr.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
-
+#include "amcheck.h"
PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(bt_index_check);
+PG_FUNCTION_INFO_V1(bt_index_parent_check);
+PG_FUNCTION_INFO_V1(verify_btreeam);
+
/*
* A B-Tree cannot possibly have this many levels, since there must be one
* block per level, which is bound by the range of BlockNumber:
@@ -50,6 +56,20 @@ PG_MODULE_MAGIC;
#define BTreeTupleGetNKeyAtts(itup, rel) \
Min(IndexRelationGetNumberOfKeyAttributes(rel), BTreeTupleGetNAtts(itup, rel))
+/*
+ * Context for use within verify_btreeam()
+ */
+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;
+
+#define CONTINUE_CHECKING(ctx) \
+ (ctx == NULL || !((ctx)->is_corrupt && (ctx)->on_error_stop))
+
/*
* State associated with verifying a B-Tree index
*
@@ -116,6 +136,9 @@ typedef struct BtreeCheckState
bloom_filter *filter;
/* Debug counter */
int64 heaptuplespresent;
+
+ /* Error reporting context */
+ BtreeCheckContext *ctx;
} BtreeCheckState;
/*
@@ -133,16 +156,14 @@ typedef struct BtreeLevel
bool istruerootlevel;
} BtreeLevel;
-PG_FUNCTION_INFO_V1(bt_index_check);
-PG_FUNCTION_INFO_V1(bt_index_parent_check);
-
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);
static inline void btree_index_checkable(Relation rel);
static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
- bool rootdescend);
+ bool rootdescend, BtreeCheckContext * ctx);
static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
BtreeLevel level);
static void bt_target_page_check(BtreeCheckState *state);
@@ -185,6 +206,26 @@ static inline ItemPointer BTreeTupleGetHeapTIDCareful(BtreeCheckState *state,
IndexTuple itup, bool nonpivot);
static inline ItemPointer BTreeTupleGetPointsToTID(IndexTuple itup);
+static TupleDesc verify_btreeam_tupdesc(void);
+static void confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg);
+
+/*
+ * Macro for either calling ereport(...) or confess(...) depending on whether
+ * a context for returning the error message exists. Prior to version 1.3,
+ * all functions reported any detected corruption via ereport, but starting in
+ * 1.3, the new function verify_btreeam reports detected corruption back to
+ * the caller as a set of rows, and pre-existing functions continue to report
+ * corruption via ereport. This macro allows the shared implementation to
+ * to do the right thing depending on context.
+ */
+#define econfess(ctx, blkno, code, ...) \
+ do { \
+ if (ctx) \
+ confess(ctx, blkno, psprintf(__VA_ARGS__)); \
+ else \
+ ereport(ERROR, (errcode(code), errmsg(__VA_ARGS__))); \
+ } while(0)
+
/*
* bt_index_check(index regclass, heapallindexed boolean)
*
@@ -203,7 +244,7 @@ bt_index_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 2)
heapallindexed = PG_GETARG_BOOL(1);
- bt_index_check_internal(indrelid, false, heapallindexed, false);
+ bt_index_check_internal(indrelid, false, heapallindexed, false, NULL);
PG_RETURN_VOID();
}
@@ -229,17 +270,66 @@ bt_index_parent_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 3)
rootdescend = PG_GETARG_BOOL(2);
- bt_index_check_internal(indrelid, true, heapallindexed, rootdescend);
+ bt_index_check_internal(indrelid, true, heapallindexed, rootdescend, NULL);
PG_RETURN_VOID();
}
+Datum
+verify_btreeam(PG_FUNCTION_ARGS)
+{
+#define BTREECHECK_RELATION_COLS 2
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ BtreeCheckContext ctx;
+ bool randomAccess;
+ Oid indrelid;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ indrelid = PG_GETARG_OID(0);
+
+ memset(&ctx, 0, sizeof(BtreeCheckContext));
+
+ ctx.on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_btreeam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ bt_index_check_internal(indrelid, true, true, true, &ctx);
+
+ PG_RETURN_NULL();
+}
+
/*
* Helper for bt_index_[parent_]check, coordinating the bulk of the work.
*/
static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
- bool rootdescend)
+ bool rootdescend, BtreeCheckContext * ctx)
{
Oid heapid;
Relation indrel;
@@ -300,15 +390,16 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
RelationOpenSmgr(indrel);
if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" lacks a main relation fork",
- RelationGetRelationName(indrel))));
+ econfess(ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" lacks a main relation fork",
+ RelationGetRelationName(indrel));
/* Check index, possibly against table it is an index on */
- _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
- bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
- heapallindexed, rootdescend);
+ if (CONTINUE_CHECKING(ctx))
+ _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
+ if (CONTINUE_CHECKING(ctx))
+ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
+ heapallindexed, rootdescend, ctx);
}
/*
@@ -402,7 +493,8 @@ btree_index_mainfork_expected(Relation rel)
*/
static void
bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
- bool readonly, bool heapallindexed, bool rootdescend)
+ bool readonly, bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx)
{
BtreeCheckState *state;
Page metapage;
@@ -434,6 +526,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->readonly = readonly;
state->heapallindexed = heapallindexed;
state->rootdescend = rootdescend;
+ state->ctx = ctx;
if (state->heapallindexed)
{
@@ -535,7 +628,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current.level = metad->btm_level;
current.leftmost = metad->btm_root;
current.istruerootlevel = true;
- while (current.leftmost != P_NONE)
+ while (CONTINUE_CHECKING(state->ctx) && current.leftmost != P_NONE)
{
/*
* Verify this level, and get left most page for next level down, if
@@ -544,10 +637,9 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current = bt_check_level_from_leftmost(state, current);
if (current.leftmost == InvalidBlockNumber)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" has no valid pages on level below %u or first level",
- RelationGetRelationName(rel), previouslevel)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" has no valid pages on level below %u or first level",
+ RelationGetRelationName(rel), previouslevel);
previouslevel = current.level;
}
@@ -555,7 +647,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
/*
* * Check whether heap contains unindexed/malformed tuples *
*/
- if (state->heapallindexed)
+ if (CONTINUE_CHECKING(state->ctx) && state->heapallindexed)
{
IndexInfo *indexinfo = BuildIndexInfo(state->rel);
TableScanDesc scan;
@@ -691,18 +783,16 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* checked.
*/
if (state->readonly && P_ISDELETED(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink or sibling link points to deleted block in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "downlink or sibling link points to deleted block in index \"%s\" "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
if (P_RIGHTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u fell off the end of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u fell off the end of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
else
ereport(DEBUG1,
(errcode(ERRCODE_NO_DATA),
@@ -722,16 +812,14 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
if (state->readonly)
{
if (!P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not leftmost in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not leftmost in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
if (level.istruerootlevel && !P_ISROOT(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not true root in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not true root in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
}
/*
@@ -780,21 +868,19 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* so sibling pointers should always be in mutual agreement
*/
if (state->readonly && opaque->btpo_prev != leftcurrent)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("left link/right link pair in index \"%s\" not in agreement",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "left link/right link pair in index \"%s\" not in agreement "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
/* Check level, which must be valid for non-ignorable page */
if (level.level != opaque->btpo.level)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leftmost down link for level points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- current, level.level, opaque->btpo.level)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "leftmost down link for level points to block in index \"%s\" whose level is not one level down "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ current, level.level, opaque->btpo.level);
/* Verify invariants for page */
bt_target_page_check(state);
@@ -803,10 +889,9 @@ nextpage:
/* Try to detect circular links */
if (current == leftcurrent || current == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
leftcurrent = current;
current = opaque->btpo_next;
@@ -850,7 +935,7 @@ nextpage:
/* Free page and associated memory for this iteration */
MemoryContextReset(state->targetcontext);
}
- while (current != P_NONE);
+ while (CONTINUE_CHECKING(state->ctx) && current != P_NONE);
if (state->lowkey)
{
@@ -930,16 +1015,15 @@ bt_target_page_check(BtreeCheckState *state)
P_HIKEY))
{
itup = (IndexTuple) PageGetItem(state->target, itemid);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of high key index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index block=%u natts=%u block type=%s page lsn=%X/%X.",
- state->targetblock,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of high key index tuple attributes in index \"%s\" "
+ "(Index block=%u natts=%u block type=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -949,7 +1033,7 @@ bt_target_page_check(BtreeCheckState *state)
* real item (if any).
*/
for (offset = P_FIRSTDATAKEY(topaque);
- offset <= max;
+ offset <= max && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
ItemId itemid;
@@ -973,16 +1057,15 @@ bt_target_page_check(BtreeCheckState *state)
* frequently, and is surprisingly tolerant of corrupt lp_len fields.
*/
if (tupsize != ItemIdGetLength(itemid))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index tuple size does not equal lp_len in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X.",
- state->targetblock, offset,
- tupsize, ItemIdGetLength(itemid),
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn),
- errhint("This could be a torn page problem.")));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "index tuple size does not equal lp_len in index \"%s\" "
+ "(Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X) "
+ "(This could be a torn page problem)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ tupsize, ItemIdGetLength(itemid),
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check the number of index tuple attributes */
if (!_bt_check_natts(state->rel, state->heapkeyspace, state->target,
@@ -998,17 +1081,16 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X.",
- itid,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of index tuple attributes in index \"%s\" "
+ "(Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1049,14 +1131,13 @@ bt_target_page_check(BtreeCheckState *state)
htid = psprintf("(%u,%u)", ItemPointerGetBlockNumber(tid),
ItemPointerGetOffsetNumber(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("could not find tuple using search from root page in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to heap tid=%s page lsn=%X/%X.",
- itid, htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "could not find tuple using search from root page in index \"%s\" "
+ "(Index tid=%s points to heap tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1079,14 +1160,13 @@ bt_target_page_check(BtreeCheckState *state)
{
char *itid = psprintf("(%u,%u)", state->targetblock, offset);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("posting list contains misplaced TID in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s posting list offset=%d page lsn=%X/%X.",
- itid, i,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "posting list contains misplaced TID in index \"%s\" "
+ "(Index tid=%s posting list offset=%d page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, i,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
ItemPointerCopy(current, &last);
@@ -1134,16 +1214,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index row size %zu exceeds maximum for index \"%s\"",
- tupsize, RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index row size %zu exceeds maximum for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ tupsize, RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Fingerprint leaf page tuples (those that point to the heap) */
@@ -1242,16 +1321,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("high key invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "high key invariant violated for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Reset, in case scantid was set to (itup) posting tuple's max TID */
skey->scantid = scantid;
@@ -1289,21 +1367,20 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Lower index tid=%s (points to %s tid=%s) "
- "higher index tid=%s (points to %s tid=%s) "
- "page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- nitid,
- P_ISLEAF(topaque) ? "heap" : "index",
- nhtid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "item order invariant violated for index \"%s\" "
+ "(Lower index tid=%s (points to %s tid=%s) "
+ "higher index tid=%s (points to %s tid=%s) "
+ "page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ nitid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ nhtid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1354,14 +1431,13 @@ bt_target_page_check(BtreeCheckState *state)
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("cross page item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Last item on page tid=(%u,%u) page lsn=%X/%X.",
- state->targetblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "cross page item order invariant violated for index \"%s\" "
+ "(Last item on page tid=(%u,%u) page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1386,7 +1462,8 @@ bt_target_page_check(BtreeCheckState *state)
* right of the child page pointer to by our rightmost downlink. And they
* might have missing downlinks. This final call checks for them.
*/
- if (!P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
+ if (CONTINUE_CHECKING(state->ctx) &&
+ !P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
{
bt_child_highkey_check(state, InvalidOffsetNumber,
NULL, topaque->btpo.level);
@@ -1708,7 +1785,7 @@ bt_child_highkey_check(BtreeCheckState *state,
}
/* Move to the right on the child level */
- while (true)
+ while (CONTINUE_CHECKING(state->ctx))
{
/*
* Did we traverse the whole tree level and this is check for pages to
@@ -1723,11 +1800,10 @@ bt_child_highkey_check(BtreeCheckState *state,
/* Did we traverse the whole tree level and don't find next downlink? */
if (blkno == P_NONE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't traverse from downlink %u to downlink %u of index \"%s\"",
- state->prevrightlink, downlink,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "can't traverse from downlink %u to downlink %u of index \"%s\"",
+ state->prevrightlink, downlink,
+ RelationGetRelationName(state->rel));
/* Load page contents */
if (blkno == downlink && loaded_child)
@@ -1739,30 +1815,27 @@ bt_child_highkey_check(BtreeCheckState *state,
/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && !P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("the first child of leftmost target page is not leftmost of its level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "the first child of leftmost target page is not leftmost of its level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check level for non-ignorable page */
if (!P_IGNORE(opaque) && opaque->btpo.level != target_level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block found while following rightlinks from child of index \"%s\" has invalid level",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, target_level - 1, opaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "block found while following rightlinks from child of index \"%s\" has invalid level "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, target_level - 1, opaque->btpo.level);
/* Try to detect circular links */
if ((!first && blkno == state->prevrightlink) || blkno == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- blkno, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ blkno, RelationGetRelationName(state->rel));
if (blkno != downlink && !P_IGNORE(opaque))
{
@@ -1825,14 +1898,13 @@ bt_child_highkey_check(BtreeCheckState *state,
if (pivotkey_offset > PageGetMaxOffsetNumber(state->target))
{
if (P_RIGHTMOST(topaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("child high key is greater than rightmost pivot key on target level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "child high key is greater than rightmost pivot key on target level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
pivotkey_offset = P_HIKEY;
}
itemid = PageGetItemIdCareful(state, state->targetblock,
@@ -1856,27 +1928,25 @@ bt_child_highkey_check(BtreeCheckState *state,
* page.
*/
if (!state->lowkey)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't find left sibling high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "can't find left sibling high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
itup = state->lowkey;
}
if (!bt_pivot_tuple_identical(highkey, itup))
{
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("mismatch between parent key and child high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "mismatch between parent key and child high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -2014,17 +2084,16 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* to test.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink to deleted page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child block=%u parent page lsn=%X/%X.",
- state->targetblock, childblock,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted page found in index \"%s\" "
+ "(Parent block=%u child block=%u parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
for (offset = P_FIRSTDATAKEY(copaque);
- offset <= maxoffset;
+ offset <= maxoffset && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
/*
@@ -2056,14 +2125,13 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
if (!invariant_l_nontarget_offset(state, targetkey, childblock, child,
offset))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("down-link lower bound invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X.",
- state->targetblock, childblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "down-link lower bound invariant violated for index \"%s\" "
+ "(Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
pfree(child);
@@ -2150,14 +2218,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* inconsistencies anywhere else.
*/
if (P_ISLEAF(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leaf index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u page lsn=%X/%X.",
- blkno,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "leaf index block lacks downlink in index \"%s\" "
+ "(Block=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/* Descend from the given page, which is an internal page */
elog(DEBUG1, "checking for interrupted multi-level deletion due to missing downlink in index \"%s\"",
@@ -2167,7 +2234,7 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
itemid = PageGetItemIdCareful(state, blkno, page, P_FIRSTDATAKEY(opaque));
itup = (IndexTuple) PageGetItem(page, itemid);
childblk = BTreeTupleGetDownLink(itup);
- for (;;)
+ while (CONTINUE_CHECKING(state->ctx))
{
CHECK_FOR_INTERRUPTS();
@@ -2179,13 +2246,12 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
/* Do an extra sanity check in passing on internal pages */
if (copaque->btpo.level != level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, childblk,
- level - 1, copaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink points to block in index \"%s\" whose level is not one level down "
+ "(Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ level - 1, copaque->btpo.level);
level = copaque->btpo.level;
itemid = PageGetItemIdCareful(state, childblk, child,
@@ -2217,14 +2283,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* parent/ancestor page) lacked a downlink is incidental.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink to deleted leaf page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X.",
- blkno, childblk,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted leaf page found in index \"%s\" "
+ "(Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/*
* Iff leaf page is half-dead, its high key top parent link should point
@@ -2244,14 +2309,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u level=%u page lsn=%X/%X.",
- blkno, opaque->btpo.level,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "internal index block lacks downlink in index \"%s\" "
+ "(Block=%u level=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, opaque->btpo.level,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
}
/*
@@ -2327,16 +2391,12 @@ bt_tuple_present_callback(Relation index, ItemPointer tid, Datum *values,
/* Probe Bloom filter -- tuple should be present */
if (bloom_lacks_element(state->filter, (unsigned char *) norm,
IndexTupleSize(norm)))
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->heaprel),
- RelationGetRelationName(state->rel)),
- !state->readonly
- ? errhint("Retrying verification using the function bt_index_parent_check() might provide a more specific error.")
- : 0));
+ econfess(state->ctx, ItemPointerGetBlockNumber(&(itup->t_tid)), ERRCODE_DATA_CORRUPTED,
+ "heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->heaprel),
+ RelationGetRelationName(state->rel));
state->heaptuplespresent++;
pfree(itup);
@@ -2395,7 +2455,7 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
if (!IndexTupleHasVarwidths(itup))
return itup;
- for (i = 0; i < tupleDescriptor->natts; i++)
+ for (i = 0; CONTINUE_CHECKING(state->ctx) && i < tupleDescriptor->natts; i++)
{
Form_pg_attribute att;
@@ -2415,12 +2475,11 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
* should never be encountered here
*/
if (VARATT_IS_EXTERNAL(DatumGetPointer(normalized[i])))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->rel));
else if (VARATT_IS_COMPRESSED(DatumGetPointer(normalized[i])))
{
formnewtup = true;
@@ -2810,10 +2869,9 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISMETA(opaque) && blocknum != BTREE_METAPAGE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid meta page found at block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid meta page found at block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/* Check page from block that ought to be meta page */
if (blocknum == BTREE_METAPAGE)
@@ -2822,20 +2880,18 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
if (!P_ISMETA(opaque) ||
metad->btm_magic != BTREE_MAGIC)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" meta page is corrupt",
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" meta page is corrupt",
+ RelationGetRelationName(state->rel));
if (metad->btm_version < BTREE_MIN_VERSION ||
metad->btm_version > BTREE_VERSION)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("version mismatch in index \"%s\": file version %d, "
- "current version %d, minimum supported version %d",
- RelationGetRelationName(state->rel),
- metad->btm_version, BTREE_VERSION,
- BTREE_MIN_VERSION)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "version mismatch in index \"%s\": file version %d, "
+ "current version %d, minimum supported version %d",
+ RelationGetRelationName(state->rel),
+ metad->btm_version, BTREE_VERSION,
+ BTREE_MIN_VERSION);
/* Finished with metapage checks */
return page;
@@ -2846,17 +2902,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* page level
*/
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && opaque->btpo.level != 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid leaf page level %u for block %u in index \"%s\"",
- opaque->btpo.level, blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "invalid leaf page level %u for block %u in index \"%s\"",
+ opaque->btpo.level, blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) &&
opaque->btpo.level == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid internal page level 0 for block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid internal page level 0 for block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/*
* Sanity checks for number of items on page.
@@ -2910,17 +2964,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* Internal pages should never have garbage items, either.
*/
if (!P_ISLEAF(opaque) && P_ISHALFDEAD(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" is half-dead",
- blocknum, RelationGetRelationName(state->rel)),
- errhint("This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it.")));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" is half-dead "
+ "(This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it)",
+ blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && P_HAS_GARBAGE(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" has garbage items",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" has garbage items",
+ blocknum, RelationGetRelationName(state->rel));
return page;
}
@@ -2971,14 +3023,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
BLCKSZ - sizeof(BTPageOpaqueData))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("line pointer points past end of tuple space in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "line pointer points past end of tuple space in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
/*
* Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree
@@ -2987,14 +3038,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
*/
if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "invalid line pointer storage in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
return itemid;
}
@@ -3016,26 +3066,23 @@ BTreeTupleGetHeapTIDCareful(BtreeCheckState *state, IndexTuple itup,
*/
Assert(state->heapkeyspace);
if (BTreeTupleIsPivot(itup) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
if (!BTreeTupleIsPivot(itup) && !nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
htid = BTreeTupleGetHeapTID(itup);
if (!ItemPointerIsValid(htid) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
return htid;
}
@@ -3066,3 +3113,52 @@ BTreeTupleGetPointsToTID(IndexTuple itup)
/* Pivot tuple returns TID with downlink block (heapkeyspace variant) */
return &itup->t_tid;
}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_btreeam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(BTREECHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == BTREECHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * confess
+ *
+ * Return a message about index corruption
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg)
+{
+ Datum values[BTREECHECK_RELATION_COLS];
+ bool nulls[BTREECHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(blkno);
+ nulls[0] = (blkno == InvalidBlockNumber);
+ values[1] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..3e47b717f1
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,884 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --schema=PATTERN check all relations in the specified schema(s)",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified "
+ "schema(s)",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -i, --check-indexes check associated btree indexes, if any",
+ " -I, --exclude-indexes do NOT check associated btree indexes",
+ " --strict-names require table and/or schema include patterns "
+ "to match at least one entity each",
+ " -b, --startblock check relations beginning at the given "
+ "starting block number",
+ " -e, --endblock check relations only up to the given ending "
+ "block number",
+ " -f, --skip-all-frozen do not check blocks marked as all frozen",
+ " -v, --skip-all-visible do not check blocks marked as all visible",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen "
+ "automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes for tables */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of OIDs of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+
+static void check_tables(SimpleOidList *checklist);
+static void check_table(Oid tbloid);
+static void check_indexes(Oid tbloid);
+static void check_index(Oid tbloid, Oid idxoid);
+
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions * connOpts);
+static void usage(void);
+static void showVersion(void);
+
+static void NoticeProcessor(void *arg, const char *message);
+
+static void expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_table_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(SimpleOidList *include_nsp,
+ SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl,
+ SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+
+static void die_on_query_failure(const char *query);
+static void ExecuteSqlStatement(const char *query);
+static PGresult *ExecuteSqlQuery(const char *query, ExecStatusType status);
+static PGresult *ExecuteSqlQueryForSingleRow(const char *query);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+#define NOPAGER 0
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ values[4] = (connOpts.dbname == NULL) ? "postgres" : connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /* Expand schema selection patterns into OID lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+ expand_schema_name_patterns(&schema_exclude_patterns,
+ &schema_exclude_oids,
+ false);
+ /* non-matching exclusion patterns aren't an error */
+
+ /* Expand table selection patterns into OID lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+ expand_table_name_patterns(&table_exclude_patterns,
+ &table_exclude_oids,
+ false);
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ check_table(cell->val);
+ if (settings.check_indexes)
+ check_indexes(cell->val);
+ }
+}
+
+static void
+check_table(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+ char *skip;
+ const char *stop;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (settings.startblock == NULL)
+ settings.startblock = pg_strdup("NULL");
+ if (settings.endblock == NULL)
+ settings.endblock = pg_strdup("NULL");
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("NULL");
+ stop = (settings.on_error_stop) ? "true" : "false";
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(relname=%s,blkno=%s,offnum=%s,lp_off=%s,lp_flags=%s,"
+ "lp_len=%s,attnum=%s,chunk=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* lp_off */
+ PQgetvalue(res, i, 4), /* lp_flags */
+ PQgetvalue(res, i, 5), /* lp_len */
+ PQgetvalue(res, i, 6), /* attnum */
+ PQgetvalue(res, i, 7), /* chunk */
+ PQgetvalue(res, i, 8)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_indexes(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ check_index(tbloid, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_index(Oid tbloid, Oid idxoid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT ct.relname, ci.relname, blkno, msg"
+ "\nFROM verify_btreeam(%u,%s),"
+ "\n pg_catalog.pg_class ci,"
+ "\n pg_catalog.pg_class ct"
+ "\nWHERE ci.oid = %u"
+ "\n AND ct.oid = %u",
+ idxoid,
+ settings.on_error_stop ? "true" : "false",
+ idxoid, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(table=%s,index=%s,blkno=%s)"
+ "\n%s\n",
+ PQgetvalue(res, i, 0), /* table relname */
+ PQgetvalue(res, i, 1), /* index relname */
+ PQgetvalue(res, i, 2), /* index blkno */
+ PQgetvalue(res, i, 3)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions * connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"startblock", required_argument, NULL, 'b'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"host", required_argument, NULL, 'h'},
+ {"check-indexes", no_argument, NULL, 'i'},
+ {"exclude-indexes", no_argument, NULL, 'I'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"schema", required_argument, NULL, 'n'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"port", required_argument, NULL, 'p'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"help", optional_argument, NULL, '?'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "b:d:e:fh:iIn:N:op:st:T:U:vVwW?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ settings.check_indexes = true;
+ break;
+ case 'I':
+ settings.check_indexes = false;
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ FILE *output;
+ int lines;
+ int lineno;
+
+ for (lines = 0; usage_text[lines]; lines++)
+ ;
+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+
+ ClosePager(output);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+static void
+expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list. See also expand_dbname_patterns()
+ * in pg_dumpall.c
+ */
+static void
+expand_table_name_patterns(SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(settings.db, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+static void
+append_csv_oids(PQExpBuffer query, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(query, "%s%u", comma, cell->val);
+}
+
+static bool
+append_filter(PQExpBuffer query, const char *lval, const char *operator,
+ const SimpleOidList *oids)
+{
+ if (!oids->head)
+ return false;
+ appendPQExpBuffer(query, "\nAND %s %s ANY(array[\n", lval, operator);
+ append_csv_oids(query, oids);
+ appendPQExpBuffer(query, "\n])");
+ return true;
+}
+
+static void
+get_table_check_list(SimpleOidList *include_nsp, SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl, SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.=)", include_nsp);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.!=)", exclude_nsp);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.=)", include_tbl);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.!=)", exclude_tbl);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+static void
+ExecuteSqlStatement(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ die_on_query_failure(query);
+ PQclear(res);
+}
+
+static PGresult *
+ExecuteSqlQuery(const char *query, ExecStatusType status)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != status)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute an SQL query and verify that we got exactly one row back.
+ */
+static PGresult *
+ExecuteSqlQueryForSingleRow(const char *query)
+{
+ PGresult *res;
+ int ntups;
+
+ res = ExecuteSqlQuery(query, PGRES_TUPLES_OK);
+
+ /* Expecting a single result only */
+ ntups = PQntuples(res);
+ if (ntups != 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+
+ return res;
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..c63ba4452e
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,55 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..de3ce54e8e
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,85 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+CREATE SCHEMA s1;
+CREATE SCHEMA s2;
+CREATE SCHEMA s3;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE TABLE s1.t3 (a TEXT);
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE TABLE s2.t3 (a TEXT);
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE TABLE s3.t3 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+CREATE INDEX i3 ON s1.t3(a);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+CREATE INDEX i3 ON s2.t3(a);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+CREATE INDEX i3 ON s3.t3(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas and tables implicitly');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-i', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+;$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1'
+ ],
+ 'pg_amcheck all tables in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1'
+ ],
+ 'pg_amcheck all tables not in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-i', '-n', 's*', '-t', 't1'
+ ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..a96b763886
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,407 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 36;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 12;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '--check-indexes', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin = 3 precedes relation relfrozenxid = $relfrozenxid
+0|2|8064|1|58|||tuple xmin = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|3|8000|1|58|||tuple xmax = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|4|7936|1|58|||t_hoff > lp_len (152 > 58)
+0|5|7872|1|58|||t_hoff not max-aligned (27)
+0|6|7808|1|58|||t_hoff < SizeofHeapTupleHeader (16 < 23)
+0|7|7744|1|58|||t_hoff < SizeofHeapTupleHeader (21 < 23)
+0|7|7744|1|58|||t_hoff not max-aligned (21)
+0|8|7680|1|58|||relation natts < tuple natts (3 < 2047)
+0|9|7616|1|58|||SizeofHeapTupleHeader + BITMAPLEN(natts) > t_hoff (23 + 256 > 24)
+0|10|7552|1|58|||relation natts < tuple natts (3 < 67)
+0|11|7488|1|58|2||t_hoff + offset > lp_len (24 + 416847976 > 58)
+0|12|7424|1|58|2|0|final chunk number differs from expected (0 vs. 6)
+0|12|7424|1|58|2|0|toasted value missing from toast table",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,lp_off=\d*,lp_flags=\d*,lp_len=\d*,attnum=\d*,chunk=\d*/,
+
+ # individual detected corruptions
+ qr/tuple xmin = \d+ precedes relation relfrozenxid = \d+/,
+ qr/tuple xmax = \d+ precedes relation relfrozenxid = \d+/,
+ qr/t_hoff > lp_len/,
+ qr/t_hoff not max-aligned/,
+ qr/t_hoff < SizeofHeapTupleHeader/,
+ qr/relation natts < tuple natts/,
+ qr/SizeofHeapTupleHeader \+ BITMAPLEN\(natts\) > t_hoff/,
+ qr/t_hoff \+ offset > lp_le/,
+ qr/final chunk number differs from expected/,
+ qr/toasted value missing from toast table/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 75518a7820..cc36d92f72 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -69,7 +69,7 @@ AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
@@ -165,6 +165,110 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if on_error_stop is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..f606e42fb9 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pg_amcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..10e1ca9663 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pg_amcheck SYSTEM "pg_amcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pg_amcheck.sgml b/doc/src/sgml/pg_amcheck.sgml
new file mode 100644
index 0000000000..f379af2258
--- /dev/null
+++ b/doc/src/sgml/pg_amcheck.sgml
@@ -0,0 +1,136 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-scheam</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "corrupted",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck -i test
+(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 3289393 is in the future
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmax = 0 precedes relation relminmxid = 1
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 12593 is in the future
+</screen>
+
+ <para>
+ .... many pages of output removed for brevity ....
+ </para>
+
+<screen>
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+tuple xmin = 305 precedes relation relfrozenxid = 487
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff > lp_len (54 > 34)
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff not max-aligned (54)
+</screen>
+
+ <para>
+ Each detected corruption is reported on two lines, the first line shows the
+ location and the second line shows a message describing the problem.
+ </para>
+ </sect2>
+</sect1>
--
2.21.1 (Apple Git-122.3)
v7-0002-Adding-checks-of-invalid-combinations-of-hint-bit.patchapplication/octet-stream; name=v7-0002-Adding-checks-of-invalid-combinations-of-hint-bit.patch; x-unix-mode=0644Download
From a611cd647372231c5fb3c54c8a2fa18d84ef7d2a Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Fri, 12 Jun 2020 13:26:14 -0700
Subject: [PATCH v7 2/2] Adding checks of invalid combinations of hint bits
Per code review by Dilip Kumar, adding a corruption check for
HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED being set simultaneously
in an on-disk tuple.
While doing that, I noticed that the heap/README.tuplock file
documents that HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI are never
set simultaneously, so adding a corruption check for that, too.
Since some clever hacker in the future may notice that these
combinations never occur on disk and use them to mean some new
thing, adding an Assert for each of them in RelationPutHeapTuple
with comments about amcheck's expectations.
---
contrib/amcheck/verify_heapam.c | 12 ++++++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 27 ++++++++++++++++++++---
src/backend/access/heap/hio.c | 11 +++++++++
3 files changed, 47 insertions(+), 3 deletions(-)
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 1bddff7fc6..f5e68906b2 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -954,6 +954,18 @@ check_tuple(HeapCheckContext * ctx)
ctx->tuphdr->t_hoff));
fatal = true;
}
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set"));
+ }
/*
* If the tuple has nulls, check that the implied length of the variable
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
index a96b763886..b2c1f36928 100644
--- a/contrib/pg_amcheck/t/004_verify_heapam.pl
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -4,7 +4,7 @@ use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 36;
+use Test::More tests => 42;
# This regression test demonstrates that the verify_heapam() function supplied
# with the amcheck contrib module and depended upon by this pg_amcheck contrib
@@ -215,7 +215,7 @@ $node->safe_psql(
my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
my $relpath = "$pgdata/$rel";
-use constant ROWCOUNT => 12;
+use constant ROWCOUNT => 14;
$node->safe_psql('postgres', qq(
INSERT INTO public.test (a, b, c)
VALUES (
@@ -250,10 +250,14 @@ $node->stop;
# Some #define constants from access/htup_details.h for use while corrupting.
use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
use constant HEAP_XMIN_COMMITTED => 0x0100;
use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
use constant HEAP_XMAX_INVALID => 0x0800;
use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
# Corrupt the tuples, one type of corruption per tuple. Some types of
# corruption cause verify_heapam to skip to the next tuple without
@@ -350,6 +354,18 @@ for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
$tup->{c6} = 41;
$tup->{c7} = 41;
}
+ elsif ($tupidx == 12)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+ }
+ elsif ($tupidx == 13)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ }
write_tuple($file, $offset, $tup);
}
close($file);
@@ -374,7 +390,10 @@ is ($result,
0|10|7552|1|58|||relation natts < tuple natts (3 < 67)
0|11|7488|1|58|2||t_hoff + offset > lp_len (24 + 416847976 > 58)
0|12|7424|1|58|2|0|final chunk number differs from expected (0 vs. 6)
-0|12|7424|1|58|2|0|toasted value missing from toast table",
+0|12|7424|1|58|2|0|toasted value missing from toast table
+0|13|7360|1|58|||HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set
+0|14|7296|1|58|||tuple xmax = 0 precedes relation relminmxid = 1
+0|14|7296|1|58|||HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set",
"Expected verify_heapam output");
# Each table corruption message is returned with a standard header, and we can
@@ -396,6 +415,8 @@ my @corruption_re = (
qr/t_hoff \+ offset > lp_le/,
qr/final chunk number differs from expected/,
qr/toasted value missing from toast table/,
+ qr/HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set/,
+ qr/HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set/,
);
$node->command_like(
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..00de10b7c9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you decide to disable one or more of these
+ * assertions, make corresponding changes to contrib/amcheck.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
--
2.21.1 (Apple Git-122.3)
On 2020-06-12 23:06, Mark Dilger wrote:
[v7-0001-Adding-verify_heapam-and-pg_amcheck.patch]
[v7-0002-Adding-checks-o...ations-of-hint-bit.patch]
I came across these typos in the sgml:
--exclude-scheam should be
--exclude-schema
<option>table</option> should be
<option>--table</option>
I found this connection problem (or perhaps it is as designed):
$ env | grep ^PG
PGPORT=6965
PGPASSFILE=/home/aardvark/.pg_aardvark
PGDATABASE=testdb
PGDATA=/home/aardvark/pg_stuff/pg_installations/pgsql.amcheck/data
-- just to show that psql is connecting (via $PGPASSFILE and $PGPORT and
$PGDATABASE):
-- and showing a table t that I made earlier
$ psql
SET
Timing is on.
psql (14devel_amcheck_0612_2f48)
Type "help" for help.
testdb=# \dt+ t
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+------+-------+----------+-------------+--------+-------------
public | t | table | aardvark | permanent | 346 MB |
(1 row)
testdb=# \q
I think this should work:
$ pg_amcheck -i -t t
pg_amcheck: error: no matching tables were found
It seems a bug that I have to add '-d testdb':
This works OK:
pg_amcheck -i -t t -d testdb
Is that error as expected?
thanks,
Erik Rijkers
On Jun 13, 2020, at 2:13 PM, Erik Rijkers <er@xs4all.nl> wrote:
Thanks for the review!
On 2020-06-12 23:06, Mark Dilger wrote:
[v7-0001-Adding-verify_heapam-and-pg_amcheck.patch]
[v7-0002-Adding-checks-o...ations-of-hint-bit.patch]I came across these typos in the sgml:
--exclude-scheam should be
--exclude-schema<option>table</option> should be
<option>--table</option>
Yeah, I agree and have made these changes for v8.
I found this connection problem (or perhaps it is as designed):
$ env | grep ^PG
PGPORT=6965
PGPASSFILE=/home/aardvark/.pg_aardvark
PGDATABASE=testdb
PGDATA=/home/aardvark/pg_stuff/pg_installations/pgsql.amcheck/data-- just to show that psql is connecting (via $PGPASSFILE and $PGPORT and $PGDATABASE):
-- and showing a table t that I made earlier$ psql
SET
Timing is on.
psql (14devel_amcheck_0612_2f48)
Type "help" for help.testdb=# \dt+ t
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+------+-------+----------+-------------+--------+-------------
public | t | table | aardvark | permanent | 346 MB |
(1 row)testdb=# \q
I think this should work:
$ pg_amcheck -i -t t
pg_amcheck: error: no matching tables were foundIt seems a bug that I have to add '-d testdb':
This works OK:
pg_amcheck -i -t t -d testdbIs that error as expected?
It was expected, but looking more broadly at other tools, your expectations seem to be more typical. I've changed it in v8. Thanks again for having a look at this patch!
Note that I've merge the two patches (v7-0001 and v7-0002) back into a single patch, since the separation introduced in v7 was only for illustration of changes in v7.
Attachments:
v8-0001-Adding-verify_heapam-and-pg_amcheck.patchapplication/octet-stream; name=v8-0001-Adding-verify_heapam-and-pg_amcheck.patch; x-unix-mode=0644Download
From 0e93328bb4c98d8e517085d0919ca0797ce1e188 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sat, 13 Jun 2020 14:21:39 -0700
Subject: [PATCH v8] Adding verify_heapam and pg_amcheck
Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it. This is
used by the new pg_amcheck command line tool for reporting back to
the caller.
---
contrib/Makefile | 1 +
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 54 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/amcheck.h | 5 +
contrib/amcheck/expected/check_btree.out | 31 +
contrib/amcheck/expected/check_heap.out | 58 +
.../amcheck/expected/disallowed_reltypes.out | 48 +
contrib/amcheck/sql/check_btree.sql | 10 +
contrib/amcheck/sql/check_heap.sql | 34 +
contrib/amcheck/sql/disallowed_reltypes.sql | 48 +
contrib/amcheck/t/skipping.pl | 101 ++
contrib/amcheck/verify_heapam.c | 1036 +++++++++++++++++
contrib/amcheck/verify_nbtree.c | 750 ++++++------
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 894 ++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 55 +
contrib/pg_amcheck/t/003_check.pl | 85 ++
contrib/pg_amcheck/t/004_verify_heapam.pl | 428 +++++++
doc/src/sgml/amcheck.sgml | 106 +-
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pg_amcheck.sgml | 136 +++
src/backend/access/heap/hio.c | 11 +
26 files changed, 3611 insertions(+), 331 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/skipping.pl
create mode 100644 contrib/amcheck/verify_heapam.c
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 doc/src/sgml/pg_amcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..c21c27cbeb 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..27d38b2e86 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..2ab7d8b0d2
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,54 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(rel regclass,
+ on_error_stop boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
+
+--
+-- verify_btreeam()
+--
+CREATE FUNCTION verify_btreeam(rel regclass,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+CREATE FUNCTION verify_btreeam(rel regclass,
+ on_error_stop boolean,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_btreeam(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION verify_btreeam(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..74edfc2f65
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index f82f48d23b..c1acf238d7 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -21,6 +21,8 @@ SELECT bt_index_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_check
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_parent_check
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
-- to run this cluster-wide with a restricted account, and as tested
@@ -29,6 +31,7 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -42,23 +45,31 @@ SELECT bt_index_parent_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
ERROR: "bttest_a" is not an index
SELECT bt_index_parent_check('bttest_a');
ERROR: "bttest_a" is not an index
+SELECT * FROM verify_btreeam('bttest_a');
+ERROR: "bttest_a" is not an index
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
ERROR: could not open relation with OID 17
SELECT bt_index_parent_check(17);
ERROR: could not open relation with OID 17
+SELECT * FROM verify_btreeam(17);
+ERROR: could not open relation with OID 17
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
ERROR: only B-Tree indexes are supported as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
@@ -67,6 +78,11 @@ SELECT bt_index_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
bt_index_check
@@ -93,6 +109,11 @@ SELECT bt_index_parent_check('bttest_b_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -118,6 +139,11 @@ SELECT bt_index_check('bttest_multi_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
bt_index_parent_check
@@ -134,6 +160,11 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
-- checks
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..6d30ca8023
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,58 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..892ae89652
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_btree.sql b/contrib/amcheck/sql/check_btree.sql
index a1fef644cb..f5d0f8c1f6 100644
--- a/contrib/amcheck/sql/check_btree.sql
+++ b/contrib/amcheck/sql/check_btree.sql
@@ -24,6 +24,7 @@ CREATE ROLE regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx'::regclass);
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
@@ -33,27 +34,33 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
SELECT bt_index_parent_check('bttest_a');
+SELECT * FROM verify_btreeam('bttest_a');
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
SELECT bt_index_parent_check(17);
+SELECT * FROM verify_btreeam(17);
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
SELECT bt_index_parent_check('bttest_b_idx', true);
@@ -61,6 +68,7 @@ SELECT bt_index_parent_check('bttest_b_idx', true);
BEGIN;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_b_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -74,6 +82,7 @@ SELECT bt_index_check('bttest_a_idx', true);
-- normal check outside of xact for index with included columns
SELECT bt_index_check('bttest_multi_idx');
+SELECT * FROM verify_btreeam('bttest_multi_idx');
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
@@ -81,6 +90,7 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
TRUNCATE bttest_multi;
INSERT INTO bttest_multi SELECT i, i%2 FROM generate_series(1, 100000) as i;
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
+SELECT * FROM verify_btreeam('bttest_multi_idx');
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..5759d5526e
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,34 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..fc90e6ca33
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
diff --git a/contrib/amcheck/t/skipping.pl b/contrib/amcheck/t/skipping.pl
new file mode 100644
index 0000000000..e716fc8c33
--- /dev/null
+++ b/contrib/amcheck/t/skipping.pl
@@ -0,0 +1,101 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 183;
+
+my ($node, $result);
+
+# Check various options are stable (don't abort) when running verify_heapam on
+# the test table. For uncorrupted tables, there isn't anything to check except
+# that it runs without crashing.
+sub check_all_options
+{
+ for my $stop (qw(NULL true false))
+ {
+ for my $skip ("NULL", "'all frozen'", "'all visible'")
+ {
+ for my $startblock (qw(NULL 5))
+ {
+ for my $endblock (qw(NULL 10))
+ {
+ my $check = "SELECT verify_heapam('test', $stop, $skip, " .
+ "$startblock, $endblock)";
+ $result = $node->safe_psql('postgres', "$check; SELECT 1");
+ is ($result, 1, "checked: $check");
+ }
+ }
+ }
+ }
+}
+
+# Stops the server and writes nulls in the first page of the table,
+# assuming page size is large enough for offset 1000..1016 to be
+# in the midst of the first page of data.
+sub corrupt_first_page
+{
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('test')));
+ my $relpath = "$pgdata/$rel";
+ $node->stop;
+
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 1000, 0);
+ syswrite($fh, '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', 16);
+ close($fh);
+
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Check empty table
+$node->safe_psql('postgres', q(
+ CREATE TABLE test (a integer);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+));
+check_all_options();
+
+# Check table with trivial data
+$node->safe_psql('postgres', q(INSERT INTO test VALUES (0)));
+check_all_options();
+
+# Check table with non-trivial data (more than a page worth) but
+# without any all frozen or all visible
+$node->safe_psql('postgres', q(
+INSERT INTO test SELECT generate_series(1,10000)));
+check_all_options();
+
+# Check table with all-visible data
+$node->safe_psql('postgres', q(VACUUM test));
+check_all_options();
+
+# Check table with all-frozen data
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+check_all_options();
+
+# Check table with corruption, no skipping
+corrupt_first_page();
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is($result, 't', 'corruption detected on first page');
+
+# Check table with corruption, skipping all visible blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all visible', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all visible first page');
+
+# Check table with corruption, skipping all frozen blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all frozen', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all frozen first page');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..f5e68906b2
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1036 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "amcheck.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam() execution.
+ */
+typedef struct HeapCheckContext
+{
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Values for returning tuples */
+ bool is_corrupt; /* have we encountered any corruption? */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext * ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx);
+static bool check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx);
+static bool check_tuple_attribute(HeapCheckContext * ctx);
+static void check_tuple(HeapCheckContext * ctx);
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+#define HEAPCHECK_RELATION_COLS 8
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ bool randomAccess;
+ HeapCheckContext ctx;
+ FullTransactionId nextFullXid;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool skip_all_frozen = false;
+ bool skip_all_visible = false;
+ int64 startblock = -1;
+ int64 endblock = -1;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ relid = PG_GETARG_OID(0);
+ on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+ if (!PG_ARGISNULL(2))
+ {
+ const char *skip = PG_GETARG_CSTRING(2);
+
+ if (pg_strcasecmp(skip, "all visible") == 0)
+ {
+ skip_all_visible = true;
+ }
+ else if (pg_strcasecmp(skip, "all frozen") == 0)
+ {
+ skip_all_visible = true;
+ skip_all_frozen = true;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all visible', 'all frozen', "
+ "or NULL")));
+ }
+ }
+ if (!PG_ARGISNULL(3))
+ startblock = PG_GETARG_INT64(3);
+ if (!PG_ARGISNULL(4))
+ endblock = PG_GETARG_INT64(4);
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those. This protection saves us from
+ * having to reacquire the locks and recheck those minimums for every
+ * tuple, which would be expensive.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance. We keep a cached copy of the oldest valid xid that we may
+ * encounter in the table, which is relfrozenxid if valid, and oldestXid
+ * otherwise.
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldestValidXid = ctx.relfrozenxid;
+
+ /* check all blocks of the relation */
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
+
+ for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ ctx.tuphdr = NULL;
+ ctx.natts = 0;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = 0; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead/redirected line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) ||
+ ItemIdIsDead(ctx.itemid) ||
+ ItemIdIsRedirected(ctx.itemid))
+ continue;
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, "
+ "or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap AM",
+ RelationGetRelationName(rel))));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext * ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext * ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldestValidXid, xid) &&
+ TransactionIdPrecedes(xid, ctx->nextKnownValidXid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->nextKnownValidXid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * check_tuphdr_xids
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!TransactionIdValidInRel(xvac, ctx))
+ {
+ confess(ctx, psprintf("tuple xvac = %u invalid", xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return false; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx,
+ pstrdup("heap tuple with XMAX_IS_MULTI is "
+ "neither LOCKED_ONLY nor has a "
+ "valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequencenumber is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx, pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("toast chunk is neither short nor extended"));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "exceeds the end chunk sequence "
+ "number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx, psprintf("chunk size %u differs from "
+ "expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * The caller should have iterated to a tuple via
+ * tupleAttributeIteration_next.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext * ctx)
+{
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask = ctx->tuphdr->t_infomask;
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
+ ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+ ctx->tuphdr->t_hoff, ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_1B_E(tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx, pstrdup("attribute is external but tuple header "
+ "flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx, pstrdup("attribute is external but relation has "
+ "no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
+
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ attr = (struct varlena *) redirect.pointer;
+
+ /* nested indirect Datums aren't allowed */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ confess(ctx, pstrdup("attribute has nested external "
+ "indirect toast pointer"));
+ return true;
+ }
+ }
+
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching
+ * va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast
+ * table, accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx, psprintf("final chunk number differs from "
+ "expected (%u vs. %u)",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx, pstrdup("toasted value missing from "
+ "toast table"));
+ systable_endscan_ordered(toastscan);
+ }
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext * ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relminmxid = %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmin = %u precedes relation "
+ "relfrozenxid = %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmin, ctx))
+ {
+ confess(ctx, psprintf("tuple xmin = %u is in the future",
+ xmin));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relfrozenxid = %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmax, ctx))
+ {
+ confess(ctx, psprintf("tuple xmax = %u is in the future",
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff > lp_len (%u > %u)",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx, psprintf("t_hoff not max-aligned (%u)",
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set"));
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the natts field or the infomask
+ * bit HEAP_HASNULL.
+ */
+ if (infomask & HEAP_HASNULL &&
+ SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ {
+ confess(ctx, psprintf("SizeofHeapTupleHeader + "
+ "BITMAPLEN(natts) > t_hoff "
+ "(%u + %u > %u)",
+ (unsigned) SizeofHeapTupleHeader,
+ BITMAPLEN(ctx->natts),
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!check_tuphdr_xids(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx,
+ psprintf("relation natts < tuple natts (%u < %u)",
+ RelationGetDescr(ctx->rel)->natts,
+ ctx->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+ ctx->offset = -1;
+ ctx->attnum = -1;
+}
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index e4d501a85d..bf68b554a8 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -32,16 +32,22 @@
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "commands/tablecmds.h"
+#include "funcapi.h"
#include "lib/bloomfilter.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
#include "storage/smgr.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
-
+#include "amcheck.h"
PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(bt_index_check);
+PG_FUNCTION_INFO_V1(bt_index_parent_check);
+PG_FUNCTION_INFO_V1(verify_btreeam);
+
/*
* A B-Tree cannot possibly have this many levels, since there must be one
* block per level, which is bound by the range of BlockNumber:
@@ -50,6 +56,20 @@ PG_MODULE_MAGIC;
#define BTreeTupleGetNKeyAtts(itup, rel) \
Min(IndexRelationGetNumberOfKeyAttributes(rel), BTreeTupleGetNAtts(itup, rel))
+/*
+ * Context for use within verify_btreeam()
+ */
+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;
+
+#define CONTINUE_CHECKING(ctx) \
+ (ctx == NULL || !((ctx)->is_corrupt && (ctx)->on_error_stop))
+
/*
* State associated with verifying a B-Tree index
*
@@ -116,6 +136,9 @@ typedef struct BtreeCheckState
bloom_filter *filter;
/* Debug counter */
int64 heaptuplespresent;
+
+ /* Error reporting context */
+ BtreeCheckContext *ctx;
} BtreeCheckState;
/*
@@ -133,16 +156,14 @@ typedef struct BtreeLevel
bool istruerootlevel;
} BtreeLevel;
-PG_FUNCTION_INFO_V1(bt_index_check);
-PG_FUNCTION_INFO_V1(bt_index_parent_check);
-
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);
static inline void btree_index_checkable(Relation rel);
static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
- bool rootdescend);
+ bool rootdescend, BtreeCheckContext * ctx);
static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
BtreeLevel level);
static void bt_target_page_check(BtreeCheckState *state);
@@ -185,6 +206,26 @@ static inline ItemPointer BTreeTupleGetHeapTIDCareful(BtreeCheckState *state,
IndexTuple itup, bool nonpivot);
static inline ItemPointer BTreeTupleGetPointsToTID(IndexTuple itup);
+static TupleDesc verify_btreeam_tupdesc(void);
+static void confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg);
+
+/*
+ * Macro for either calling ereport(...) or confess(...) depending on whether
+ * a context for returning the error message exists. Prior to version 1.3,
+ * all functions reported any detected corruption via ereport, but starting in
+ * 1.3, the new function verify_btreeam reports detected corruption back to
+ * the caller as a set of rows, and pre-existing functions continue to report
+ * corruption via ereport. This macro allows the shared implementation to
+ * to do the right thing depending on context.
+ */
+#define econfess(ctx, blkno, code, ...) \
+ do { \
+ if (ctx) \
+ confess(ctx, blkno, psprintf(__VA_ARGS__)); \
+ else \
+ ereport(ERROR, (errcode(code), errmsg(__VA_ARGS__))); \
+ } while(0)
+
/*
* bt_index_check(index regclass, heapallindexed boolean)
*
@@ -203,7 +244,7 @@ bt_index_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 2)
heapallindexed = PG_GETARG_BOOL(1);
- bt_index_check_internal(indrelid, false, heapallindexed, false);
+ bt_index_check_internal(indrelid, false, heapallindexed, false, NULL);
PG_RETURN_VOID();
}
@@ -229,17 +270,66 @@ bt_index_parent_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 3)
rootdescend = PG_GETARG_BOOL(2);
- bt_index_check_internal(indrelid, true, heapallindexed, rootdescend);
+ bt_index_check_internal(indrelid, true, heapallindexed, rootdescend, NULL);
PG_RETURN_VOID();
}
+Datum
+verify_btreeam(PG_FUNCTION_ARGS)
+{
+#define BTREECHECK_RELATION_COLS 2
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ BtreeCheckContext ctx;
+ bool randomAccess;
+ Oid indrelid;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ indrelid = PG_GETARG_OID(0);
+
+ memset(&ctx, 0, sizeof(BtreeCheckContext));
+
+ ctx.on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_btreeam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ bt_index_check_internal(indrelid, true, true, true, &ctx);
+
+ PG_RETURN_NULL();
+}
+
/*
* Helper for bt_index_[parent_]check, coordinating the bulk of the work.
*/
static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
- bool rootdescend)
+ bool rootdescend, BtreeCheckContext * ctx)
{
Oid heapid;
Relation indrel;
@@ -300,15 +390,16 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
RelationOpenSmgr(indrel);
if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" lacks a main relation fork",
- RelationGetRelationName(indrel))));
+ econfess(ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" lacks a main relation fork",
+ RelationGetRelationName(indrel));
/* Check index, possibly against table it is an index on */
- _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
- bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
- heapallindexed, rootdescend);
+ if (CONTINUE_CHECKING(ctx))
+ _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
+ if (CONTINUE_CHECKING(ctx))
+ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
+ heapallindexed, rootdescend, ctx);
}
/*
@@ -402,7 +493,8 @@ btree_index_mainfork_expected(Relation rel)
*/
static void
bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
- bool readonly, bool heapallindexed, bool rootdescend)
+ bool readonly, bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx)
{
BtreeCheckState *state;
Page metapage;
@@ -434,6 +526,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->readonly = readonly;
state->heapallindexed = heapallindexed;
state->rootdescend = rootdescend;
+ state->ctx = ctx;
if (state->heapallindexed)
{
@@ -535,7 +628,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current.level = metad->btm_level;
current.leftmost = metad->btm_root;
current.istruerootlevel = true;
- while (current.leftmost != P_NONE)
+ while (CONTINUE_CHECKING(state->ctx) && current.leftmost != P_NONE)
{
/*
* Verify this level, and get left most page for next level down, if
@@ -544,10 +637,9 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current = bt_check_level_from_leftmost(state, current);
if (current.leftmost == InvalidBlockNumber)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" has no valid pages on level below %u or first level",
- RelationGetRelationName(rel), previouslevel)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" has no valid pages on level below %u or first level",
+ RelationGetRelationName(rel), previouslevel);
previouslevel = current.level;
}
@@ -555,7 +647,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
/*
* * Check whether heap contains unindexed/malformed tuples *
*/
- if (state->heapallindexed)
+ if (CONTINUE_CHECKING(state->ctx) && state->heapallindexed)
{
IndexInfo *indexinfo = BuildIndexInfo(state->rel);
TableScanDesc scan;
@@ -691,18 +783,16 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* checked.
*/
if (state->readonly && P_ISDELETED(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink or sibling link points to deleted block in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "downlink or sibling link points to deleted block in index \"%s\" "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
if (P_RIGHTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u fell off the end of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u fell off the end of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
else
ereport(DEBUG1,
(errcode(ERRCODE_NO_DATA),
@@ -722,16 +812,14 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
if (state->readonly)
{
if (!P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not leftmost in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not leftmost in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
if (level.istruerootlevel && !P_ISROOT(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not true root in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not true root in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
}
/*
@@ -780,21 +868,19 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* so sibling pointers should always be in mutual agreement
*/
if (state->readonly && opaque->btpo_prev != leftcurrent)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("left link/right link pair in index \"%s\" not in agreement",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "left link/right link pair in index \"%s\" not in agreement "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
/* Check level, which must be valid for non-ignorable page */
if (level.level != opaque->btpo.level)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leftmost down link for level points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- current, level.level, opaque->btpo.level)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "leftmost down link for level points to block in index \"%s\" whose level is not one level down "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ current, level.level, opaque->btpo.level);
/* Verify invariants for page */
bt_target_page_check(state);
@@ -803,10 +889,9 @@ nextpage:
/* Try to detect circular links */
if (current == leftcurrent || current == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
leftcurrent = current;
current = opaque->btpo_next;
@@ -850,7 +935,7 @@ nextpage:
/* Free page and associated memory for this iteration */
MemoryContextReset(state->targetcontext);
}
- while (current != P_NONE);
+ while (CONTINUE_CHECKING(state->ctx) && current != P_NONE);
if (state->lowkey)
{
@@ -930,16 +1015,15 @@ bt_target_page_check(BtreeCheckState *state)
P_HIKEY))
{
itup = (IndexTuple) PageGetItem(state->target, itemid);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of high key index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index block=%u natts=%u block type=%s page lsn=%X/%X.",
- state->targetblock,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of high key index tuple attributes in index \"%s\" "
+ "(Index block=%u natts=%u block type=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -949,7 +1033,7 @@ bt_target_page_check(BtreeCheckState *state)
* real item (if any).
*/
for (offset = P_FIRSTDATAKEY(topaque);
- offset <= max;
+ offset <= max && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
ItemId itemid;
@@ -973,16 +1057,15 @@ bt_target_page_check(BtreeCheckState *state)
* frequently, and is surprisingly tolerant of corrupt lp_len fields.
*/
if (tupsize != ItemIdGetLength(itemid))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index tuple size does not equal lp_len in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X.",
- state->targetblock, offset,
- tupsize, ItemIdGetLength(itemid),
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn),
- errhint("This could be a torn page problem.")));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "index tuple size does not equal lp_len in index \"%s\" "
+ "(Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X) "
+ "(This could be a torn page problem)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ tupsize, ItemIdGetLength(itemid),
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check the number of index tuple attributes */
if (!_bt_check_natts(state->rel, state->heapkeyspace, state->target,
@@ -998,17 +1081,16 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X.",
- itid,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of index tuple attributes in index \"%s\" "
+ "(Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1049,14 +1131,13 @@ bt_target_page_check(BtreeCheckState *state)
htid = psprintf("(%u,%u)", ItemPointerGetBlockNumber(tid),
ItemPointerGetOffsetNumber(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("could not find tuple using search from root page in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to heap tid=%s page lsn=%X/%X.",
- itid, htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "could not find tuple using search from root page in index \"%s\" "
+ "(Index tid=%s points to heap tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1079,14 +1160,13 @@ bt_target_page_check(BtreeCheckState *state)
{
char *itid = psprintf("(%u,%u)", state->targetblock, offset);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("posting list contains misplaced TID in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s posting list offset=%d page lsn=%X/%X.",
- itid, i,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "posting list contains misplaced TID in index \"%s\" "
+ "(Index tid=%s posting list offset=%d page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, i,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
ItemPointerCopy(current, &last);
@@ -1134,16 +1214,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index row size %zu exceeds maximum for index \"%s\"",
- tupsize, RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index row size %zu exceeds maximum for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ tupsize, RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Fingerprint leaf page tuples (those that point to the heap) */
@@ -1242,16 +1321,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("high key invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "high key invariant violated for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Reset, in case scantid was set to (itup) posting tuple's max TID */
skey->scantid = scantid;
@@ -1289,21 +1367,20 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Lower index tid=%s (points to %s tid=%s) "
- "higher index tid=%s (points to %s tid=%s) "
- "page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- nitid,
- P_ISLEAF(topaque) ? "heap" : "index",
- nhtid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "item order invariant violated for index \"%s\" "
+ "(Lower index tid=%s (points to %s tid=%s) "
+ "higher index tid=%s (points to %s tid=%s) "
+ "page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ nitid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ nhtid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1354,14 +1431,13 @@ bt_target_page_check(BtreeCheckState *state)
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("cross page item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Last item on page tid=(%u,%u) page lsn=%X/%X.",
- state->targetblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "cross page item order invariant violated for index \"%s\" "
+ "(Last item on page tid=(%u,%u) page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1386,7 +1462,8 @@ bt_target_page_check(BtreeCheckState *state)
* right of the child page pointer to by our rightmost downlink. And they
* might have missing downlinks. This final call checks for them.
*/
- if (!P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
+ if (CONTINUE_CHECKING(state->ctx) &&
+ !P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
{
bt_child_highkey_check(state, InvalidOffsetNumber,
NULL, topaque->btpo.level);
@@ -1708,7 +1785,7 @@ bt_child_highkey_check(BtreeCheckState *state,
}
/* Move to the right on the child level */
- while (true)
+ while (CONTINUE_CHECKING(state->ctx))
{
/*
* Did we traverse the whole tree level and this is check for pages to
@@ -1723,11 +1800,10 @@ bt_child_highkey_check(BtreeCheckState *state,
/* Did we traverse the whole tree level and don't find next downlink? */
if (blkno == P_NONE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't traverse from downlink %u to downlink %u of index \"%s\"",
- state->prevrightlink, downlink,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "can't traverse from downlink %u to downlink %u of index \"%s\"",
+ state->prevrightlink, downlink,
+ RelationGetRelationName(state->rel));
/* Load page contents */
if (blkno == downlink && loaded_child)
@@ -1739,30 +1815,27 @@ bt_child_highkey_check(BtreeCheckState *state,
/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && !P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("the first child of leftmost target page is not leftmost of its level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "the first child of leftmost target page is not leftmost of its level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check level for non-ignorable page */
if (!P_IGNORE(opaque) && opaque->btpo.level != target_level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block found while following rightlinks from child of index \"%s\" has invalid level",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, target_level - 1, opaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "block found while following rightlinks from child of index \"%s\" has invalid level "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, target_level - 1, opaque->btpo.level);
/* Try to detect circular links */
if ((!first && blkno == state->prevrightlink) || blkno == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- blkno, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ blkno, RelationGetRelationName(state->rel));
if (blkno != downlink && !P_IGNORE(opaque))
{
@@ -1825,14 +1898,13 @@ bt_child_highkey_check(BtreeCheckState *state,
if (pivotkey_offset > PageGetMaxOffsetNumber(state->target))
{
if (P_RIGHTMOST(topaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("child high key is greater than rightmost pivot key on target level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "child high key is greater than rightmost pivot key on target level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
pivotkey_offset = P_HIKEY;
}
itemid = PageGetItemIdCareful(state, state->targetblock,
@@ -1856,27 +1928,25 @@ bt_child_highkey_check(BtreeCheckState *state,
* page.
*/
if (!state->lowkey)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't find left sibling high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "can't find left sibling high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
itup = state->lowkey;
}
if (!bt_pivot_tuple_identical(highkey, itup))
{
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("mismatch between parent key and child high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "mismatch between parent key and child high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -2014,17 +2084,16 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* to test.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink to deleted page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child block=%u parent page lsn=%X/%X.",
- state->targetblock, childblock,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted page found in index \"%s\" "
+ "(Parent block=%u child block=%u parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
for (offset = P_FIRSTDATAKEY(copaque);
- offset <= maxoffset;
+ offset <= maxoffset && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
/*
@@ -2056,14 +2125,13 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
if (!invariant_l_nontarget_offset(state, targetkey, childblock, child,
offset))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("down-link lower bound invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X.",
- state->targetblock, childblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "down-link lower bound invariant violated for index \"%s\" "
+ "(Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
pfree(child);
@@ -2150,14 +2218,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* inconsistencies anywhere else.
*/
if (P_ISLEAF(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leaf index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u page lsn=%X/%X.",
- blkno,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "leaf index block lacks downlink in index \"%s\" "
+ "(Block=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/* Descend from the given page, which is an internal page */
elog(DEBUG1, "checking for interrupted multi-level deletion due to missing downlink in index \"%s\"",
@@ -2167,7 +2234,7 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
itemid = PageGetItemIdCareful(state, blkno, page, P_FIRSTDATAKEY(opaque));
itup = (IndexTuple) PageGetItem(page, itemid);
childblk = BTreeTupleGetDownLink(itup);
- for (;;)
+ while (CONTINUE_CHECKING(state->ctx))
{
CHECK_FOR_INTERRUPTS();
@@ -2179,13 +2246,12 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
/* Do an extra sanity check in passing on internal pages */
if (copaque->btpo.level != level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, childblk,
- level - 1, copaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink points to block in index \"%s\" whose level is not one level down "
+ "(Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ level - 1, copaque->btpo.level);
level = copaque->btpo.level;
itemid = PageGetItemIdCareful(state, childblk, child,
@@ -2217,14 +2283,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* parent/ancestor page) lacked a downlink is incidental.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink to deleted leaf page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X.",
- blkno, childblk,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted leaf page found in index \"%s\" "
+ "(Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/*
* Iff leaf page is half-dead, its high key top parent link should point
@@ -2244,14 +2309,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u level=%u page lsn=%X/%X.",
- blkno, opaque->btpo.level,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "internal index block lacks downlink in index \"%s\" "
+ "(Block=%u level=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, opaque->btpo.level,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
}
/*
@@ -2327,16 +2391,12 @@ bt_tuple_present_callback(Relation index, ItemPointer tid, Datum *values,
/* Probe Bloom filter -- tuple should be present */
if (bloom_lacks_element(state->filter, (unsigned char *) norm,
IndexTupleSize(norm)))
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->heaprel),
- RelationGetRelationName(state->rel)),
- !state->readonly
- ? errhint("Retrying verification using the function bt_index_parent_check() might provide a more specific error.")
- : 0));
+ econfess(state->ctx, ItemPointerGetBlockNumber(&(itup->t_tid)), ERRCODE_DATA_CORRUPTED,
+ "heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->heaprel),
+ RelationGetRelationName(state->rel));
state->heaptuplespresent++;
pfree(itup);
@@ -2395,7 +2455,7 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
if (!IndexTupleHasVarwidths(itup))
return itup;
- for (i = 0; i < tupleDescriptor->natts; i++)
+ for (i = 0; CONTINUE_CHECKING(state->ctx) && i < tupleDescriptor->natts; i++)
{
Form_pg_attribute att;
@@ -2415,12 +2475,11 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
* should never be encountered here
*/
if (VARATT_IS_EXTERNAL(DatumGetPointer(normalized[i])))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->rel));
else if (VARATT_IS_COMPRESSED(DatumGetPointer(normalized[i])))
{
formnewtup = true;
@@ -2810,10 +2869,9 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISMETA(opaque) && blocknum != BTREE_METAPAGE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid meta page found at block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid meta page found at block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/* Check page from block that ought to be meta page */
if (blocknum == BTREE_METAPAGE)
@@ -2822,20 +2880,18 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
if (!P_ISMETA(opaque) ||
metad->btm_magic != BTREE_MAGIC)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" meta page is corrupt",
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" meta page is corrupt",
+ RelationGetRelationName(state->rel));
if (metad->btm_version < BTREE_MIN_VERSION ||
metad->btm_version > BTREE_VERSION)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("version mismatch in index \"%s\": file version %d, "
- "current version %d, minimum supported version %d",
- RelationGetRelationName(state->rel),
- metad->btm_version, BTREE_VERSION,
- BTREE_MIN_VERSION)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "version mismatch in index \"%s\": file version %d, "
+ "current version %d, minimum supported version %d",
+ RelationGetRelationName(state->rel),
+ metad->btm_version, BTREE_VERSION,
+ BTREE_MIN_VERSION);
/* Finished with metapage checks */
return page;
@@ -2846,17 +2902,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* page level
*/
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && opaque->btpo.level != 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid leaf page level %u for block %u in index \"%s\"",
- opaque->btpo.level, blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "invalid leaf page level %u for block %u in index \"%s\"",
+ opaque->btpo.level, blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) &&
opaque->btpo.level == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid internal page level 0 for block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid internal page level 0 for block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/*
* Sanity checks for number of items on page.
@@ -2910,17 +2964,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* Internal pages should never have garbage items, either.
*/
if (!P_ISLEAF(opaque) && P_ISHALFDEAD(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" is half-dead",
- blocknum, RelationGetRelationName(state->rel)),
- errhint("This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it.")));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" is half-dead "
+ "(This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it)",
+ blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && P_HAS_GARBAGE(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" has garbage items",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" has garbage items",
+ blocknum, RelationGetRelationName(state->rel));
return page;
}
@@ -2971,14 +3023,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
BLCKSZ - sizeof(BTPageOpaqueData))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("line pointer points past end of tuple space in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "line pointer points past end of tuple space in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
/*
* Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree
@@ -2987,14 +3038,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
*/
if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "invalid line pointer storage in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
return itemid;
}
@@ -3016,26 +3066,23 @@ BTreeTupleGetHeapTIDCareful(BtreeCheckState *state, IndexTuple itup,
*/
Assert(state->heapkeyspace);
if (BTreeTupleIsPivot(itup) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
if (!BTreeTupleIsPivot(itup) && !nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
htid = BTreeTupleGetHeapTID(itup);
if (!ItemPointerIsValid(htid) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
return htid;
}
@@ -3066,3 +3113,52 @@ BTreeTupleGetPointsToTID(IndexTuple itup)
/* Pivot tuple returns TID with downlink block (heapkeyspace variant) */
return &itup->t_tid;
}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_btreeam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(BTREECHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == BTREECHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * confess
+ *
+ * Return a message about index corruption
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg)
+{
+ Datum values[BTREECHECK_RELATION_COLS];
+ bool nulls[BTREECHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(blkno);
+ nulls[0] = (blkno == InvalidBlockNumber);
+ values[1] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..6b57ccf69c
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,894 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --schema=PATTERN check all relations in the specified schema(s)",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified "
+ "schema(s)",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -i, --check-indexes check associated btree indexes, if any",
+ " -I, --exclude-indexes do NOT check associated btree indexes",
+ " --strict-names require table and/or schema include patterns "
+ "to match at least one entity each",
+ " -b, --startblock check relations beginning at the given "
+ "starting block number",
+ " -e, --endblock check relations only up to the given ending "
+ "block number",
+ " -f, --skip-all-frozen do not check blocks marked as all frozen",
+ " -v, --skip-all-visible do not check blocks marked as all visible",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen "
+ "automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes for tables */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of OIDs of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+
+static void check_tables(SimpleOidList *checklist);
+static void check_table(Oid tbloid);
+static void check_indexes(Oid tbloid);
+static void check_index(Oid tbloid, Oid idxoid);
+
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions * connOpts);
+static void usage(void);
+static void showVersion(void);
+
+static void NoticeProcessor(void *arg, const char *message);
+
+static void expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_table_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(SimpleOidList *include_nsp,
+ SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl,
+ SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+
+static void die_on_query_failure(const char *query);
+static void ExecuteSqlStatement(const char *query);
+static PGresult *ExecuteSqlQuery(const char *query, ExecStatusType status);
+static PGresult *ExecuteSqlQueryForSingleRow(const char *query);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+#define NOPAGER 0
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /* Expand schema selection patterns into OID lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+ expand_schema_name_patterns(&schema_exclude_patterns,
+ &schema_exclude_oids,
+ false);
+ /* non-matching exclusion patterns aren't an error */
+
+ /* Expand table selection patterns into OID lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+ expand_table_name_patterns(&table_exclude_patterns,
+ &table_exclude_oids,
+ false);
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ check_table(cell->val);
+ if (settings.check_indexes)
+ check_indexes(cell->val);
+ }
+}
+
+static void
+check_table(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+ char *skip;
+ const char *stop;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (settings.startblock == NULL)
+ settings.startblock = pg_strdup("NULL");
+ if (settings.endblock == NULL)
+ settings.endblock = pg_strdup("NULL");
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("NULL");
+ stop = (settings.on_error_stop) ? "true" : "false";
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(relname=%s,blkno=%s,offnum=%s,lp_off=%s,lp_flags=%s,"
+ "lp_len=%s,attnum=%s,chunk=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* lp_off */
+ PQgetvalue(res, i, 4), /* lp_flags */
+ PQgetvalue(res, i, 5), /* lp_len */
+ PQgetvalue(res, i, 6), /* attnum */
+ PQgetvalue(res, i, 7), /* chunk */
+ PQgetvalue(res, i, 8)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_indexes(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ check_index(tbloid, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_index(Oid tbloid, Oid idxoid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT ct.relname, ci.relname, blkno, msg"
+ "\nFROM verify_btreeam(%u,%s),"
+ "\n pg_catalog.pg_class ci,"
+ "\n pg_catalog.pg_class ct"
+ "\nWHERE ci.oid = %u"
+ "\n AND ct.oid = %u",
+ idxoid,
+ settings.on_error_stop ? "true" : "false",
+ idxoid, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(table=%s,index=%s,blkno=%s)"
+ "\n%s\n",
+ PQgetvalue(res, i, 0), /* table relname */
+ PQgetvalue(res, i, 1), /* index relname */
+ PQgetvalue(res, i, 2), /* index blkno */
+ PQgetvalue(res, i, 3)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions * connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"startblock", required_argument, NULL, 'b'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"host", required_argument, NULL, 'h'},
+ {"check-indexes", no_argument, NULL, 'i'},
+ {"exclude-indexes", no_argument, NULL, 'I'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"schema", required_argument, NULL, 'n'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"port", required_argument, NULL, 'p'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"help", optional_argument, NULL, '?'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "b:d:e:fh:iIn:N:op:st:T:U:vVwW?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ settings.check_indexes = true;
+ break;
+ case 'I':
+ settings.check_indexes = false;
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ FILE *output;
+ int lines;
+ int lineno;
+
+ for (lines = 0; usage_text[lines]; lines++)
+ ;
+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+
+ ClosePager(output);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+static void
+expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list. See also expand_dbname_patterns()
+ * in pg_dumpall.c
+ */
+static void
+expand_table_name_patterns(SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(settings.db, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+static void
+append_csv_oids(PQExpBuffer query, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(query, "%s%u", comma, cell->val);
+}
+
+static bool
+append_filter(PQExpBuffer query, const char *lval, const char *operator,
+ const SimpleOidList *oids)
+{
+ if (!oids->head)
+ return false;
+ appendPQExpBuffer(query, "\nAND %s %s ANY(array[\n", lval, operator);
+ append_csv_oids(query, oids);
+ appendPQExpBuffer(query, "\n])");
+ return true;
+}
+
+static void
+get_table_check_list(SimpleOidList *include_nsp, SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl, SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.=)", include_nsp);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.!=)", exclude_nsp);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.=)", include_tbl);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.!=)", exclude_tbl);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+static void
+ExecuteSqlStatement(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ die_on_query_failure(query);
+ PQclear(res);
+}
+
+static PGresult *
+ExecuteSqlQuery(const char *query, ExecStatusType status)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != status)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute an SQL query and verify that we got exactly one row back.
+ */
+static PGresult *
+ExecuteSqlQueryForSingleRow(const char *query)
+{
+ PGresult *res;
+ int ntups;
+
+ res = ExecuteSqlQuery(query, PGRES_TUPLES_OK);
+
+ /* Expecting a single result only */
+ ntups = PQntuples(res);
+ if (ntups != 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+
+ return res;
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..c63ba4452e
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,55 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..de3ce54e8e
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,85 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+CREATE SCHEMA s1;
+CREATE SCHEMA s2;
+CREATE SCHEMA s3;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE TABLE s1.t3 (a TEXT);
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE TABLE s2.t3 (a TEXT);
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE TABLE s3.t3 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+CREATE INDEX i3 ON s1.t3(a);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+CREATE INDEX i3 ON s2.t3(a);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+CREATE INDEX i3 ON s3.t3(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas and tables implicitly');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-i', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+;$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1'
+ ],
+ 'pg_amcheck all tables in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1'
+ ],
+ 'pg_amcheck all tables not in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-i', '-n', 's*', '-t', 't1'
+ ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..b2c1f36928
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,428 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 42;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 14;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '--check-indexes', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ elsif ($tupidx == 12)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+ }
+ elsif ($tupidx == 13)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin = 3 precedes relation relfrozenxid = $relfrozenxid
+0|2|8064|1|58|||tuple xmin = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|3|8000|1|58|||tuple xmax = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|4|7936|1|58|||t_hoff > lp_len (152 > 58)
+0|5|7872|1|58|||t_hoff not max-aligned (27)
+0|6|7808|1|58|||t_hoff < SizeofHeapTupleHeader (16 < 23)
+0|7|7744|1|58|||t_hoff < SizeofHeapTupleHeader (21 < 23)
+0|7|7744|1|58|||t_hoff not max-aligned (21)
+0|8|7680|1|58|||relation natts < tuple natts (3 < 2047)
+0|9|7616|1|58|||SizeofHeapTupleHeader + BITMAPLEN(natts) > t_hoff (23 + 256 > 24)
+0|10|7552|1|58|||relation natts < tuple natts (3 < 67)
+0|11|7488|1|58|2||t_hoff + offset > lp_len (24 + 416847976 > 58)
+0|12|7424|1|58|2|0|final chunk number differs from expected (0 vs. 6)
+0|12|7424|1|58|2|0|toasted value missing from toast table
+0|13|7360|1|58|||HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set
+0|14|7296|1|58|||tuple xmax = 0 precedes relation relminmxid = 1
+0|14|7296|1|58|||HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,lp_off=\d*,lp_flags=\d*,lp_len=\d*,attnum=\d*,chunk=\d*/,
+
+ # individual detected corruptions
+ qr/tuple xmin = \d+ precedes relation relfrozenxid = \d+/,
+ qr/tuple xmax = \d+ precedes relation relfrozenxid = \d+/,
+ qr/t_hoff > lp_len/,
+ qr/t_hoff not max-aligned/,
+ qr/t_hoff < SizeofHeapTupleHeader/,
+ qr/relation natts < tuple natts/,
+ qr/SizeofHeapTupleHeader \+ BITMAPLEN\(natts\) > t_hoff/,
+ qr/t_hoff \+ offset > lp_le/,
+ qr/final chunk number differs from expected/,
+ qr/toasted value missing from toast table/,
+ qr/HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set/,
+ qr/HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 75518a7820..cc36d92f72 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -69,7 +69,7 @@ AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
@@ -165,6 +165,110 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if on_error_stop is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..f606e42fb9 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pg_amcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..10e1ca9663 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pg_amcheck SYSTEM "pg_amcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pg_amcheck.sgml b/doc/src/sgml/pg_amcheck.sgml
new file mode 100644
index 0000000000..a0b9c9d19b
--- /dev/null
+++ b/doc/src/sgml/pg_amcheck.sgml
@@ -0,0 +1,136 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "corrupted",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck -i test
+(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 3289393 is in the future
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmax = 0 precedes relation relminmxid = 1
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 12593 is in the future
+</screen>
+
+ <para>
+ .... many pages of output removed for brevity ....
+ </para>
+
+<screen>
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+tuple xmin = 305 precedes relation relfrozenxid = 487
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff > lp_len (54 > 34)
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff not max-aligned (54)
+</screen>
+
+ <para>
+ Each detected corruption is reported on two lines, the first line shows the
+ location and the second line shows a message describing the problem.
+ </para>
+ </sect2>
+</sect1>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..00de10b7c9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you decide to disable one or more of these
+ * assertions, make corresponding changes to contrib/amcheck.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
--
2.21.1 (Apple Git-122.3)
On Sat, Jun 13, 2020 at 2:36 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
On Jun 11, 2020, at 11:35 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Fri, Jun 12, 2020 at 12:40 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:On Jun 11, 2020, at 9:14 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have just browsed through the patch and the idea is quite
interesting. I think we can expand it to check that whether the flags
set in the infomask are sane or not w.r.t other flags and xid status.
Some examples are- If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
should not be set in new_infomask2.
- If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
actually cross verify the transaction status from the CLOG and check
whether is matching the hint bit or not.While browsing through the code I could not find that we are doing
this kind of check, ignore if we are already checking this.Thanks for taking a look!
Having both of those bits set simultaneously appears to fall into a different category than what I wrote verify_heapam.c to detect.
Ok
It doesn't violate any assertion in the backend, nor does it cause
the code to crash. (At least, I don't immediately see how it does
either of those things.) At first glance it appears invalid to have
those bits both set simultaneously, but I'm hesitant to enforce that
without good reason. If it is a good thing to enforce, should we also
change the backend code to Assert?Yeah, it may not hit assert or crash but it could lead to a wrong
result. But I agree that it could be an assertion in the backend
code.For v7, I've added an assertion for this. Per heap/README.tuplock, "We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit is set." I added an assertion for that, too. Both new assertions are in RelationPutHeapTuple(). I'm not sure if that is the best place to put the assertion, but I am confident that the assertion needs to only check tuples destined for disk, as in memory tuples can and do violate the assertion.
Also for v7, I've updated contrib/amcheck to report these two conditions as corruption.
What about the other check, like hint bit is saying the
transaction is committed but actually as per the clog the status is
something else. I think in general processing it is hard to check
such things in backend no? because if the hint bit is set saying that
the transaction is committed then we will directly check its
visibility with the snapshot. I think a corruption checker may be a
good tool for catching such anomalies.I already made some design changes to this patch to avoid taking the CLogTruncationLock too often. I'm happy to incorporate this idea, but perhaps you could provide a design on how to do it without all the extra locking? If not, I can try to get this into v8 as an optional check, so users can turn it on at their discretion. Having the check enabled by default is probably a non-starter.
Okay, even I can't think a way to do it without an extra locking.
I have looked into 0001 patch and I have a few comments.
1.
+
+ /* Skip over unused/dead/redirected line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) ||
+ ItemIdIsDead(ctx.itemid) ||
+ ItemIdIsRedirected(ctx.itemid))
+ continue;
Isn't it a good idea to verify the Redirected Itemtid? Because we
will still access the redirected item id to find the
actual tuple from the index scan. Maybe not exactly at this level,
but we can verify that the link itemid store in that
is within the itemid range of the page or not.
2.
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
I think we can also check that if there is no NULL attributes (if
(!(t_infomask & HEAP_HASNULL)) then
ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.
3.
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+ ctx->offset = -1;
+ ctx->attnum = -1;
So we are first setting ctx->offset to 0, then inside
check_tuple_attribute, we will keep updating the offset as we process
the attributes and after the loop is over we set ctx->offset to -1, I
did not understand that why we need to reset it to -1, do we ever
check for that. We don't even initialize the ctx->offset to -1 while
initializing the context for the tuple so I do not understand what is
the meaning of the random value -1.
4.
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("toast chunk is neither short nor extended"));
+ return;
+ }
I think the error message "toast chunk is neither short nor extended".
Because ideally, the toast chunk should not be further toasted.
So I think the check is correct, but the error message is not correct.
5.
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
....
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ PG_RETURN_NULL();
+ }
Don't we need to close the relation/toastrel/toastindexrel in such
return which is without an abort? IIRC, we
will get relcache leak WARNING on commit if we left them open in commit path.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Jun 21, 2020, at 2:54 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have looked into 0001 patch and I have a few comments.
1. + + /* Skip over unused/dead/redirected line pointers */ + if (!ItemIdIsUsed(ctx.itemid) || + ItemIdIsDead(ctx.itemid) || + ItemIdIsRedirected(ctx.itemid)) + continue;Isn't it a good idea to verify the Redirected Itemtid? Because we
will still access the redirected item id to find the
actual tuple from the index scan. Maybe not exactly at this level,
but we can verify that the link itemid store in that
is within the itemid range of the page or not.
Good idea. I've added checks that the redirection is valid, both in terms of being within bounds and in terms of alignment.
2.
+ /* Check for tuple header corruption */ + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader) + { + confess(ctx, + psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)", + ctx->tuphdr->t_hoff, + (unsigned) SizeofHeapTupleHeader)); + fatal = true; + }I think we can also check that if there is no NULL attributes (if
(!(t_infomask & HEAP_HASNULL)) then
ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.
You have to take alignment padding into account, but otherwise yes, and I've added a check for that.
3. + ctx->offset = 0; + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++) + { + if (!check_tuple_attribute(ctx)) + break; + } + ctx->offset = -1; + ctx->attnum = -1;So we are first setting ctx->offset to 0, then inside
check_tuple_attribute, we will keep updating the offset as we process
the attributes and after the loop is over we set ctx->offset to -1, I
did not understand that why we need to reset it to -1, do we ever
check for that. We don't even initialize the ctx->offset to -1 while
initializing the context for the tuple so I do not understand what is
the meaning of the random value -1.
Ahh, right, those are left over from a previous design of the code. Thanks for pointing them out. They are now removed.
4. + if (!VARATT_IS_EXTENDED(chunk)) + { + chunksize = VARSIZE(chunk) - VARHDRSZ; + chunkdata = VARDATA(chunk); + } + else if (VARATT_IS_SHORT(chunk)) + { + /* + * could happen due to heap_form_tuple doing its thing + */ + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT; + chunkdata = VARDATA_SHORT(chunk); + } + else + { + /* should never happen */ + confess(ctx, + pstrdup("toast chunk is neither short nor extended")); + return; + }I think the error message "toast chunk is neither short nor extended".
Because ideally, the toast chunk should not be further toasted.
So I think the check is correct, but the error message is not correct.
I agree the error message was wrongly stated, and I've changed it, but you might suggest a better wording than what I came up with, "corrupt toast chunk va_header".
5.
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock); + check_relation_relkind_and_relam(ctx.rel); + + /* + * Open the toast relation, if any, also protected from concurrent + * vacuums. + */ + if (ctx.rel->rd_rel->reltoastrelid) + { + int offset; + + /* Main relation has associated toast relation */ + ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid, + ShareUpdateExclusiveLock); + offset = toast_open_indexes(ctx.toastrel, .... + if (TransactionIdIsNormal(ctx.relfrozenxid) && + TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid)) + { + confess(&ctx, psprintf("relfrozenxid %u precedes global " + "oldest valid xid %u ", + ctx.relfrozenxid, ctx.oldestValidXid)); + PG_RETURN_NULL(); + }Don't we need to close the relation/toastrel/toastindexrel in such
return which is without an abort? IIRC, we
will get relcache leak WARNING on commit if we left them open in commit path.
Ok, I've added logic to close them.
All changes inspired by your review are included in the v9-0001 patch. The differences since v8 are pulled out into v9_diffs for easier review.
Attachments:
v9-0001-Adding-verify_heapam-and-pg_amcheck.patchapplication/octet-stream; name=v9-0001-Adding-verify_heapam-and-pg_amcheck.patch; x-unix-mode=0644Download
From 1202079cbb5fad678d9dd714c80faf4b54f9c385 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 21 Jun 2020 15:43:35 -0700
Subject: [PATCH v9] Adding verify_heapam and pg_amcheck
Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it. This is
used by the new pg_amcheck command line tool for reporting back to
the caller.
---
contrib/Makefile | 1 +
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 54 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/amcheck.h | 5 +
contrib/amcheck/expected/check_btree.out | 31 +
contrib/amcheck/expected/check_heap.out | 58 +
.../amcheck/expected/disallowed_reltypes.out | 48 +
contrib/amcheck/sql/check_btree.sql | 10 +
contrib/amcheck/sql/check_heap.sql | 34 +
contrib/amcheck/sql/disallowed_reltypes.sql | 48 +
contrib/amcheck/t/skipping.pl | 101 ++
contrib/amcheck/verify_heapam.c | 1082 +++++++++++++++++
contrib/amcheck/verify_nbtree.c | 750 +++++++-----
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 894 ++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 55 +
contrib/pg_amcheck/t/003_check.pl | 85 ++
contrib/pg_amcheck/t/004_verify_heapam.pl | 431 +++++++
doc/src/sgml/amcheck.sgml | 106 +-
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pg_amcheck.sgml | 136 +++
src/backend/access/heap/hio.c | 11 +
26 files changed, 3660 insertions(+), 331 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/skipping.pl
create mode 100644 contrib/amcheck/verify_heapam.c
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 doc/src/sgml/pg_amcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..c21c27cbeb 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..27d38b2e86 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..2ab7d8b0d2
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,54 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(rel regclass,
+ on_error_stop boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
+
+--
+-- verify_btreeam()
+--
+CREATE FUNCTION verify_btreeam(rel regclass,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+CREATE FUNCTION verify_btreeam(rel regclass,
+ on_error_stop boolean,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_btreeam(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION verify_btreeam(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..74edfc2f65
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index f82f48d23b..c1acf238d7 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -21,6 +21,8 @@ SELECT bt_index_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_check
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_parent_check
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
-- to run this cluster-wide with a restricted account, and as tested
@@ -29,6 +31,7 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -42,23 +45,31 @@ SELECT bt_index_parent_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
ERROR: "bttest_a" is not an index
SELECT bt_index_parent_check('bttest_a');
ERROR: "bttest_a" is not an index
+SELECT * FROM verify_btreeam('bttest_a');
+ERROR: "bttest_a" is not an index
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
ERROR: could not open relation with OID 17
SELECT bt_index_parent_check(17);
ERROR: could not open relation with OID 17
+SELECT * FROM verify_btreeam(17);
+ERROR: could not open relation with OID 17
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
ERROR: only B-Tree indexes are supported as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
@@ -67,6 +78,11 @@ SELECT bt_index_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
bt_index_check
@@ -93,6 +109,11 @@ SELECT bt_index_parent_check('bttest_b_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -118,6 +139,11 @@ SELECT bt_index_check('bttest_multi_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
bt_index_parent_check
@@ -134,6 +160,11 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
-- checks
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..6d30ca8023
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,58 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..892ae89652
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_btree.sql b/contrib/amcheck/sql/check_btree.sql
index a1fef644cb..f5d0f8c1f6 100644
--- a/contrib/amcheck/sql/check_btree.sql
+++ b/contrib/amcheck/sql/check_btree.sql
@@ -24,6 +24,7 @@ CREATE ROLE regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx'::regclass);
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
@@ -33,27 +34,33 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
SELECT bt_index_parent_check('bttest_a');
+SELECT * FROM verify_btreeam('bttest_a');
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
SELECT bt_index_parent_check(17);
+SELECT * FROM verify_btreeam(17);
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
SELECT bt_index_parent_check('bttest_b_idx', true);
@@ -61,6 +68,7 @@ SELECT bt_index_parent_check('bttest_b_idx', true);
BEGIN;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_b_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -74,6 +82,7 @@ SELECT bt_index_check('bttest_a_idx', true);
-- normal check outside of xact for index with included columns
SELECT bt_index_check('bttest_multi_idx');
+SELECT * FROM verify_btreeam('bttest_multi_idx');
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
@@ -81,6 +90,7 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
TRUNCATE bttest_multi;
INSERT INTO bttest_multi SELECT i, i%2 FROM generate_series(1, 100000) as i;
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
+SELECT * FROM verify_btreeam('bttest_multi_idx');
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..5759d5526e
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,34 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..fc90e6ca33
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
diff --git a/contrib/amcheck/t/skipping.pl b/contrib/amcheck/t/skipping.pl
new file mode 100644
index 0000000000..e716fc8c33
--- /dev/null
+++ b/contrib/amcheck/t/skipping.pl
@@ -0,0 +1,101 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 183;
+
+my ($node, $result);
+
+# Check various options are stable (don't abort) when running verify_heapam on
+# the test table. For uncorrupted tables, there isn't anything to check except
+# that it runs without crashing.
+sub check_all_options
+{
+ for my $stop (qw(NULL true false))
+ {
+ for my $skip ("NULL", "'all frozen'", "'all visible'")
+ {
+ for my $startblock (qw(NULL 5))
+ {
+ for my $endblock (qw(NULL 10))
+ {
+ my $check = "SELECT verify_heapam('test', $stop, $skip, " .
+ "$startblock, $endblock)";
+ $result = $node->safe_psql('postgres', "$check; SELECT 1");
+ is ($result, 1, "checked: $check");
+ }
+ }
+ }
+ }
+}
+
+# Stops the server and writes nulls in the first page of the table,
+# assuming page size is large enough for offset 1000..1016 to be
+# in the midst of the first page of data.
+sub corrupt_first_page
+{
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('test')));
+ my $relpath = "$pgdata/$rel";
+ $node->stop;
+
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 1000, 0);
+ syswrite($fh, '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', 16);
+ close($fh);
+
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Check empty table
+$node->safe_psql('postgres', q(
+ CREATE TABLE test (a integer);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+));
+check_all_options();
+
+# Check table with trivial data
+$node->safe_psql('postgres', q(INSERT INTO test VALUES (0)));
+check_all_options();
+
+# Check table with non-trivial data (more than a page worth) but
+# without any all frozen or all visible
+$node->safe_psql('postgres', q(
+INSERT INTO test SELECT generate_series(1,10000)));
+check_all_options();
+
+# Check table with all-visible data
+$node->safe_psql('postgres', q(VACUUM test));
+check_all_options();
+
+# Check table with all-frozen data
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+check_all_options();
+
+# Check table with corruption, no skipping
+corrupt_first_page();
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is($result, 't', 'corruption detected on first page');
+
+# Check table with corruption, skipping all visible blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all visible', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all visible first page');
+
+# Check table with corruption, skipping all frozen blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all frozen', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all frozen first page');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..b7ea745964
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1082 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "amcheck.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam() execution.
+ */
+typedef struct HeapCheckContext
+{
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Values for returning tuples */
+ bool is_corrupt; /* have we encountered any corruption? */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext * ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx);
+static bool check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx);
+static bool check_tuple_attribute(HeapCheckContext * ctx);
+static void check_tuple(HeapCheckContext * ctx);
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+#define HEAPCHECK_RELATION_COLS 8
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ bool randomAccess;
+ HeapCheckContext ctx;
+ FullTransactionId nextFullXid;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool fatal = false;
+ bool on_error_stop;
+ bool skip_all_frozen = false;
+ bool skip_all_visible = false;
+ int64 startblock = -1;
+ int64 endblock = -1;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ relid = PG_GETARG_OID(0);
+ on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+ if (!PG_ARGISNULL(2))
+ {
+ const char *skip = PG_GETARG_CSTRING(2);
+
+ if (pg_strcasecmp(skip, "all visible") == 0)
+ {
+ skip_all_visible = true;
+ }
+ else if (pg_strcasecmp(skip, "all frozen") == 0)
+ {
+ skip_all_visible = true;
+ skip_all_frozen = true;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all visible', 'all frozen', "
+ "or NULL")));
+ }
+ }
+ if (!PG_ARGISNULL(3))
+ startblock = PG_GETARG_INT64(3);
+ if (!PG_ARGISNULL(4))
+ endblock = PG_GETARG_INT64(4);
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those. This protection saves us from
+ * having to reacquire the locks and recheck those minimums for every
+ * tuple, which would be expensive.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toastrel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance. We keep a cached copy of the oldest valid xid that we may
+ * encounter in the table, which is relfrozenxid if valid, and oldestXid
+ * otherwise.
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ fatal = true;
+ }
+ else if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ fatal = true;
+ }
+
+ if (fatal)
+ {
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ if (ctx.toastrel)
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldestValidXid = ctx.relfrozenxid;
+
+ /* check all blocks of the relation */
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
+
+ for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+ PageHeader ph;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+ ph = (PageHeader) ctx.page;
+
+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ ctx.tuphdr = NULL;
+ ctx.natts = 0;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = 0; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it redirects
+ * to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ uint16 redirect = ItemIdGetRedirect(ctx.itemid);
+ if (redirect <= SizeOfPageHeaderData || redirect >= ph->pd_lower)
+ {
+ confess(&ctx, psprintf(
+ "Invalid redirect line pointer offset %u out of bounds",
+ (unsigned) redirect));
+ continue;
+ }
+ if ((redirect - SizeOfPageHeaderData) % sizeof(uint16))
+ {
+ confess(&ctx, psprintf(
+ "Invalid redirect line pointer offset %u bad alignment",
+ (unsigned) redirect));
+ }
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, "
+ "or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap AM",
+ RelationGetRelationName(rel))));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext * ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext * ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldestValidXid, xid) &&
+ TransactionIdPrecedes(xid, ctx->nextKnownValidXid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->nextKnownValidXid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * check_tuphdr_xids
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!TransactionIdValidInRel(xvac, ctx))
+ {
+ confess(ctx, psprintf("tuple xvac = %u invalid", xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return false; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx,
+ pstrdup("heap tuple with XMAX_IS_MULTI is "
+ "neither LOCKED_ONLY nor has a "
+ "valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequencenumber is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx, pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("corrupt toast chunk va_header"));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "exceeds the end chunk sequence "
+ "number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx, psprintf("chunk size %u differs from "
+ "expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * The caller should have iterated to a tuple via
+ * tupleAttributeIteration_next.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext * ctx)
+{
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask = ctx->tuphdr->t_infomask;
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
+ ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+ ctx->tuphdr->t_hoff, ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_1B_E(tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx, pstrdup("attribute is external but tuple header "
+ "flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx, pstrdup("attribute is external but relation has "
+ "no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ struct varatt_indirect redirect;
+
+ VARATT_EXTERNAL_GET_POINTER(redirect, attr);
+ attr = (struct varlena *) redirect.pointer;
+
+ /* nested indirect Datums aren't allowed */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
+ confess(ctx, pstrdup("attribute has nested external "
+ "indirect toast pointer"));
+ return true;
+ }
+ }
+
+ if (VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching
+ * va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast
+ * table, accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx, psprintf("final chunk number differs from "
+ "expected (%u vs. %u)",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx, pstrdup("toasted value missing from "
+ "toast table"));
+ systable_endscan_ordered(toastscan);
+ }
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext * ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relminmxid = %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmin = %u precedes relation "
+ "relfrozenxid = %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmin, ctx))
+ {
+ confess(ctx, psprintf("tuple xmin = %u is in the future",
+ xmin));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relfrozenxid = %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmax, ctx))
+ {
+ confess(ctx, psprintf("tuple xmax = %u is in the future",
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff > lp_len (%u > %u)",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx, psprintf("t_hoff not max-aligned (%u)",
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set"));
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the natts field or the infomask
+ * bit HEAP_HASNULL.
+ *
+ * If the tuple does not have nulls, check that no space has been
+ * reserved for the null bitmap.
+ */
+ if (infomask & HEAP_HASNULL)
+ {
+ if (SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ {
+ confess(ctx, psprintf("SizeofHeapTupleHeader + "
+ "BITMAPLEN(natts) > t_hoff "
+ "(%u + %u > %u)",
+ (unsigned) SizeofHeapTupleHeader,
+ BITMAPLEN(ctx->natts),
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+ }
+ else if (MAXALIGN(ctx->tuphdr->t_hoff) != MAXALIGN(SizeofHeapTupleHeader))
+ {
+ confess(ctx,
+ psprintf("t_hoff = %u in tuple without nulls (expected %u)",
+ (unsigned) MAXALIGN(ctx->tuphdr->t_hoff),
+ (unsigned) MAXALIGN(SizeofHeapTupleHeader)));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!check_tuphdr_xids(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx,
+ psprintf("relation natts < tuple natts (%u < %u)",
+ RelationGetDescr(ctx->rel)->natts,
+ ctx->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+}
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index e4d501a85d..bf68b554a8 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -32,16 +32,22 @@
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "commands/tablecmds.h"
+#include "funcapi.h"
#include "lib/bloomfilter.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
#include "storage/smgr.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
-
+#include "amcheck.h"
PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(bt_index_check);
+PG_FUNCTION_INFO_V1(bt_index_parent_check);
+PG_FUNCTION_INFO_V1(verify_btreeam);
+
/*
* A B-Tree cannot possibly have this many levels, since there must be one
* block per level, which is bound by the range of BlockNumber:
@@ -50,6 +56,20 @@ PG_MODULE_MAGIC;
#define BTreeTupleGetNKeyAtts(itup, rel) \
Min(IndexRelationGetNumberOfKeyAttributes(rel), BTreeTupleGetNAtts(itup, rel))
+/*
+ * Context for use within verify_btreeam()
+ */
+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;
+
+#define CONTINUE_CHECKING(ctx) \
+ (ctx == NULL || !((ctx)->is_corrupt && (ctx)->on_error_stop))
+
/*
* State associated with verifying a B-Tree index
*
@@ -116,6 +136,9 @@ typedef struct BtreeCheckState
bloom_filter *filter;
/* Debug counter */
int64 heaptuplespresent;
+
+ /* Error reporting context */
+ BtreeCheckContext *ctx;
} BtreeCheckState;
/*
@@ -133,16 +156,14 @@ typedef struct BtreeLevel
bool istruerootlevel;
} BtreeLevel;
-PG_FUNCTION_INFO_V1(bt_index_check);
-PG_FUNCTION_INFO_V1(bt_index_parent_check);
-
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);
static inline void btree_index_checkable(Relation rel);
static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
- bool rootdescend);
+ bool rootdescend, BtreeCheckContext * ctx);
static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
BtreeLevel level);
static void bt_target_page_check(BtreeCheckState *state);
@@ -185,6 +206,26 @@ static inline ItemPointer BTreeTupleGetHeapTIDCareful(BtreeCheckState *state,
IndexTuple itup, bool nonpivot);
static inline ItemPointer BTreeTupleGetPointsToTID(IndexTuple itup);
+static TupleDesc verify_btreeam_tupdesc(void);
+static void confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg);
+
+/*
+ * Macro for either calling ereport(...) or confess(...) depending on whether
+ * a context for returning the error message exists. Prior to version 1.3,
+ * all functions reported any detected corruption via ereport, but starting in
+ * 1.3, the new function verify_btreeam reports detected corruption back to
+ * the caller as a set of rows, and pre-existing functions continue to report
+ * corruption via ereport. This macro allows the shared implementation to
+ * to do the right thing depending on context.
+ */
+#define econfess(ctx, blkno, code, ...) \
+ do { \
+ if (ctx) \
+ confess(ctx, blkno, psprintf(__VA_ARGS__)); \
+ else \
+ ereport(ERROR, (errcode(code), errmsg(__VA_ARGS__))); \
+ } while(0)
+
/*
* bt_index_check(index regclass, heapallindexed boolean)
*
@@ -203,7 +244,7 @@ bt_index_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 2)
heapallindexed = PG_GETARG_BOOL(1);
- bt_index_check_internal(indrelid, false, heapallindexed, false);
+ bt_index_check_internal(indrelid, false, heapallindexed, false, NULL);
PG_RETURN_VOID();
}
@@ -229,17 +270,66 @@ bt_index_parent_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 3)
rootdescend = PG_GETARG_BOOL(2);
- bt_index_check_internal(indrelid, true, heapallindexed, rootdescend);
+ bt_index_check_internal(indrelid, true, heapallindexed, rootdescend, NULL);
PG_RETURN_VOID();
}
+Datum
+verify_btreeam(PG_FUNCTION_ARGS)
+{
+#define BTREECHECK_RELATION_COLS 2
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ BtreeCheckContext ctx;
+ bool randomAccess;
+ Oid indrelid;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ indrelid = PG_GETARG_OID(0);
+
+ memset(&ctx, 0, sizeof(BtreeCheckContext));
+
+ ctx.on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_btreeam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ bt_index_check_internal(indrelid, true, true, true, &ctx);
+
+ PG_RETURN_NULL();
+}
+
/*
* Helper for bt_index_[parent_]check, coordinating the bulk of the work.
*/
static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
- bool rootdescend)
+ bool rootdescend, BtreeCheckContext * ctx)
{
Oid heapid;
Relation indrel;
@@ -300,15 +390,16 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
RelationOpenSmgr(indrel);
if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" lacks a main relation fork",
- RelationGetRelationName(indrel))));
+ econfess(ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" lacks a main relation fork",
+ RelationGetRelationName(indrel));
/* Check index, possibly against table it is an index on */
- _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
- bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
- heapallindexed, rootdescend);
+ if (CONTINUE_CHECKING(ctx))
+ _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
+ if (CONTINUE_CHECKING(ctx))
+ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
+ heapallindexed, rootdescend, ctx);
}
/*
@@ -402,7 +493,8 @@ btree_index_mainfork_expected(Relation rel)
*/
static void
bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
- bool readonly, bool heapallindexed, bool rootdescend)
+ bool readonly, bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx)
{
BtreeCheckState *state;
Page metapage;
@@ -434,6 +526,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->readonly = readonly;
state->heapallindexed = heapallindexed;
state->rootdescend = rootdescend;
+ state->ctx = ctx;
if (state->heapallindexed)
{
@@ -535,7 +628,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current.level = metad->btm_level;
current.leftmost = metad->btm_root;
current.istruerootlevel = true;
- while (current.leftmost != P_NONE)
+ while (CONTINUE_CHECKING(state->ctx) && current.leftmost != P_NONE)
{
/*
* Verify this level, and get left most page for next level down, if
@@ -544,10 +637,9 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current = bt_check_level_from_leftmost(state, current);
if (current.leftmost == InvalidBlockNumber)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" has no valid pages on level below %u or first level",
- RelationGetRelationName(rel), previouslevel)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" has no valid pages on level below %u or first level",
+ RelationGetRelationName(rel), previouslevel);
previouslevel = current.level;
}
@@ -555,7 +647,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
/*
* * Check whether heap contains unindexed/malformed tuples *
*/
- if (state->heapallindexed)
+ if (CONTINUE_CHECKING(state->ctx) && state->heapallindexed)
{
IndexInfo *indexinfo = BuildIndexInfo(state->rel);
TableScanDesc scan;
@@ -691,18 +783,16 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* checked.
*/
if (state->readonly && P_ISDELETED(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink or sibling link points to deleted block in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "downlink or sibling link points to deleted block in index \"%s\" "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
if (P_RIGHTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u fell off the end of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u fell off the end of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
else
ereport(DEBUG1,
(errcode(ERRCODE_NO_DATA),
@@ -722,16 +812,14 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
if (state->readonly)
{
if (!P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not leftmost in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not leftmost in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
if (level.istruerootlevel && !P_ISROOT(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not true root in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not true root in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
}
/*
@@ -780,21 +868,19 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* so sibling pointers should always be in mutual agreement
*/
if (state->readonly && opaque->btpo_prev != leftcurrent)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("left link/right link pair in index \"%s\" not in agreement",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "left link/right link pair in index \"%s\" not in agreement "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
/* Check level, which must be valid for non-ignorable page */
if (level.level != opaque->btpo.level)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leftmost down link for level points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- current, level.level, opaque->btpo.level)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "leftmost down link for level points to block in index \"%s\" whose level is not one level down "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ current, level.level, opaque->btpo.level);
/* Verify invariants for page */
bt_target_page_check(state);
@@ -803,10 +889,9 @@ nextpage:
/* Try to detect circular links */
if (current == leftcurrent || current == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
leftcurrent = current;
current = opaque->btpo_next;
@@ -850,7 +935,7 @@ nextpage:
/* Free page and associated memory for this iteration */
MemoryContextReset(state->targetcontext);
}
- while (current != P_NONE);
+ while (CONTINUE_CHECKING(state->ctx) && current != P_NONE);
if (state->lowkey)
{
@@ -930,16 +1015,15 @@ bt_target_page_check(BtreeCheckState *state)
P_HIKEY))
{
itup = (IndexTuple) PageGetItem(state->target, itemid);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of high key index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index block=%u natts=%u block type=%s page lsn=%X/%X.",
- state->targetblock,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of high key index tuple attributes in index \"%s\" "
+ "(Index block=%u natts=%u block type=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -949,7 +1033,7 @@ bt_target_page_check(BtreeCheckState *state)
* real item (if any).
*/
for (offset = P_FIRSTDATAKEY(topaque);
- offset <= max;
+ offset <= max && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
ItemId itemid;
@@ -973,16 +1057,15 @@ bt_target_page_check(BtreeCheckState *state)
* frequently, and is surprisingly tolerant of corrupt lp_len fields.
*/
if (tupsize != ItemIdGetLength(itemid))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index tuple size does not equal lp_len in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X.",
- state->targetblock, offset,
- tupsize, ItemIdGetLength(itemid),
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn),
- errhint("This could be a torn page problem.")));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "index tuple size does not equal lp_len in index \"%s\" "
+ "(Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X) "
+ "(This could be a torn page problem)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ tupsize, ItemIdGetLength(itemid),
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check the number of index tuple attributes */
if (!_bt_check_natts(state->rel, state->heapkeyspace, state->target,
@@ -998,17 +1081,16 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X.",
- itid,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of index tuple attributes in index \"%s\" "
+ "(Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1049,14 +1131,13 @@ bt_target_page_check(BtreeCheckState *state)
htid = psprintf("(%u,%u)", ItemPointerGetBlockNumber(tid),
ItemPointerGetOffsetNumber(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("could not find tuple using search from root page in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to heap tid=%s page lsn=%X/%X.",
- itid, htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "could not find tuple using search from root page in index \"%s\" "
+ "(Index tid=%s points to heap tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1079,14 +1160,13 @@ bt_target_page_check(BtreeCheckState *state)
{
char *itid = psprintf("(%u,%u)", state->targetblock, offset);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("posting list contains misplaced TID in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s posting list offset=%d page lsn=%X/%X.",
- itid, i,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "posting list contains misplaced TID in index \"%s\" "
+ "(Index tid=%s posting list offset=%d page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, i,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
ItemPointerCopy(current, &last);
@@ -1134,16 +1214,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index row size %zu exceeds maximum for index \"%s\"",
- tupsize, RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index row size %zu exceeds maximum for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ tupsize, RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Fingerprint leaf page tuples (those that point to the heap) */
@@ -1242,16 +1321,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("high key invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "high key invariant violated for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Reset, in case scantid was set to (itup) posting tuple's max TID */
skey->scantid = scantid;
@@ -1289,21 +1367,20 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Lower index tid=%s (points to %s tid=%s) "
- "higher index tid=%s (points to %s tid=%s) "
- "page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- nitid,
- P_ISLEAF(topaque) ? "heap" : "index",
- nhtid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "item order invariant violated for index \"%s\" "
+ "(Lower index tid=%s (points to %s tid=%s) "
+ "higher index tid=%s (points to %s tid=%s) "
+ "page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ nitid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ nhtid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1354,14 +1431,13 @@ bt_target_page_check(BtreeCheckState *state)
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("cross page item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Last item on page tid=(%u,%u) page lsn=%X/%X.",
- state->targetblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "cross page item order invariant violated for index \"%s\" "
+ "(Last item on page tid=(%u,%u) page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1386,7 +1462,8 @@ bt_target_page_check(BtreeCheckState *state)
* right of the child page pointer to by our rightmost downlink. And they
* might have missing downlinks. This final call checks for them.
*/
- if (!P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
+ if (CONTINUE_CHECKING(state->ctx) &&
+ !P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
{
bt_child_highkey_check(state, InvalidOffsetNumber,
NULL, topaque->btpo.level);
@@ -1708,7 +1785,7 @@ bt_child_highkey_check(BtreeCheckState *state,
}
/* Move to the right on the child level */
- while (true)
+ while (CONTINUE_CHECKING(state->ctx))
{
/*
* Did we traverse the whole tree level and this is check for pages to
@@ -1723,11 +1800,10 @@ bt_child_highkey_check(BtreeCheckState *state,
/* Did we traverse the whole tree level and don't find next downlink? */
if (blkno == P_NONE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't traverse from downlink %u to downlink %u of index \"%s\"",
- state->prevrightlink, downlink,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "can't traverse from downlink %u to downlink %u of index \"%s\"",
+ state->prevrightlink, downlink,
+ RelationGetRelationName(state->rel));
/* Load page contents */
if (blkno == downlink && loaded_child)
@@ -1739,30 +1815,27 @@ bt_child_highkey_check(BtreeCheckState *state,
/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && !P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("the first child of leftmost target page is not leftmost of its level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "the first child of leftmost target page is not leftmost of its level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check level for non-ignorable page */
if (!P_IGNORE(opaque) && opaque->btpo.level != target_level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block found while following rightlinks from child of index \"%s\" has invalid level",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, target_level - 1, opaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "block found while following rightlinks from child of index \"%s\" has invalid level "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, target_level - 1, opaque->btpo.level);
/* Try to detect circular links */
if ((!first && blkno == state->prevrightlink) || blkno == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- blkno, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ blkno, RelationGetRelationName(state->rel));
if (blkno != downlink && !P_IGNORE(opaque))
{
@@ -1825,14 +1898,13 @@ bt_child_highkey_check(BtreeCheckState *state,
if (pivotkey_offset > PageGetMaxOffsetNumber(state->target))
{
if (P_RIGHTMOST(topaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("child high key is greater than rightmost pivot key on target level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "child high key is greater than rightmost pivot key on target level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
pivotkey_offset = P_HIKEY;
}
itemid = PageGetItemIdCareful(state, state->targetblock,
@@ -1856,27 +1928,25 @@ bt_child_highkey_check(BtreeCheckState *state,
* page.
*/
if (!state->lowkey)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't find left sibling high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "can't find left sibling high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
itup = state->lowkey;
}
if (!bt_pivot_tuple_identical(highkey, itup))
{
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("mismatch between parent key and child high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "mismatch between parent key and child high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -2014,17 +2084,16 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* to test.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink to deleted page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child block=%u parent page lsn=%X/%X.",
- state->targetblock, childblock,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted page found in index \"%s\" "
+ "(Parent block=%u child block=%u parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
for (offset = P_FIRSTDATAKEY(copaque);
- offset <= maxoffset;
+ offset <= maxoffset && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
/*
@@ -2056,14 +2125,13 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
if (!invariant_l_nontarget_offset(state, targetkey, childblock, child,
offset))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("down-link lower bound invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X.",
- state->targetblock, childblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "down-link lower bound invariant violated for index \"%s\" "
+ "(Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
pfree(child);
@@ -2150,14 +2218,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* inconsistencies anywhere else.
*/
if (P_ISLEAF(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leaf index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u page lsn=%X/%X.",
- blkno,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "leaf index block lacks downlink in index \"%s\" "
+ "(Block=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/* Descend from the given page, which is an internal page */
elog(DEBUG1, "checking for interrupted multi-level deletion due to missing downlink in index \"%s\"",
@@ -2167,7 +2234,7 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
itemid = PageGetItemIdCareful(state, blkno, page, P_FIRSTDATAKEY(opaque));
itup = (IndexTuple) PageGetItem(page, itemid);
childblk = BTreeTupleGetDownLink(itup);
- for (;;)
+ while (CONTINUE_CHECKING(state->ctx))
{
CHECK_FOR_INTERRUPTS();
@@ -2179,13 +2246,12 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
/* Do an extra sanity check in passing on internal pages */
if (copaque->btpo.level != level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, childblk,
- level - 1, copaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink points to block in index \"%s\" whose level is not one level down "
+ "(Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ level - 1, copaque->btpo.level);
level = copaque->btpo.level;
itemid = PageGetItemIdCareful(state, childblk, child,
@@ -2217,14 +2283,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* parent/ancestor page) lacked a downlink is incidental.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink to deleted leaf page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X.",
- blkno, childblk,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted leaf page found in index \"%s\" "
+ "(Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/*
* Iff leaf page is half-dead, its high key top parent link should point
@@ -2244,14 +2309,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u level=%u page lsn=%X/%X.",
- blkno, opaque->btpo.level,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "internal index block lacks downlink in index \"%s\" "
+ "(Block=%u level=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, opaque->btpo.level,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
}
/*
@@ -2327,16 +2391,12 @@ bt_tuple_present_callback(Relation index, ItemPointer tid, Datum *values,
/* Probe Bloom filter -- tuple should be present */
if (bloom_lacks_element(state->filter, (unsigned char *) norm,
IndexTupleSize(norm)))
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->heaprel),
- RelationGetRelationName(state->rel)),
- !state->readonly
- ? errhint("Retrying verification using the function bt_index_parent_check() might provide a more specific error.")
- : 0));
+ econfess(state->ctx, ItemPointerGetBlockNumber(&(itup->t_tid)), ERRCODE_DATA_CORRUPTED,
+ "heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->heaprel),
+ RelationGetRelationName(state->rel));
state->heaptuplespresent++;
pfree(itup);
@@ -2395,7 +2455,7 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
if (!IndexTupleHasVarwidths(itup))
return itup;
- for (i = 0; i < tupleDescriptor->natts; i++)
+ for (i = 0; CONTINUE_CHECKING(state->ctx) && i < tupleDescriptor->natts; i++)
{
Form_pg_attribute att;
@@ -2415,12 +2475,11 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
* should never be encountered here
*/
if (VARATT_IS_EXTERNAL(DatumGetPointer(normalized[i])))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->rel));
else if (VARATT_IS_COMPRESSED(DatumGetPointer(normalized[i])))
{
formnewtup = true;
@@ -2810,10 +2869,9 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISMETA(opaque) && blocknum != BTREE_METAPAGE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid meta page found at block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid meta page found at block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/* Check page from block that ought to be meta page */
if (blocknum == BTREE_METAPAGE)
@@ -2822,20 +2880,18 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
if (!P_ISMETA(opaque) ||
metad->btm_magic != BTREE_MAGIC)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" meta page is corrupt",
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" meta page is corrupt",
+ RelationGetRelationName(state->rel));
if (metad->btm_version < BTREE_MIN_VERSION ||
metad->btm_version > BTREE_VERSION)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("version mismatch in index \"%s\": file version %d, "
- "current version %d, minimum supported version %d",
- RelationGetRelationName(state->rel),
- metad->btm_version, BTREE_VERSION,
- BTREE_MIN_VERSION)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "version mismatch in index \"%s\": file version %d, "
+ "current version %d, minimum supported version %d",
+ RelationGetRelationName(state->rel),
+ metad->btm_version, BTREE_VERSION,
+ BTREE_MIN_VERSION);
/* Finished with metapage checks */
return page;
@@ -2846,17 +2902,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* page level
*/
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && opaque->btpo.level != 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid leaf page level %u for block %u in index \"%s\"",
- opaque->btpo.level, blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "invalid leaf page level %u for block %u in index \"%s\"",
+ opaque->btpo.level, blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) &&
opaque->btpo.level == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid internal page level 0 for block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid internal page level 0 for block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/*
* Sanity checks for number of items on page.
@@ -2910,17 +2964,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* Internal pages should never have garbage items, either.
*/
if (!P_ISLEAF(opaque) && P_ISHALFDEAD(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" is half-dead",
- blocknum, RelationGetRelationName(state->rel)),
- errhint("This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it.")));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" is half-dead "
+ "(This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it)",
+ blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && P_HAS_GARBAGE(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" has garbage items",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" has garbage items",
+ blocknum, RelationGetRelationName(state->rel));
return page;
}
@@ -2971,14 +3023,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
BLCKSZ - sizeof(BTPageOpaqueData))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("line pointer points past end of tuple space in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "line pointer points past end of tuple space in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
/*
* Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree
@@ -2987,14 +3038,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
*/
if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "invalid line pointer storage in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
return itemid;
}
@@ -3016,26 +3066,23 @@ BTreeTupleGetHeapTIDCareful(BtreeCheckState *state, IndexTuple itup,
*/
Assert(state->heapkeyspace);
if (BTreeTupleIsPivot(itup) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
if (!BTreeTupleIsPivot(itup) && !nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
htid = BTreeTupleGetHeapTID(itup);
if (!ItemPointerIsValid(htid) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
return htid;
}
@@ -3066,3 +3113,52 @@ BTreeTupleGetPointsToTID(IndexTuple itup)
/* Pivot tuple returns TID with downlink block (heapkeyspace variant) */
return &itup->t_tid;
}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_btreeam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(BTREECHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == BTREECHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * confess
+ *
+ * Return a message about index corruption
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg)
+{
+ Datum values[BTREECHECK_RELATION_COLS];
+ bool nulls[BTREECHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(blkno);
+ nulls[0] = (blkno == InvalidBlockNumber);
+ values[1] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..6b57ccf69c
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,894 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --schema=PATTERN check all relations in the specified schema(s)",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified "
+ "schema(s)",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -i, --check-indexes check associated btree indexes, if any",
+ " -I, --exclude-indexes do NOT check associated btree indexes",
+ " --strict-names require table and/or schema include patterns "
+ "to match at least one entity each",
+ " -b, --startblock check relations beginning at the given "
+ "starting block number",
+ " -e, --endblock check relations only up to the given ending "
+ "block number",
+ " -f, --skip-all-frozen do not check blocks marked as all frozen",
+ " -v, --skip-all-visible do not check blocks marked as all visible",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen "
+ "automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes for tables */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of OIDs of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+
+static void check_tables(SimpleOidList *checklist);
+static void check_table(Oid tbloid);
+static void check_indexes(Oid tbloid);
+static void check_index(Oid tbloid, Oid idxoid);
+
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions * connOpts);
+static void usage(void);
+static void showVersion(void);
+
+static void NoticeProcessor(void *arg, const char *message);
+
+static void expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_table_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(SimpleOidList *include_nsp,
+ SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl,
+ SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+
+static void die_on_query_failure(const char *query);
+static void ExecuteSqlStatement(const char *query);
+static PGresult *ExecuteSqlQuery(const char *query, ExecStatusType status);
+static PGresult *ExecuteSqlQueryForSingleRow(const char *query);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+#define NOPAGER 0
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /* Expand schema selection patterns into OID lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+ expand_schema_name_patterns(&schema_exclude_patterns,
+ &schema_exclude_oids,
+ false);
+ /* non-matching exclusion patterns aren't an error */
+
+ /* Expand table selection patterns into OID lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+ expand_table_name_patterns(&table_exclude_patterns,
+ &table_exclude_oids,
+ false);
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ check_table(cell->val);
+ if (settings.check_indexes)
+ check_indexes(cell->val);
+ }
+}
+
+static void
+check_table(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+ char *skip;
+ const char *stop;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (settings.startblock == NULL)
+ settings.startblock = pg_strdup("NULL");
+ if (settings.endblock == NULL)
+ settings.endblock = pg_strdup("NULL");
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("NULL");
+ stop = (settings.on_error_stop) ? "true" : "false";
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(relname=%s,blkno=%s,offnum=%s,lp_off=%s,lp_flags=%s,"
+ "lp_len=%s,attnum=%s,chunk=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* lp_off */
+ PQgetvalue(res, i, 4), /* lp_flags */
+ PQgetvalue(res, i, 5), /* lp_len */
+ PQgetvalue(res, i, 6), /* attnum */
+ PQgetvalue(res, i, 7), /* chunk */
+ PQgetvalue(res, i, 8)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_indexes(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ check_index(tbloid, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_index(Oid tbloid, Oid idxoid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT ct.relname, ci.relname, blkno, msg"
+ "\nFROM verify_btreeam(%u,%s),"
+ "\n pg_catalog.pg_class ci,"
+ "\n pg_catalog.pg_class ct"
+ "\nWHERE ci.oid = %u"
+ "\n AND ct.oid = %u",
+ idxoid,
+ settings.on_error_stop ? "true" : "false",
+ idxoid, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(table=%s,index=%s,blkno=%s)"
+ "\n%s\n",
+ PQgetvalue(res, i, 0), /* table relname */
+ PQgetvalue(res, i, 1), /* index relname */
+ PQgetvalue(res, i, 2), /* index blkno */
+ PQgetvalue(res, i, 3)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions * connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"startblock", required_argument, NULL, 'b'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"host", required_argument, NULL, 'h'},
+ {"check-indexes", no_argument, NULL, 'i'},
+ {"exclude-indexes", no_argument, NULL, 'I'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"schema", required_argument, NULL, 'n'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"port", required_argument, NULL, 'p'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"help", optional_argument, NULL, '?'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "b:d:e:fh:iIn:N:op:st:T:U:vVwW?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ settings.check_indexes = true;
+ break;
+ case 'I':
+ settings.check_indexes = false;
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ FILE *output;
+ int lines;
+ int lineno;
+
+ for (lines = 0; usage_text[lines]; lines++)
+ ;
+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+
+ ClosePager(output);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+static void
+expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list. See also expand_dbname_patterns()
+ * in pg_dumpall.c
+ */
+static void
+expand_table_name_patterns(SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(settings.db, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+static void
+append_csv_oids(PQExpBuffer query, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(query, "%s%u", comma, cell->val);
+}
+
+static bool
+append_filter(PQExpBuffer query, const char *lval, const char *operator,
+ const SimpleOidList *oids)
+{
+ if (!oids->head)
+ return false;
+ appendPQExpBuffer(query, "\nAND %s %s ANY(array[\n", lval, operator);
+ append_csv_oids(query, oids);
+ appendPQExpBuffer(query, "\n])");
+ return true;
+}
+
+static void
+get_table_check_list(SimpleOidList *include_nsp, SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl, SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.=)", include_nsp);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.!=)", exclude_nsp);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.=)", include_tbl);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.!=)", exclude_tbl);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+static void
+ExecuteSqlStatement(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ die_on_query_failure(query);
+ PQclear(res);
+}
+
+static PGresult *
+ExecuteSqlQuery(const char *query, ExecStatusType status)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != status)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute an SQL query and verify that we got exactly one row back.
+ */
+static PGresult *
+ExecuteSqlQueryForSingleRow(const char *query)
+{
+ PGresult *res;
+ int ntups;
+
+ res = ExecuteSqlQuery(query, PGRES_TUPLES_OK);
+
+ /* Expecting a single result only */
+ ntups = PQntuples(res);
+ if (ntups != 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+
+ return res;
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..c63ba4452e
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,55 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..de3ce54e8e
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,85 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+CREATE SCHEMA s1;
+CREATE SCHEMA s2;
+CREATE SCHEMA s3;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE TABLE s1.t3 (a TEXT);
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE TABLE s2.t3 (a TEXT);
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE TABLE s3.t3 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+CREATE INDEX i3 ON s1.t3(a);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+CREATE INDEX i3 ON s2.t3(a);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+CREATE INDEX i3 ON s3.t3(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas and tables implicitly');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-i', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+;$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1'
+ ],
+ 'pg_amcheck all tables in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1'
+ ],
+ 'pg_amcheck all tables not in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-i', '-n', 's*', '-t', 't1'
+ ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..08bce6e68e
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,431 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 42;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 14;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '--check-indexes', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ elsif ($tupidx == 12)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+ }
+ elsif ($tupidx == 13)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin = 3 precedes relation relfrozenxid = $relfrozenxid
+0|2|8064|1|58|||tuple xmin = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|3|8000|1|58|||tuple xmax = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|4|7936|1|58|||t_hoff > lp_len (152 > 58)
+0|4|7936|1|58|||t_hoff = 152 in tuple without nulls (expected 24)
+0|5|7872|1|58|||t_hoff not max-aligned (27)
+0|5|7872|1|58|||t_hoff = 32 in tuple without nulls (expected 24)
+0|6|7808|1|58|||t_hoff < SizeofHeapTupleHeader (16 < 23)
+0|6|7808|1|58|||t_hoff = 16 in tuple without nulls (expected 24)
+0|7|7744|1|58|||t_hoff < SizeofHeapTupleHeader (21 < 23)
+0|7|7744|1|58|||t_hoff not max-aligned (21)
+0|8|7680|1|58|||relation natts < tuple natts (3 < 2047)
+0|9|7616|1|58|||SizeofHeapTupleHeader + BITMAPLEN(natts) > t_hoff (23 + 256 > 24)
+0|10|7552|1|58|||relation natts < tuple natts (3 < 67)
+0|11|7488|1|58|2||t_hoff + offset > lp_len (24 + 416847976 > 58)
+0|12|7424|1|58|2|0|final chunk number differs from expected (0 vs. 6)
+0|12|7424|1|58|2|0|toasted value missing from toast table
+0|13|7360|1|58|||HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set
+0|14|7296|1|58|||tuple xmax = 0 precedes relation relminmxid = 1
+0|14|7296|1|58|||HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,lp_off=\d*,lp_flags=\d*,lp_len=\d*,attnum=\d*,chunk=\d*/,
+
+ # individual detected corruptions
+ qr/tuple xmin = \d+ precedes relation relfrozenxid = \d+/,
+ qr/tuple xmax = \d+ precedes relation relfrozenxid = \d+/,
+ qr/t_hoff > lp_len/,
+ qr/t_hoff not max-aligned/,
+ qr/t_hoff < SizeofHeapTupleHeader/,
+ qr/relation natts < tuple natts/,
+ qr/SizeofHeapTupleHeader \+ BITMAPLEN\(natts\) > t_hoff/,
+ qr/t_hoff \+ offset > lp_le/,
+ qr/final chunk number differs from expected/,
+ qr/toasted value missing from toast table/,
+ qr/HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set/,
+ qr/HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 75518a7820..cc36d92f72 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -69,7 +69,7 @@ AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
@@ -165,6 +165,110 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if on_error_stop is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..f606e42fb9 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pg_amcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..10e1ca9663 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pg_amcheck SYSTEM "pg_amcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pg_amcheck.sgml b/doc/src/sgml/pg_amcheck.sgml
new file mode 100644
index 0000000000..a0b9c9d19b
--- /dev/null
+++ b/doc/src/sgml/pg_amcheck.sgml
@@ -0,0 +1,136 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "corrupted",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck -i test
+(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 3289393 is in the future
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmax = 0 precedes relation relminmxid = 1
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 12593 is in the future
+</screen>
+
+ <para>
+ .... many pages of output removed for brevity ....
+ </para>
+
+<screen>
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+tuple xmin = 305 precedes relation relfrozenxid = 487
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff > lp_len (54 > 34)
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff not max-aligned (54)
+</screen>
+
+ <para>
+ Each detected corruption is reported on two lines, the first line shows the
+ location and the second line shows a message describing the problem.
+ </para>
+ </sect2>
+</sect1>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..00de10b7c9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you decide to disable one or more of these
+ * assertions, make corresponding changes to contrib/amcheck.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
--
2.21.1 (Apple Git-122.3)
v9_diffsapplication/octet-stream; name=v9_diffs; x-unix-mode=0644Download
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index f5e68906b2..b7ea745964 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -113,6 +113,7 @@ verify_heapam(PG_FUNCTION_ARGS)
FullTransactionId nextFullXid;
Buffer vmbuffer = InvalidBuffer;
Oid relid;
+ bool fatal = false;
bool on_error_stop;
bool skip_all_frozen = false;
bool skip_all_visible = false;
@@ -208,6 +209,7 @@ verify_heapam(PG_FUNCTION_ARGS)
else
{
/* Main relation has no associated toast relation */
+ ctx.toastrel = NULL;
ctx.toast_indexes = NULL;
ctx.num_toast_indexes = 0;
}
@@ -234,15 +236,25 @@ verify_heapam(PG_FUNCTION_ARGS)
confess(&ctx, psprintf("relfrozenxid %u precedes global "
"oldest valid xid %u ",
ctx.relfrozenxid, ctx.oldestValidXid));
- PG_RETURN_NULL();
+ fatal = true;
}
-
- if (TransactionIdIsNormal(ctx.relminmxid) &&
+ else if (TransactionIdIsNormal(ctx.relminmxid) &&
TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
{
confess(&ctx, psprintf("relfrozenxid %u precedes global "
"oldest valid xid %u ",
ctx.relfrozenxid, ctx.oldestValidXid));
+ fatal = true;
+ }
+
+ if (fatal)
+ {
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ if (ctx.toastrel)
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
PG_RETURN_NULL();
}
@@ -264,6 +276,7 @@ verify_heapam(PG_FUNCTION_ARGS)
{
int32 mapbits;
OffsetNumber maxoff;
+ PageHeader ph;
/* Optionally skip over all-frozen or all-visible blocks */
if (skip_all_frozen || skip_all_visible)
@@ -281,6 +294,7 @@ verify_heapam(PG_FUNCTION_ARGS)
RBM_NORMAL, ctx.bstrategy);
LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
ctx.page = BufferGetPage(ctx.buffer);
+ ph = (PageHeader) ctx.page;
/* We must unlock the page from the prior iteration, if any */
Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
@@ -302,12 +316,33 @@ verify_heapam(PG_FUNCTION_ARGS)
{
ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
- /* Skip over unused/dead/redirected line pointers */
- if (!ItemIdIsUsed(ctx.itemid) ||
- ItemIdIsDead(ctx.itemid) ||
- ItemIdIsRedirected(ctx.itemid))
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
continue;
+ /*
+ * If this line pointer has been redirected, check that it redirects
+ * to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ uint16 redirect = ItemIdGetRedirect(ctx.itemid);
+ if (redirect <= SizeOfPageHeaderData || redirect >= ph->pd_lower)
+ {
+ confess(&ctx, psprintf(
+ "Invalid redirect line pointer offset %u out of bounds",
+ (unsigned) redirect));
+ continue;
+ }
+ if ((redirect - SizeOfPageHeaderData) % sizeof(uint16))
+ {
+ confess(&ctx, psprintf(
+ "Invalid redirect line pointer offset %u bad alignment",
+ (unsigned) redirect));
+ }
+ continue;
+ }
+
/* Set up context information about this next tuple */
ctx.lp_len = ItemIdGetLength(ctx.itemid);
ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
@@ -637,7 +672,7 @@ check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx)
{
/* should never happen */
confess(ctx,
- pstrdup("toast chunk is neither short nor extended"));
+ pstrdup("corrupt toast chunk va_header"));
return;
}
@@ -972,16 +1007,29 @@ check_tuple(HeapCheckContext * ctx)
* length nulls bitmap field t_bits does not overflow the allowed space.
* We don't know if the corruption is in the natts field or the infomask
* bit HEAP_HASNULL.
+ *
+ * If the tuple does not have nulls, check that no space has been
+ * reserved for the null bitmap.
*/
- if (infomask & HEAP_HASNULL &&
- SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ if (infomask & HEAP_HASNULL)
{
- confess(ctx, psprintf("SizeofHeapTupleHeader + "
- "BITMAPLEN(natts) > t_hoff "
- "(%u + %u > %u)",
- (unsigned) SizeofHeapTupleHeader,
- BITMAPLEN(ctx->natts),
- ctx->tuphdr->t_hoff));
+ if (SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ {
+ confess(ctx, psprintf("SizeofHeapTupleHeader + "
+ "BITMAPLEN(natts) > t_hoff "
+ "(%u + %u > %u)",
+ (unsigned) SizeofHeapTupleHeader,
+ BITMAPLEN(ctx->natts),
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+ }
+ else if (MAXALIGN(ctx->tuphdr->t_hoff) != MAXALIGN(SizeofHeapTupleHeader))
+ {
+ confess(ctx,
+ psprintf("t_hoff = %u in tuple without nulls (expected %u)",
+ (unsigned) MAXALIGN(ctx->tuphdr->t_hoff),
+ (unsigned) MAXALIGN(SizeofHeapTupleHeader)));
fatal = true;
}
@@ -1031,6 +1079,4 @@ check_tuple(HeapCheckContext * ctx)
if (!check_tuple_attribute(ctx))
break;
}
- ctx->offset = -1;
- ctx->attnum = -1;
}
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
index b2c1f36928..08bce6e68e 100644
--- a/contrib/pg_amcheck/t/004_verify_heapam.pl
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -381,8 +381,11 @@ is ($result,
0|2|8064|1|58|||tuple xmin = 4026531839 precedes relation relfrozenxid = $relfrozenxid
0|3|8000|1|58|||tuple xmax = 4026531839 precedes relation relfrozenxid = $relfrozenxid
0|4|7936|1|58|||t_hoff > lp_len (152 > 58)
+0|4|7936|1|58|||t_hoff = 152 in tuple without nulls (expected 24)
0|5|7872|1|58|||t_hoff not max-aligned (27)
+0|5|7872|1|58|||t_hoff = 32 in tuple without nulls (expected 24)
0|6|7808|1|58|||t_hoff < SizeofHeapTupleHeader (16 < 23)
+0|6|7808|1|58|||t_hoff = 16 in tuple without nulls (expected 24)
0|7|7744|1|58|||t_hoff < SizeofHeapTupleHeader (21 < 23)
0|7|7744|1|58|||t_hoff not max-aligned (21)
0|8|7680|1|58|||relation natts < tuple natts (3 < 2047)
On Mon, Jun 22, 2020 at 5:44 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
On Jun 21, 2020, at 2:54 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have looked into 0001 patch and I have a few comments.
1. + + /* Skip over unused/dead/redirected line pointers */ + if (!ItemIdIsUsed(ctx.itemid) || + ItemIdIsDead(ctx.itemid) || + ItemIdIsRedirected(ctx.itemid)) + continue;Isn't it a good idea to verify the Redirected Itemtid? Because we
will still access the redirected item id to find the
actual tuple from the index scan. Maybe not exactly at this level,
but we can verify that the link itemid store in that
is within the itemid range of the page or not.Good idea. I've added checks that the redirection is valid, both in terms of being within bounds and in terms of alignment.
2.
+ /* Check for tuple header corruption */ + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader) + { + confess(ctx, + psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)", + ctx->tuphdr->t_hoff, + (unsigned) SizeofHeapTupleHeader)); + fatal = true; + }I think we can also check that if there is no NULL attributes (if
(!(t_infomask & HEAP_HASNULL)) then
ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.You have to take alignment padding into account, but otherwise yes, and I've added a check for that.
3. + ctx->offset = 0; + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++) + { + if (!check_tuple_attribute(ctx)) + break; + } + ctx->offset = -1; + ctx->attnum = -1;So we are first setting ctx->offset to 0, then inside
check_tuple_attribute, we will keep updating the offset as we process
the attributes and after the loop is over we set ctx->offset to -1, I
did not understand that why we need to reset it to -1, do we ever
check for that. We don't even initialize the ctx->offset to -1 while
initializing the context for the tuple so I do not understand what is
the meaning of the random value -1.Ahh, right, those are left over from a previous design of the code. Thanks for pointing them out. They are now removed.
4. + if (!VARATT_IS_EXTENDED(chunk)) + { + chunksize = VARSIZE(chunk) - VARHDRSZ; + chunkdata = VARDATA(chunk); + } + else if (VARATT_IS_SHORT(chunk)) + { + /* + * could happen due to heap_form_tuple doing its thing + */ + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT; + chunkdata = VARDATA_SHORT(chunk); + } + else + { + /* should never happen */ + confess(ctx, + pstrdup("toast chunk is neither short nor extended")); + return; + }I think the error message "toast chunk is neither short nor extended".
Because ideally, the toast chunk should not be further toasted.
So I think the check is correct, but the error message is not correct.I agree the error message was wrongly stated, and I've changed it, but you might suggest a better wording than what I came up with, "corrupt toast chunk va_header".
5.
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock); + check_relation_relkind_and_relam(ctx.rel); + + /* + * Open the toast relation, if any, also protected from concurrent + * vacuums. + */ + if (ctx.rel->rd_rel->reltoastrelid) + { + int offset; + + /* Main relation has associated toast relation */ + ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid, + ShareUpdateExclusiveLock); + offset = toast_open_indexes(ctx.toastrel, .... + if (TransactionIdIsNormal(ctx.relfrozenxid) && + TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid)) + { + confess(&ctx, psprintf("relfrozenxid %u precedes global " + "oldest valid xid %u ", + ctx.relfrozenxid, ctx.oldestValidXid)); + PG_RETURN_NULL(); + }Don't we need to close the relation/toastrel/toastindexrel in such
return which is without an abort? IIRC, we
will get relcache leak WARNING on commit if we left them open in commit path.Ok, I've added logic to close them.
All changes inspired by your review are included in the v9-0001 patch. The differences since v8 are pulled out into v9_diffs for easier review.
I have reviewed the changes in v9_diffs and looks fine to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Sun, Jun 28, 2020 at 8:59 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Mon, Jun 22, 2020 at 5:44 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:On Jun 21, 2020, at 2:54 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
I have looked into 0001 patch and I have a few comments.
1. + + /* Skip over unused/dead/redirected line pointers */ + if (!ItemIdIsUsed(ctx.itemid) || + ItemIdIsDead(ctx.itemid) || + ItemIdIsRedirected(ctx.itemid)) + continue;Isn't it a good idea to verify the Redirected Itemtid? Because we
will still access the redirected item id to find the
actual tuple from the index scan. Maybe not exactly at this level,
but we can verify that the link itemid store in that
is within the itemid range of the page or not.Good idea. I've added checks that the redirection is valid, both in terms of being within bounds and in terms of alignment.
2.
+ /* Check for tuple header corruption */ + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader) + { + confess(ctx, + psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)", + ctx->tuphdr->t_hoff, + (unsigned) SizeofHeapTupleHeader)); + fatal = true; + }I think we can also check that if there is no NULL attributes (if
(!(t_infomask & HEAP_HASNULL)) then
ctx->tuphdr->t_hoff should be equal to SizeofHeapTupleHeader.You have to take alignment padding into account, but otherwise yes, and I've added a check for that.
3. + ctx->offset = 0; + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++) + { + if (!check_tuple_attribute(ctx)) + break; + } + ctx->offset = -1; + ctx->attnum = -1;So we are first setting ctx->offset to 0, then inside
check_tuple_attribute, we will keep updating the offset as we process
the attributes and after the loop is over we set ctx->offset to -1, I
did not understand that why we need to reset it to -1, do we ever
check for that. We don't even initialize the ctx->offset to -1 while
initializing the context for the tuple so I do not understand what is
the meaning of the random value -1.Ahh, right, those are left over from a previous design of the code. Thanks for pointing them out. They are now removed.
4. + if (!VARATT_IS_EXTENDED(chunk)) + { + chunksize = VARSIZE(chunk) - VARHDRSZ; + chunkdata = VARDATA(chunk); + } + else if (VARATT_IS_SHORT(chunk)) + { + /* + * could happen due to heap_form_tuple doing its thing + */ + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT; + chunkdata = VARDATA_SHORT(chunk); + } + else + { + /* should never happen */ + confess(ctx, + pstrdup("toast chunk is neither short nor extended")); + return; + }I think the error message "toast chunk is neither short nor extended".
Because ideally, the toast chunk should not be further toasted.
So I think the check is correct, but the error message is not correct.I agree the error message was wrongly stated, and I've changed it, but you might suggest a better wording than what I came up with, "corrupt toast chunk va_header".
5.
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock); + check_relation_relkind_and_relam(ctx.rel); + + /* + * Open the toast relation, if any, also protected from concurrent + * vacuums. + */ + if (ctx.rel->rd_rel->reltoastrelid) + { + int offset; + + /* Main relation has associated toast relation */ + ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid, + ShareUpdateExclusiveLock); + offset = toast_open_indexes(ctx.toastrel, .... + if (TransactionIdIsNormal(ctx.relfrozenxid) && + TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid)) + { + confess(&ctx, psprintf("relfrozenxid %u precedes global " + "oldest valid xid %u ", + ctx.relfrozenxid, ctx.oldestValidXid)); + PG_RETURN_NULL(); + }Don't we need to close the relation/toastrel/toastindexrel in such
return which is without an abort? IIRC, we
will get relcache leak WARNING on commit if we left them open in commit path.Ok, I've added logic to close them.
All changes inspired by your review are included in the v9-0001 patch. The differences since v8 are pulled out into v9_diffs for easier review.
I have reviewed the changes in v9_diffs and looks fine to me.
Some more comments on v9_0001.
1.
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
...
...
+
+ for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+ PageHeader ph;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
I might be missing something, but it appears that first we are getting
the nextFullXid and after that, we are scanning the block by block.
So while we are scanning the block if the nextXid is advanced and it
has updated some tuple in the heap pages, then it seems the current
logic will complain about out of range xid. I did not test this
behavior so please point me to the logic which is protecting this.
2.
/*
* Helper function to construct the TupleDesc needed by verify_heapam.
*/
static TupleDesc
verify_heapam_tupdesc(void)
From function name, it appeared that it is verifying tuple descriptor
but this is just creating the tuple descriptor.
3.
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
Here, do we want to test that in VM the all visible bit is set whereas
on the page it is not set? That can lead to a wrong result in an
index-only scan.
4. One cosmetic comment
+ /* Skip non-varlena values, but update offset first */
..
+
+ /* Ok, we're looking at a varlena attribute. */
Throughout the patch, I have noticed that some of your single-line
comments have "full stop" whereas other don't. Can we keep them
consistent?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Jun 28, 2020, at 9:05 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Some more comments on v9_0001. 1. + LWLockAcquire(XidGenLock, LW_SHARED); + nextFullXid = ShmemVariableCache->nextFullXid; + ctx.oldestValidXid = ShmemVariableCache->oldestXid; + LWLockRelease(XidGenLock); + ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid); ... ... + + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++) + { + int32 mapbits; + OffsetNumber maxoff; + PageHeader ph; + + /* Optionally skip over all-frozen or all-visible blocks */ + if (skip_all_frozen || skip_all_visible) + { + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno, + &vmbuffer); + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0) + continue; + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) + continue; + } + + /* Read and lock the next page. */ + ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno, + RBM_NORMAL, ctx.bstrategy); + LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);I might be missing something, but it appears that first we are getting
the nextFullXid and after that, we are scanning the block by block.
So while we are scanning the block if the nextXid is advanced and it
has updated some tuple in the heap pages, then it seems the current
logic will complain about out of range xid. I did not test this
behavior so please point me to the logic which is protecting this.
We know the oldest valid Xid cannot advance, because we hold a lock that would prevent it from doing so. We cannot know that the newest Xid will not advance, but when we see an Xid beyond the end of the known valid range, we check its validity, and either report it as a corruption or advance our idea of the newest valid Xid, depending on that check. That logic is in TransactionIdValidInRel.
2.
/*
* Helper function to construct the TupleDesc needed by verify_heapam.
*/
static TupleDesc
verify_heapam_tupdesc(void)From function name, it appeared that it is verifying tuple descriptor
but this is just creating the tuple descriptor.
In amcheck--1.2--1.3.sql we define a function named verify_heapam which returns a set of records. This is the tuple descriptor for that function. I understand that the name can be parsed as verify_(heapam_tupdesc), but it is meant as (verify_heapam)_tupdesc. Do you have a name you would prefer?
3. + /* Optionally skip over all-frozen or all-visible blocks */ + if (skip_all_frozen || skip_all_visible) + { + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno, + &vmbuffer); + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0) + continue; + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) + continue; + }Here, do we want to test that in VM the all visible bit is set whereas
on the page it is not set? That can lead to a wrong result in an
index-only scan.
If the caller has specified that the corruption check should skip over all-frozen or all-visible data, then we cannot load the page that the VM claims is all-frozen or all-visible without defeating the purpose of the caller having specified these options. Without loading the page, we cannot check the page's header bits.
When not skipping all-visible or all-frozen blocks, we might like to pin both the heap page and the visibility map page in order to compare the two, being careful not to hold a pin on the one while performing I/O on the other. See for example the logic in heap_delete(). But I'm not sure what guarantees the system makes about agreement between these two bits. Certainly, the VM should not claim a page is all visible when it isn't, but are we guaranteed that a page that is all-visible will always have its all-visible bit set? I don't know if (possibly transient) disagreement between these two bits constitutes corruption. Perhaps others following this thread can advise?
4. One cosmetic comment
+ /* Skip non-varlena values, but update offset first */ .. + + /* Ok, we're looking at a varlena attribute. */Throughout the patch, I have noticed that some of your single-line
comments have "full stop" whereas other don't. Can we keep them
consistent?
I try to use a "full stop" at the end of sentences, but not at the end of sentence fragments. To me, a "full stop" means that a sentence has reached its conclusion. I don't intentionally use one at the end of a fragment, unless the fragment precedes a full sentence, in which case the "full stop" is needed to separate the two. Of course, I may have violated my own rule in a few places, but before I submit a v10 patch with comment punctuation changes, perhaps we can agree on what the rule is? (This has probably been discussed before and agreed before. A link to the appropriate email thread would be sufficient.)
For example:
/* red, green, or blue */
/* set to pink */
/* set to blue. We have not closed the file. */
/* At this point, we have chosen the color. */
The first comment is not a sentence, but the fourth is. The third comment is a fragment followed by a full sentence, and a "full stop" separates the two. As for the second comment, as I recall, verb phrases can be interpreted as a full sentence, as in "Close the door!", when they are meant as commands to the listener, but not otherwise. "set to pink" is not a command to the reader, but rather a description of what the code is doing at that point, so I think of it as a mere verb phrase and not a full sentence.
Making matters even more complicated, portions of the logic in verify_heapam were taken from sections of code that would ereport(), elog(), or Assert() on corruption, and when I took such code, I sometimes also took the comments in unmodified form. That means that my normal commenting rules don't apply, as I'm not the comment author in such cases.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
I think there are two very large patches here. One adds checking of
heapam tables to amcheck, and the other adds a binary that eases calling
amcheck from the command line. I think these should be two separate
patches.
I don't know what to think of a module contrib/pg_amcheck. I kinda lean
towards fitting it in src/bin/scripts rather than as a contrib module.
However, it seems a bit weird that it depends on a contrib module.
Maybe amcheck should not be a contrib module at all but rather a new
extension in src/extensions/ that is compiled and installed (in the
filesystem, not in databases) by default.
I strongly agree with hardening backend code so that all the crashes
that Mark has found can be repaired. (We discussed this topic
before[1]/messages/by-id/20200513221051.GA26592@alvherre.pgsql: we'd repair all crashes when run with production code, not
all assertion crashes.)
[1]: /messages/by-id/20200513221051.GA26592@alvherre.pgsql
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Jun 30, 2020, at 11:44 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I think there are two very large patches here. One adds checking of
heapam tables to amcheck, and the other adds a binary that eases calling
amcheck from the command line. I think these should be two separate
patches.
contrib/amcheck has pretty limited regression test coverage. I wrote pg_amcheck in large part because the infrastructure I was writing for testing contrib/amcheck was starting to look like a stand-alone tool, so I made it one. I can split contrib/pg_amcheck into a separate patch, but I would expect reviewers to use it to review contrib/amcheck Say the word, and I'll resubmit as two separate patches.
I don't know what to think of a module contrib/pg_amcheck. I kinda lean
towards fitting it in src/bin/scripts rather than as a contrib module.
However, it seems a bit weird that it depends on a contrib module.
Agreed.
Maybe amcheck should not be a contrib module at all but rather a new
extension in src/extensions/ that is compiled and installed (in the
filesystem, not in databases) by default.
Fine with me, but I'll have to see what others think about that.
I strongly agree with hardening backend code so that all the crashes
that Mark has found can be repaired. (We discussed this topic
before[1]: we'd repair all crashes when run with production code, not
all assertion crashes.)
I'm guessing that hardening the backend would be a separate patch? Or did you want that as part of this one?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2020-Jun-30, Mark Dilger wrote:
I'm guessing that hardening the backend would be a separate patch? Or
did you want that as part of this one?
Lately, to me the foremost criterion to determine what is a separate
patch and what isn't is the way the commit message is structured. If it
looks too much like a bullet list of unrelated things, that suggests
that the commit should be split into one commit per bullet point; of
course, there are counterexamples. But when I have a commit message
that says "I do A, and I also do B because I need it for A", then it
makes more sense to do B first standalone and then A on top. OTOH if
two things are done because they're heavily intermixed (e.g. commit
850196b610d2, bullet points galore), that suggests that one commit is a
decent approach.
Just my opinion, of course.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Jun 28, 2020 at 11:18 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
On Jun 28, 2020, at 9:05 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Some more comments on v9_0001. 1. + LWLockAcquire(XidGenLock, LW_SHARED); + nextFullXid = ShmemVariableCache->nextFullXid; + ctx.oldestValidXid = ShmemVariableCache->oldestXid; + LWLockRelease(XidGenLock); + ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid); ... ... + + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++) + { + int32 mapbits; + OffsetNumber maxoff; + PageHeader ph; + + /* Optionally skip over all-frozen or all-visible blocks */ + if (skip_all_frozen || skip_all_visible) + { + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno, + &vmbuffer); + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0) + continue; + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) + continue; + } + + /* Read and lock the next page. */ + ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno, + RBM_NORMAL, ctx.bstrategy); + LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);I might be missing something, but it appears that first we are getting
the nextFullXid and after that, we are scanning the block by block.
So while we are scanning the block if the nextXid is advanced and it
has updated some tuple in the heap pages, then it seems the current
logic will complain about out of range xid. I did not test this
behavior so please point me to the logic which is protecting this.We know the oldest valid Xid cannot advance, because we hold a lock that would prevent it from doing so. We cannot know that the newest Xid will not advance, but when we see an Xid beyond the end of the known valid range, we check its validity, and either report it as a corruption or advance our idea of the newest valid Xid, depending on that check. That logic is in TransactionIdValidInRel.
That makes sense to me.
2.
/*
* Helper function to construct the TupleDesc needed by verify_heapam.
*/
static TupleDesc
verify_heapam_tupdesc(void)From function name, it appeared that it is verifying tuple descriptor
but this is just creating the tuple descriptor.In amcheck--1.2--1.3.sql we define a function named verify_heapam which returns a set of records. This is the tuple descriptor for that function. I understand that the name can be parsed as verify_(heapam_tupdesc), but it is meant as (verify_heapam)_tupdesc. Do you have a name you would prefer?
Not very particular, but if we have a name like
verify_heapam_get_tupdesc, But, just a suggestion so it's your choice
if you prefer the current name I have no objection.
3. + /* Optionally skip over all-frozen or all-visible blocks */ + if (skip_all_frozen || skip_all_visible) + { + mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno, + &vmbuffer); + if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0) + continue; + if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) + continue; + }Here, do we want to test that in VM the all visible bit is set whereas
on the page it is not set? That can lead to a wrong result in an
index-only scan.If the caller has specified that the corruption check should skip over all-frozen or all-visible data, then we cannot load the page that the VM claims is all-frozen or all-visible without defeating the purpose of the caller having specified these options. Without loading the page, we cannot check the page's header bits.
When not skipping all-visible or all-frozen blocks, we might like to pin both the heap page and the visibility map page in order to compare the two, being careful not to hold a pin on the one while performing I/O on the other. See for example the logic in heap_delete(). But I'm not sure what guarantees the system makes about agreement between these two bits. Certainly, the VM should not claim a page is all visible when it isn't, but are we guaranteed that a page that is all-visible will always have its all-visible bit set? I don't know if (possibly transient) disagreement between these two bits constitutes corruption. Perhaps others following this thread can advise?
Right, the VM should not claim its all visible when it actually not.
But, IIRC, it is not guaranteed that if the page is all visible then
the VM must set the all visible flag.
4. One cosmetic comment
+ /* Skip non-varlena values, but update offset first */ .. + + /* Ok, we're looking at a varlena attribute. */Throughout the patch, I have noticed that some of your single-line
comments have "full stop" whereas other don't. Can we keep them
consistent?I try to use a "full stop" at the end of sentences, but not at the end of sentence fragments. To me, a "full stop" means that a sentence has reached its conclusion. I don't intentionally use one at the end of a fragment, unless the fragment precedes a full sentence, in which case the "full stop" is needed to separate the two. Of course, I may have violated my own rule in a few places, but before I submit a v10 patch with comment punctuation changes, perhaps we can agree on what the rule is? (This has probably been discussed before and agreed before. A link to the appropriate email thread would be sufficient.)
I can see in different files we have followed different rules. I am
fine as far as those are consistent across the file.
For example:
/* red, green, or blue */
/* set to pink */
/* set to blue. We have not closed the file. */
/* At this point, we have chosen the color. */The first comment is not a sentence, but the fourth is. The third comment is a fragment followed by a full sentence, and a "full stop" separates the two. As for the second comment, as I recall, verb phrases can be interpreted as a full sentence, as in "Close the door!", when they are meant as commands to the listener, but not otherwise. "set to pink" is not a command to the reader, but rather a description of what the code is doing at that point, so I think of it as a mere verb phrase and not a full sentence.
Making matters even more complicated, portions of the logic in verify_heapam were taken from sections of code that would ereport(), elog(), or Assert() on corruption, and when I took such code, I sometimes also took the comments in unmodified form. That means that my normal commenting rules don't apply, as I'm not the comment author in such cases.
I agree.
A few more comments.
1.
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
+
....
+
+ /*
+ * Must dereference indirect toast pointers before we can check them
+ */
+ if (VARATT_IS_EXTERNAL_INDIRECT(attr))
+ {
So first we are checking that if the varatt is not
VARATT_IS_EXTERNAL_ONDISK then we are returning, but just a
few statements down we are checking if the varatt is
VARATT_IS_EXTERNAL_INDIRECT, so seems like unreachable code.
2. Another point related to the same code is that toast_save_datum
always set the VARTAG_ONDISK tag. IIUC, we use
VARTAG_INDIRECT in reorderbuffer for generating temp tuple so ideally
while scanning the heap we should never get
VARATT_IS_EXTERNAL_INDIRECT tuple. Am I missing something here?
3.
+ if (VARATT_IS_1B_E(tp + ctx->offset))
+ {
+ uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but not marked as on disk"));
+ return true;
+ }
First, we are checking that if VARATT_IS_1B_E and if so we will check
whether its tag is VARTAG_ONDISK or not. But just after that, we will
get the actual attribute pointer and
Again check the same thing with 2 different checks. Can you explain
why this is necessary?
4.
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set"));
+ }
Maybe we can further expand these checks, like if the tuple is
HEAP_XMAX_LOCK_ONLY then HEAP_UPDATED or HEAP_HOT_UPDATED should not
be set.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Jul 4, 2020, at 6:04 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
A few more comments.
Your comments all pertain to function check_tuple_attribute(), which follows the logic of heap_deform_tuple() and detoast_external_attr(). The idea is that any error that could result in an assertion or crash in those functions should be checked carefully by check_tuple_attribute(), and checked *before* any such asserts or crashes might be triggered.
I obviously did not explain this thinking in the function comment. That is rectified in the v10 patch, attached.
1.
+ if (!VARATT_IS_EXTERNAL_ONDISK(attr)) + { + confess(ctx, + pstrdup("attribute is external but not marked as on disk")); + return true; + } + .... + + /* + * Must dereference indirect toast pointers before we can check them + */ + if (VARATT_IS_EXTERNAL_INDIRECT(attr)) + {So first we are checking that if the varatt is not
VARATT_IS_EXTERNAL_ONDISK then we are returning, but just a
few statements down we are checking if the varatt is
VARATT_IS_EXTERNAL_INDIRECT, so seems like unreachable code.
True. I've removed the VARATT_IS_EXTERNAL_INDIRECT check.
2. Another point related to the same code is that toast_save_datum
always set the VARTAG_ONDISK tag. IIUC, we use
VARTAG_INDIRECT in reorderbuffer for generating temp tuple so ideally
while scanning the heap we should never get
VARATT_IS_EXTERNAL_INDIRECT tuple. Am I missing something here?
I think you are right that we cannot get a VARATT_IS_EXTERNAL_INDIRECT tuple. That check is removed in v10.
3. + if (VARATT_IS_1B_E(tp + ctx->offset)) + { + uint8 va_tag = va_tag = VARTAG_EXTERNAL(tp + ctx->offset); + + if (va_tag != VARTAG_ONDISK) + { + confess(ctx, psprintf("unexpected TOAST vartag %u for " + "attribute #%u at t_hoff = %u, " + "offset = %u", + va_tag, ctx->attnum, + ctx->tuphdr->t_hoff, ctx->offset)); + return false; /* We can't know where the next attribute + * begins */ + } + }+ /* Skip values that are not external */ + if (!VARATT_IS_EXTERNAL(attr)) + return true; + + /* It is external, and we're looking at a page on disk */ + if (!VARATT_IS_EXTERNAL_ONDISK(attr)) + { + confess(ctx, + pstrdup("attribute is external but not marked as on disk")); + return true; + }First, we are checking that if VARATT_IS_1B_E and if so we will check
whether its tag is VARTAG_ONDISK or not. But just after that, we will
get the actual attribute pointer and
Again check the same thing with 2 different checks. Can you explain
why this is necessary?
The code that calls check_tuple_attribute() expects it to check the current attribute, but also to safely advance the ctx->offset value to the next attribute, as the caller is iterating over all attributes. The first check verifies that it is safe to call att_addlength_pointer, as we must not call att_addlength_pointer on a corrupt datum. The second check simply returns on non-external attributes, having advanced ctx->offset, there is nothing left to do. The third check is validating the external attribute, now that we know that it is external. You are right that the third check cannot fail, as the first check would already have confess()ed and returned false. The third check is removed in v10, attached.
4. + if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) && + (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED)) + { + confess(ctx, + psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set")); + } + if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) && + (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI)) + { + confess(ctx, + psprintf("HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set")); + }Maybe we can further expand these checks, like if the tuple is
HEAP_XMAX_LOCK_ONLY then HEAP_UPDATED or HEAP_HOT_UPDATED should not
be set.
Adding Asserts in src/backend/access/heap/hio.c against those two conditions, the regression tests fail in quite a lot of places where HEAP_XMAX_LOCK_ONLY and HEAP_UPDATED are both true. I'm leaving this idea out for v10, since it doesn't work, but in case you want to tell me what I did wrong, here are the changed I made on top of v10:
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index 00de10b7c9..76d23e141a 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -57,6 +57,10 @@ RelationPutHeapTuple(Relation relation,
(tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
(tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask & HEAP_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_HOT_UPDATED)));
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 49d3d5618a..60e4ad5be0 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -969,12 +969,19 @@ check_tuple(HeapCheckContext * ctx)
ctx->tuphdr->t_hoff));
fatal = true;
}
- if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
- (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ if (ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY)
{
- confess(ctx,
- psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ if (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED)
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ if (ctx->tuphdr->t_infomask & HEAP_UPDATED)
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_UPDATED both set"));
+ if (ctx->tuphdr->t_infomask2 & HEAP_HOT_UPDATED)
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_HOT_UPDATED both set"));
}
+
if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
(ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
{
The v10 patch without these ideas is here:
Attachments:
v10-0001-Adding-verify_heapam-and-pg_amcheck.patchapplication/octet-stream; name=v10-0001-Adding-verify_heapam-and-pg_amcheck.patch; x-unix-mode=0644Download
From b2c8e2f2cb6108e0215adc928d1274e64577cea3 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sat, 4 Jul 2020 09:35:57 -0700
Subject: [PATCH v10] Adding verify_heapam and pg_amcheck
Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it. This is
used by the new pg_amcheck command line tool for reporting back to
the caller.
---
contrib/Makefile | 1 +
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 54 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/amcheck.h | 5 +
contrib/amcheck/expected/check_btree.out | 31 +
contrib/amcheck/expected/check_heap.out | 58 +
.../amcheck/expected/disallowed_reltypes.out | 48 +
contrib/amcheck/sql/check_btree.sql | 10 +
contrib/amcheck/sql/check_heap.sql | 34 +
contrib/amcheck/sql/disallowed_reltypes.sql | 48 +
contrib/amcheck/t/skipping.pl | 101 ++
contrib/amcheck/verify_heapam.c | 1062 +++++++++++++++++
contrib/amcheck/verify_nbtree.c | 750 +++++++-----
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 894 ++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 55 +
contrib/pg_amcheck/t/003_check.pl | 85 ++
contrib/pg_amcheck/t/004_verify_heapam.pl | 431 +++++++
doc/src/sgml/amcheck.sgml | 106 +-
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pg_amcheck.sgml | 136 +++
src/backend/access/heap/hio.c | 11 +
26 files changed, 3640 insertions(+), 331 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/skipping.pl
create mode 100644 contrib/amcheck/verify_heapam.c
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 doc/src/sgml/pg_amcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..c21c27cbeb 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..27d38b2e86 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..2ab7d8b0d2
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,54 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(rel regclass,
+ on_error_stop boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
+
+--
+-- verify_btreeam()
+--
+CREATE FUNCTION verify_btreeam(rel regclass,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+CREATE FUNCTION verify_btreeam(rel regclass,
+ on_error_stop boolean,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_btreeam(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION verify_btreeam(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..74edfc2f65
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index f82f48d23b..c1acf238d7 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -21,6 +21,8 @@ SELECT bt_index_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_check
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_parent_check
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
-- to run this cluster-wide with a restricted account, and as tested
@@ -29,6 +31,7 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -42,23 +45,31 @@ SELECT bt_index_parent_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
ERROR: "bttest_a" is not an index
SELECT bt_index_parent_check('bttest_a');
ERROR: "bttest_a" is not an index
+SELECT * FROM verify_btreeam('bttest_a');
+ERROR: "bttest_a" is not an index
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
ERROR: could not open relation with OID 17
SELECT bt_index_parent_check(17);
ERROR: could not open relation with OID 17
+SELECT * FROM verify_btreeam(17);
+ERROR: could not open relation with OID 17
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
ERROR: only B-Tree indexes are supported as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
@@ -67,6 +78,11 @@ SELECT bt_index_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
bt_index_check
@@ -93,6 +109,11 @@ SELECT bt_index_parent_check('bttest_b_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -118,6 +139,11 @@ SELECT bt_index_check('bttest_multi_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
bt_index_parent_check
@@ -134,6 +160,11 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
-- checks
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..6d30ca8023
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,58 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..892ae89652
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_btree.sql b/contrib/amcheck/sql/check_btree.sql
index a1fef644cb..f5d0f8c1f6 100644
--- a/contrib/amcheck/sql/check_btree.sql
+++ b/contrib/amcheck/sql/check_btree.sql
@@ -24,6 +24,7 @@ CREATE ROLE regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx'::regclass);
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
@@ -33,27 +34,33 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
SELECT bt_index_parent_check('bttest_a');
+SELECT * FROM verify_btreeam('bttest_a');
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
SELECT bt_index_parent_check(17);
+SELECT * FROM verify_btreeam(17);
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
SELECT bt_index_parent_check('bttest_b_idx', true);
@@ -61,6 +68,7 @@ SELECT bt_index_parent_check('bttest_b_idx', true);
BEGIN;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_b_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -74,6 +82,7 @@ SELECT bt_index_check('bttest_a_idx', true);
-- normal check outside of xact for index with included columns
SELECT bt_index_check('bttest_multi_idx');
+SELECT * FROM verify_btreeam('bttest_multi_idx');
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
@@ -81,6 +90,7 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
TRUNCATE bttest_multi;
INSERT INTO bttest_multi SELECT i, i%2 FROM generate_series(1, 100000) as i;
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
+SELECT * FROM verify_btreeam('bttest_multi_idx');
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..5759d5526e
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,34 @@
+CREATE TABLE heaptest (a integer, b text);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all frozen',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := 'all frozen',
+ startblock := 5,
+ endblock := NULL);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := false,
+ skip := 'all visible',
+ startblock := NULL,
+ endblock := 10);
+SELECT * FROM verify_heapam(rel := 'heaptest',
+ on_error_stop := true,
+ skip := NULL,
+ startblock := 5,
+ endblock := 10);
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..fc90e6ca33
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
diff --git a/contrib/amcheck/t/skipping.pl b/contrib/amcheck/t/skipping.pl
new file mode 100644
index 0000000000..e716fc8c33
--- /dev/null
+++ b/contrib/amcheck/t/skipping.pl
@@ -0,0 +1,101 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 183;
+
+my ($node, $result);
+
+# Check various options are stable (don't abort) when running verify_heapam on
+# the test table. For uncorrupted tables, there isn't anything to check except
+# that it runs without crashing.
+sub check_all_options
+{
+ for my $stop (qw(NULL true false))
+ {
+ for my $skip ("NULL", "'all frozen'", "'all visible'")
+ {
+ for my $startblock (qw(NULL 5))
+ {
+ for my $endblock (qw(NULL 10))
+ {
+ my $check = "SELECT verify_heapam('test', $stop, $skip, " .
+ "$startblock, $endblock)";
+ $result = $node->safe_psql('postgres', "$check; SELECT 1");
+ is ($result, 1, "checked: $check");
+ }
+ }
+ }
+ }
+}
+
+# Stops the server and writes nulls in the first page of the table,
+# assuming page size is large enough for offset 1000..1016 to be
+# in the midst of the first page of data.
+sub corrupt_first_page
+{
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('test')));
+ my $relpath = "$pgdata/$rel";
+ $node->stop;
+
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 1000, 0);
+ syswrite($fh, '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', 16);
+ close($fh);
+
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Check empty table
+$node->safe_psql('postgres', q(
+ CREATE TABLE test (a integer);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+));
+check_all_options();
+
+# Check table with trivial data
+$node->safe_psql('postgres', q(INSERT INTO test VALUES (0)));
+check_all_options();
+
+# Check table with non-trivial data (more than a page worth) but
+# without any all frozen or all visible
+$node->safe_psql('postgres', q(
+INSERT INTO test SELECT generate_series(1,10000)));
+check_all_options();
+
+# Check table with all-visible data
+$node->safe_psql('postgres', q(VACUUM test));
+check_all_options();
+
+# Check table with all-frozen data
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+check_all_options();
+
+# Check table with corruption, no skipping
+corrupt_first_page();
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is($result, 't', 'corruption detected on first page');
+
+# Check table with corruption, skipping all visible blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all visible', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all visible first page');
+
+# Check table with corruption, skipping all frozen blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all frozen', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all frozen first page');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..49d3d5618a
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1062 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "amcheck.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam() execution.
+ */
+typedef struct HeapCheckContext
+{
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Values for returning tuples */
+ bool is_corrupt; /* have we encountered any corruption? */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext * ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx);
+static bool check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx);
+static bool check_tuple_attribute(HeapCheckContext * ctx);
+static void check_tuple(HeapCheckContext * ctx);
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+#define HEAPCHECK_RELATION_COLS 8
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ bool randomAccess;
+ HeapCheckContext ctx;
+ FullTransactionId nextFullXid;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool fatal = false;
+ bool on_error_stop;
+ bool skip_all_frozen = false;
+ bool skip_all_visible = false;
+ int64 startblock = -1;
+ int64 endblock = -1;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ relid = PG_GETARG_OID(0);
+ on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+ if (!PG_ARGISNULL(2))
+ {
+ const char *skip = PG_GETARG_CSTRING(2);
+
+ if (pg_strcasecmp(skip, "all visible") == 0)
+ {
+ skip_all_visible = true;
+ }
+ else if (pg_strcasecmp(skip, "all frozen") == 0)
+ {
+ skip_all_visible = true;
+ skip_all_frozen = true;
+ }
+ else
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all visible', 'all frozen', "
+ "or NULL")));
+ }
+ }
+ if (!PG_ARGISNULL(3))
+ startblock = PG_GETARG_INT64(3);
+ if (!PG_ARGISNULL(4))
+ endblock = PG_GETARG_INT64(4);
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those. This protection saves us from
+ * having to reacquire the locks and recheck those minimums for every
+ * tuple, which would be expensive.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toastrel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance. We keep a cached copy of the oldest valid xid that we may
+ * encounter in the table, which is relfrozenxid if valid, and oldestXid
+ * otherwise.
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ fatal = true;
+ }
+ else if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global "
+ "oldest valid xid %u ",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ fatal = true;
+ }
+
+ if (fatal)
+ {
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ if (ctx.toastrel)
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldestValidXid = ctx.relfrozenxid;
+
+ /* check all blocks of the relation */
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
+
+ for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+ PageHeader ph;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_all_frozen || skip_all_visible)
+ {
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_all_visible && (mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ if (skip_all_frozen && (mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+ ph = (PageHeader) ctx.page;
+
+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ ctx.tuphdr = NULL;
+ ctx.natts = 0;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = 0; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it redirects
+ * to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ uint16 redirect = ItemIdGetRedirect(ctx.itemid);
+ if (redirect <= SizeOfPageHeaderData || redirect >= ph->pd_lower)
+ {
+ confess(&ctx, psprintf(
+ "Invalid redirect line pointer offset %u out of bounds",
+ (unsigned) redirect));
+ continue;
+ }
+ if ((redirect - SizeOfPageHeaderData) % sizeof(uint16))
+ {
+ confess(&ctx, psprintf(
+ "Invalid redirect line pointer offset %u bad alignment",
+ (unsigned) redirect));
+ }
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, "
+ "or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap AM",
+ RelationGetRelationName(rel))));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext * ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext * ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldestValidXid, xid) &&
+ TransactionIdPrecedes(xid, ctx->nextKnownValidXid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->nextKnownValidXid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * check_tuphdr_xids
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!TransactionIdValidInRel(xvac, ctx))
+ {
+ confess(ctx, psprintf("tuple xvac = %u invalid", xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return false; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return false; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx,
+ pstrdup("heap tuple with XMAX_IS_MULTI is "
+ "neither LOCKED_ONLY nor has a "
+ "valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return false; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequencenumber is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx, pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ confess(ctx,
+ pstrdup("corrupt toast chunk va_header"));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx, psprintf("toast chunk sequence number %u "
+ "exceeds the end chunk sequence "
+ "number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx, psprintf("chunk size %u differs from "
+ "expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in
+ * the case of a toasted value, continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The Asserts
+ * present in those two functions are preserved here, but in cases where
+ * those two functions are a bit cavalier in their assumptions about data
+ * being correct, we add here additional checks. The presence of duplicate
+ * checks seems a reasonable price to pay for keeping this code tightly
+ * coupled with the code it protects.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext * ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask = ctx->tuphdr->t_infomask;
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
+ ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+ ctx->tuphdr->t_hoff, ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf("unexpected TOAST vartag %u for "
+ "attribute #%u at t_hoff = %u, "
+ "offset = %u",
+ va_tag, ctx->attnum,
+ ctx->tuphdr->t_hoff, ctx->offset));
+ return false; /* We can't know where the next attribute
+ * begins */
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx, pstrdup("attribute is external but tuple header "
+ "flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx, pstrdup("attribute is external but relation has "
+ "no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching
+ * va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast
+ * table, accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx, psprintf("final chunk number differs from "
+ "expected (%u vs. %u)",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx, pstrdup("toasted value missing from "
+ "toast table"));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext * ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relminmxid = %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmin = %u precedes relation "
+ "relfrozenxid = %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmin, ctx))
+ {
+ confess(ctx, psprintf("tuple xmin = %u is in the future",
+ xmin));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf("tuple xmax = %u precedes relation "
+ "relfrozenxid = %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmax, ctx))
+ {
+ confess(ctx, psprintf("tuple xmax = %u is in the future",
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("t_hoff < SizeofHeapTupleHeader (%u < %u)",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx, psprintf("t_hoff > lp_len (%u > %u)",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx, psprintf("t_hoff not max-aligned (%u)",
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set"));
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ confess(ctx,
+ psprintf("HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set"));
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the natts field or the infomask
+ * bit HEAP_HASNULL.
+ *
+ * If the tuple does not have nulls, check that no space has been
+ * reserved for the null bitmap.
+ */
+ if (infomask & HEAP_HASNULL)
+ {
+ if (SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) > ctx->tuphdr->t_hoff)
+ {
+ confess(ctx, psprintf("SizeofHeapTupleHeader + "
+ "BITMAPLEN(natts) > t_hoff "
+ "(%u + %u > %u)",
+ (unsigned) SizeofHeapTupleHeader,
+ BITMAPLEN(ctx->natts),
+ ctx->tuphdr->t_hoff));
+ fatal = true;
+ }
+ }
+ else if (MAXALIGN(ctx->tuphdr->t_hoff) != MAXALIGN(SizeofHeapTupleHeader))
+ {
+ confess(ctx,
+ psprintf("t_hoff = %u in tuple without nulls (expected %u)",
+ (unsigned) MAXALIGN(ctx->tuphdr->t_hoff),
+ (unsigned) MAXALIGN(SizeofHeapTupleHeader)));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!check_tuphdr_xids(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx,
+ psprintf("relation natts < tuple natts (%u < %u)",
+ RelationGetDescr(ctx->rel)->natts,
+ ctx->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+}
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index e4d501a85d..bf68b554a8 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -32,16 +32,22 @@
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "commands/tablecmds.h"
+#include "funcapi.h"
#include "lib/bloomfilter.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
#include "storage/smgr.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
-
+#include "amcheck.h"
PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(bt_index_check);
+PG_FUNCTION_INFO_V1(bt_index_parent_check);
+PG_FUNCTION_INFO_V1(verify_btreeam);
+
/*
* A B-Tree cannot possibly have this many levels, since there must be one
* block per level, which is bound by the range of BlockNumber:
@@ -50,6 +56,20 @@ PG_MODULE_MAGIC;
#define BTreeTupleGetNKeyAtts(itup, rel) \
Min(IndexRelationGetNumberOfKeyAttributes(rel), BTreeTupleGetNAtts(itup, rel))
+/*
+ * Context for use within verify_btreeam()
+ */
+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;
+
+#define CONTINUE_CHECKING(ctx) \
+ (ctx == NULL || !((ctx)->is_corrupt && (ctx)->on_error_stop))
+
/*
* State associated with verifying a B-Tree index
*
@@ -116,6 +136,9 @@ typedef struct BtreeCheckState
bloom_filter *filter;
/* Debug counter */
int64 heaptuplespresent;
+
+ /* Error reporting context */
+ BtreeCheckContext *ctx;
} BtreeCheckState;
/*
@@ -133,16 +156,14 @@ typedef struct BtreeLevel
bool istruerootlevel;
} BtreeLevel;
-PG_FUNCTION_INFO_V1(bt_index_check);
-PG_FUNCTION_INFO_V1(bt_index_parent_check);
-
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);
static inline void btree_index_checkable(Relation rel);
static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
- bool rootdescend);
+ bool rootdescend, BtreeCheckContext * ctx);
static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
BtreeLevel level);
static void bt_target_page_check(BtreeCheckState *state);
@@ -185,6 +206,26 @@ static inline ItemPointer BTreeTupleGetHeapTIDCareful(BtreeCheckState *state,
IndexTuple itup, bool nonpivot);
static inline ItemPointer BTreeTupleGetPointsToTID(IndexTuple itup);
+static TupleDesc verify_btreeam_tupdesc(void);
+static void confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg);
+
+/*
+ * Macro for either calling ereport(...) or confess(...) depending on whether
+ * a context for returning the error message exists. Prior to version 1.3,
+ * all functions reported any detected corruption via ereport, but starting in
+ * 1.3, the new function verify_btreeam reports detected corruption back to
+ * the caller as a set of rows, and pre-existing functions continue to report
+ * corruption via ereport. This macro allows the shared implementation to
+ * to do the right thing depending on context.
+ */
+#define econfess(ctx, blkno, code, ...) \
+ do { \
+ if (ctx) \
+ confess(ctx, blkno, psprintf(__VA_ARGS__)); \
+ else \
+ ereport(ERROR, (errcode(code), errmsg(__VA_ARGS__))); \
+ } while(0)
+
/*
* bt_index_check(index regclass, heapallindexed boolean)
*
@@ -203,7 +244,7 @@ bt_index_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 2)
heapallindexed = PG_GETARG_BOOL(1);
- bt_index_check_internal(indrelid, false, heapallindexed, false);
+ bt_index_check_internal(indrelid, false, heapallindexed, false, NULL);
PG_RETURN_VOID();
}
@@ -229,17 +270,66 @@ bt_index_parent_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 3)
rootdescend = PG_GETARG_BOOL(2);
- bt_index_check_internal(indrelid, true, heapallindexed, rootdescend);
+ bt_index_check_internal(indrelid, true, heapallindexed, rootdescend, NULL);
PG_RETURN_VOID();
}
+Datum
+verify_btreeam(PG_FUNCTION_ARGS)
+{
+#define BTREECHECK_RELATION_COLS 2
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ BtreeCheckContext ctx;
+ bool randomAccess;
+ Oid indrelid;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ indrelid = PG_GETARG_OID(0);
+
+ memset(&ctx, 0, sizeof(BtreeCheckContext));
+
+ ctx.on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_btreeam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ bt_index_check_internal(indrelid, true, true, true, &ctx);
+
+ PG_RETURN_NULL();
+}
+
/*
* Helper for bt_index_[parent_]check, coordinating the bulk of the work.
*/
static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
- bool rootdescend)
+ bool rootdescend, BtreeCheckContext * ctx)
{
Oid heapid;
Relation indrel;
@@ -300,15 +390,16 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
RelationOpenSmgr(indrel);
if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" lacks a main relation fork",
- RelationGetRelationName(indrel))));
+ econfess(ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" lacks a main relation fork",
+ RelationGetRelationName(indrel));
/* Check index, possibly against table it is an index on */
- _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
- bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
- heapallindexed, rootdescend);
+ if (CONTINUE_CHECKING(ctx))
+ _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
+ if (CONTINUE_CHECKING(ctx))
+ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
+ heapallindexed, rootdescend, ctx);
}
/*
@@ -402,7 +493,8 @@ btree_index_mainfork_expected(Relation rel)
*/
static void
bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
- bool readonly, bool heapallindexed, bool rootdescend)
+ bool readonly, bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx)
{
BtreeCheckState *state;
Page metapage;
@@ -434,6 +526,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->readonly = readonly;
state->heapallindexed = heapallindexed;
state->rootdescend = rootdescend;
+ state->ctx = ctx;
if (state->heapallindexed)
{
@@ -535,7 +628,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current.level = metad->btm_level;
current.leftmost = metad->btm_root;
current.istruerootlevel = true;
- while (current.leftmost != P_NONE)
+ while (CONTINUE_CHECKING(state->ctx) && current.leftmost != P_NONE)
{
/*
* Verify this level, and get left most page for next level down, if
@@ -544,10 +637,9 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current = bt_check_level_from_leftmost(state, current);
if (current.leftmost == InvalidBlockNumber)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" has no valid pages on level below %u or first level",
- RelationGetRelationName(rel), previouslevel)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" has no valid pages on level below %u or first level",
+ RelationGetRelationName(rel), previouslevel);
previouslevel = current.level;
}
@@ -555,7 +647,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
/*
* * Check whether heap contains unindexed/malformed tuples *
*/
- if (state->heapallindexed)
+ if (CONTINUE_CHECKING(state->ctx) && state->heapallindexed)
{
IndexInfo *indexinfo = BuildIndexInfo(state->rel);
TableScanDesc scan;
@@ -691,18 +783,16 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* checked.
*/
if (state->readonly && P_ISDELETED(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink or sibling link points to deleted block in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "downlink or sibling link points to deleted block in index \"%s\" "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
if (P_RIGHTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u fell off the end of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u fell off the end of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
else
ereport(DEBUG1,
(errcode(ERRCODE_NO_DATA),
@@ -722,16 +812,14 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
if (state->readonly)
{
if (!P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not leftmost in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not leftmost in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
if (level.istruerootlevel && !P_ISROOT(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not true root in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not true root in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
}
/*
@@ -780,21 +868,19 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* so sibling pointers should always be in mutual agreement
*/
if (state->readonly && opaque->btpo_prev != leftcurrent)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("left link/right link pair in index \"%s\" not in agreement",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "left link/right link pair in index \"%s\" not in agreement "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
/* Check level, which must be valid for non-ignorable page */
if (level.level != opaque->btpo.level)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leftmost down link for level points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- current, level.level, opaque->btpo.level)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "leftmost down link for level points to block in index \"%s\" whose level is not one level down "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ current, level.level, opaque->btpo.level);
/* Verify invariants for page */
bt_target_page_check(state);
@@ -803,10 +889,9 @@ nextpage:
/* Try to detect circular links */
if (current == leftcurrent || current == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
leftcurrent = current;
current = opaque->btpo_next;
@@ -850,7 +935,7 @@ nextpage:
/* Free page and associated memory for this iteration */
MemoryContextReset(state->targetcontext);
}
- while (current != P_NONE);
+ while (CONTINUE_CHECKING(state->ctx) && current != P_NONE);
if (state->lowkey)
{
@@ -930,16 +1015,15 @@ bt_target_page_check(BtreeCheckState *state)
P_HIKEY))
{
itup = (IndexTuple) PageGetItem(state->target, itemid);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of high key index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index block=%u natts=%u block type=%s page lsn=%X/%X.",
- state->targetblock,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of high key index tuple attributes in index \"%s\" "
+ "(Index block=%u natts=%u block type=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -949,7 +1033,7 @@ bt_target_page_check(BtreeCheckState *state)
* real item (if any).
*/
for (offset = P_FIRSTDATAKEY(topaque);
- offset <= max;
+ offset <= max && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
ItemId itemid;
@@ -973,16 +1057,15 @@ bt_target_page_check(BtreeCheckState *state)
* frequently, and is surprisingly tolerant of corrupt lp_len fields.
*/
if (tupsize != ItemIdGetLength(itemid))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index tuple size does not equal lp_len in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X.",
- state->targetblock, offset,
- tupsize, ItemIdGetLength(itemid),
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn),
- errhint("This could be a torn page problem.")));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "index tuple size does not equal lp_len in index \"%s\" "
+ "(Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X) "
+ "(This could be a torn page problem)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ tupsize, ItemIdGetLength(itemid),
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check the number of index tuple attributes */
if (!_bt_check_natts(state->rel, state->heapkeyspace, state->target,
@@ -998,17 +1081,16 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X.",
- itid,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of index tuple attributes in index \"%s\" "
+ "(Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1049,14 +1131,13 @@ bt_target_page_check(BtreeCheckState *state)
htid = psprintf("(%u,%u)", ItemPointerGetBlockNumber(tid),
ItemPointerGetOffsetNumber(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("could not find tuple using search from root page in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to heap tid=%s page lsn=%X/%X.",
- itid, htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "could not find tuple using search from root page in index \"%s\" "
+ "(Index tid=%s points to heap tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1079,14 +1160,13 @@ bt_target_page_check(BtreeCheckState *state)
{
char *itid = psprintf("(%u,%u)", state->targetblock, offset);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("posting list contains misplaced TID in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s posting list offset=%d page lsn=%X/%X.",
- itid, i,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "posting list contains misplaced TID in index \"%s\" "
+ "(Index tid=%s posting list offset=%d page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, i,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
ItemPointerCopy(current, &last);
@@ -1134,16 +1214,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index row size %zu exceeds maximum for index \"%s\"",
- tupsize, RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index row size %zu exceeds maximum for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ tupsize, RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Fingerprint leaf page tuples (those that point to the heap) */
@@ -1242,16 +1321,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("high key invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "high key invariant violated for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Reset, in case scantid was set to (itup) posting tuple's max TID */
skey->scantid = scantid;
@@ -1289,21 +1367,20 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Lower index tid=%s (points to %s tid=%s) "
- "higher index tid=%s (points to %s tid=%s) "
- "page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- nitid,
- P_ISLEAF(topaque) ? "heap" : "index",
- nhtid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "item order invariant violated for index \"%s\" "
+ "(Lower index tid=%s (points to %s tid=%s) "
+ "higher index tid=%s (points to %s tid=%s) "
+ "page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ nitid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ nhtid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1354,14 +1431,13 @@ bt_target_page_check(BtreeCheckState *state)
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("cross page item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Last item on page tid=(%u,%u) page lsn=%X/%X.",
- state->targetblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "cross page item order invariant violated for index \"%s\" "
+ "(Last item on page tid=(%u,%u) page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1386,7 +1462,8 @@ bt_target_page_check(BtreeCheckState *state)
* right of the child page pointer to by our rightmost downlink. And they
* might have missing downlinks. This final call checks for them.
*/
- if (!P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
+ if (CONTINUE_CHECKING(state->ctx) &&
+ !P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
{
bt_child_highkey_check(state, InvalidOffsetNumber,
NULL, topaque->btpo.level);
@@ -1708,7 +1785,7 @@ bt_child_highkey_check(BtreeCheckState *state,
}
/* Move to the right on the child level */
- while (true)
+ while (CONTINUE_CHECKING(state->ctx))
{
/*
* Did we traverse the whole tree level and this is check for pages to
@@ -1723,11 +1800,10 @@ bt_child_highkey_check(BtreeCheckState *state,
/* Did we traverse the whole tree level and don't find next downlink? */
if (blkno == P_NONE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't traverse from downlink %u to downlink %u of index \"%s\"",
- state->prevrightlink, downlink,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "can't traverse from downlink %u to downlink %u of index \"%s\"",
+ state->prevrightlink, downlink,
+ RelationGetRelationName(state->rel));
/* Load page contents */
if (blkno == downlink && loaded_child)
@@ -1739,30 +1815,27 @@ bt_child_highkey_check(BtreeCheckState *state,
/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && !P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("the first child of leftmost target page is not leftmost of its level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "the first child of leftmost target page is not leftmost of its level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check level for non-ignorable page */
if (!P_IGNORE(opaque) && opaque->btpo.level != target_level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block found while following rightlinks from child of index \"%s\" has invalid level",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, target_level - 1, opaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "block found while following rightlinks from child of index \"%s\" has invalid level "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, target_level - 1, opaque->btpo.level);
/* Try to detect circular links */
if ((!first && blkno == state->prevrightlink) || blkno == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- blkno, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ blkno, RelationGetRelationName(state->rel));
if (blkno != downlink && !P_IGNORE(opaque))
{
@@ -1825,14 +1898,13 @@ bt_child_highkey_check(BtreeCheckState *state,
if (pivotkey_offset > PageGetMaxOffsetNumber(state->target))
{
if (P_RIGHTMOST(topaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("child high key is greater than rightmost pivot key on target level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "child high key is greater than rightmost pivot key on target level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
pivotkey_offset = P_HIKEY;
}
itemid = PageGetItemIdCareful(state, state->targetblock,
@@ -1856,27 +1928,25 @@ bt_child_highkey_check(BtreeCheckState *state,
* page.
*/
if (!state->lowkey)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't find left sibling high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "can't find left sibling high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
itup = state->lowkey;
}
if (!bt_pivot_tuple_identical(highkey, itup))
{
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("mismatch between parent key and child high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "mismatch between parent key and child high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -2014,17 +2084,16 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* to test.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink to deleted page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child block=%u parent page lsn=%X/%X.",
- state->targetblock, childblock,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted page found in index \"%s\" "
+ "(Parent block=%u child block=%u parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
for (offset = P_FIRSTDATAKEY(copaque);
- offset <= maxoffset;
+ offset <= maxoffset && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
/*
@@ -2056,14 +2125,13 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
if (!invariant_l_nontarget_offset(state, targetkey, childblock, child,
offset))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("down-link lower bound invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X.",
- state->targetblock, childblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "down-link lower bound invariant violated for index \"%s\" "
+ "(Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
pfree(child);
@@ -2150,14 +2218,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* inconsistencies anywhere else.
*/
if (P_ISLEAF(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leaf index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u page lsn=%X/%X.",
- blkno,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "leaf index block lacks downlink in index \"%s\" "
+ "(Block=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/* Descend from the given page, which is an internal page */
elog(DEBUG1, "checking for interrupted multi-level deletion due to missing downlink in index \"%s\"",
@@ -2167,7 +2234,7 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
itemid = PageGetItemIdCareful(state, blkno, page, P_FIRSTDATAKEY(opaque));
itup = (IndexTuple) PageGetItem(page, itemid);
childblk = BTreeTupleGetDownLink(itup);
- for (;;)
+ while (CONTINUE_CHECKING(state->ctx))
{
CHECK_FOR_INTERRUPTS();
@@ -2179,13 +2246,12 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
/* Do an extra sanity check in passing on internal pages */
if (copaque->btpo.level != level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, childblk,
- level - 1, copaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink points to block in index \"%s\" whose level is not one level down "
+ "(Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ level - 1, copaque->btpo.level);
level = copaque->btpo.level;
itemid = PageGetItemIdCareful(state, childblk, child,
@@ -2217,14 +2283,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* parent/ancestor page) lacked a downlink is incidental.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink to deleted leaf page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X.",
- blkno, childblk,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted leaf page found in index \"%s\" "
+ "(Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/*
* Iff leaf page is half-dead, its high key top parent link should point
@@ -2244,14 +2309,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u level=%u page lsn=%X/%X.",
- blkno, opaque->btpo.level,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "internal index block lacks downlink in index \"%s\" "
+ "(Block=%u level=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, opaque->btpo.level,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
}
/*
@@ -2327,16 +2391,12 @@ bt_tuple_present_callback(Relation index, ItemPointer tid, Datum *values,
/* Probe Bloom filter -- tuple should be present */
if (bloom_lacks_element(state->filter, (unsigned char *) norm,
IndexTupleSize(norm)))
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->heaprel),
- RelationGetRelationName(state->rel)),
- !state->readonly
- ? errhint("Retrying verification using the function bt_index_parent_check() might provide a more specific error.")
- : 0));
+ econfess(state->ctx, ItemPointerGetBlockNumber(&(itup->t_tid)), ERRCODE_DATA_CORRUPTED,
+ "heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->heaprel),
+ RelationGetRelationName(state->rel));
state->heaptuplespresent++;
pfree(itup);
@@ -2395,7 +2455,7 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
if (!IndexTupleHasVarwidths(itup))
return itup;
- for (i = 0; i < tupleDescriptor->natts; i++)
+ for (i = 0; CONTINUE_CHECKING(state->ctx) && i < tupleDescriptor->natts; i++)
{
Form_pg_attribute att;
@@ -2415,12 +2475,11 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
* should never be encountered here
*/
if (VARATT_IS_EXTERNAL(DatumGetPointer(normalized[i])))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->rel));
else if (VARATT_IS_COMPRESSED(DatumGetPointer(normalized[i])))
{
formnewtup = true;
@@ -2810,10 +2869,9 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISMETA(opaque) && blocknum != BTREE_METAPAGE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid meta page found at block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid meta page found at block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/* Check page from block that ought to be meta page */
if (blocknum == BTREE_METAPAGE)
@@ -2822,20 +2880,18 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
if (!P_ISMETA(opaque) ||
metad->btm_magic != BTREE_MAGIC)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" meta page is corrupt",
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" meta page is corrupt",
+ RelationGetRelationName(state->rel));
if (metad->btm_version < BTREE_MIN_VERSION ||
metad->btm_version > BTREE_VERSION)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("version mismatch in index \"%s\": file version %d, "
- "current version %d, minimum supported version %d",
- RelationGetRelationName(state->rel),
- metad->btm_version, BTREE_VERSION,
- BTREE_MIN_VERSION)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "version mismatch in index \"%s\": file version %d, "
+ "current version %d, minimum supported version %d",
+ RelationGetRelationName(state->rel),
+ metad->btm_version, BTREE_VERSION,
+ BTREE_MIN_VERSION);
/* Finished with metapage checks */
return page;
@@ -2846,17 +2902,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* page level
*/
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && opaque->btpo.level != 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid leaf page level %u for block %u in index \"%s\"",
- opaque->btpo.level, blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "invalid leaf page level %u for block %u in index \"%s\"",
+ opaque->btpo.level, blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) &&
opaque->btpo.level == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid internal page level 0 for block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid internal page level 0 for block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/*
* Sanity checks for number of items on page.
@@ -2910,17 +2964,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* Internal pages should never have garbage items, either.
*/
if (!P_ISLEAF(opaque) && P_ISHALFDEAD(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" is half-dead",
- blocknum, RelationGetRelationName(state->rel)),
- errhint("This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it.")));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" is half-dead "
+ "(This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it)",
+ blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && P_HAS_GARBAGE(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" has garbage items",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" has garbage items",
+ blocknum, RelationGetRelationName(state->rel));
return page;
}
@@ -2971,14 +3023,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
BLCKSZ - sizeof(BTPageOpaqueData))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("line pointer points past end of tuple space in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "line pointer points past end of tuple space in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
/*
* Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree
@@ -2987,14 +3038,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
*/
if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "invalid line pointer storage in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
return itemid;
}
@@ -3016,26 +3066,23 @@ BTreeTupleGetHeapTIDCareful(BtreeCheckState *state, IndexTuple itup,
*/
Assert(state->heapkeyspace);
if (BTreeTupleIsPivot(itup) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
if (!BTreeTupleIsPivot(itup) && !nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
htid = BTreeTupleGetHeapTID(itup);
if (!ItemPointerIsValid(htid) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
return htid;
}
@@ -3066,3 +3113,52 @@ BTreeTupleGetPointsToTID(IndexTuple itup)
/* Pivot tuple returns TID with downlink block (heapkeyspace variant) */
return &itup->t_tid;
}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_btreeam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(BTREECHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == BTREECHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * confess
+ *
+ * Return a message about index corruption
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg)
+{
+ Datum values[BTREECHECK_RELATION_COLS];
+ bool nulls[BTREECHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(blkno);
+ nulls[0] = (blkno == InvalidBlockNumber);
+ values[1] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..6b57ccf69c
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,894 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --schema=PATTERN check all relations in the specified schema(s)",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified "
+ "schema(s)",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -i, --check-indexes check associated btree indexes, if any",
+ " -I, --exclude-indexes do NOT check associated btree indexes",
+ " --strict-names require table and/or schema include patterns "
+ "to match at least one entity each",
+ " -b, --startblock check relations beginning at the given "
+ "starting block number",
+ " -e, --endblock check relations only up to the given ending "
+ "block number",
+ " -f, --skip-all-frozen do not check blocks marked as all frozen",
+ " -v, --skip-all-visible do not check blocks marked as all visible",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen "
+ "automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes for tables */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of OIDs of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+
+static void check_tables(SimpleOidList *checklist);
+static void check_table(Oid tbloid);
+static void check_indexes(Oid tbloid);
+static void check_index(Oid tbloid, Oid idxoid);
+
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions * connOpts);
+static void usage(void);
+static void showVersion(void);
+
+static void NoticeProcessor(void *arg, const char *message);
+
+static void expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_table_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(SimpleOidList *include_nsp,
+ SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl,
+ SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+
+static void die_on_query_failure(const char *query);
+static void ExecuteSqlStatement(const char *query);
+static PGresult *ExecuteSqlQuery(const char *query, ExecStatusType status);
+static PGresult *ExecuteSqlQueryForSingleRow(const char *query);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+#define NOPAGER 0
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /* Expand schema selection patterns into OID lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+ expand_schema_name_patterns(&schema_exclude_patterns,
+ &schema_exclude_oids,
+ false);
+ /* non-matching exclusion patterns aren't an error */
+
+ /* Expand table selection patterns into OID lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+ expand_table_name_patterns(&table_exclude_patterns,
+ &table_exclude_oids,
+ false);
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ check_table(cell->val);
+ if (settings.check_indexes)
+ check_indexes(cell->val);
+ }
+}
+
+static void
+check_table(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+ char *skip;
+ const char *stop;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (settings.startblock == NULL)
+ settings.startblock = pg_strdup("NULL");
+ if (settings.endblock == NULL)
+ settings.endblock = pg_strdup("NULL");
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("NULL");
+ stop = (settings.on_error_stop) ? "true" : "false";
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(relname=%s,blkno=%s,offnum=%s,lp_off=%s,lp_flags=%s,"
+ "lp_len=%s,attnum=%s,chunk=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* lp_off */
+ PQgetvalue(res, i, 4), /* lp_flags */
+ PQgetvalue(res, i, 5), /* lp_len */
+ PQgetvalue(res, i, 6), /* attnum */
+ PQgetvalue(res, i, 7), /* chunk */
+ PQgetvalue(res, i, 8)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_indexes(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ check_index(tbloid, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_index(Oid tbloid, Oid idxoid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT ct.relname, ci.relname, blkno, msg"
+ "\nFROM verify_btreeam(%u,%s),"
+ "\n pg_catalog.pg_class ci,"
+ "\n pg_catalog.pg_class ct"
+ "\nWHERE ci.oid = %u"
+ "\n AND ct.oid = %u",
+ idxoid,
+ settings.on_error_stop ? "true" : "false",
+ idxoid, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(table=%s,index=%s,blkno=%s)"
+ "\n%s\n",
+ PQgetvalue(res, i, 0), /* table relname */
+ PQgetvalue(res, i, 1), /* index relname */
+ PQgetvalue(res, i, 2), /* index blkno */
+ PQgetvalue(res, i, 3)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions * connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"startblock", required_argument, NULL, 'b'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"host", required_argument, NULL, 'h'},
+ {"check-indexes", no_argument, NULL, 'i'},
+ {"exclude-indexes", no_argument, NULL, 'I'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"schema", required_argument, NULL, 'n'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"port", required_argument, NULL, 'p'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"help", optional_argument, NULL, '?'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "b:d:e:fh:iIn:N:op:st:T:U:vVwW?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ settings.check_indexes = true;
+ break;
+ case 'I':
+ settings.check_indexes = false;
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ FILE *output;
+ int lines;
+ int lineno;
+
+ for (lines = 0; usage_text[lines]; lines++)
+ ;
+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+
+ ClosePager(output);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+static void
+expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list. See also expand_dbname_patterns()
+ * in pg_dumpall.c
+ */
+static void
+expand_table_name_patterns(SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(settings.db, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+static void
+append_csv_oids(PQExpBuffer query, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(query, "%s%u", comma, cell->val);
+}
+
+static bool
+append_filter(PQExpBuffer query, const char *lval, const char *operator,
+ const SimpleOidList *oids)
+{
+ if (!oids->head)
+ return false;
+ appendPQExpBuffer(query, "\nAND %s %s ANY(array[\n", lval, operator);
+ append_csv_oids(query, oids);
+ appendPQExpBuffer(query, "\n])");
+ return true;
+}
+
+static void
+get_table_check_list(SimpleOidList *include_nsp, SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl, SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.=)", include_nsp);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.!=)", exclude_nsp);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.=)", include_tbl);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.!=)", exclude_tbl);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+static void
+ExecuteSqlStatement(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ die_on_query_failure(query);
+ PQclear(res);
+}
+
+static PGresult *
+ExecuteSqlQuery(const char *query, ExecStatusType status)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != status)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute an SQL query and verify that we got exactly one row back.
+ */
+static PGresult *
+ExecuteSqlQueryForSingleRow(const char *query)
+{
+ PGresult *res;
+ int ntups;
+
+ res = ExecuteSqlQuery(query, PGRES_TUPLES_OK);
+
+ /* Expecting a single result only */
+ ntups = PQntuples(res);
+ if (ntups != 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+
+ return res;
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..c63ba4452e
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,55 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..de3ce54e8e
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,85 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+CREATE SCHEMA s1;
+CREATE SCHEMA s2;
+CREATE SCHEMA s3;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE TABLE s1.t3 (a TEXT);
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE TABLE s2.t3 (a TEXT);
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE TABLE s3.t3 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+CREATE INDEX i3 ON s1.t3(a);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+CREATE INDEX i3 ON s2.t3(a);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+CREATE INDEX i3 ON s3.t3(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas and tables implicitly');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-i', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+;$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1'
+ ],
+ 'pg_amcheck all tables in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1'
+ ],
+ 'pg_amcheck all tables not in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-i', '-n', 's*', '-t', 't1'
+ ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..08bce6e68e
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,431 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 42;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 14;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '--check-indexes', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ elsif ($tupidx == 12)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+ }
+ elsif ($tupidx == 13)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin = 3 precedes relation relfrozenxid = $relfrozenxid
+0|2|8064|1|58|||tuple xmin = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|3|8000|1|58|||tuple xmax = 4026531839 precedes relation relfrozenxid = $relfrozenxid
+0|4|7936|1|58|||t_hoff > lp_len (152 > 58)
+0|4|7936|1|58|||t_hoff = 152 in tuple without nulls (expected 24)
+0|5|7872|1|58|||t_hoff not max-aligned (27)
+0|5|7872|1|58|||t_hoff = 32 in tuple without nulls (expected 24)
+0|6|7808|1|58|||t_hoff < SizeofHeapTupleHeader (16 < 23)
+0|6|7808|1|58|||t_hoff = 16 in tuple without nulls (expected 24)
+0|7|7744|1|58|||t_hoff < SizeofHeapTupleHeader (21 < 23)
+0|7|7744|1|58|||t_hoff not max-aligned (21)
+0|8|7680|1|58|||relation natts < tuple natts (3 < 2047)
+0|9|7616|1|58|||SizeofHeapTupleHeader + BITMAPLEN(natts) > t_hoff (23 + 256 > 24)
+0|10|7552|1|58|||relation natts < tuple natts (3 < 67)
+0|11|7488|1|58|2||t_hoff + offset > lp_len (24 + 416847976 > 58)
+0|12|7424|1|58|2|0|final chunk number differs from expected (0 vs. 6)
+0|12|7424|1|58|2|0|toasted value missing from toast table
+0|13|7360|1|58|||HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set
+0|14|7296|1|58|||tuple xmax = 0 precedes relation relminmxid = 1
+0|14|7296|1|58|||HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,lp_off=\d*,lp_flags=\d*,lp_len=\d*,attnum=\d*,chunk=\d*/,
+
+ # individual detected corruptions
+ qr/tuple xmin = \d+ precedes relation relfrozenxid = \d+/,
+ qr/tuple xmax = \d+ precedes relation relfrozenxid = \d+/,
+ qr/t_hoff > lp_len/,
+ qr/t_hoff not max-aligned/,
+ qr/t_hoff < SizeofHeapTupleHeader/,
+ qr/relation natts < tuple natts/,
+ qr/SizeofHeapTupleHeader \+ BITMAPLEN\(natts\) > t_hoff/,
+ qr/t_hoff \+ offset > lp_le/,
+ qr/final chunk number differs from expected/,
+ qr/toasted value missing from toast table/,
+ qr/HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED both set/,
+ qr/HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI both set/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index 75518a7820..cc36d92f72 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -69,7 +69,7 @@ AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
@@ -165,6 +165,110 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if on_error_stop is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..f606e42fb9 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pg_amcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..10e1ca9663 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pg_amcheck SYSTEM "pg_amcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pg_amcheck.sgml b/doc/src/sgml/pg_amcheck.sgml
new file mode 100644
index 0000000000..a0b9c9d19b
--- /dev/null
+++ b/doc/src/sgml/pg_amcheck.sgml
@@ -0,0 +1,136 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "corrupted",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck -i test
+(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 3289393 is in the future
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmax = 0 precedes relation relminmxid = 1
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 12593 is in the future
+</screen>
+
+ <para>
+ .... many pages of output removed for brevity ....
+ </para>
+
+<screen>
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+tuple xmin = 305 precedes relation relfrozenxid = 487
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff > lp_len (54 > 34)
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff not max-aligned (54)
+</screen>
+
+ <para>
+ Each detected corruption is reported on two lines, the first line shows the
+ location and the second line shows a message describing the problem.
+ </para>
+ </sect2>
+</sect1>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..00de10b7c9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you decide to disable one or more of these
+ * assertions, make corresponding changes to contrib/amcheck.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
--
2.21.1 (Apple Git-122.3)
On Mon, Jul 6, 2020 at 2:06 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
The v10 patch without these ideas is here:
Along the lines of what Alvaro was saying before, I think this
definitely needs to be split up into a series of patches. The commit
message for v10 describes it doing three pretty separate things, and I
think that argues for splitting it into a series of three patches. I'd
argue for this ordering:
0001 Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it. This is
used by the new pg_amcheck command line tool for reporting back to
the caller.
0002 Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.
0003 Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
It's too hard to review things like this when it's all mixed together.
+++ b/contrib/amcheck/t/skipping.pl
The name of this file is inconsistent with the tree's usual
convention, which is all stuff like 001_whatever.pl, except for
src/test/modules/brin, which randomly decided to use two digits
instead of three. There's no precedent for a test file with no leading
numeric digits. Also, what does "skipping" even have to do with what
the test is checking? Maybe it's intended to refer to the new error
handling "skipping" the actual error in favor of just reporting it
without stopping, but that's not really what the word "skipping"
normally means. Finally, it seems a bit over-engineered: do we really
need 183 test cases to check that detecting a problem doesn't lead to
an abort? Like, if that's the purpose of the test, I'd expect it to
check one corrupt relation and one non-corrupt relation, each with and
without the no-error behavior. And that's about it. Or maybe it's
talking about skipping pages during the checks, because those pages
are all-visible or all-frozen? It's not very clear to me what's going
on here.
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
Please add explanatory comments indicating what these are intended to
mean. For most of the the structure members, the brief comments
already present seem sufficient; but here, more explanation looks
necessary and less is provided. The "Values for returning tuples"
could possibly also use some more detail.
+#define HEAPCHECK_RELATION_COLS 8
I think this should really be at the top of the file someplace.
Sometimes people have adopted this style when the #define is only used
within the function that contains it, but that's not the case here.
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all visible', 'all frozen', "
+ "or NULL")));
I think it would be better if we had three string values selecting the
different behaviors, and made the parameter NOT NULL but with a
default. It seems like that would be easier to understand. Right now,
I can tell that my options for what to skip are "all visible", "all
frozen", and, uh, some other thing that I don't know what it is. I'm
gonna guess the third option is to skip nothing, but it seems best to
make that explicit. Also, should we maybe consider spelling this
'all-visible' and 'all-frozen' with dashes, instead of using spaces?
Spaces in an option value seems a little icky to me somehow.
+ int64 startblock = -1;
+ int64 endblock = -1;
...
+ if (!PG_ARGISNULL(3))
+ startblock = PG_GETARG_INT64(3);
+ if (!PG_ARGISNULL(4))
+ endblock = PG_GETARG_INT64(4);
...
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
+
+ for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)
So, the user can specify a negative value explicitly and it will be
treated as the default, and an endblock value that's larger than the
relation size will be treated as the relation size. The way pg_prewarm
does the corresponding checks seems superior: null indicates the
default value, and any non-null value must be within range or you get
an error. Also, you seem to be treating endblock as the first block
that should not be checked, whereas pg_prewarm takes what seems to me
to be the more natural interpretation: the end block is the last block
that IS checked. If you do it this way, then someone who specifies the
same start and end block will check no blocks -- silently, I think.
+ if (skip_all_frozen || skip_all_visible)
Since you can't skip all frozen without skipping all visible, this
test could be simplified. Or you could introduce a three-valued enum
and test that skip_pages != SKIP_PAGES_NONE, which might be even
better.
+ /* We must unlock the page from the prior iteration, if any */
+ Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);
I don't understand this assertion, and I don't understand the comment,
either. I think ctx.blkno can never be equal to InvalidBlockNumber
because we never set it to anything outside the range of 0..(endblock
- 1), and I think ctx.buffer must always be unequal to InvalidBuffer
because we just initialized it by calling ReadBufferExtended(). So I
think this assertion would still pass if we wrote && rather than ||.
But even then, I don't know what that has to do with the comment or
why it even makes sense to have an assertion for that in the first
place.
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those. This protection
saves us from
+ * having to reacquire the locks and recheck those minimums for every
+ * tuple, which would be expensive.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
I don't think we'd need to recheck for every tuple, would we? Just for
cases where there's an apparent violation of the rules. I guess that
could still be expensive if there's a lot of them, but needing
ShareUpdateExclusiveLock rather than only AccessShareLock is a little
unfortunate.
It's also unclear to me why this concerns itself with relfrozenxid and
the cluster-wide oldestXid value but not with datfrozenxid. It seems
like if we're going to sanity-check the relfrozenxid against the
cluster-wide value, we ought to also check it against the
database-wide value. Checking neither would also seem like a plausible
choice. But it seems very strange to only check against the
cluster-wide value.
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber
increments to FirstOffsetNumber");
If you are going to rely on this property, I agree that it is good to
check it. But it would be better to NOT rely on this property, and I
suspect the code can be written quite cleanly without relying on it.
And actually, that's what you did, because you first set ctx.offnum =
InvalidOffsetNumber but then just after that you set ctx.offnum = 0 in
the loop initializer. So AFAICS the first initializer, and the static
assert, are pointless.
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ uint16 redirect = ItemIdGetRedirect(ctx.itemid);
+ if (redirect <= SizeOfPageHeaderData
|| redirect >= ph->pd_lower)
...
+ if ((redirect - SizeOfPageHeaderData)
% sizeof(uint16))
I think that ItemIdGetRedirect() returns an offset, not a byte
position. So the expectation that I would have is that it would be any
integer >= 0 and <= maxoff. Am I confused? BTW, it seems like it might
be good to complain if the item to which it points is LP_UNUSED...
AFAIK that shouldn't happen.
+ errmsg("\"%s\" is not a heap AM",
I think the correct wording would be just "is not a heap." The "heap
AM" is the thing in pg_am, not a specific table.
+confess(HeapCheckContext * ctx, char *msg)
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
This is what happens when you pgindent without adding all the right
things to typedefs.list first ... or when you don't pgindent and have
odd ideas about how to indent things.
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. Don't leak the msg argument memory.
+ */
+ pfree(msg);
Maybe change the second sentence to something like: "That should be
OK, else the user can lower work_mem, but we'd better not leak any
additional memory."
+/*
+ * check_tuphdr_xids
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
First, check_tuphdr_xids() doesn't seem like a very good name. If you
have a function with that name and, like this one, it returns Boolean,
what does true mean? What does false mean? Kinda hard to tell. And
also, check the tuple header XIDs *for what*? If you called it, say,
tuple_is_visible(), that would be self-evident.
Second, consider that we hold at least AccessShareLock on the relation
- actually, ATM we hold ShareUpdateExclusiveLock. Either way, there
cannot be a concurrent modification to the tuple descriptor in
progress. Therefore, I think that only a HEAPTUPLE_DEAD tuple is
potentially using a non-current schema. If the tuple is
HEAPTUPLE_INSERT_IN_PROGRESS, there's either no ADD COLUMN in the
inserting transaction, or that transaction committed before we got our
lock. Similarly if it's HEAPTUPLE_DELETE_IN_PROGRESS or
HEAPTUPLE_RECENTLY_DEAD, the original inserter must've committed
before we got our lock. Or if it's both inserted and deleted in the
same transaction, say, then that transaction committed before we got
our lock or else contains no relevant DDL. IOW, I think you can check
everything but dead tuples here.
Capitalization and punctuation for messages complaining about problems
need to be consistent. verify_heapam() has "Invalid redirect line
pointer offset %u out of bounds" which starts with a capital letter,
but check_tuphdr_xids() has "heap tuple with XMAX_IS_MULTI is neither
LOCKED_ONLY nor has a valid xmax" which does not. I vote for lower
case, but in any event it should be the same. Also,
check_tuphdr_xids() has "tuple xvac = %u invalid" which is either a
debugging leftover or a very unclear complaint. I think some real work
needs to be put into the phrasing of these messages so that it's more
clear exactly what is going on and why it's bad. For example the first
example in this paragraph is clearly a problem of some kind, but it's
not very clear exactly what is happening: is %u the offset of the
invalid line redirect or the value to which it points? I don't think
the phrasing is very grammatical, which makes it hard to tell which is
meant, and I actually think it would be a good idea to include both
things.
Project policy is generally against splitting a string across multiple
lines to fit within 80 characters. We like to fit within 80
characters, but we like to be able to grep for strings more, and
breaking them up like this makes that harder.
+ confess(ctx,
+ pstrdup("corrupt toast chunk va_header"));
This is another message that I don't think is very clear. There's two
elements to that. One is that the phrasing is not very good, and the
other is that there are no % escapes. What's somebody going to do when
they see this message? First, they're probably going to have to look
at the code to figure out in which circumstances it gets generated;
that's a sign that the message isn't phrased clearly enough. That will
tell them that an unexpected bit pattern has been found, but not what
that unexpected bit pattern actually was. So then, they're going to
have to try to find the relevant va_header by some other means and
fish out the relevant bit so that they can see what actually went
wrong.
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ *
Extra blank line.
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel),
+
ctx->attnum);
Maybe you could avoid the line wrap by declaring this without
initializing it, and then initializing it as a separate statement.
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)",
+
ctx->tuphdr->t_hoff, ctx->offset,
+ ctx->lp_len));
Uggh! This isn't even remotely an English sentence. I don't think
formulas are the way to go here, but I like the idea of formulas in
some places and written-out messages in others even less. I guess the
complaint here in English is something like "tuple attribute %d should
start at offset %u, but tuple length is only %u" or something of that
sort. Also, it seems like this complaint really ought to have been
reported on the *preceding* loop iteration, either complaining that
(1) the fixed length attribute is more than the number of remaining
bytes in the tuple or (2) the varlena header for the tuple specifies
an excessively high length. It seems like you're blaming the wrong
attribute for the problem.
BTW, the header comments for this function (check_tuple_attribute)
neglect to document the meaning of the return value.
+ confess(ctx, psprintf("tuple xmax = %u
precedes relation "
+
"relfrozenxid = %u",
This is another example of these messages needing work. The
corresponding message from heap_prepare_freeze_tuple() is "found
update xid %u from before relfrozenxid %u". That's better, because we
don't normally include equals signs in our messages like this, and
also because "relation relfrozenxid" is redundant. I think this should
say something like "tuple xmax %u precedes relfrozenxid %u".
+ confess(ctx, psprintf("tuple xmax = %u is in
the future",
+ xmax));
And then this could be something like "tuple xmax %u follows
last-assigned xid %u". That would be more symmetric and more
informative.
+ if (SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) >
ctx->tuphdr->t_hoff)
I think we should be able to predict the exact value of t_hoff and
complain if it isn't precisely equal to the expected value. Or is that
not possible for some reason?
Is there some place that's checking that lp_len >=
SizeOfHeapTupleHeader before check_tuple() goes and starts poking into
the header? If not, there should be.
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas and tables implicitly');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-i', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas, tables and indexes');
I haven't really looked through the btree-checking and pg_amcheck
parts of this much yet, but this caught my eye. Why would the default
be to check tables but not indexes? I think the default ought to be to
check everything we know how to check.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, May 14, 2020 at 03:50:52PM -0400, Tom Lane wrote:
I think there's definitely value in corrupting data in some predictable
(reproducible) way and verifying that the check code catches it and
responds as expected. Sure, this will not be 100% coverage, but it'll be
a lot better than 0% coverage.
Skimming quickly through the patch, that's what is done in a way
similar to pg_checksums's 002_actions.pl. So it seems fine to me to
use something like that for some basic coverage. We may want to
refactor the test APIs to unify all that though.
--
Michael
On Jul 16, 2020, at 12:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jul 6, 2020 at 2:06 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
The v10 patch without these ideas is here:
Along the lines of what Alvaro was saying before, I think this
definitely needs to be split up into a series of patches. The commit
message for v10 describes it doing three pretty separate things, and I
think that argues for splitting it into a series of three patches. I'd
argue for this ordering:0001 Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it. This is
used by the new pg_amcheck command line tool for reporting back to
the caller.0002 Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.0003 Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.It's too hard to review things like this when it's all mixed together.
The v11 patch series is broken up as you suggest.
+++ b/contrib/amcheck/t/skipping.plThe name of this file is inconsistent with the tree's usual
convention, which is all stuff like 001_whatever.pl, except for
src/test/modules/brin, which randomly decided to use two digits
instead of three. There's no precedent for a test file with no leading
numeric digits. Also, what does "skipping" even have to do with what
the test is checking? Maybe it's intended to refer to the new error
handling "skipping" the actual error in favor of just reporting it
without stopping, but that's not really what the word "skipping"
normally means. Finally, it seems a bit over-engineered: do we really
need 183 test cases to check that detecting a problem doesn't lead to
an abort? Like, if that's the purpose of the test, I'd expect it to
check one corrupt relation and one non-corrupt relation, each with and
without the no-error behavior. And that's about it. Or maybe it's
talking about skipping pages during the checks, because those pages
are all-visible or all-frozen? It's not very clear to me what's going
on here.
The "skipping" did originally refer to testing verify_heapam()'s option to skip all-visible or all-frozen blocks. I have renamed it 001_verify_heapam.pl, since it tests that function.
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;Please add explanatory comments indicating what these are intended to
mean.
Done.
For most of the the structure members, the brief comments
already present seem sufficient; but here, more explanation looks
necessary and less is provided. The "Values for returning tuples"
could possibly also use some more detail.
Ok, I've expanded the comments for these.
+#define HEAPCHECK_RELATION_COLS 8
I think this should really be at the top of the file someplace.
Sometimes people have adopted this style when the #define is only used
within the function that contains it, but that's not the case here.
Done.
+ ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("unrecognized parameter for 'skip': %s", skip), + errhint("please choose from 'all visible', 'all frozen', " + "or NULL")));I think it would be better if we had three string values selecting the
different behaviors, and made the parameter NOT NULL but with a
default. It seems like that would be easier to understand. Right now,
I can tell that my options for what to skip are "all visible", "all
frozen", and, uh, some other thing that I don't know what it is. I'm
gonna guess the third option is to skip nothing, but it seems best to
make that explicit. Also, should we maybe consider spelling this
'all-visible' and 'all-frozen' with dashes, instead of using spaces?
Spaces in an option value seems a little icky to me somehow.
I've made the options 'all-visible', 'all-frozen', and 'none'. It defaults to 'none'. I did not mark the function as strict, as I think NULL is a reasonable value (and the default) for startblock and endblock.
+ int64 startblock = -1; + int64 endblock = -1; ... + if (!PG_ARGISNULL(3)) + startblock = PG_GETARG_INT64(3); + if (!PG_ARGISNULL(4)) + endblock = PG_GETARG_INT64(4); ... + if (startblock < 0) + startblock = 0; + if (endblock < 0 || endblock > ctx.nblocks) + endblock = ctx.nblocks; + + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)So, the user can specify a negative value explicitly and it will be
treated as the default, and an endblock value that's larger than the
relation size will be treated as the relation size. The way pg_prewarm
does the corresponding checks seems superior: null indicates the
default value, and any non-null value must be within range or you get
an error. Also, you seem to be treating endblock as the first block
that should not be checked, whereas pg_prewarm takes what seems to me
to be the more natural interpretation: the end block is the last block
that IS checked. If you do it this way, then someone who specifies the
same start and end block will check no blocks -- silently, I think.
Under that regime, for relations with one block of data, (startblock=0, endblock=0) means "check the zero'th block", and for relations with no blocks of data, specifying any non-null (startblock,endblock) pair raises an exception. I don't like that too much, but I'm happy to defer to precedent. Since you say pg_prewarm works this way (I did not check), I have changed verify_heapam to do likewise.
+ if (skip_all_frozen || skip_all_visible)
Since you can't skip all frozen without skipping all visible, this
test could be simplified. Or you could introduce a three-valued enum
and test that skip_pages != SKIP_PAGES_NONE, which might be even
better.
It works now with a three-valued enum.
+ /* We must unlock the page from the prior iteration, if any */ + Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);I don't understand this assertion, and I don't understand the comment,
either. I think ctx.blkno can never be equal to InvalidBlockNumber
because we never set it to anything outside the range of 0..(endblock
- 1), and I think ctx.buffer must always be unequal to InvalidBuffer
because we just initialized it by calling ReadBufferExtended(). So I
think this assertion would still pass if we wrote && rather than ||.
But even then, I don't know what that has to do with the comment or
why it even makes sense to have an assertion for that in the first
place.
Yes, it is vestigial. Removed.
+ /* + * Open the relation. We use ShareUpdateExclusive to prevent concurrent + * vacuums from changing the relfrozenxid, relminmxid, or advancing the + * global oldestXid to be newer than those. This protection saves us from + * having to reacquire the locks and recheck those minimums for every + * tuple, which would be expensive. + */ + ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);I don't think we'd need to recheck for every tuple, would we? Just for
cases where there's an apparent violation of the rules.
It's a bit fuzzy what an "apparent violation" might be if both ends of the range of valid xids may be moving, and arbitrarily much. It's also not clear how often to recheck, since you'd be dealing with a race condition no matter how often you check. Perhaps the comments shouldn't mention how often you'd have to recheck, since there is no really defensible choice for that. I removed the offending sentence.
I guess that
could still be expensive if there's a lot of them, but needing
ShareUpdateExclusiveLock rather than only AccessShareLock is a little
unfortunate.
I welcome strategies that would allow for taking a lesser lock.
It's also unclear to me why this concerns itself with relfrozenxid and
the cluster-wide oldestXid value but not with datfrozenxid. It seems
like if we're going to sanity-check the relfrozenxid against the
cluster-wide value, we ought to also check it against the
database-wide value. Checking neither would also seem like a plausible
choice. But it seems very strange to only check against the
cluster-wide value.
If the relation has a normal relfrozenxid, then the oldest valid xid we can encounter in the table is relfrozenxid. Otherwise, each row needs to be compared against some other minimum xid value.
Logically, that other minimum xid value should be the oldest valid xid for the database, which must logically be at least as old as any valid row in the table and no older than the oldest valid xid for the cluster.
Unfortunately, if the comments in commands/vacuum.c circa line 1572 can be believed, and if I am reading them correctly, the stored value for the oldest valid xid in the database has been known to be corrupted by bugs in pg_upgrade. This is awful. If I compare the xid of a row in a table against the oldest xid value for the database, and the xid of the row is older, what can I do? I don't have a principled basis for determining which one of them is wrong.
The logic in verify_heapam is conservative; it makes no guarantees about finding and reporting all corruption, but if it does report a row as corrupt, you can bank on that, bugs in verify_heapam itself not withstanding. I think this is a good choice; a tool with only false negatives is much more useful than one with both false positives and false negatives.
I have added a comment about my reasoning to verify_heapam.c. I'm happy to be convinced of a better strategy for handling this situation.
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber, + "InvalidOffsetNumber increments to FirstOffsetNumber");If you are going to rely on this property, I agree that it is good to
check it. But it would be better to NOT rely on this property, and I
suspect the code can be written quite cleanly without relying on it.
And actually, that's what you did, because you first set ctx.offnum =
InvalidOffsetNumber but then just after that you set ctx.offnum = 0 in
the loop initializer. So AFAICS the first initializer, and the static
assert, are pointless.
Ah, right you are. Removed.
+ if (ItemIdIsRedirected(ctx.itemid)) + { + uint16 redirect = ItemIdGetRedirect(ctx.itemid); + if (redirect <= SizeOfPageHeaderData || redirect >= ph->pd_lower) ... + if ((redirect - SizeOfPageHeaderData) % sizeof(uint16))I think that ItemIdGetRedirect() returns an offset, not a byte
position. So the expectation that I would have is that it would be any
integer >= 0 and <= maxoff. Am I confused?
I think you are right about it returning an offset, which should be between FirstOffsetNumber and maxoff, inclusive. I have updated the checks.
BTW, it seems like it might
be good to complain if the item to which it points is LP_UNUSED...
AFAIK that shouldn't happen.
Thanks for mentioning that. It now checks for that.
+ errmsg("\"%s\" is not a heap AM",
I think the correct wording would be just "is not a heap." The "heap
AM" is the thing in pg_am, not a specific table.
Fixed.
+confess(HeapCheckContext * ctx, char *msg) +TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx) +check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)This is what happens when you pgindent without adding all the right
things to typedefs.list first ... or when you don't pgindent and have
odd ideas about how to indent things.
Hmm. I don't see the three lines of code you are quoting. Which patch is that from?
+ /* + * In principle, there is nothing to prevent a scan over a large, highly + * corrupted table from using workmem worth of memory building up the + * tuplestore. Don't leak the msg argument memory. + */ + pfree(msg);Maybe change the second sentence to something like: "That should be
OK, else the user can lower work_mem, but we'd better not leak any
additional memory."
It may be a little wordy, but I went with
/*
* In principle, there is nothing to prevent a scan over a large, highly
* corrupted table from using workmem worth of memory building up the
* tuplestore. That's ok, but if we also leak the msg argument memory
* until the end of the query, we could exceed workmem by more than a
* trivial amount. Therefore, free the msg argument each time we are
* called rather than waiting for our current memory context to be freed.
*/
+/* + * check_tuphdr_xids + * + * Determine whether tuples are visible for verification. Similar to + * HeapTupleSatisfiesVacuum, but with critical differences. + * + * 1) Does not touch hint bits. It seems imprudent to write hint bits + * to a table during a corruption check. + * 2) Only makes a boolean determination of whether verification should + * see the tuple, rather than doing extra work for vacuum-related + * categorization. + * + * The caller should already have checked that xmin and xmax are not out of + * bounds for the relation. + */First, check_tuphdr_xids() doesn't seem like a very good name. If you
have a function with that name and, like this one, it returns Boolean,
what does true mean? What does false mean? Kinda hard to tell. And
also, check the tuple header XIDs *for what*? If you called it, say,
tuple_is_visible(), that would be self-evident.
Changed.
Second, consider that we hold at least AccessShareLock on the relation
- actually, ATM we hold ShareUpdateExclusiveLock. Either way, there
cannot be a concurrent modification to the tuple descriptor in
progress. Therefore, I think that only a HEAPTUPLE_DEAD tuple is
potentially using a non-current schema. If the tuple is
HEAPTUPLE_INSERT_IN_PROGRESS, there's either no ADD COLUMN in the
inserting transaction, or that transaction committed before we got our
lock. Similarly if it's HEAPTUPLE_DELETE_IN_PROGRESS or
HEAPTUPLE_RECENTLY_DEAD, the original inserter must've committed
before we got our lock. Or if it's both inserted and deleted in the
same transaction, say, then that transaction committed before we got
our lock or else contains no relevant DDL. IOW, I think you can check
everything but dead tuples here.
Ok, I have changed tuple_is_visible to return true rather than false for those other cases.
Capitalization and punctuation for messages complaining about problems
need to be consistent. verify_heapam() has "Invalid redirect line
pointer offset %u out of bounds" which starts with a capital letter,
but check_tuphdr_xids() has "heap tuple with XMAX_IS_MULTI is neither
LOCKED_ONLY nor has a valid xmax" which does not. I vote for lower
case, but in any event it should be the same.
I standardized on all lowercase text, though I left embedded symbols and constants such as LOCKED_ONLY alone.
Also,
check_tuphdr_xids() has "tuple xvac = %u invalid" which is either a
debugging leftover or a very unclear complaint.
Right. That has been changed to "old-style VACUUM FULL transaction ID %u is invalid in this relation".
I think some real work
needs to be put into the phrasing of these messages so that it's more
clear exactly what is going on and why it's bad. For example the first
example in this paragraph is clearly a problem of some kind, but it's
not very clear exactly what is happening: is %u the offset of the
invalid line redirect or the value to which it points? I don't think
the phrasing is very grammatical, which makes it hard to tell which is
meant, and I actually think it would be a good idea to include both
things.
Beware that every row returned from amcheck has more fields than just the error message.
blkno OUT bigint,
offnum OUT integer,
lp_off OUT smallint,
lp_flags OUT smallint,
lp_len OUT smallint,
attnum OUT integer,
chunk OUT integer,
msg OUT text
Rather than including blkno, offnum, lp_off, lp_flags, lp_len, attnum, or chunk in the message, it would be better to remove these things from messages that include them. For the specific message under consideration, I've converted the text to "line pointer redirection to item at offset number %u is outside valid bounds %u .. %u". That avoids duplicating the offset information of the referring item, while reporting to offset of the referred item.
Project policy is generally against splitting a string across multiple
lines to fit within 80 characters. We like to fit within 80
characters, but we like to be able to grep for strings more, and
breaking them up like this makes that harder.
Thanks for clarifying the project policy. I joined these message strings back together.
+ confess(ctx, + pstrdup("corrupt toast chunk va_header"));This is another message that I don't think is very clear. There's two
elements to that. One is that the phrasing is not very good, and the
other is that there are no % escapes
Changed to "corrupt extended toast chunk with sequence number %d has invalid varlena header %0x". I think all the other information about where the corruption was found is already present in the other returned columns.
What's somebody going to do when
they see this message? First, they're probably going to have to look
at the code to figure out in which circumstances it gets generated;
that's a sign that the message isn't phrased clearly enough. That will
tell them that an unexpected bit pattern has been found, but not what
that unexpected bit pattern actually was. So then, they're going to
have to try to find the relevant va_header by some other means and
fish out the relevant bit so that they can see what actually went
wrong.
Right.
+ * Checks the current attribute as tracked in ctx for corruption. Records + * any corruption found in ctx->corruption. + * + *Extra blank line.
Fixed.
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), + ctx->attnum);Maybe you could avoid the line wrap by declaring this without
initializing it, and then initializing it as a separate statement.
Yes, I like that better. I did not need to do the same with infomask, but it looks better to me to break the declaration and initialization for both, so I did that.
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)", + ctx->tuphdr->t_hoff, ctx->offset, + ctx->lp_len));Uggh! This isn't even remotely an English sentence. I don't think
formulas are the way to go here, but I like the idea of formulas in
some places and written-out messages in others even less. I guess the
complaint here in English is something like "tuple attribute %d should
start at offset %u, but tuple length is only %u" or something of that
sort. Also, it seems like this complaint really ought to have been
reported on the *preceding* loop iteration, either complaining that
(1) the fixed length attribute is more than the number of remaining
bytes in the tuple or (2) the varlena header for the tuple specifies
an excessively high length. It seems like you're blaming the wrong
attribute for the problem.
Yeah, and it wouldn't complain if the final attribute of a tuple was overlong, as there wouldn't be a next attribute to blame it on. I've changed it to report as you suggest, although it also still complains if the first attribute starts outside the bounds of the tuple. The two error messages now read as "tuple attribute should start at offset %u, but tuple length is only %u" and "tuple attribute of length %u ends at offset %u, but tuple length is only %u".
BTW, the header comments for this function (check_tuple_attribute)
neglect to document the meaning of the return value.
Fixed.
+ confess(ctx, psprintf("tuple xmax = %u precedes relation " + "relfrozenxid = %u",This is another example of these messages needing work. The
corresponding message from heap_prepare_freeze_tuple() is "found
update xid %u from before relfrozenxid %u". That's better, because we
don't normally include equals signs in our messages like this, and
also because "relation relfrozenxid" is redundant. I think this should
say something like "tuple xmax %u precedes relfrozenxid %u".+ confess(ctx, psprintf("tuple xmax = %u is in the future", + xmax));And then this could be something like "tuple xmax %u follows
last-assigned xid %u". That would be more symmetric and more
informative.
Both of these have been changed.
+ if (SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) >
ctx->tuphdr->t_hoff)I think we should be able to predict the exact value of t_hoff and
complain if it isn't precisely equal to the expected value. Or is that
not possible for some reason?
That is possible, and I've updated the error message to match. There are cases where you can't know if the HEAP_HASNULL bit is wrong or if the t_hoff value is wrong, but I've changed the code to just compute the length based on the HEAP_HASNULL setting and use that as the expected value, and complain when the actual value does not match the expected. That sidesteps the problem of not knowing exactly which value to blame.
Is there some place that's checking that lp_len >=
SizeOfHeapTupleHeader before check_tuple() goes and starts poking into
the header? If not, there should be.
Good catch. check_tuple() now does that before reading the header.
+$node->command_ok(
+ [ + 'pg_amcheck', '-p', $port, 'postgres' + ], + 'pg_amcheck all schemas and tables implicitly'); + +$node->command_ok( + [ + 'pg_amcheck', '-i', '-p', $port, 'postgres' + ], + 'pg_amcheck all schemas, tables and indexes');I haven't really looked through the btree-checking and pg_amcheck
parts of this much yet, but this caught my eye. Why would the default
be to check tables but not indexes? I think the default ought to be to
check everything we know how to check.
I have changed the default to match your expectations.
Attachments:
v11-0001-Adding-function-verify_btreeam-and-bumping-modul.patchapplication/octet-stream; name=v11-0001-Adding-function-verify_btreeam-and-bumping-modul.patch; x-unix-mode=0644Download
From 40315f9708a75d458cdd6bd4db5a733eb52d9b9a Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Jul 2020 12:20:44 -0700
Subject: [PATCH v11 1/3] Adding function verify_btreeam and bumping module
version.
For most errors found while verifying a btree index, the new
function verify_btreeam returns one row per error containing the
block number where the error was discovered and an error message
describing the problem. The pre-existing behavior for functions
bt_index_parent_check and bt_index_check is unchanged.
---
contrib/amcheck/Makefile | 2 +-
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/expected/check_btree.out | 35 +
contrib/amcheck/sql/check_btree.sql | 13 +
contrib/amcheck/verify_nbtree.c | 836 +++++++++++++----------
5 files changed, 517 insertions(+), 371 deletions(-)
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..b288c28fa0 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -6,7 +6,7 @@ OBJS = \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
REGRESS = check check_btree
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index f82f48d23b..7297abb577 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -21,6 +21,8 @@ SELECT bt_index_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_check
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_parent_check
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
-- to run this cluster-wide with a restricted account, and as tested
@@ -29,6 +31,7 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -42,17 +45,23 @@ SELECT bt_index_parent_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
ERROR: "bttest_a" is not an index
SELECT bt_index_parent_check('bttest_a');
ERROR: "bttest_a" is not an index
+SELECT * FROM verify_btreeam('bttest_a');
+ERROR: "bttest_a" is not an index
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
ERROR: could not open relation with OID 17
SELECT bt_index_parent_check(17);
ERROR: could not open relation with OID 17
+SELECT * FROM verify_btreeam(17);
+ERROR: could not open relation with OID 17
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
@@ -60,6 +69,12 @@ SELECT bt_index_parent_check('bttest_a_brin_idx');
ERROR: only B-Tree indexes are supported as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
ROLLBACK;
+BEGIN;
+CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ERROR: only B-Tree indexes are supported as targets for verification
+DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
+ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -67,6 +82,11 @@ SELECT bt_index_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
bt_index_check
@@ -93,6 +113,11 @@ SELECT bt_index_parent_check('bttest_b_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -118,6 +143,11 @@ SELECT bt_index_check('bttest_multi_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
bt_index_parent_check
@@ -134,6 +164,11 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
-- checks
diff --git a/contrib/amcheck/sql/check_btree.sql b/contrib/amcheck/sql/check_btree.sql
index a1fef644cb..816ca9d033 100644
--- a/contrib/amcheck/sql/check_btree.sql
+++ b/contrib/amcheck/sql/check_btree.sql
@@ -24,6 +24,7 @@ CREATE ROLE regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx'::regclass);
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
@@ -33,27 +34,36 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
SELECT bt_index_parent_check('bttest_a');
+SELECT * FROM verify_btreeam('bttest_a');
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
SELECT bt_index_parent_check(17);
+SELECT * FROM verify_btreeam(17);
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
ROLLBACK;
+BEGIN;
+CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
SELECT bt_index_parent_check('bttest_b_idx', true);
@@ -61,6 +71,7 @@ SELECT bt_index_parent_check('bttest_b_idx', true);
BEGIN;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_b_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -74,6 +85,7 @@ SELECT bt_index_check('bttest_a_idx', true);
-- normal check outside of xact for index with included columns
SELECT bt_index_check('bttest_multi_idx');
+SELECT * FROM verify_btreeam('bttest_multi_idx');
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
@@ -81,6 +93,7 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
TRUNCATE bttest_multi;
INSERT INTO bttest_multi SELECT i, i%2 FROM generate_series(1, 100000) as i;
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
+SELECT * FROM verify_btreeam('bttest_multi_idx');
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index e4d501a85d..ee7c8124b8 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -32,16 +32,22 @@
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "commands/tablecmds.h"
+#include "funcapi.h"
#include "lib/bloomfilter.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
#include "storage/smgr.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
-
+#include "amcheck.h"
PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(bt_index_check);
+PG_FUNCTION_INFO_V1(bt_index_parent_check);
+PG_FUNCTION_INFO_V1(verify_btreeam);
+
/*
* A B-Tree cannot possibly have this many levels, since there must be one
* block per level, which is bound by the range of BlockNumber:
@@ -50,6 +56,20 @@ PG_MODULE_MAGIC;
#define BTreeTupleGetNKeyAtts(itup, rel) \
Min(IndexRelationGetNumberOfKeyAttributes(rel), BTreeTupleGetNAtts(itup, rel))
+/*
+ * Context for use within verify_btreeam()
+ */
+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;
+
+#define CONTINUE_CHECKING(ctx) \
+ (ctx == NULL || !((ctx)->is_corrupt && (ctx)->on_error_stop))
+
/*
* State associated with verifying a B-Tree index
*
@@ -116,6 +136,9 @@ typedef struct BtreeCheckState
bloom_filter *filter;
/* Debug counter */
int64 heaptuplespresent;
+
+ /* Error reporting context */
+ BtreeCheckContext *ctx;
} BtreeCheckState;
/*
@@ -133,28 +156,28 @@ typedef struct BtreeLevel
bool istruerootlevel;
} BtreeLevel;
-PG_FUNCTION_INFO_V1(bt_index_check);
-PG_FUNCTION_INFO_V1(bt_index_parent_check);
-
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);
static inline void btree_index_checkable(Relation rel);
static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
- bool rootdescend);
+ bool rootdescend, BtreeCheckContext * ctx);
static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
- BtreeLevel level);
-static void bt_target_page_check(BtreeCheckState *state);
-static BTScanInsert bt_right_page_check_scankey(BtreeCheckState *state);
+ BtreeLevel level, BtreeCheckContext * ctx);
+static void bt_target_page_check(BtreeCheckState *state, BtreeCheckContext * ctx);
+static BTScanInsert bt_right_page_check_scankey(BtreeCheckState *state, BtreeCheckContext * ctx);
static void bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
- OffsetNumber downlinkoffnum);
+ OffsetNumber downlinkoffnum, BtreeCheckContext * ctx);
static void bt_child_highkey_check(BtreeCheckState *state,
OffsetNumber target_downlinkoffnum,
Page loaded_child,
- uint32 target_level);
+ uint32 target_level,
+ BtreeCheckContext * ctx);
static void bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
- BlockNumber targetblock, Page target);
+ BlockNumber targetblock, Page target,
+ BtreeCheckContext * ctx);
static void bt_tuple_present_callback(Relation index, ItemPointer tid,
Datum *values, bool *isnull,
bool tupleIsAlive, void *checkstate);
@@ -176,7 +199,7 @@ static inline bool invariant_l_nontarget_offset(BtreeCheckState *state,
BlockNumber nontargetblock,
Page nontarget,
OffsetNumber upperbound);
-static Page palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum);
+static Page palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum, BtreeCheckContext * ctx);
static inline BTScanInsert bt_mkscankey_pivotsearch(Relation rel,
IndexTuple itup);
static ItemId PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block,
@@ -185,6 +208,26 @@ static inline ItemPointer BTreeTupleGetHeapTIDCareful(BtreeCheckState *state,
IndexTuple itup, bool nonpivot);
static inline ItemPointer BTreeTupleGetPointsToTID(IndexTuple itup);
+static TupleDesc verify_btreeam_tupdesc(void);
+static void confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg);
+
+/*
+ * Macro for either calling ereport(...) or confess(...) depending on whether
+ * a context for returning the error message exists. Prior to version 1.3,
+ * all functions reported any detected corruption via ereport, but starting in
+ * 1.3, the new function verify_btreeam reports detected corruption back to
+ * the caller as a set of rows, and pre-existing functions continue to report
+ * corruption via ereport. This macro allows the shared implementation to
+ * to do the right thing depending on context.
+ */
+#define econfess(ctx, blkno, code, ...) \
+ do { \
+ if (ctx) \
+ confess(ctx, blkno, psprintf(__VA_ARGS__)); \
+ else \
+ ereport(ERROR, (errcode(code), errmsg(__VA_ARGS__))); \
+ } while(0)
+
/*
* bt_index_check(index regclass, heapallindexed boolean)
*
@@ -203,7 +246,7 @@ bt_index_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 2)
heapallindexed = PG_GETARG_BOOL(1);
- bt_index_check_internal(indrelid, false, heapallindexed, false);
+ bt_index_check_internal(indrelid, false, heapallindexed, false, NULL);
PG_RETURN_VOID();
}
@@ -229,17 +272,66 @@ bt_index_parent_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 3)
rootdescend = PG_GETARG_BOOL(2);
- bt_index_check_internal(indrelid, true, heapallindexed, rootdescend);
+ bt_index_check_internal(indrelid, true, heapallindexed, rootdescend, NULL);
PG_RETURN_VOID();
}
+Datum
+verify_btreeam(PG_FUNCTION_ARGS)
+{
+#define BTREECHECK_RELATION_COLS 2
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ BtreeCheckContext ctx;
+ bool randomAccess;
+ Oid indrelid;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ indrelid = PG_GETARG_OID(0);
+
+ memset(&ctx, 0, sizeof(BtreeCheckContext));
+
+ ctx.on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_btreeam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ bt_index_check_internal(indrelid, true, true, true, &ctx);
+
+ PG_RETURN_NULL();
+}
+
/*
* Helper for bt_index_[parent_]check, coordinating the bulk of the work.
*/
static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
- bool rootdescend)
+ bool rootdescend, BtreeCheckContext * ctx)
{
Oid heapid;
Relation indrel;
@@ -300,15 +392,16 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
RelationOpenSmgr(indrel);
if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" lacks a main relation fork",
- RelationGetRelationName(indrel))));
+ econfess(ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" lacks a main relation fork",
+ RelationGetRelationName(indrel));
/* Check index, possibly against table it is an index on */
- _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
- bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
- heapallindexed, rootdescend);
+ if (CONTINUE_CHECKING(ctx))
+ _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
+ if (CONTINUE_CHECKING(ctx))
+ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
+ heapallindexed, rootdescend, ctx);
}
/*
@@ -402,7 +495,8 @@ btree_index_mainfork_expected(Relation rel)
*/
static void
bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
- bool readonly, bool heapallindexed, bool rootdescend)
+ bool readonly, bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx)
{
BtreeCheckState *state;
Page metapage;
@@ -434,6 +528,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->readonly = readonly;
state->heapallindexed = heapallindexed;
state->rootdescend = rootdescend;
+ state->ctx = ctx;
if (state->heapallindexed)
{
@@ -506,7 +601,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);
/* Get true root block from meta-page */
- metapage = palloc_btree_page(state, BTREE_METAPAGE);
+ metapage = palloc_btree_page(state, BTREE_METAPAGE, ctx);
metad = BTPageGetMeta(metapage);
/*
@@ -535,19 +630,18 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current.level = metad->btm_level;
current.leftmost = metad->btm_root;
current.istruerootlevel = true;
- while (current.leftmost != P_NONE)
+ while (CONTINUE_CHECKING(state->ctx) && current.leftmost != P_NONE)
{
/*
* Verify this level, and get left most page for next level down, if
* not at leaf level
*/
- current = bt_check_level_from_leftmost(state, current);
+ current = bt_check_level_from_leftmost(state, current, ctx);
if (current.leftmost == InvalidBlockNumber)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" has no valid pages on level below %u or first level",
- RelationGetRelationName(rel), previouslevel)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" has no valid pages on level below %u or first level",
+ RelationGetRelationName(rel), previouslevel);
previouslevel = current.level;
}
@@ -555,7 +649,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
/*
* * Check whether heap contains unindexed/malformed tuples *
*/
- if (state->heapallindexed)
+ if (CONTINUE_CHECKING(state->ctx) && state->heapallindexed)
{
IndexInfo *indexinfo = BuildIndexInfo(state->rel);
TableScanDesc scan;
@@ -639,7 +733,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
* each call to bt_target_page_check().
*/
static BtreeLevel
-bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
+bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level, BtreeCheckContext * ctx)
{
/* State to establish early, concerning entire level */
BTPageOpaque opaque;
@@ -672,7 +766,7 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
/* Initialize state for this iteration */
state->targetblock = current;
- state->target = palloc_btree_page(state, state->targetblock);
+ state->target = palloc_btree_page(state, state->targetblock, ctx);
state->targetlsn = PageGetLSN(state->target);
opaque = (BTPageOpaque) PageGetSpecialPointer(state->target);
@@ -691,18 +785,16 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* checked.
*/
if (state->readonly && P_ISDELETED(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink or sibling link points to deleted block in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "downlink or sibling link points to deleted block in index \"%s\" "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
if (P_RIGHTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u fell off the end of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u fell off the end of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
else
ereport(DEBUG1,
(errcode(ERRCODE_NO_DATA),
@@ -722,16 +814,14 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
if (state->readonly)
{
if (!P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not leftmost in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not leftmost in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
if (level.istruerootlevel && !P_ISROOT(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not true root in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not true root in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
}
/*
@@ -780,33 +870,30 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* so sibling pointers should always be in mutual agreement
*/
if (state->readonly && opaque->btpo_prev != leftcurrent)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("left link/right link pair in index \"%s\" not in agreement",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "left link/right link pair in index \"%s\" not in agreement "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
/* Check level, which must be valid for non-ignorable page */
if (level.level != opaque->btpo.level)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leftmost down link for level points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- current, level.level, opaque->btpo.level)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "leftmost down link for level points to block in index \"%s\" whose level is not one level down "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ current, level.level, opaque->btpo.level);
/* Verify invariants for page */
- bt_target_page_check(state);
+ bt_target_page_check(state, ctx);
nextpage:
/* Try to detect circular links */
if (current == leftcurrent || current == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
leftcurrent = current;
current = opaque->btpo_next;
@@ -850,7 +937,7 @@ nextpage:
/* Free page and associated memory for this iteration */
MemoryContextReset(state->targetcontext);
}
- while (current != P_NONE);
+ while (CONTINUE_CHECKING(state->ctx) && current != P_NONE);
if (state->lowkey)
{
@@ -902,7 +989,7 @@ nextpage:
* resetting state->targetcontext.
*/
static void
-bt_target_page_check(BtreeCheckState *state)
+bt_target_page_check(BtreeCheckState *state, BtreeCheckContext * ctx)
{
OffsetNumber offset;
OffsetNumber max;
@@ -930,16 +1017,15 @@ bt_target_page_check(BtreeCheckState *state)
P_HIKEY))
{
itup = (IndexTuple) PageGetItem(state->target, itemid);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of high key index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index block=%u natts=%u block type=%s page lsn=%X/%X.",
- state->targetblock,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of high key index tuple attributes in index \"%s\" "
+ "(Index block=%u natts=%u block type=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -949,7 +1035,7 @@ bt_target_page_check(BtreeCheckState *state)
* real item (if any).
*/
for (offset = P_FIRSTDATAKEY(topaque);
- offset <= max;
+ offset <= max && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
ItemId itemid;
@@ -973,16 +1059,15 @@ bt_target_page_check(BtreeCheckState *state)
* frequently, and is surprisingly tolerant of corrupt lp_len fields.
*/
if (tupsize != ItemIdGetLength(itemid))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index tuple size does not equal lp_len in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X.",
- state->targetblock, offset,
- tupsize, ItemIdGetLength(itemid),
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn),
- errhint("This could be a torn page problem.")));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "index tuple size does not equal lp_len in index \"%s\" "
+ "(Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X) "
+ "(This could be a torn page problem)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ tupsize, ItemIdGetLength(itemid),
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check the number of index tuple attributes */
if (!_bt_check_natts(state->rel, state->heapkeyspace, state->target,
@@ -998,17 +1083,16 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X.",
- itid,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of index tuple attributes in index \"%s\" "
+ "(Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1027,7 +1111,8 @@ bt_target_page_check(BtreeCheckState *state)
bt_child_highkey_check(state,
offset,
NULL,
- topaque->btpo.level);
+ topaque->btpo.level,
+ ctx);
}
continue;
}
@@ -1049,14 +1134,13 @@ bt_target_page_check(BtreeCheckState *state)
htid = psprintf("(%u,%u)", ItemPointerGetBlockNumber(tid),
ItemPointerGetOffsetNumber(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("could not find tuple using search from root page in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to heap tid=%s page lsn=%X/%X.",
- itid, htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "could not find tuple using search from root page in index \"%s\" "
+ "(Index tid=%s points to heap tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1079,14 +1163,13 @@ bt_target_page_check(BtreeCheckState *state)
{
char *itid = psprintf("(%u,%u)", state->targetblock, offset);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("posting list contains misplaced TID in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s posting list offset=%d page lsn=%X/%X.",
- itid, i,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "posting list contains misplaced TID in index \"%s\" "
+ "(Index tid=%s posting list offset=%d page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, i,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
ItemPointerCopy(current, &last);
@@ -1134,16 +1217,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index row size %zu exceeds maximum for index \"%s\"",
- tupsize, RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index row size %zu exceeds maximum for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ tupsize, RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Fingerprint leaf page tuples (those that point to the heap) */
@@ -1242,16 +1324,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("high key invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "high key invariant violated for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Reset, in case scantid was set to (itup) posting tuple's max TID */
skey->scantid = scantid;
@@ -1289,21 +1370,20 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Lower index tid=%s (points to %s tid=%s) "
- "higher index tid=%s (points to %s tid=%s) "
- "page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- nitid,
- P_ISLEAF(topaque) ? "heap" : "index",
- nhtid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "item order invariant violated for index \"%s\" "
+ "(Lower index tid=%s (points to %s tid=%s) "
+ "higher index tid=%s (points to %s tid=%s) "
+ "page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ nitid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ nhtid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1328,7 +1408,7 @@ bt_target_page_check(BtreeCheckState *state)
BTScanInsert rightkey;
/* Get item in next/right page */
- rightkey = bt_right_page_check_scankey(state);
+ rightkey = bt_right_page_check_scankey(state, ctx);
if (rightkey &&
!invariant_g_offset(state, rightkey, max))
@@ -1343,7 +1423,7 @@ bt_target_page_check(BtreeCheckState *state)
if (!state->readonly)
{
/* Get fresh copy of target page */
- state->target = palloc_btree_page(state, state->targetblock);
+ state->target = palloc_btree_page(state, state->targetblock, ctx);
/* Note that we deliberately do not update target LSN */
topaque = (BTPageOpaque) PageGetSpecialPointer(state->target);
@@ -1354,14 +1434,13 @@ bt_target_page_check(BtreeCheckState *state)
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("cross page item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Last item on page tid=(%u,%u) page lsn=%X/%X.",
- state->targetblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "cross page item order invariant violated for index \"%s\" "
+ "(Last item on page tid=(%u,%u) page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1374,7 +1453,7 @@ bt_target_page_check(BtreeCheckState *state)
* because it has no useful value to compare).
*/
if (!P_ISLEAF(topaque) && state->readonly)
- bt_child_check(state, skey, offset);
+ bt_child_check(state, skey, offset, ctx);
}
/*
@@ -1386,10 +1465,11 @@ bt_target_page_check(BtreeCheckState *state)
* right of the child page pointer to by our rightmost downlink. And they
* might have missing downlinks. This final call checks for them.
*/
- if (!P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
+ if (CONTINUE_CHECKING(state->ctx) &&
+ !P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
{
bt_child_highkey_check(state, InvalidOffsetNumber,
- NULL, topaque->btpo.level);
+ NULL, topaque->btpo.level, ctx);
}
}
@@ -1410,7 +1490,7 @@ bt_target_page_check(BtreeCheckState *state)
* been concurrently deleted.
*/
static BTScanInsert
-bt_right_page_check_scankey(BtreeCheckState *state)
+bt_right_page_check_scankey(BtreeCheckState *state, BtreeCheckContext * ctx)
{
BTPageOpaque opaque;
ItemId rightitem;
@@ -1455,7 +1535,7 @@ bt_right_page_check_scankey(BtreeCheckState *state)
{
CHECK_FOR_INTERRUPTS();
- rightpage = palloc_btree_page(state, targetnext);
+ rightpage = palloc_btree_page(state, targetnext, ctx);
opaque = (BTPageOpaque) PageGetSpecialPointer(rightpage);
if (!P_IGNORE(opaque) || P_RIGHTMOST(opaque))
@@ -1666,7 +1746,8 @@ static void
bt_child_highkey_check(BtreeCheckState *state,
OffsetNumber target_downlinkoffnum,
Page loaded_child,
- uint32 target_level)
+ uint32 target_level,
+ BtreeCheckContext * ctx)
{
BlockNumber blkno = state->prevrightlink;
Page page;
@@ -1708,7 +1789,7 @@ bt_child_highkey_check(BtreeCheckState *state,
}
/* Move to the right on the child level */
- while (true)
+ while (CONTINUE_CHECKING(state->ctx))
{
/*
* Did we traverse the whole tree level and this is check for pages to
@@ -1723,51 +1804,47 @@ bt_child_highkey_check(BtreeCheckState *state,
/* Did we traverse the whole tree level and don't find next downlink? */
if (blkno == P_NONE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't traverse from downlink %u to downlink %u of index \"%s\"",
- state->prevrightlink, downlink,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "can't traverse from downlink %u to downlink %u of index \"%s\"",
+ state->prevrightlink, downlink,
+ RelationGetRelationName(state->rel));
/* Load page contents */
if (blkno == downlink && loaded_child)
page = loaded_child;
else
- page = palloc_btree_page(state, blkno);
+ page = palloc_btree_page(state, blkno, ctx);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && !P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("the first child of leftmost target page is not leftmost of its level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "the first child of leftmost target page is not leftmost of its level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check level for non-ignorable page */
if (!P_IGNORE(opaque) && opaque->btpo.level != target_level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block found while following rightlinks from child of index \"%s\" has invalid level",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, target_level - 1, opaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "block found while following rightlinks from child of index \"%s\" has invalid level "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, target_level - 1, opaque->btpo.level);
/* Try to detect circular links */
if ((!first && blkno == state->prevrightlink) || blkno == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- blkno, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ blkno, RelationGetRelationName(state->rel));
if (blkno != downlink && !P_IGNORE(opaque))
{
/* blkno probably has missing parent downlink */
- bt_downlink_missing_check(state, rightsplit, blkno, page);
+ bt_downlink_missing_check(state, rightsplit, blkno, page, ctx);
}
rightsplit = P_INCOMPLETE_SPLIT(opaque);
@@ -1825,14 +1902,13 @@ bt_child_highkey_check(BtreeCheckState *state,
if (pivotkey_offset > PageGetMaxOffsetNumber(state->target))
{
if (P_RIGHTMOST(topaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("child high key is greater than rightmost pivot key on target level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "child high key is greater than rightmost pivot key on target level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
pivotkey_offset = P_HIKEY;
}
itemid = PageGetItemIdCareful(state, state->targetblock,
@@ -1856,27 +1932,25 @@ bt_child_highkey_check(BtreeCheckState *state,
* page.
*/
if (!state->lowkey)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't find left sibling high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "can't find left sibling high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
itup = state->lowkey;
}
if (!bt_pivot_tuple_identical(highkey, itup))
{
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("mismatch between parent key and child high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "mismatch between parent key and child high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1913,7 +1987,7 @@ bt_child_highkey_check(BtreeCheckState *state,
*/
static void
bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
- OffsetNumber downlinkoffnum)
+ OffsetNumber downlinkoffnum, BtreeCheckContext * ctx)
{
ItemId itemid;
IndexTuple itup;
@@ -1978,7 +2052,7 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* the operator class obeys the transitive law.
*/
topaque = (BTPageOpaque) PageGetSpecialPointer(state->target);
- child = palloc_btree_page(state, childblock);
+ child = palloc_btree_page(state, childblock, ctx);
copaque = (BTPageOpaque) PageGetSpecialPointer(child);
maxoffset = PageGetMaxOffsetNumber(child);
@@ -1987,7 +2061,7 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* check for downlink connectivity.
*/
bt_child_highkey_check(state, downlinkoffnum,
- child, topaque->btpo.level);
+ child, topaque->btpo.level, ctx);
/*
* Since there cannot be a concurrent VACUUM operation in readonly mode,
@@ -2014,17 +2088,16 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* to test.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink to deleted page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child block=%u parent page lsn=%X/%X.",
- state->targetblock, childblock,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted page found in index \"%s\" "
+ "(Parent block=%u child block=%u parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
for (offset = P_FIRSTDATAKEY(copaque);
- offset <= maxoffset;
+ offset <= maxoffset && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
/*
@@ -2056,14 +2129,13 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
if (!invariant_l_nontarget_offset(state, targetkey, childblock, child,
offset))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("down-link lower bound invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X.",
- state->targetblock, childblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "down-link lower bound invariant violated for index \"%s\" "
+ "(Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
pfree(child);
@@ -2084,7 +2156,7 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
*/
static void
bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
- BlockNumber blkno, Page page)
+ BlockNumber blkno, Page page, BtreeCheckContext * ctx)
{
BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
ItemId itemid;
@@ -2150,14 +2222,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* inconsistencies anywhere else.
*/
if (P_ISLEAF(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leaf index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u page lsn=%X/%X.",
- blkno,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "leaf index block lacks downlink in index \"%s\" "
+ "(Block=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/* Descend from the given page, which is an internal page */
elog(DEBUG1, "checking for interrupted multi-level deletion due to missing downlink in index \"%s\"",
@@ -2167,11 +2238,11 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
itemid = PageGetItemIdCareful(state, blkno, page, P_FIRSTDATAKEY(opaque));
itup = (IndexTuple) PageGetItem(page, itemid);
childblk = BTreeTupleGetDownLink(itup);
- for (;;)
+ while (CONTINUE_CHECKING(state->ctx))
{
CHECK_FOR_INTERRUPTS();
- child = palloc_btree_page(state, childblk);
+ child = palloc_btree_page(state, childblk, ctx);
copaque = (BTPageOpaque) PageGetSpecialPointer(child);
if (P_ISLEAF(copaque))
@@ -2179,13 +2250,12 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
/* Do an extra sanity check in passing on internal pages */
if (copaque->btpo.level != level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, childblk,
- level - 1, copaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink points to block in index \"%s\" whose level is not one level down "
+ "(Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ level - 1, copaque->btpo.level);
level = copaque->btpo.level;
itemid = PageGetItemIdCareful(state, childblk, child,
@@ -2217,14 +2287,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* parent/ancestor page) lacked a downlink is incidental.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink to deleted leaf page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X.",
- blkno, childblk,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted leaf page found in index \"%s\" "
+ "(Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/*
* Iff leaf page is half-dead, its high key top parent link should point
@@ -2244,14 +2313,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u level=%u page lsn=%X/%X.",
- blkno, opaque->btpo.level,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "internal index block lacks downlink in index \"%s\" "
+ "(Block=%u level=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, opaque->btpo.level,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
}
/*
@@ -2327,16 +2395,12 @@ bt_tuple_present_callback(Relation index, ItemPointer tid, Datum *values,
/* Probe Bloom filter -- tuple should be present */
if (bloom_lacks_element(state->filter, (unsigned char *) norm,
IndexTupleSize(norm)))
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->heaprel),
- RelationGetRelationName(state->rel)),
- !state->readonly
- ? errhint("Retrying verification using the function bt_index_parent_check() might provide a more specific error.")
- : 0));
+ econfess(state->ctx, ItemPointerGetBlockNumber(&(itup->t_tid)), ERRCODE_DATA_CORRUPTED,
+ "heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->heaprel),
+ RelationGetRelationName(state->rel));
state->heaptuplespresent++;
pfree(itup);
@@ -2395,7 +2459,7 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
if (!IndexTupleHasVarwidths(itup))
return itup;
- for (i = 0; i < tupleDescriptor->natts; i++)
+ for (i = 0; CONTINUE_CHECKING(state->ctx) && i < tupleDescriptor->natts; i++)
{
Form_pg_attribute att;
@@ -2415,12 +2479,11 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
* should never be encountered here
*/
if (VARATT_IS_EXTERNAL(DatumGetPointer(normalized[i])))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->rel));
else if (VARATT_IS_COMPRESSED(DatumGetPointer(normalized[i])))
{
formnewtup = true;
@@ -2780,7 +2843,7 @@ invariant_l_nontarget_offset(BtreeCheckState *state, BTScanInsert key,
* misbehaves.
*/
static Page
-palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
+palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum, BtreeCheckContext * ctx)
{
Buffer buffer;
Page page;
@@ -2810,10 +2873,9 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISMETA(opaque) && blocknum != BTREE_METAPAGE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid meta page found at block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid meta page found at block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/* Check page from block that ought to be meta page */
if (blocknum == BTREE_METAPAGE)
@@ -2822,20 +2884,18 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
if (!P_ISMETA(opaque) ||
metad->btm_magic != BTREE_MAGIC)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" meta page is corrupt",
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" meta page is corrupt",
+ RelationGetRelationName(state->rel));
if (metad->btm_version < BTREE_MIN_VERSION ||
metad->btm_version > BTREE_VERSION)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("version mismatch in index \"%s\": file version %d, "
- "current version %d, minimum supported version %d",
- RelationGetRelationName(state->rel),
- metad->btm_version, BTREE_VERSION,
- BTREE_MIN_VERSION)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "version mismatch in index \"%s\": file version %d, "
+ "current version %d, minimum supported version %d",
+ RelationGetRelationName(state->rel),
+ metad->btm_version, BTREE_VERSION,
+ BTREE_MIN_VERSION);
/* Finished with metapage checks */
return page;
@@ -2846,17 +2906,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* page level
*/
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && opaque->btpo.level != 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid leaf page level %u for block %u in index \"%s\"",
- opaque->btpo.level, blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "invalid leaf page level %u for block %u in index \"%s\"",
+ opaque->btpo.level, blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) &&
opaque->btpo.level == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid internal page level 0 for block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid internal page level 0 for block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/*
* Sanity checks for number of items on page.
@@ -2882,23 +2940,20 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
*/
maxoffset = PageGetMaxOffsetNumber(page);
if (maxoffset > MaxIndexTuplesPerPage)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("Number of items on block %u of index \"%s\" exceeds MaxIndexTuplesPerPage (%u)",
- blocknum, RelationGetRelationName(state->rel),
- MaxIndexTuplesPerPage)));
+ econfess(ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "Number of items on block %u of index \"%s\" exceeds MaxIndexTuplesPerPage (%u)",
+ blocknum, RelationGetRelationName(state->rel),
+ MaxIndexTuplesPerPage);
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) && maxoffset < P_FIRSTDATAKEY(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal block %u in index \"%s\" lacks high key and/or at least one downlink",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal block %u in index \"%s\" lacks high key and/or at least one downlink",
+ blocknum, RelationGetRelationName(state->rel));
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && !P_RIGHTMOST(opaque) && maxoffset < P_HIKEY)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("non-rightmost leaf block %u in index \"%s\" lacks high key item",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "non-rightmost leaf block %u in index \"%s\" lacks high key item",
+ blocknum, RelationGetRelationName(state->rel));
/*
* In general, internal pages are never marked half-dead, except on
@@ -2910,17 +2965,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* Internal pages should never have garbage items, either.
*/
if (!P_ISLEAF(opaque) && P_ISHALFDEAD(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" is half-dead",
- blocknum, RelationGetRelationName(state->rel)),
- errhint("This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it.")));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" is half-dead "
+ "(This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it)",
+ blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && P_HAS_GARBAGE(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" has garbage items",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" has garbage items",
+ blocknum, RelationGetRelationName(state->rel));
return page;
}
@@ -2971,14 +3024,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
BLCKSZ - sizeof(BTPageOpaqueData))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("line pointer points past end of tuple space in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "line pointer points past end of tuple space in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
/*
* Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree
@@ -2987,14 +3039,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
*/
if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "invalid line pointer storage in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
return itemid;
}
@@ -3016,26 +3067,23 @@ BTreeTupleGetHeapTIDCareful(BtreeCheckState *state, IndexTuple itup,
*/
Assert(state->heapkeyspace);
if (BTreeTupleIsPivot(itup) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
if (!BTreeTupleIsPivot(itup) && !nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
htid = BTreeTupleGetHeapTID(itup);
if (!ItemPointerIsValid(htid) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
return htid;
}
@@ -3066,3 +3114,53 @@ BTreeTupleGetPointsToTID(IndexTuple itup)
/* Pivot tuple returns TID with downlink block (heapkeyspace variant) */
return &itup->t_tid;
}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_btreeam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(BTREECHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == BTREECHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * confess
+ *
+ * Return a message about index corruption
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg)
+{
+ Datum values[BTREECHECK_RELATION_COLS];
+ bool nulls[BTREECHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(blkno);
+ nulls[0] = (blkno == InvalidBlockNumber);
+ values[1] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. That is OK, but leaves no room for leaking all the msg
+ * arguments that are allocated during the scan.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
--
2.21.1 (Apple Git-122.3)
v11-0002-Adding-function-verify_heapam-to-amcheck-module.patchapplication/octet-stream; name=v11-0002-Adding-function-verify_heapam-to-amcheck-module.patch; x-unix-mode=0644Download
From a92f063a0deac08503d7f605207fca60542b325b Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Jul 2020 12:37:48 -0700
Subject: [PATCH v11 2/3] Adding function verify_heapam to amcheck module
Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.
---
contrib/amcheck/Makefile | 5 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 54 +
contrib/amcheck/amcheck.h | 5 +
contrib/amcheck/expected/check_heap.out | 67 +
.../amcheck/expected/disallowed_reltypes.out | 48 +
contrib/amcheck/sql/check_heap.sql | 19 +
contrib/amcheck/sql/disallowed_reltypes.sql | 48 +
contrib/amcheck/t/001_verify_heapam.pl | 94 ++
contrib/amcheck/verify_heapam.c | 1153 +++++++++++++++++
doc/src/sgml/amcheck.sgml | 106 +-
src/backend/access/heap/hio.c | 11 +
11 files changed, 1608 insertions(+), 2 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/001_verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index b288c28fa0..27d38b2e86 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..df418a850b
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,54 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(rel regclass,
+ on_error_stop boolean default false,
+ skip cstring default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
+
+--
+-- verify_btreeam()
+--
+CREATE FUNCTION verify_btreeam(rel regclass,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+CREATE FUNCTION verify_btreeam(rel regclass,
+ on_error_stop boolean,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_btreeam(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION verify_btreeam(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..74edfc2f65
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..4175bb2d37
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,67 @@
+CREATE TABLE heaptest (a integer, b text);
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'rope');
+ERROR: unrecognized parameter for 'skip': rope
+HINT: please choose from 'all-visible', 'all-frozen', or 'none'
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+ERROR: starting block 0 is out of bounds for relation with no blocks
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 100000, endblock := 200000);
+ERROR: block range 100000 .. 200000 is out of bounds for relation with block count 370
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+VACUUM FREEZE heaptest;
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..892ae89652
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..c75f5ff869
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,19 @@
+CREATE TABLE heaptest (a integer, b text);
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'rope');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 100000, endblock := 200000);
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+VACUUM FREEZE heaptest;
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..fc90e6ca33
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
new file mode 100644
index 0000000000..c2d890bcd9
--- /dev/null
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -0,0 +1,94 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 48;
+
+my ($node, $result);
+
+# Check various options are stable (don't abort) when running verify_heapam on
+# the test table. For uncorrupted tables, there isn't anything to check except
+# that it runs without crashing.
+sub check_all_options
+{
+ for my $stop (qw(NULL true false))
+ {
+ for my $skip ("'none'", "'all-frozen'", "'all-visible'")
+ {
+ my $check = "SELECT verify_heapam('test', $stop, $skip)";
+ $result = $node->safe_psql('postgres', "$check; SELECT 1");
+ is ($result, 1, "checked: $check");
+ }
+ }
+}
+
+# Stops the server and writes nulls in the first page of the table,
+# assuming page size is large enough for offset 1000..1016 to be
+# in the midst of the first page of data.
+sub corrupt_first_page
+{
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('test')));
+ my $relpath = "$pgdata/$rel";
+ $node->stop;
+
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 1000, 0);
+ syswrite($fh, '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', 16);
+ close($fh);
+
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Check empty table
+$node->safe_psql('postgres', q(
+ CREATE TABLE test (a integer);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+));
+check_all_options();
+
+# Check table with trivial data
+$node->safe_psql('postgres', q(INSERT INTO test VALUES (0)));
+check_all_options();
+
+# Check table with non-trivial data (more than a page worth) but
+# without any all-frozen or all-visible
+$node->safe_psql('postgres', q(
+INSERT INTO test SELECT generate_series(1,10000)));
+check_all_options();
+
+# Check table with all-visible data
+$node->safe_psql('postgres', q(VACUUM test));
+check_all_options();
+
+# Check table with all-frozen data
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+check_all_options();
+
+# Check table with corruption, no skipping
+corrupt_first_page();
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is($result, 't', 'corruption detected on first page');
+
+# Check table with corruption, skipping all-visible blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all-visible', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all-visible first page');
+
+# Check table with corruption, skipping all-frozen blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all-frozen', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all-frozen first page');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..007241d333
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1153 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "amcheck.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/* The number of columns in tuples returned by verify_heapam */
+#define HEAPCHECK_RELATION_COLS 8
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam execution.
+ */
+typedef struct HeapCheckContext
+{
+ /*
+ * While verifying a table, we check whether any xid we encounter is
+ * either too old or too new. We could naively check that by taking the
+ * XidGenLock each time and reading ShmemVariableCache. We instead cache
+ * the values and rely on the fact that we have the table locked
+ * sufficiently that the oldest xid in the table cannot change
+ * mid-verification, and although the newest xid in the table may advance,
+ * it cannot retreat. As such, whenever we encounter an xid older than
+ * our cached oldest xid, we know it is invalid, and when we encounter an
+ * xid newer than our cached newest xid, we recheck the
+ * ShmemVariableCache.
+ */
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Whether verify_heapam has yet encountered any corrupt tuples */
+ bool is_corrupt;
+
+ /* The descriptor and tuplestore for verify_heapam's result tuples */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext * ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx);
+static bool tuple_is_visible(HeapTupleHeader tuphdr, HeapCheckContext * ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx);
+static bool check_tuple_attribute(HeapCheckContext * ctx);
+static void check_tuple(HeapCheckContext * ctx);
+
+typedef enum SkipPages
+{
+ SKIP_ALL_FROZEN_PAGES,
+ SKIP_ALL_VISIBLE_PAGES,
+ SKIP_PAGES_NONE
+} SkipPages;
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ bool randomAccess;
+ HeapCheckContext ctx;
+ FullTransactionId nextFullXid;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool fatal = false;
+ bool on_error_stop;
+ SkipPages skip_option = SKIP_PAGES_NONE;
+ int64 startblock;
+ int64 endblock;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ relid = PG_GETARG_OID(0);
+ on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+ if (!PG_ARGISNULL(2))
+ {
+ const char *skip = PG_GETARG_CSTRING(2);
+
+ if (pg_strcasecmp(skip, "all-visible") == 0)
+ skip_option = SKIP_ALL_VISIBLE_PAGES;
+ else if (pg_strcasecmp(skip, "all-frozen") == 0)
+ skip_option = SKIP_ALL_FROZEN_PAGES;
+ else if (pg_strcasecmp(skip, "none") == 0)
+ skip_option = SKIP_PAGES_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all-visible', 'all-frozen', or 'none'")));
+ }
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+
+ /* Early exit if the relation is empty */
+ if (!ctx.nblocks)
+ {
+ /*
+ * For consistency, we need to enforce that the startblock and
+ * endblock are within the valid range if the user specified them.
+ * Yet, for an empty table with no blocks, no specified block can be
+ * in range.
+ */
+ if (!PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("starting block " INT64_FORMAT
+ " is out of bounds for relation with no blocks",
+ PG_GETARG_INT64(3))));
+ if (!PG_ARGISNULL(4))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block " INT64_FORMAT
+ " is out of bounds for relation with no blocks",
+ PG_GETARG_INT64(4))));
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+ PG_RETURN_NULL();
+ }
+
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ /* If we get this far, we know the relation has at least one block */
+ startblock = PG_ARGISNULL(3) ? 0 : PG_GETARG_INT64(3);
+ endblock = PG_ARGISNULL(4) ? ((int64) ctx.nblocks) - 1 : PG_GETARG_INT64(4);
+ if (startblock < 0 || endblock >= ctx.nblocks || startblock > endblock)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block range " INT64_FORMAT " .. " INT64_FORMAT
+ " is out of bounds for relation with block count %u",
+ startblock, endblock, ctx.nblocks)));
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toastrel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance.
+ *
+ * If relfrozenxid is normal, it contains the oldest valid xid we may
+ * encounter in the table. If not, the oldest xid for our database is the
+ * oldest we should encounter.
+ *
+ * Bugs in pg_upgrade are reported (see commands/vacuum.c circa line 1572)
+ * to have sometimes rendered the oldest xid value for a database invalid.
+ * It seems unwise to report rows as corrupt for failing to be newer than
+ * a value which itself may be corrupt. We instead use the oldest xid for
+ * the entire cluster, which must be at least as old as the oldest xid for
+ * our database.
+ *
+ * If neither the value for the database nor the xids for any row are
+ * corrupt, then this gives the right answer. If the rows disagree with
+ * the value for the database, how can we know which one is wrong?
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global oldest valid xid %u",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ fatal = true;
+ }
+ else if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relminmxid %u precedes global oldest valid xid %u",
+ ctx.relminmxid, ctx.oldestValidXid));
+ fatal = true;
+ }
+
+ if (fatal)
+ {
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ if (ctx.toastrel)
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldestValidXid = ctx.relfrozenxid;
+
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
+
+ for (ctx.blkno = startblock; ctx.blkno <= endblock; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+ PageHeader ph;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_option != SKIP_PAGES_NONE)
+ {
+ bool all_frozen,
+ all_visible;
+
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE;
+ all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN;
+
+ if ((all_frozen && skip_option == SKIP_ALL_FROZEN_PAGES) ||
+ (all_visible && skip_option == SKIP_ALL_VISIBLE_PAGES))
+ {
+ continue;
+ }
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+ ph = (PageHeader) ctx.page;
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ ctx.tuphdr = NULL;
+ ctx.natts = 0;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it
+ * redirects to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
+ ItemId rditem;
+
+ if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ {
+ confess(&ctx, psprintf(
+ "line pointer redirection to item at offset number %u is outside valid bounds %u .. %u",
+ (unsigned) rdoffnum, (unsigned) FirstOffsetNumber,
+ (unsigned) maxoff));
+ continue;
+ }
+ rditem = PageGetItemId(ctx.page, rdoffnum);
+ if (!ItemIdIsUsed(rditem))
+ confess(&ctx, psprintf(
+ "line pointer redirection to unused item at offset %u",
+ (unsigned) rdoffnum));
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap",
+ RelationGetRelationName(rel))));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext * ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. That's ok, but if we also leak the msg argument memory
+ * until the end of the query, we could exceed workmem by more than a
+ * trivial amount. Therefore, free the msg argument each time we are
+ * called rather than waiting for our current memory context to be freed.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext * ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldestValidXid, xid) &&
+ TransactionIdPrecedes(xid, ctx->nextKnownValidXid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->nextKnownValidXid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * tuple_is_visible
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+tuple_is_visible(HeapTupleHeader tuphdr, HeapCheckContext * ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!TransactionIdValidInRel(xvac, ctx))
+ {
+ confess(ctx, psprintf("old-style VACUUM FULL transaction ID %u is invalid in this relation",
+ xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return true; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return true; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx, pstrdup(
+ "heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor has a valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext * ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequence number is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx, pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ uint32 header = ((varattrib_4b *) chunk)->va_4byte.va_header;
+
+ confess(ctx, psprintf(
+ "corrupt extended toast chunk with sequence number %d has invalid varlena header %0x",
+ curchunk, header));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx, psprintf(
+ "toast chunk sequence number %u not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx, psprintf(
+ "toast chunk sequence number %u exceeds the end chunk sequence number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx, psprintf("toast chunk size %u differs from expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in
+ * the case of a toasted value, continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The checks
+ * performed by Asserts present in those two functions are also performed
+ * here. In cases where those two functions are a bit cavalier in their
+ * assumptions about data being correct, we perform additional checks not
+ * present in either of those two functions. Where some condition is checked
+ * in both of those functions, we perform it here twice, as we parallel the
+ * logical flow of those two functions. The presence of duplicate checks
+ * seems a reasonable price to pay for keeping this code tightly coupled with
+ * the code it protects.
+ *
+ * Returns true if the tuple attribute is sane enough for processing to
+ * continue on to the next attribute, false otherwise.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext * ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask;
+ Form_pg_attribute thisatt;
+
+ infomask = ctx->tuphdr->t_infomask;
+ thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf(
+ "tuple attribute should start at offset %u, but tuple length is only %u",
+ ctx->tuphdr->t_hoff + ctx->offset, ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf(
+ "%s toast at offset %u is unexpected",
+ va_tag == VARTAG_INDIRECT ? "indirect" :
+ va_tag == VARTAG_EXPANDED_RO ? "expanded" :
+ va_tag == VARTAG_EXPANDED_RW ? "expanded" :
+ "unexpected",
+ ctx->tuphdr->t_hoff + ctx->offset));
+ /* We can't know where the next attribute begins */
+ return false;
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf(
+ "tuple attribute of length %u ends at offset %u, but tuple length is only %u",
+ thisatt->attlen, ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx, pstrdup(
+ "attribute is external but tuple header flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx, pstrdup(
+ "attribute is external but relation has no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx, psprintf(
+ "final chunk number %u differs from expected value %u",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx, pstrdup("toasted value missing from toast table"));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext * ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /*
+ * If the line pointer for this tuple does not reserve enough space for a
+ * complete tuple header, we dare not read the tuple header.
+ */
+ if (ctx->lp_len < MAXALIGN(SizeofHeapTupleHeader))
+ {
+ confess(ctx, psprintf(
+ "tuple's %u byte line pointer length is less than the %u byte minimum tuple header size",
+ ctx->lp_len, (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ return;
+ }
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx, psprintf(
+ "tuple xmax %u precedes relminmxid %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf(
+ "tuple xmin %u precedes relfrozenxid %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmin, ctx))
+ {
+ confess(ctx, psprintf(
+ "tuple xmin %u follows last assigned xid %u",
+ xmin, ctx->nextKnownValidXid));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf(
+ "tuple xmax %u precedes relfrozenxid %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmax, ctx))
+ {
+ confess(ctx, psprintf(
+ "tuple xmax %u follows last assigned xid %u",
+ xmax, ctx->nextKnownValidXid));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("tuple's header size is %u bytes which is less than the %u byte minimum valid header size",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx, psprintf(
+ "tuple's %u byte header size exceeds the %u byte length of the entire tuple",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx, psprintf(
+ "tuple's user data offset %u not maximally aligned to %u",
+ ctx->tuphdr->t_hoff, (uint32) MAXALIGN(ctx->tuphdr->t_hoff)));
+ fatal = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ confess(ctx,
+ psprintf("tuple xmax marked incompatibly as keys updated and locked only"));
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ confess(ctx,
+ psprintf("tuple xmax marked incompatibly as committed and as a multitransaction ID"));
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the t_hoff field or the infomask
+ * bit HEAP_HASNULL.
+ *
+ * If the tuple does not have nulls, check that no space has been reserved
+ * for the null bitmap.
+ */
+ if ((infomask & HEAP_HASNULL) &&
+ (ctx->tuphdr->t_hoff != MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts))))
+ {
+ confess(ctx, psprintf(
+ "tuple with null values has user data offset %u rather than the expected offset %u",
+ ctx->tuphdr->t_hoff,
+ (uint32) MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts))));
+ fatal = true;
+ }
+ else if (!(infomask & HEAP_HASNULL) &&
+ (ctx->tuphdr->t_hoff != MAXALIGN(SizeofHeapTupleHeader)))
+ {
+ confess(ctx, psprintf(
+ "tuple without null values has user data offset %u rather than the expected offset %u",
+ ctx->tuphdr->t_hoff,
+ (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!tuple_is_visible(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx, psprintf(
+ "tuple has %u attributes in relation with only %u attributes",
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a9df2c1a9d..b8170bbfdf 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -69,7 +69,7 @@ AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
@@ -165,6 +165,110 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if on_error_stop is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..00de10b7c9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you decide to disable one or more of these
+ * assertions, make corresponding changes to contrib/amcheck.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
--
2.21.1 (Apple Git-122.3)
v11-0003-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v11-0003-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 97126faecea883e072150f0bbecdd323d8a0a646 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 20 Jul 2020 13:05:26 -0700
Subject: [PATCH v11 3/3] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 900 ++++++++++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 55 ++
contrib/pg_amcheck/t/003_check.pl | 85 ++
contrib/pg_amcheck/t/004_verify_heapam.pl | 434 +++++++++++
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pg_amcheck.sgml | 136 ++++
11 files changed, 1653 insertions(+)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 doc/src/sgml/pg_amcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..c21c27cbeb 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..45cd50c217
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,900 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --schema=PATTERN check all relations in the specified schema(s)",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified "
+ "schema(s)",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -i, --check-indexes check associated btree indexes, if any",
+ " -I, --exclude-indexes do NOT check associated btree indexes",
+ " --strict-names require table and/or schema include patterns "
+ "to match at least one entity each",
+ " -b, --startblock check relations beginning at the given "
+ "starting block number",
+ " -e, --endblock check relations only up to the given ending "
+ "block number",
+ " -f, --skip-all-frozen do not check blocks marked as all frozen",
+ " -v, --skip-all-visible do not check blocks marked as all visible",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen "
+ "automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes for tables */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of OIDs of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+
+static void check_tables(SimpleOidList *checklist);
+static void check_table(Oid tbloid);
+static void check_indexes(Oid tbloid);
+static void check_index(Oid tbloid, Oid idxoid);
+
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions * connOpts);
+static void usage(void);
+static void showVersion(void);
+
+static void NoticeProcessor(void *arg, const char *message);
+
+static void expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_table_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(SimpleOidList *include_nsp,
+ SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl,
+ SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+
+static void die_on_query_failure(const char *query);
+static void ExecuteSqlStatement(const char *query);
+static PGresult *ExecuteSqlQuery(const char *query, ExecStatusType status);
+static PGresult *ExecuteSqlQueryForSingleRow(const char *query);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+#define NOPAGER 0
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ /* Default behaviors */
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+ settings.check_indexes = true;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /* Expand schema selection patterns into OID lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+ expand_schema_name_patterns(&schema_exclude_patterns,
+ &schema_exclude_oids,
+ false);
+ /* non-matching exclusion patterns aren't an error */
+
+ /* Expand table selection patterns into OID lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+ expand_table_name_patterns(&table_exclude_patterns,
+ &table_exclude_oids,
+ false);
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ check_table(cell->val);
+ if (settings.check_indexes)
+ check_indexes(cell->val);
+ }
+}
+
+static void
+check_table(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+ char *skip;
+ const char *stop;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (settings.startblock == NULL)
+ settings.startblock = pg_strdup("NULL");
+ if (settings.endblock == NULL)
+ settings.endblock = pg_strdup("NULL");
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("NULL");
+ stop = (settings.on_error_stop) ? "true" : "false";
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(relname=%s,blkno=%s,offnum=%s,lp_off=%s,lp_flags=%s,"
+ "lp_len=%s,attnum=%s,chunk=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* lp_off */
+ PQgetvalue(res, i, 4), /* lp_flags */
+ PQgetvalue(res, i, 5), /* lp_len */
+ PQgetvalue(res, i, 6), /* attnum */
+ PQgetvalue(res, i, 7), /* chunk */
+ PQgetvalue(res, i, 8)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_indexes(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ check_index(tbloid, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_index(Oid tbloid, Oid idxoid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT ct.relname, ci.relname, blkno, msg"
+ "\nFROM verify_btreeam(%u,%s),"
+ "\n pg_catalog.pg_class ci,"
+ "\n pg_catalog.pg_class ct"
+ "\nWHERE ci.oid = %u"
+ "\n AND ct.oid = %u",
+ idxoid,
+ settings.on_error_stop ? "true" : "false",
+ idxoid, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(table=%s,index=%s,blkno=%s)"
+ "\n%s\n",
+ PQgetvalue(res, i, 0), /* table relname */
+ PQgetvalue(res, i, 1), /* index relname */
+ PQgetvalue(res, i, 2), /* index blkno */
+ PQgetvalue(res, i, 3)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions * connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"startblock", required_argument, NULL, 'b'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"host", required_argument, NULL, 'h'},
+ {"check-indexes", no_argument, NULL, 'i'},
+ {"exclude-indexes", no_argument, NULL, 'I'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"schema", required_argument, NULL, 'n'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"port", required_argument, NULL, 'p'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"help", optional_argument, NULL, '?'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "b:d:e:fh:iIn:N:op:st:T:U:vVwW?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ settings.check_indexes = true;
+ break;
+ case 'I':
+ settings.check_indexes = false;
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ FILE *output;
+ int lines;
+ int lineno;
+
+ for (lines = 0; usage_text[lines]; lines++)
+ ;
+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+
+ ClosePager(output);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+static void
+expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list. See also expand_dbname_patterns()
+ * in pg_dumpall.c
+ */
+static void
+expand_table_name_patterns(SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(settings.db, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+static void
+append_csv_oids(PQExpBuffer query, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(query, "%s%u", comma, cell->val);
+}
+
+static bool
+append_filter(PQExpBuffer query, const char *lval, const char *operator,
+ const SimpleOidList *oids)
+{
+ if (!oids->head)
+ return false;
+ appendPQExpBuffer(query, "\nAND %s %s ANY(array[\n", lval, operator);
+ append_csv_oids(query, oids);
+ appendPQExpBuffer(query, "\n])");
+ return true;
+}
+
+static void
+get_table_check_list(SimpleOidList *include_nsp, SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl, SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.=)", include_nsp);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.!=)", exclude_nsp);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.=)", include_tbl);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.!=)", exclude_tbl);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+static void
+ExecuteSqlStatement(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ die_on_query_failure(query);
+ PQclear(res);
+}
+
+static PGresult *
+ExecuteSqlQuery(const char *query, ExecStatusType status)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != status)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute an SQL query and verify that we got exactly one row back.
+ */
+static PGresult *
+ExecuteSqlQueryForSingleRow(const char *query)
+{
+ PGresult *res;
+ int ntups;
+
+ res = ExecuteSqlQuery(query, PGRES_TUPLES_OK);
+
+ /* Expecting a single result only */
+ ntups = PQntuples(res);
+ if (ntups != 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+
+ return res;
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..c63ba4452e
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,55 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..01531e5c77
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,85 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+CREATE SCHEMA s1;
+CREATE SCHEMA s2;
+CREATE SCHEMA s3;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE TABLE s1.t3 (a TEXT);
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE TABLE s2.t3 (a TEXT);
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE TABLE s3.t3 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+CREATE INDEX i3 ON s1.t3(a);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+CREATE INDEX i3 ON s2.t3(a);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+CREATE INDEX i3 ON s3.t3(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-I', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-i', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+;$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1'
+ ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1'
+ ],
+ 'pg_amcheck all objects not in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-i', '-n', 's*', '-t', 't1'
+ ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-I', '-p', $port, 'postgres', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-I', '-p', $port, 'postgres', '-N', 's1', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..58d5ab88cb
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,434 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 48;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 14;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '--check-indexes', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ elsif ($tupidx == 12)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+ }
+ elsif ($tupidx == 13)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin 3 precedes relfrozenxid $relfrozenxid
+0|2|8064|1|58|||tuple xmin 4026531839 precedes relfrozenxid $relfrozenxid
+0|3|8000|1|58|||tuple xmax 4026531839 precedes relfrozenxid $relfrozenxid
+0|4|7936|1|58|||tuple's 152 byte header size exceeds the 58 byte length of the entire tuple
+0|4|7936|1|58|||tuple without null values has user data offset 152 rather than the expected offset 24
+0|5|7872|1|58|||tuple's user data offset 27 not maximally aligned to 32
+0|5|7872|1|58|||tuple without null values has user data offset 27 rather than the expected offset 24
+0|6|7808|1|58|||tuple's header size is 16 bytes which is less than the 23 byte minimum valid header size
+0|6|7808|1|58|||tuple without null values has user data offset 16 rather than the expected offset 24
+0|7|7744|1|58|||tuple's header size is 21 bytes which is less than the 23 byte minimum valid header size
+0|7|7744|1|58|||tuple's user data offset 21 not maximally aligned to 24
+0|7|7744|1|58|||tuple without null values has user data offset 21 rather than the expected offset 24
+0|8|7680|1|58|||tuple has 2047 attributes in relation with only 3 attributes
+0|9|7616|1|58|||tuple with null values has user data offset 24 rather than the expected offset 280
+0|10|7552|1|58|||tuple has 67 attributes in relation with only 3 attributes
+0|11|7488|1|58|1||tuple attribute of length 4294967295 ends at offset 416848000, but tuple length is only 58
+0|12|7424|1|58|2|0|final chunk number 0 differs from expected value 6
+0|12|7424|1|58|2|0|toasted value missing from toast table
+0|13|7360|1|58|||tuple xmax marked incompatibly as keys updated and locked only
+0|14|7296|1|58|||tuple xmax 0 precedes relminmxid 1
+0|14|7296|1|58|||tuple xmax marked incompatibly as committed and as a multitransaction ID",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,lp_off=\d*,lp_flags=\d*,lp_len=\d*,attnum=\d*,chunk=\d*/,
+
+ # individual detected corruptions
+ qr/final chunk number \d+ differs from expected value \d+/,
+ qr/toasted value missing from toast table/,
+ qr/tuple attribute of length \d+ ends at offset \d+, but tuple length is only \d+/,
+ qr/tuple has \d+ attributes in relation with only \d+ attributes/,
+ qr/tuple with null values has user data offset \d+ rather than the expected offset \d+/,
+ qr/tuple without null values has user data offset \d+ rather than the expected offset \d+/,
+ qr/tuple xmax \d+ precedes relfrozenxid \d+/,
+ qr/tuple xmax \d+ precedes relminmxid \d+/,
+ qr/tuple xmax marked incompatibly as committed and as a multitransaction ID/,
+ qr/tuple xmax marked incompatibly as keys updated and locked only/,
+ qr/tuple xmin \d+ precedes relfrozenxid \d+/,
+ qr/tuple's \d+ byte header size exceeds the \d+ byte length of the entire tuple/,
+ qr/tuple's header size is \d+ bytes which is less than the \d+ byte minimum valid header size/,
+ qr/tuple's user data offset \d+ not maximally aligned to \d+/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '--exclude-indexes', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..f606e42fb9 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pg_amcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..10e1ca9663 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pg_amcheck SYSTEM "pg_amcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pg_amcheck.sgml b/doc/src/sgml/pg_amcheck.sgml
new file mode 100644
index 0000000000..a0b9c9d19b
--- /dev/null
+++ b/doc/src/sgml/pg_amcheck.sgml
@@ -0,0 +1,136 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "corrupted",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck -i test
+(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 3289393 is in the future
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmax = 0 precedes relation relminmxid = 1
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 12593 is in the future
+</screen>
+
+ <para>
+ .... many pages of output removed for brevity ....
+ </para>
+
+<screen>
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+tuple xmin = 305 precedes relation relfrozenxid = 487
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff > lp_len (54 > 34)
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff not max-aligned (54)
+</screen>
+
+ <para>
+ Each detected corruption is reported on two lines, the first line shows the
+ location and the second line shows a message describing the problem.
+ </para>
+ </sect2>
+</sect1>
--
2.21.1 (Apple Git-122.3)
Hi Mark,
I think new structures should be listed in src/tools/pgindent/typedefs.list,
otherwise, pgindent might disturb its indentation.
Regards,
Amul
On Tue, Jul 21, 2020 at 2:32 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Show quoted text
On Jul 16, 2020, at 12:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jul 6, 2020 at 2:06 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
The v10 patch without these ideas is here:
Along the lines of what Alvaro was saying before, I think this
definitely needs to be split up into a series of patches. The commit
message for v10 describes it doing three pretty separate things, and I
think that argues for splitting it into a series of three patches. I'd
argue for this ordering:0001 Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it. This is
used by the new pg_amcheck command line tool for reporting back to
the caller.0002 Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.0003 Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.It's too hard to review things like this when it's all mixed together.
The v11 patch series is broken up as you suggest.
+++ b/contrib/amcheck/t/skipping.plThe name of this file is inconsistent with the tree's usual
convention, which is all stuff like 001_whatever.pl, except for
src/test/modules/brin, which randomly decided to use two digits
instead of three. There's no precedent for a test file with no leading
numeric digits. Also, what does "skipping" even have to do with what
the test is checking? Maybe it's intended to refer to the new error
handling "skipping" the actual error in favor of just reporting it
without stopping, but that's not really what the word "skipping"
normally means. Finally, it seems a bit over-engineered: do we really
need 183 test cases to check that detecting a problem doesn't lead to
an abort? Like, if that's the purpose of the test, I'd expect it to
check one corrupt relation and one non-corrupt relation, each with and
without the no-error behavior. And that's about it. Or maybe it's
talking about skipping pages during the checks, because those pages
are all-visible or all-frozen? It's not very clear to me what's going
on here.The "skipping" did originally refer to testing verify_heapam()'s option to skip all-visible or all-frozen blocks. I have renamed it 001_verify_heapam.pl, since it tests that function.
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;Please add explanatory comments indicating what these are intended to
mean.Done.
For most of the the structure members, the brief comments
already present seem sufficient; but here, more explanation looks
necessary and less is provided. The "Values for returning tuples"
could possibly also use some more detail.Ok, I've expanded the comments for these.
+#define HEAPCHECK_RELATION_COLS 8
I think this should really be at the top of the file someplace.
Sometimes people have adopted this style when the #define is only used
within the function that contains it, but that's not the case here.Done.
+ ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("unrecognized parameter for 'skip': %s", skip), + errhint("please choose from 'all visible', 'all frozen', " + "or NULL")));I think it would be better if we had three string values selecting the
different behaviors, and made the parameter NOT NULL but with a
default. It seems like that would be easier to understand. Right now,
I can tell that my options for what to skip are "all visible", "all
frozen", and, uh, some other thing that I don't know what it is. I'm
gonna guess the third option is to skip nothing, but it seems best to
make that explicit. Also, should we maybe consider spelling this
'all-visible' and 'all-frozen' with dashes, instead of using spaces?
Spaces in an option value seems a little icky to me somehow.I've made the options 'all-visible', 'all-frozen', and 'none'. It defaults to 'none'. I did not mark the function as strict, as I think NULL is a reasonable value (and the default) for startblock and endblock.
+ int64 startblock = -1; + int64 endblock = -1; ... + if (!PG_ARGISNULL(3)) + startblock = PG_GETARG_INT64(3); + if (!PG_ARGISNULL(4)) + endblock = PG_GETARG_INT64(4); ... + if (startblock < 0) + startblock = 0; + if (endblock < 0 || endblock > ctx.nblocks) + endblock = ctx.nblocks; + + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)So, the user can specify a negative value explicitly and it will be
treated as the default, and an endblock value that's larger than the
relation size will be treated as the relation size. The way pg_prewarm
does the corresponding checks seems superior: null indicates the
default value, and any non-null value must be within range or you get
an error. Also, you seem to be treating endblock as the first block
that should not be checked, whereas pg_prewarm takes what seems to me
to be the more natural interpretation: the end block is the last block
that IS checked. If you do it this way, then someone who specifies the
same start and end block will check no blocks -- silently, I think.Under that regime, for relations with one block of data, (startblock=0, endblock=0) means "check the zero'th block", and for relations with no blocks of data, specifying any non-null (startblock,endblock) pair raises an exception. I don't like that too much, but I'm happy to defer to precedent. Since you say pg_prewarm works this way (I did not check), I have changed verify_heapam to do likewise.
+ if (skip_all_frozen || skip_all_visible)
Since you can't skip all frozen without skipping all visible, this
test could be simplified. Or you could introduce a three-valued enum
and test that skip_pages != SKIP_PAGES_NONE, which might be even
better.It works now with a three-valued enum.
+ /* We must unlock the page from the prior iteration, if any */ + Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);I don't understand this assertion, and I don't understand the comment,
either. I think ctx.blkno can never be equal to InvalidBlockNumber
because we never set it to anything outside the range of 0..(endblock
- 1), and I think ctx.buffer must always be unequal to InvalidBuffer
because we just initialized it by calling ReadBufferExtended(). So I
think this assertion would still pass if we wrote && rather than ||.
But even then, I don't know what that has to do with the comment or
why it even makes sense to have an assertion for that in the first
place.Yes, it is vestigial. Removed.
+ /* + * Open the relation. We use ShareUpdateExclusive to prevent concurrent + * vacuums from changing the relfrozenxid, relminmxid, or advancing the + * global oldestXid to be newer than those. This protection saves us from + * having to reacquire the locks and recheck those minimums for every + * tuple, which would be expensive. + */ + ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);I don't think we'd need to recheck for every tuple, would we? Just for
cases where there's an apparent violation of the rules.It's a bit fuzzy what an "apparent violation" might be if both ends of the range of valid xids may be moving, and arbitrarily much. It's also not clear how often to recheck, since you'd be dealing with a race condition no matter how often you check. Perhaps the comments shouldn't mention how often you'd have to recheck, since there is no really defensible choice for that. I removed the offending sentence.
I guess that
could still be expensive if there's a lot of them, but needing
ShareUpdateExclusiveLock rather than only AccessShareLock is a little
unfortunate.I welcome strategies that would allow for taking a lesser lock.
It's also unclear to me why this concerns itself with relfrozenxid and
the cluster-wide oldestXid value but not with datfrozenxid. It seems
like if we're going to sanity-check the relfrozenxid against the
cluster-wide value, we ought to also check it against the
database-wide value. Checking neither would also seem like a plausible
choice. But it seems very strange to only check against the
cluster-wide value.If the relation has a normal relfrozenxid, then the oldest valid xid we can encounter in the table is relfrozenxid. Otherwise, each row needs to be compared against some other minimum xid value.
Logically, that other minimum xid value should be the oldest valid xid for the database, which must logically be at least as old as any valid row in the table and no older than the oldest valid xid for the cluster.
Unfortunately, if the comments in commands/vacuum.c circa line 1572 can be believed, and if I am reading them correctly, the stored value for the oldest valid xid in the database has been known to be corrupted by bugs in pg_upgrade. This is awful. If I compare the xid of a row in a table against the oldest xid value for the database, and the xid of the row is older, what can I do? I don't have a principled basis for determining which one of them is wrong.
The logic in verify_heapam is conservative; it makes no guarantees about finding and reporting all corruption, but if it does report a row as corrupt, you can bank on that, bugs in verify_heapam itself not withstanding. I think this is a good choice; a tool with only false negatives is much more useful than one with both false positives and false negatives.
I have added a comment about my reasoning to verify_heapam.c. I'm happy to be convinced of a better strategy for handling this situation.
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber, + "InvalidOffsetNumber increments to FirstOffsetNumber");If you are going to rely on this property, I agree that it is good to
check it. But it would be better to NOT rely on this property, and I
suspect the code can be written quite cleanly without relying on it.
And actually, that's what you did, because you first set ctx.offnum =
InvalidOffsetNumber but then just after that you set ctx.offnum = 0 in
the loop initializer. So AFAICS the first initializer, and the static
assert, are pointless.Ah, right you are. Removed.
+ if (ItemIdIsRedirected(ctx.itemid)) + { + uint16 redirect = ItemIdGetRedirect(ctx.itemid); + if (redirect <= SizeOfPageHeaderData || redirect >= ph->pd_lower) ... + if ((redirect - SizeOfPageHeaderData) % sizeof(uint16))I think that ItemIdGetRedirect() returns an offset, not a byte
position. So the expectation that I would have is that it would be any
integer >= 0 and <= maxoff. Am I confused?I think you are right about it returning an offset, which should be between FirstOffsetNumber and maxoff, inclusive. I have updated the checks.
BTW, it seems like it might
be good to complain if the item to which it points is LP_UNUSED...
AFAIK that shouldn't happen.Thanks for mentioning that. It now checks for that.
+ errmsg("\"%s\" is not a heap AM",
I think the correct wording would be just "is not a heap." The "heap
AM" is the thing in pg_am, not a specific table.Fixed.
+confess(HeapCheckContext * ctx, char *msg) +TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx) +check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)This is what happens when you pgindent without adding all the right
things to typedefs.list first ... or when you don't pgindent and have
odd ideas about how to indent things.Hmm. I don't see the three lines of code you are quoting. Which patch is that from?
+ /* + * In principle, there is nothing to prevent a scan over a large, highly + * corrupted table from using workmem worth of memory building up the + * tuplestore. Don't leak the msg argument memory. + */ + pfree(msg);Maybe change the second sentence to something like: "That should be
OK, else the user can lower work_mem, but we'd better not leak any
additional memory."It may be a little wordy, but I went with
/*
* In principle, there is nothing to prevent a scan over a large, highly
* corrupted table from using workmem worth of memory building up the
* tuplestore. That's ok, but if we also leak the msg argument memory
* until the end of the query, we could exceed workmem by more than a
* trivial amount. Therefore, free the msg argument each time we are
* called rather than waiting for our current memory context to be freed.
*/+/* + * check_tuphdr_xids + * + * Determine whether tuples are visible for verification. Similar to + * HeapTupleSatisfiesVacuum, but with critical differences. + * + * 1) Does not touch hint bits. It seems imprudent to write hint bits + * to a table during a corruption check. + * 2) Only makes a boolean determination of whether verification should + * see the tuple, rather than doing extra work for vacuum-related + * categorization. + * + * The caller should already have checked that xmin and xmax are not out of + * bounds for the relation. + */First, check_tuphdr_xids() doesn't seem like a very good name. If you
have a function with that name and, like this one, it returns Boolean,
what does true mean? What does false mean? Kinda hard to tell. And
also, check the tuple header XIDs *for what*? If you called it, say,
tuple_is_visible(), that would be self-evident.Changed.
Second, consider that we hold at least AccessShareLock on the relation
- actually, ATM we hold ShareUpdateExclusiveLock. Either way, there
cannot be a concurrent modification to the tuple descriptor in
progress. Therefore, I think that only a HEAPTUPLE_DEAD tuple is
potentially using a non-current schema. If the tuple is
HEAPTUPLE_INSERT_IN_PROGRESS, there's either no ADD COLUMN in the
inserting transaction, or that transaction committed before we got our
lock. Similarly if it's HEAPTUPLE_DELETE_IN_PROGRESS or
HEAPTUPLE_RECENTLY_DEAD, the original inserter must've committed
before we got our lock. Or if it's both inserted and deleted in the
same transaction, say, then that transaction committed before we got
our lock or else contains no relevant DDL. IOW, I think you can check
everything but dead tuples here.Ok, I have changed tuple_is_visible to return true rather than false for those other cases.
Capitalization and punctuation for messages complaining about problems
need to be consistent. verify_heapam() has "Invalid redirect line
pointer offset %u out of bounds" which starts with a capital letter,
but check_tuphdr_xids() has "heap tuple with XMAX_IS_MULTI is neither
LOCKED_ONLY nor has a valid xmax" which does not. I vote for lower
case, but in any event it should be the same.I standardized on all lowercase text, though I left embedded symbols and constants such as LOCKED_ONLY alone.
Also,
check_tuphdr_xids() has "tuple xvac = %u invalid" which is either a
debugging leftover or a very unclear complaint.Right. That has been changed to "old-style VACUUM FULL transaction ID %u is invalid in this relation".
I think some real work
needs to be put into the phrasing of these messages so that it's more
clear exactly what is going on and why it's bad. For example the first
example in this paragraph is clearly a problem of some kind, but it's
not very clear exactly what is happening: is %u the offset of the
invalid line redirect or the value to which it points? I don't think
the phrasing is very grammatical, which makes it hard to tell which is
meant, and I actually think it would be a good idea to include both
things.Beware that every row returned from amcheck has more fields than just the error message.
blkno OUT bigint,
offnum OUT integer,
lp_off OUT smallint,
lp_flags OUT smallint,
lp_len OUT smallint,
attnum OUT integer,
chunk OUT integer,
msg OUT textRather than including blkno, offnum, lp_off, lp_flags, lp_len, attnum, or chunk in the message, it would be better to remove these things from messages that include them. For the specific message under consideration, I've converted the text to "line pointer redirection to item at offset number %u is outside valid bounds %u .. %u". That avoids duplicating the offset information of the referring item, while reporting to offset of the referred item.
Project policy is generally against splitting a string across multiple
lines to fit within 80 characters. We like to fit within 80
characters, but we like to be able to grep for strings more, and
breaking them up like this makes that harder.Thanks for clarifying the project policy. I joined these message strings back together.
+ confess(ctx, + pstrdup("corrupt toast chunk va_header"));This is another message that I don't think is very clear. There's two
elements to that. One is that the phrasing is not very good, and the
other is that there are no % escapesChanged to "corrupt extended toast chunk with sequence number %d has invalid varlena header %0x". I think all the other information about where the corruption was found is already present in the other returned columns.
What's somebody going to do when
they see this message? First, they're probably going to have to look
at the code to figure out in which circumstances it gets generated;
that's a sign that the message isn't phrased clearly enough. That will
tell them that an unexpected bit pattern has been found, but not what
that unexpected bit pattern actually was. So then, they're going to
have to try to find the relevant va_header by some other means and
fish out the relevant bit so that they can see what actually went
wrong.Right.
+ * Checks the current attribute as tracked in ctx for corruption. Records + * any corruption found in ctx->corruption. + * + *Extra blank line.
Fixed.
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), + ctx->attnum);Maybe you could avoid the line wrap by declaring this without
initializing it, and then initializing it as a separate statement.Yes, I like that better. I did not need to do the same with infomask, but it looks better to me to break the declaration and initialization for both, so I did that.
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)", + ctx->tuphdr->t_hoff, ctx->offset, + ctx->lp_len));Uggh! This isn't even remotely an English sentence. I don't think
formulas are the way to go here, but I like the idea of formulas in
some places and written-out messages in others even less. I guess the
complaint here in English is something like "tuple attribute %d should
start at offset %u, but tuple length is only %u" or something of that
sort. Also, it seems like this complaint really ought to have been
reported on the *preceding* loop iteration, either complaining that
(1) the fixed length attribute is more than the number of remaining
bytes in the tuple or (2) the varlena header for the tuple specifies
an excessively high length. It seems like you're blaming the wrong
attribute for the problem.Yeah, and it wouldn't complain if the final attribute of a tuple was overlong, as there wouldn't be a next attribute to blame it on. I've changed it to report as you suggest, although it also still complains if the first attribute starts outside the bounds of the tuple. The two error messages now read as "tuple attribute should start at offset %u, but tuple length is only %u" and "tuple attribute of length %u ends at offset %u, but tuple length is only %u".
BTW, the header comments for this function (check_tuple_attribute)
neglect to document the meaning of the return value.Fixed.
+ confess(ctx, psprintf("tuple xmax = %u precedes relation " + "relfrozenxid = %u",This is another example of these messages needing work. The
corresponding message from heap_prepare_freeze_tuple() is "found
update xid %u from before relfrozenxid %u". That's better, because we
don't normally include equals signs in our messages like this, and
also because "relation relfrozenxid" is redundant. I think this should
say something like "tuple xmax %u precedes relfrozenxid %u".+ confess(ctx, psprintf("tuple xmax = %u is in the future", + xmax));And then this could be something like "tuple xmax %u follows
last-assigned xid %u". That would be more symmetric and more
informative.Both of these have been changed.
+ if (SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) >
ctx->tuphdr->t_hoff)I think we should be able to predict the exact value of t_hoff and
complain if it isn't precisely equal to the expected value. Or is that
not possible for some reason?That is possible, and I've updated the error message to match. There are cases where you can't know if the HEAP_HASNULL bit is wrong or if the t_hoff value is wrong, but I've changed the code to just compute the length based on the HEAP_HASNULL setting and use that as the expected value, and complain when the actual value does not match the expected. That sidesteps the problem of not knowing exactly which value to blame.
Is there some place that's checking that lp_len >=
SizeOfHeapTupleHeader before check_tuple() goes and starts poking into
the header? If not, there should be.Good catch. check_tuple() now does that before reading the header.
+$node->command_ok(
+ [ + 'pg_amcheck', '-p', $port, 'postgres' + ], + 'pg_amcheck all schemas and tables implicitly'); + +$node->command_ok( + [ + 'pg_amcheck', '-i', '-p', $port, 'postgres' + ], + 'pg_amcheck all schemas, tables and indexes');I haven't really looked through the btree-checking and pg_amcheck
parts of this much yet, but this caught my eye. Why would the default
be to check tables but not indexes? I think the default ought to be to
check everything we know how to check.I have changed the default to match your expectations.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Jul 21, 2020 at 10:58 AM Amul Sul <sulamul@gmail.com> wrote:
Hi Mark,
I think new structures should be listed in src/tools/pgindent/typedefs.list,
otherwise, pgindent might disturb its indentation.Regards,
AmulOn Tue, Jul 21, 2020 at 2:32 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:On Jul 16, 2020, at 12:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jul 6, 2020 at 2:06 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
The v10 patch without these ideas is here:
Along the lines of what Alvaro was saying before, I think this
definitely needs to be split up into a series of patches. The commit
message for v10 describes it doing three pretty separate things, and I
think that argues for splitting it into a series of three patches. I'd
argue for this ordering:0001 Refactoring existing amcheck btree checking functions to optionally
return corruption information rather than ereport'ing it. This is
used by the new pg_amcheck command line tool for reporting back to
the caller.0002 Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.0003 Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.It's too hard to review things like this when it's all mixed together.
The v11 patch series is broken up as you suggest.
+++ b/contrib/amcheck/t/skipping.plThe name of this file is inconsistent with the tree's usual
convention, which is all stuff like 001_whatever.pl, except for
src/test/modules/brin, which randomly decided to use two digits
instead of three. There's no precedent for a test file with no leading
numeric digits. Also, what does "skipping" even have to do with what
the test is checking? Maybe it's intended to refer to the new error
handling "skipping" the actual error in favor of just reporting it
without stopping, but that's not really what the word "skipping"
normally means. Finally, it seems a bit over-engineered: do we really
need 183 test cases to check that detecting a problem doesn't lead to
an abort? Like, if that's the purpose of the test, I'd expect it to
check one corrupt relation and one non-corrupt relation, each with and
without the no-error behavior. And that's about it. Or maybe it's
talking about skipping pages during the checks, because those pages
are all-visible or all-frozen? It's not very clear to me what's going
on here.The "skipping" did originally refer to testing verify_heapam()'s option to skip all-visible or all-frozen blocks. I have renamed it 001_verify_heapam.pl, since it tests that function.
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;Please add explanatory comments indicating what these are intended to
mean.Done.
For most of the the structure members, the brief comments
already present seem sufficient; but here, more explanation looks
necessary and less is provided. The "Values for returning tuples"
could possibly also use some more detail.Ok, I've expanded the comments for these.
+#define HEAPCHECK_RELATION_COLS 8
I think this should really be at the top of the file someplace.
Sometimes people have adopted this style when the #define is only used
within the function that contains it, but that's not the case here.Done.
+ ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("unrecognized parameter for 'skip': %s", skip), + errhint("please choose from 'all visible', 'all frozen', " + "or NULL")));I think it would be better if we had three string values selecting the
different behaviors, and made the parameter NOT NULL but with a
default. It seems like that would be easier to understand. Right now,
I can tell that my options for what to skip are "all visible", "all
frozen", and, uh, some other thing that I don't know what it is. I'm
gonna guess the third option is to skip nothing, but it seems best to
make that explicit. Also, should we maybe consider spelling this
'all-visible' and 'all-frozen' with dashes, instead of using spaces?
Spaces in an option value seems a little icky to me somehow.I've made the options 'all-visible', 'all-frozen', and 'none'. It defaults to 'none'. I did not mark the function as strict, as I think NULL is a reasonable value (and the default) for startblock and endblock.
+ int64 startblock = -1; + int64 endblock = -1; ... + if (!PG_ARGISNULL(3)) + startblock = PG_GETARG_INT64(3); + if (!PG_ARGISNULL(4)) + endblock = PG_GETARG_INT64(4); ... + if (startblock < 0) + startblock = 0; + if (endblock < 0 || endblock > ctx.nblocks) + endblock = ctx.nblocks; + + for (ctx.blkno = startblock; ctx.blkno < endblock; ctx.blkno++)So, the user can specify a negative value explicitly and it will be
treated as the default, and an endblock value that's larger than the
relation size will be treated as the relation size. The way pg_prewarm
does the corresponding checks seems superior: null indicates the
default value, and any non-null value must be within range or you get
an error. Also, you seem to be treating endblock as the first block
that should not be checked, whereas pg_prewarm takes what seems to me
to be the more natural interpretation: the end block is the last block
that IS checked. If you do it this way, then someone who specifies the
same start and end block will check no blocks -- silently, I think.Under that regime, for relations with one block of data, (startblock=0, endblock=0) means "check the zero'th block", and for relations with no blocks of data, specifying any non-null (startblock,endblock) pair raises an exception. I don't like that too much, but I'm happy to defer to precedent. Since you say pg_prewarm works this way (I did not check), I have changed verify_heapam to do likewise.
+ if (skip_all_frozen || skip_all_visible)
Since you can't skip all frozen without skipping all visible, this
test could be simplified. Or you could introduce a three-valued enum
and test that skip_pages != SKIP_PAGES_NONE, which might be even
better.It works now with a three-valued enum.
+ /* We must unlock the page from the prior iteration, if any */ + Assert(ctx.blkno == InvalidBlockNumber || ctx.buffer != InvalidBuffer);I don't understand this assertion, and I don't understand the comment,
either. I think ctx.blkno can never be equal to InvalidBlockNumber
because we never set it to anything outside the range of 0..(endblock
- 1), and I think ctx.buffer must always be unequal to InvalidBuffer
because we just initialized it by calling ReadBufferExtended(). So I
think this assertion would still pass if we wrote && rather than ||.
But even then, I don't know what that has to do with the comment or
why it even makes sense to have an assertion for that in the first
place.Yes, it is vestigial. Removed.
+ /* + * Open the relation. We use ShareUpdateExclusive to prevent concurrent + * vacuums from changing the relfrozenxid, relminmxid, or advancing the + * global oldestXid to be newer than those. This protection saves us from + * having to reacquire the locks and recheck those minimums for every + * tuple, which would be expensive. + */ + ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);I don't think we'd need to recheck for every tuple, would we? Just for
cases where there's an apparent violation of the rules.It's a bit fuzzy what an "apparent violation" might be if both ends of the range of valid xids may be moving, and arbitrarily much. It's also not clear how often to recheck, since you'd be dealing with a race condition no matter how often you check. Perhaps the comments shouldn't mention how often you'd have to recheck, since there is no really defensible choice for that. I removed the offending sentence.
I guess that
could still be expensive if there's a lot of them, but needing
ShareUpdateExclusiveLock rather than only AccessShareLock is a little
unfortunate.I welcome strategies that would allow for taking a lesser lock.
It's also unclear to me why this concerns itself with relfrozenxid and
the cluster-wide oldestXid value but not with datfrozenxid. It seems
like if we're going to sanity-check the relfrozenxid against the
cluster-wide value, we ought to also check it against the
database-wide value. Checking neither would also seem like a plausible
choice. But it seems very strange to only check against the
cluster-wide value.If the relation has a normal relfrozenxid, then the oldest valid xid we can encounter in the table is relfrozenxid. Otherwise, each row needs to be compared against some other minimum xid value.
Logically, that other minimum xid value should be the oldest valid xid for the database, which must logically be at least as old as any valid row in the table and no older than the oldest valid xid for the cluster.
Unfortunately, if the comments in commands/vacuum.c circa line 1572 can be believed, and if I am reading them correctly, the stored value for the oldest valid xid in the database has been known to be corrupted by bugs in pg_upgrade. This is awful. If I compare the xid of a row in a table against the oldest xid value for the database, and the xid of the row is older, what can I do? I don't have a principled basis for determining which one of them is wrong.
The logic in verify_heapam is conservative; it makes no guarantees about finding and reporting all corruption, but if it does report a row as corrupt, you can bank on that, bugs in verify_heapam itself not withstanding. I think this is a good choice; a tool with only false negatives is much more useful than one with both false positives and false negatives.
I have added a comment about my reasoning to verify_heapam.c. I'm happy to be convinced of a better strategy for handling this situation.
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber, + "InvalidOffsetNumber increments to FirstOffsetNumber");If you are going to rely on this property, I agree that it is good to
check it. But it would be better to NOT rely on this property, and I
suspect the code can be written quite cleanly without relying on it.
And actually, that's what you did, because you first set ctx.offnum =
InvalidOffsetNumber but then just after that you set ctx.offnum = 0 in
the loop initializer. So AFAICS the first initializer, and the static
assert, are pointless.Ah, right you are. Removed.
+ if (ItemIdIsRedirected(ctx.itemid)) + { + uint16 redirect = ItemIdGetRedirect(ctx.itemid); + if (redirect <= SizeOfPageHeaderData || redirect >= ph->pd_lower) ... + if ((redirect - SizeOfPageHeaderData) % sizeof(uint16))I think that ItemIdGetRedirect() returns an offset, not a byte
position. So the expectation that I would have is that it would be any
integer >= 0 and <= maxoff. Am I confused?I think you are right about it returning an offset, which should be between FirstOffsetNumber and maxoff, inclusive. I have updated the checks.
BTW, it seems like it might
be good to complain if the item to which it points is LP_UNUSED...
AFAIK that shouldn't happen.Thanks for mentioning that. It now checks for that.
+ errmsg("\"%s\" is not a heap AM",
I think the correct wording would be just "is not a heap." The "heap
AM" is the thing in pg_am, not a specific table.Fixed.
+confess(HeapCheckContext * ctx, char *msg) +TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx) +check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)This is what happens when you pgindent without adding all the right
things to typedefs.list first ... or when you don't pgindent and have
odd ideas about how to indent things.Hmm. I don't see the three lines of code you are quoting. Which patch is that from?
+ /* + * In principle, there is nothing to prevent a scan over a large, highly + * corrupted table from using workmem worth of memory building up the + * tuplestore. Don't leak the msg argument memory. + */ + pfree(msg);Maybe change the second sentence to something like: "That should be
OK, else the user can lower work_mem, but we'd better not leak any
additional memory."It may be a little wordy, but I went with
/*
* In principle, there is nothing to prevent a scan over a large, highly
* corrupted table from using workmem worth of memory building up the
* tuplestore. That's ok, but if we also leak the msg argument memory
* until the end of the query, we could exceed workmem by more than a
* trivial amount. Therefore, free the msg argument each time we are
* called rather than waiting for our current memory context to be freed.
*/+/* + * check_tuphdr_xids + * + * Determine whether tuples are visible for verification. Similar to + * HeapTupleSatisfiesVacuum, but with critical differences. + * + * 1) Does not touch hint bits. It seems imprudent to write hint bits + * to a table during a corruption check. + * 2) Only makes a boolean determination of whether verification should + * see the tuple, rather than doing extra work for vacuum-related + * categorization. + * + * The caller should already have checked that xmin and xmax are not out of + * bounds for the relation. + */First, check_tuphdr_xids() doesn't seem like a very good name. If you
have a function with that name and, like this one, it returns Boolean,
what does true mean? What does false mean? Kinda hard to tell. And
also, check the tuple header XIDs *for what*? If you called it, say,
tuple_is_visible(), that would be self-evident.Changed.
Second, consider that we hold at least AccessShareLock on the relation
- actually, ATM we hold ShareUpdateExclusiveLock. Either way, there
cannot be a concurrent modification to the tuple descriptor in
progress. Therefore, I think that only a HEAPTUPLE_DEAD tuple is
potentially using a non-current schema. If the tuple is
HEAPTUPLE_INSERT_IN_PROGRESS, there's either no ADD COLUMN in the
inserting transaction, or that transaction committed before we got our
lock. Similarly if it's HEAPTUPLE_DELETE_IN_PROGRESS or
HEAPTUPLE_RECENTLY_DEAD, the original inserter must've committed
before we got our lock. Or if it's both inserted and deleted in the
same transaction, say, then that transaction committed before we got
our lock or else contains no relevant DDL. IOW, I think you can check
everything but dead tuples here.Ok, I have changed tuple_is_visible to return true rather than false for those other cases.
Capitalization and punctuation for messages complaining about problems
need to be consistent. verify_heapam() has "Invalid redirect line
pointer offset %u out of bounds" which starts with a capital letter,
but check_tuphdr_xids() has "heap tuple with XMAX_IS_MULTI is neither
LOCKED_ONLY nor has a valid xmax" which does not. I vote for lower
case, but in any event it should be the same.I standardized on all lowercase text, though I left embedded symbols and constants such as LOCKED_ONLY alone.
Also,
check_tuphdr_xids() has "tuple xvac = %u invalid" which is either a
debugging leftover or a very unclear complaint.Right. That has been changed to "old-style VACUUM FULL transaction ID %u is invalid in this relation".
I think some real work
needs to be put into the phrasing of these messages so that it's more
clear exactly what is going on and why it's bad. For example the first
example in this paragraph is clearly a problem of some kind, but it's
not very clear exactly what is happening: is %u the offset of the
invalid line redirect or the value to which it points? I don't think
the phrasing is very grammatical, which makes it hard to tell which is
meant, and I actually think it would be a good idea to include both
things.Beware that every row returned from amcheck has more fields than just the error message.
blkno OUT bigint,
offnum OUT integer,
lp_off OUT smallint,
lp_flags OUT smallint,
lp_len OUT smallint,
attnum OUT integer,
chunk OUT integer,
msg OUT textRather than including blkno, offnum, lp_off, lp_flags, lp_len, attnum, or chunk in the message, it would be better to remove these things from messages that include them. For the specific message under consideration, I've converted the text to "line pointer redirection to item at offset number %u is outside valid bounds %u .. %u". That avoids duplicating the offset information of the referring item, while reporting to offset of the referred item.
Project policy is generally against splitting a string across multiple
lines to fit within 80 characters. We like to fit within 80
characters, but we like to be able to grep for strings more, and
breaking them up like this makes that harder.Thanks for clarifying the project policy. I joined these message strings back together.
In v11-0001 and v11-0002 patches, there are still a few more errmsg that need to
be joined.
e.g:
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot "
+ "accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed "
+ "in this context")));
Show quoted text
+ confess(ctx, + pstrdup("corrupt toast chunk va_header"));This is another message that I don't think is very clear. There's two
elements to that. One is that the phrasing is not very good, and the
other is that there are no % escapesChanged to "corrupt extended toast chunk with sequence number %d has invalid varlena header %0x". I think all the other information about where the corruption was found is already present in the other returned columns.
What's somebody going to do when
they see this message? First, they're probably going to have to look
at the code to figure out in which circumstances it gets generated;
that's a sign that the message isn't phrased clearly enough. That will
tell them that an unexpected bit pattern has been found, but not what
that unexpected bit pattern actually was. So then, they're going to
have to try to find the relevant va_header by some other means and
fish out the relevant bit so that they can see what actually went
wrong.Right.
+ * Checks the current attribute as tracked in ctx for corruption. Records + * any corruption found in ctx->corruption. + * + *Extra blank line.
Fixed.
+ Form_pg_attribute thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), + ctx->attnum);Maybe you could avoid the line wrap by declaring this without
initializing it, and then initializing it as a separate statement.Yes, I like that better. I did not need to do the same with infomask, but it looks better to me to break the declaration and initialization for both, so I did that.
+ confess(ctx, psprintf("t_hoff + offset > lp_len (%u + %u > %u)", + ctx->tuphdr->t_hoff, ctx->offset, + ctx->lp_len));Uggh! This isn't even remotely an English sentence. I don't think
formulas are the way to go here, but I like the idea of formulas in
some places and written-out messages in others even less. I guess the
complaint here in English is something like "tuple attribute %d should
start at offset %u, but tuple length is only %u" or something of that
sort. Also, it seems like this complaint really ought to have been
reported on the *preceding* loop iteration, either complaining that
(1) the fixed length attribute is more than the number of remaining
bytes in the tuple or (2) the varlena header for the tuple specifies
an excessively high length. It seems like you're blaming the wrong
attribute for the problem.Yeah, and it wouldn't complain if the final attribute of a tuple was overlong, as there wouldn't be a next attribute to blame it on. I've changed it to report as you suggest, although it also still complains if the first attribute starts outside the bounds of the tuple. The two error messages now read as "tuple attribute should start at offset %u, but tuple length is only %u" and "tuple attribute of length %u ends at offset %u, but tuple length is only %u".
BTW, the header comments for this function (check_tuple_attribute)
neglect to document the meaning of the return value.Fixed.
+ confess(ctx, psprintf("tuple xmax = %u precedes relation " + "relfrozenxid = %u",This is another example of these messages needing work. The
corresponding message from heap_prepare_freeze_tuple() is "found
update xid %u from before relfrozenxid %u". That's better, because we
don't normally include equals signs in our messages like this, and
also because "relation relfrozenxid" is redundant. I think this should
say something like "tuple xmax %u precedes relfrozenxid %u".+ confess(ctx, psprintf("tuple xmax = %u is in the future", + xmax));And then this could be something like "tuple xmax %u follows
last-assigned xid %u". That would be more symmetric and more
informative.Both of these have been changed.
+ if (SizeofHeapTupleHeader + BITMAPLEN(ctx->natts) >
ctx->tuphdr->t_hoff)I think we should be able to predict the exact value of t_hoff and
complain if it isn't precisely equal to the expected value. Or is that
not possible for some reason?That is possible, and I've updated the error message to match. There are cases where you can't know if the HEAP_HASNULL bit is wrong or if the t_hoff value is wrong, but I've changed the code to just compute the length based on the HEAP_HASNULL setting and use that as the expected value, and complain when the actual value does not match the expected. That sidesteps the problem of not knowing exactly which value to blame.
Is there some place that's checking that lp_len >=
SizeOfHeapTupleHeader before check_tuple() goes and starts poking into
the header? If not, there should be.Good catch. check_tuple() now does that before reading the header.
+$node->command_ok(
+ [ + 'pg_amcheck', '-p', $port, 'postgres' + ], + 'pg_amcheck all schemas and tables implicitly'); + +$node->command_ok( + [ + 'pg_amcheck', '-i', '-p', $port, 'postgres' + ], + 'pg_amcheck all schemas, tables and indexes');I haven't really looked through the btree-checking and pg_amcheck
parts of this much yet, but this caught my eye. Why would the default
be to check tables but not indexes? I think the default ought to be to
check everything we know how to check.I have changed the default to match your expectations.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jul 20, 2020, at 11:50 PM, Amul Sul <sulamul@gmail.com> wrote:
On Tue, Jul 21, 2020 at 10:58 AM Amul Sul <sulamul@gmail.com> wrote:
Hi Mark,
I think new structures should be listed in src/tools/pgindent/typedefs.list,
otherwise, pgindent might disturb its indentation.
<snip>
In v11-0001 and v11-0002 patches, there are still a few more errmsg that need to
be joined.e.g:
+ /* check to see if caller supports us returning a tuplestore */ + if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo)) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("set-valued function called in context that cannot " + "accept a set"))); + if (!(rsinfo->allowedModes & SFRM_Materialize)) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("materialize mode required, but it is not allowed " + "in this context")));
Thanks for the review!
I believe these v12 patches resolve the two issues you raised.
Attachments:
v12-0001-Adding-function-verify_btreeam-and-bumping-versi.patchapplication/octet-stream; name=v12-0001-Adding-function-verify_btreeam-and-bumping-versi.patch; x-unix-mode=0644Download
From a1204bfad60d3b692b200ef5824b9a7600af075b Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 21 Jul 2020 08:00:04 -0700
Subject: [PATCH v12 1/3] Adding function verify_btreeam and bumping version
For most errors found while verifying a btree index, the new
function verify_btreeam returns one row per error containing the
block number where the error was discovered and an error message
describing the problem. The pre-existing behavior for functions
bt_index_parent_check and bt_index_check is unchanged.
---
contrib/amcheck/Makefile | 2 +-
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/expected/check_btree.out | 35 +
contrib/amcheck/sql/check_btree.sql | 13 +
contrib/amcheck/verify_nbtree.c | 834 +++++++++++++----------
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 516 insertions(+), 371 deletions(-)
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..b288c28fa0 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -6,7 +6,7 @@ OBJS = \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
REGRESS = check check_btree
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index f82f48d23b..7297abb577 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -21,6 +21,8 @@ SELECT bt_index_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_check
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_parent_check
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
-- to run this cluster-wide with a restricted account, and as tested
@@ -29,6 +31,7 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -42,17 +45,23 @@ SELECT bt_index_parent_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
ERROR: "bttest_a" is not an index
SELECT bt_index_parent_check('bttest_a');
ERROR: "bttest_a" is not an index
+SELECT * FROM verify_btreeam('bttest_a');
+ERROR: "bttest_a" is not an index
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
ERROR: could not open relation with OID 17
SELECT bt_index_parent_check(17);
ERROR: could not open relation with OID 17
+SELECT * FROM verify_btreeam(17);
+ERROR: could not open relation with OID 17
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
@@ -60,6 +69,12 @@ SELECT bt_index_parent_check('bttest_a_brin_idx');
ERROR: only B-Tree indexes are supported as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
ROLLBACK;
+BEGIN;
+CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ERROR: only B-Tree indexes are supported as targets for verification
+DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
+ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -67,6 +82,11 @@ SELECT bt_index_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
bt_index_check
@@ -93,6 +113,11 @@ SELECT bt_index_parent_check('bttest_b_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -118,6 +143,11 @@ SELECT bt_index_check('bttest_multi_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
bt_index_parent_check
@@ -134,6 +164,11 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
-- checks
diff --git a/contrib/amcheck/sql/check_btree.sql b/contrib/amcheck/sql/check_btree.sql
index a1fef644cb..816ca9d033 100644
--- a/contrib/amcheck/sql/check_btree.sql
+++ b/contrib/amcheck/sql/check_btree.sql
@@ -24,6 +24,7 @@ CREATE ROLE regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx'::regclass);
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
@@ -33,27 +34,36 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
SELECT bt_index_parent_check('bttest_a');
+SELECT * FROM verify_btreeam('bttest_a');
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
SELECT bt_index_parent_check(17);
+SELECT * FROM verify_btreeam(17);
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
ROLLBACK;
+BEGIN;
+CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
SELECT bt_index_parent_check('bttest_b_idx', true);
@@ -61,6 +71,7 @@ SELECT bt_index_parent_check('bttest_b_idx', true);
BEGIN;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_b_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -74,6 +85,7 @@ SELECT bt_index_check('bttest_a_idx', true);
-- normal check outside of xact for index with included columns
SELECT bt_index_check('bttest_multi_idx');
+SELECT * FROM verify_btreeam('bttest_multi_idx');
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
@@ -81,6 +93,7 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
TRUNCATE bttest_multi;
INSERT INTO bttest_multi SELECT i, i%2 FROM generate_series(1, 100000) as i;
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
+SELECT * FROM verify_btreeam('bttest_multi_idx');
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index e4d501a85d..ea70fc41a9 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -32,16 +32,22 @@
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "commands/tablecmds.h"
+#include "funcapi.h"
#include "lib/bloomfilter.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
#include "storage/smgr.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
-
+#include "amcheck.h"
PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(bt_index_check);
+PG_FUNCTION_INFO_V1(bt_index_parent_check);
+PG_FUNCTION_INFO_V1(verify_btreeam);
+
/*
* A B-Tree cannot possibly have this many levels, since there must be one
* block per level, which is bound by the range of BlockNumber:
@@ -50,6 +56,20 @@ PG_MODULE_MAGIC;
#define BTreeTupleGetNKeyAtts(itup, rel) \
Min(IndexRelationGetNumberOfKeyAttributes(rel), BTreeTupleGetNAtts(itup, rel))
+/*
+ * Context for use within verify_btreeam()
+ */
+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;
+
+#define CONTINUE_CHECKING(ctx) \
+ (ctx == NULL || !((ctx)->is_corrupt && (ctx)->on_error_stop))
+
/*
* State associated with verifying a B-Tree index
*
@@ -116,6 +136,9 @@ typedef struct BtreeCheckState
bloom_filter *filter;
/* Debug counter */
int64 heaptuplespresent;
+
+ /* Error reporting context */
+ BtreeCheckContext *ctx;
} BtreeCheckState;
/*
@@ -133,28 +156,28 @@ typedef struct BtreeLevel
bool istruerootlevel;
} BtreeLevel;
-PG_FUNCTION_INFO_V1(bt_index_check);
-PG_FUNCTION_INFO_V1(bt_index_parent_check);
-
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);
static inline void btree_index_checkable(Relation rel);
static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
- bool rootdescend);
+ bool rootdescend, BtreeCheckContext * ctx);
static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
- BtreeLevel level);
-static void bt_target_page_check(BtreeCheckState *state);
-static BTScanInsert bt_right_page_check_scankey(BtreeCheckState *state);
+ BtreeLevel level, BtreeCheckContext * ctx);
+static void bt_target_page_check(BtreeCheckState *state, BtreeCheckContext * ctx);
+static BTScanInsert bt_right_page_check_scankey(BtreeCheckState *state, BtreeCheckContext * ctx);
static void bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
- OffsetNumber downlinkoffnum);
+ OffsetNumber downlinkoffnum, BtreeCheckContext * ctx);
static void bt_child_highkey_check(BtreeCheckState *state,
OffsetNumber target_downlinkoffnum,
Page loaded_child,
- uint32 target_level);
+ uint32 target_level,
+ BtreeCheckContext * ctx);
static void bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
- BlockNumber targetblock, Page target);
+ BlockNumber targetblock, Page target,
+ BtreeCheckContext * ctx);
static void bt_tuple_present_callback(Relation index, ItemPointer tid,
Datum *values, bool *isnull,
bool tupleIsAlive, void *checkstate);
@@ -176,7 +199,7 @@ static inline bool invariant_l_nontarget_offset(BtreeCheckState *state,
BlockNumber nontargetblock,
Page nontarget,
OffsetNumber upperbound);
-static Page palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum);
+static Page palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum, BtreeCheckContext * ctx);
static inline BTScanInsert bt_mkscankey_pivotsearch(Relation rel,
IndexTuple itup);
static ItemId PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block,
@@ -185,6 +208,26 @@ static inline ItemPointer BTreeTupleGetHeapTIDCareful(BtreeCheckState *state,
IndexTuple itup, bool nonpivot);
static inline ItemPointer BTreeTupleGetPointsToTID(IndexTuple itup);
+static TupleDesc verify_btreeam_tupdesc(void);
+static void confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg);
+
+/*
+ * Macro for either calling ereport(...) or confess(...) depending on whether
+ * a context for returning the error message exists. Prior to version 1.3,
+ * all functions reported any detected corruption via ereport, but starting in
+ * 1.3, the new function verify_btreeam reports detected corruption back to
+ * the caller as a set of rows, and pre-existing functions continue to report
+ * corruption via ereport. This macro allows the shared implementation to
+ * to do the right thing depending on context.
+ */
+#define econfess(ctx, blkno, code, ...) \
+ do { \
+ if (ctx) \
+ confess(ctx, blkno, psprintf(__VA_ARGS__)); \
+ else \
+ ereport(ERROR, (errcode(code), errmsg(__VA_ARGS__))); \
+ } while(0)
+
/*
* bt_index_check(index regclass, heapallindexed boolean)
*
@@ -203,7 +246,7 @@ bt_index_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 2)
heapallindexed = PG_GETARG_BOOL(1);
- bt_index_check_internal(indrelid, false, heapallindexed, false);
+ bt_index_check_internal(indrelid, false, heapallindexed, false, NULL);
PG_RETURN_VOID();
}
@@ -229,17 +272,64 @@ bt_index_parent_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 3)
rootdescend = PG_GETARG_BOOL(2);
- bt_index_check_internal(indrelid, true, heapallindexed, rootdescend);
+ bt_index_check_internal(indrelid, true, heapallindexed, rootdescend, NULL);
PG_RETURN_VOID();
}
+Datum
+verify_btreeam(PG_FUNCTION_ARGS)
+{
+#define BTREECHECK_RELATION_COLS 2
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ BtreeCheckContext ctx;
+ bool randomAccess;
+ Oid indrelid;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ indrelid = PG_GETARG_OID(0);
+
+ memset(&ctx, 0, sizeof(BtreeCheckContext));
+
+ ctx.on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_btreeam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ bt_index_check_internal(indrelid, true, true, true, &ctx);
+
+ PG_RETURN_NULL();
+}
+
/*
* Helper for bt_index_[parent_]check, coordinating the bulk of the work.
*/
static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
- bool rootdescend)
+ bool rootdescend, BtreeCheckContext * ctx)
{
Oid heapid;
Relation indrel;
@@ -300,15 +390,16 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
RelationOpenSmgr(indrel);
if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" lacks a main relation fork",
- RelationGetRelationName(indrel))));
+ econfess(ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" lacks a main relation fork",
+ RelationGetRelationName(indrel));
/* Check index, possibly against table it is an index on */
- _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
- bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
- heapallindexed, rootdescend);
+ if (CONTINUE_CHECKING(ctx))
+ _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
+ if (CONTINUE_CHECKING(ctx))
+ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
+ heapallindexed, rootdescend, ctx);
}
/*
@@ -402,7 +493,8 @@ btree_index_mainfork_expected(Relation rel)
*/
static void
bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
- bool readonly, bool heapallindexed, bool rootdescend)
+ bool readonly, bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx)
{
BtreeCheckState *state;
Page metapage;
@@ -434,6 +526,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->readonly = readonly;
state->heapallindexed = heapallindexed;
state->rootdescend = rootdescend;
+ state->ctx = ctx;
if (state->heapallindexed)
{
@@ -506,7 +599,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);
/* Get true root block from meta-page */
- metapage = palloc_btree_page(state, BTREE_METAPAGE);
+ metapage = palloc_btree_page(state, BTREE_METAPAGE, ctx);
metad = BTPageGetMeta(metapage);
/*
@@ -535,19 +628,18 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current.level = metad->btm_level;
current.leftmost = metad->btm_root;
current.istruerootlevel = true;
- while (current.leftmost != P_NONE)
+ while (CONTINUE_CHECKING(state->ctx) && current.leftmost != P_NONE)
{
/*
* Verify this level, and get left most page for next level down, if
* not at leaf level
*/
- current = bt_check_level_from_leftmost(state, current);
+ current = bt_check_level_from_leftmost(state, current, ctx);
if (current.leftmost == InvalidBlockNumber)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" has no valid pages on level below %u or first level",
- RelationGetRelationName(rel), previouslevel)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" has no valid pages on level below %u or first level",
+ RelationGetRelationName(rel), previouslevel);
previouslevel = current.level;
}
@@ -555,7 +647,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
/*
* * Check whether heap contains unindexed/malformed tuples *
*/
- if (state->heapallindexed)
+ if (CONTINUE_CHECKING(state->ctx) && state->heapallindexed)
{
IndexInfo *indexinfo = BuildIndexInfo(state->rel);
TableScanDesc scan;
@@ -639,7 +731,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
* each call to bt_target_page_check().
*/
static BtreeLevel
-bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
+bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level, BtreeCheckContext * ctx)
{
/* State to establish early, concerning entire level */
BTPageOpaque opaque;
@@ -672,7 +764,7 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
/* Initialize state for this iteration */
state->targetblock = current;
- state->target = palloc_btree_page(state, state->targetblock);
+ state->target = palloc_btree_page(state, state->targetblock, ctx);
state->targetlsn = PageGetLSN(state->target);
opaque = (BTPageOpaque) PageGetSpecialPointer(state->target);
@@ -691,18 +783,16 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* checked.
*/
if (state->readonly && P_ISDELETED(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink or sibling link points to deleted block in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "downlink or sibling link points to deleted block in index \"%s\" "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
if (P_RIGHTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u fell off the end of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u fell off the end of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
else
ereport(DEBUG1,
(errcode(ERRCODE_NO_DATA),
@@ -722,16 +812,14 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
if (state->readonly)
{
if (!P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not leftmost in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not leftmost in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
if (level.istruerootlevel && !P_ISROOT(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not true root in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not true root in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
}
/*
@@ -780,33 +868,30 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* so sibling pointers should always be in mutual agreement
*/
if (state->readonly && opaque->btpo_prev != leftcurrent)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("left link/right link pair in index \"%s\" not in agreement",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "left link/right link pair in index \"%s\" not in agreement "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
/* Check level, which must be valid for non-ignorable page */
if (level.level != opaque->btpo.level)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leftmost down link for level points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- current, level.level, opaque->btpo.level)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "leftmost down link for level points to block in index \"%s\" whose level is not one level down "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ current, level.level, opaque->btpo.level);
/* Verify invariants for page */
- bt_target_page_check(state);
+ bt_target_page_check(state, ctx);
nextpage:
/* Try to detect circular links */
if (current == leftcurrent || current == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
leftcurrent = current;
current = opaque->btpo_next;
@@ -850,7 +935,7 @@ nextpage:
/* Free page and associated memory for this iteration */
MemoryContextReset(state->targetcontext);
}
- while (current != P_NONE);
+ while (CONTINUE_CHECKING(state->ctx) && current != P_NONE);
if (state->lowkey)
{
@@ -902,7 +987,7 @@ nextpage:
* resetting state->targetcontext.
*/
static void
-bt_target_page_check(BtreeCheckState *state)
+bt_target_page_check(BtreeCheckState *state, BtreeCheckContext * ctx)
{
OffsetNumber offset;
OffsetNumber max;
@@ -930,16 +1015,15 @@ bt_target_page_check(BtreeCheckState *state)
P_HIKEY))
{
itup = (IndexTuple) PageGetItem(state->target, itemid);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of high key index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index block=%u natts=%u block type=%s page lsn=%X/%X.",
- state->targetblock,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of high key index tuple attributes in index \"%s\" "
+ "(Index block=%u natts=%u block type=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -949,7 +1033,7 @@ bt_target_page_check(BtreeCheckState *state)
* real item (if any).
*/
for (offset = P_FIRSTDATAKEY(topaque);
- offset <= max;
+ offset <= max && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
ItemId itemid;
@@ -973,16 +1057,15 @@ bt_target_page_check(BtreeCheckState *state)
* frequently, and is surprisingly tolerant of corrupt lp_len fields.
*/
if (tupsize != ItemIdGetLength(itemid))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index tuple size does not equal lp_len in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X.",
- state->targetblock, offset,
- tupsize, ItemIdGetLength(itemid),
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn),
- errhint("This could be a torn page problem.")));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "index tuple size does not equal lp_len in index \"%s\" "
+ "(Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X) "
+ "(This could be a torn page problem)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ tupsize, ItemIdGetLength(itemid),
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check the number of index tuple attributes */
if (!_bt_check_natts(state->rel, state->heapkeyspace, state->target,
@@ -998,17 +1081,16 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X.",
- itid,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of index tuple attributes in index \"%s\" "
+ "(Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1027,7 +1109,8 @@ bt_target_page_check(BtreeCheckState *state)
bt_child_highkey_check(state,
offset,
NULL,
- topaque->btpo.level);
+ topaque->btpo.level,
+ ctx);
}
continue;
}
@@ -1049,14 +1132,13 @@ bt_target_page_check(BtreeCheckState *state)
htid = psprintf("(%u,%u)", ItemPointerGetBlockNumber(tid),
ItemPointerGetOffsetNumber(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("could not find tuple using search from root page in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to heap tid=%s page lsn=%X/%X.",
- itid, htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "could not find tuple using search from root page in index \"%s\" "
+ "(Index tid=%s points to heap tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1079,14 +1161,13 @@ bt_target_page_check(BtreeCheckState *state)
{
char *itid = psprintf("(%u,%u)", state->targetblock, offset);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("posting list contains misplaced TID in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s posting list offset=%d page lsn=%X/%X.",
- itid, i,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "posting list contains misplaced TID in index \"%s\" "
+ "(Index tid=%s posting list offset=%d page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, i,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
ItemPointerCopy(current, &last);
@@ -1134,16 +1215,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index row size %zu exceeds maximum for index \"%s\"",
- tupsize, RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index row size %zu exceeds maximum for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ tupsize, RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Fingerprint leaf page tuples (those that point to the heap) */
@@ -1242,16 +1322,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("high key invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "high key invariant violated for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Reset, in case scantid was set to (itup) posting tuple's max TID */
skey->scantid = scantid;
@@ -1289,21 +1368,20 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Lower index tid=%s (points to %s tid=%s) "
- "higher index tid=%s (points to %s tid=%s) "
- "page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- nitid,
- P_ISLEAF(topaque) ? "heap" : "index",
- nhtid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "item order invariant violated for index \"%s\" "
+ "(Lower index tid=%s (points to %s tid=%s) "
+ "higher index tid=%s (points to %s tid=%s) "
+ "page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ nitid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ nhtid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1328,7 +1406,7 @@ bt_target_page_check(BtreeCheckState *state)
BTScanInsert rightkey;
/* Get item in next/right page */
- rightkey = bt_right_page_check_scankey(state);
+ rightkey = bt_right_page_check_scankey(state, ctx);
if (rightkey &&
!invariant_g_offset(state, rightkey, max))
@@ -1343,7 +1421,7 @@ bt_target_page_check(BtreeCheckState *state)
if (!state->readonly)
{
/* Get fresh copy of target page */
- state->target = palloc_btree_page(state, state->targetblock);
+ state->target = palloc_btree_page(state, state->targetblock, ctx);
/* Note that we deliberately do not update target LSN */
topaque = (BTPageOpaque) PageGetSpecialPointer(state->target);
@@ -1354,14 +1432,13 @@ bt_target_page_check(BtreeCheckState *state)
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("cross page item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Last item on page tid=(%u,%u) page lsn=%X/%X.",
- state->targetblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "cross page item order invariant violated for index \"%s\" "
+ "(Last item on page tid=(%u,%u) page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1374,7 +1451,7 @@ bt_target_page_check(BtreeCheckState *state)
* because it has no useful value to compare).
*/
if (!P_ISLEAF(topaque) && state->readonly)
- bt_child_check(state, skey, offset);
+ bt_child_check(state, skey, offset, ctx);
}
/*
@@ -1386,10 +1463,11 @@ bt_target_page_check(BtreeCheckState *state)
* right of the child page pointer to by our rightmost downlink. And they
* might have missing downlinks. This final call checks for them.
*/
- if (!P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
+ if (CONTINUE_CHECKING(state->ctx) &&
+ !P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
{
bt_child_highkey_check(state, InvalidOffsetNumber,
- NULL, topaque->btpo.level);
+ NULL, topaque->btpo.level, ctx);
}
}
@@ -1410,7 +1488,7 @@ bt_target_page_check(BtreeCheckState *state)
* been concurrently deleted.
*/
static BTScanInsert
-bt_right_page_check_scankey(BtreeCheckState *state)
+bt_right_page_check_scankey(BtreeCheckState *state, BtreeCheckContext * ctx)
{
BTPageOpaque opaque;
ItemId rightitem;
@@ -1455,7 +1533,7 @@ bt_right_page_check_scankey(BtreeCheckState *state)
{
CHECK_FOR_INTERRUPTS();
- rightpage = palloc_btree_page(state, targetnext);
+ rightpage = palloc_btree_page(state, targetnext, ctx);
opaque = (BTPageOpaque) PageGetSpecialPointer(rightpage);
if (!P_IGNORE(opaque) || P_RIGHTMOST(opaque))
@@ -1666,7 +1744,8 @@ static void
bt_child_highkey_check(BtreeCheckState *state,
OffsetNumber target_downlinkoffnum,
Page loaded_child,
- uint32 target_level)
+ uint32 target_level,
+ BtreeCheckContext * ctx)
{
BlockNumber blkno = state->prevrightlink;
Page page;
@@ -1708,7 +1787,7 @@ bt_child_highkey_check(BtreeCheckState *state,
}
/* Move to the right on the child level */
- while (true)
+ while (CONTINUE_CHECKING(state->ctx))
{
/*
* Did we traverse the whole tree level and this is check for pages to
@@ -1723,51 +1802,47 @@ bt_child_highkey_check(BtreeCheckState *state,
/* Did we traverse the whole tree level and don't find next downlink? */
if (blkno == P_NONE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't traverse from downlink %u to downlink %u of index \"%s\"",
- state->prevrightlink, downlink,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "can't traverse from downlink %u to downlink %u of index \"%s\"",
+ state->prevrightlink, downlink,
+ RelationGetRelationName(state->rel));
/* Load page contents */
if (blkno == downlink && loaded_child)
page = loaded_child;
else
- page = palloc_btree_page(state, blkno);
+ page = palloc_btree_page(state, blkno, ctx);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && !P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("the first child of leftmost target page is not leftmost of its level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "the first child of leftmost target page is not leftmost of its level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check level for non-ignorable page */
if (!P_IGNORE(opaque) && opaque->btpo.level != target_level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block found while following rightlinks from child of index \"%s\" has invalid level",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, target_level - 1, opaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "block found while following rightlinks from child of index \"%s\" has invalid level "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, target_level - 1, opaque->btpo.level);
/* Try to detect circular links */
if ((!first && blkno == state->prevrightlink) || blkno == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- blkno, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ blkno, RelationGetRelationName(state->rel));
if (blkno != downlink && !P_IGNORE(opaque))
{
/* blkno probably has missing parent downlink */
- bt_downlink_missing_check(state, rightsplit, blkno, page);
+ bt_downlink_missing_check(state, rightsplit, blkno, page, ctx);
}
rightsplit = P_INCOMPLETE_SPLIT(opaque);
@@ -1825,14 +1900,13 @@ bt_child_highkey_check(BtreeCheckState *state,
if (pivotkey_offset > PageGetMaxOffsetNumber(state->target))
{
if (P_RIGHTMOST(topaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("child high key is greater than rightmost pivot key on target level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "child high key is greater than rightmost pivot key on target level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
pivotkey_offset = P_HIKEY;
}
itemid = PageGetItemIdCareful(state, state->targetblock,
@@ -1856,27 +1930,25 @@ bt_child_highkey_check(BtreeCheckState *state,
* page.
*/
if (!state->lowkey)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't find left sibling high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "can't find left sibling high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
itup = state->lowkey;
}
if (!bt_pivot_tuple_identical(highkey, itup))
{
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("mismatch between parent key and child high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "mismatch between parent key and child high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1913,7 +1985,7 @@ bt_child_highkey_check(BtreeCheckState *state,
*/
static void
bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
- OffsetNumber downlinkoffnum)
+ OffsetNumber downlinkoffnum, BtreeCheckContext * ctx)
{
ItemId itemid;
IndexTuple itup;
@@ -1978,7 +2050,7 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* the operator class obeys the transitive law.
*/
topaque = (BTPageOpaque) PageGetSpecialPointer(state->target);
- child = palloc_btree_page(state, childblock);
+ child = palloc_btree_page(state, childblock, ctx);
copaque = (BTPageOpaque) PageGetSpecialPointer(child);
maxoffset = PageGetMaxOffsetNumber(child);
@@ -1987,7 +2059,7 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* check for downlink connectivity.
*/
bt_child_highkey_check(state, downlinkoffnum,
- child, topaque->btpo.level);
+ child, topaque->btpo.level, ctx);
/*
* Since there cannot be a concurrent VACUUM operation in readonly mode,
@@ -2014,17 +2086,16 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* to test.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink to deleted page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child block=%u parent page lsn=%X/%X.",
- state->targetblock, childblock,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted page found in index \"%s\" "
+ "(Parent block=%u child block=%u parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
for (offset = P_FIRSTDATAKEY(copaque);
- offset <= maxoffset;
+ offset <= maxoffset && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
/*
@@ -2056,14 +2127,13 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
if (!invariant_l_nontarget_offset(state, targetkey, childblock, child,
offset))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("down-link lower bound invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X.",
- state->targetblock, childblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "down-link lower bound invariant violated for index \"%s\" "
+ "(Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
pfree(child);
@@ -2084,7 +2154,7 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
*/
static void
bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
- BlockNumber blkno, Page page)
+ BlockNumber blkno, Page page, BtreeCheckContext * ctx)
{
BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
ItemId itemid;
@@ -2150,14 +2220,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* inconsistencies anywhere else.
*/
if (P_ISLEAF(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leaf index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u page lsn=%X/%X.",
- blkno,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "leaf index block lacks downlink in index \"%s\" "
+ "(Block=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/* Descend from the given page, which is an internal page */
elog(DEBUG1, "checking for interrupted multi-level deletion due to missing downlink in index \"%s\"",
@@ -2167,11 +2236,11 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
itemid = PageGetItemIdCareful(state, blkno, page, P_FIRSTDATAKEY(opaque));
itup = (IndexTuple) PageGetItem(page, itemid);
childblk = BTreeTupleGetDownLink(itup);
- for (;;)
+ while (CONTINUE_CHECKING(state->ctx))
{
CHECK_FOR_INTERRUPTS();
- child = palloc_btree_page(state, childblk);
+ child = palloc_btree_page(state, childblk, ctx);
copaque = (BTPageOpaque) PageGetSpecialPointer(child);
if (P_ISLEAF(copaque))
@@ -2179,13 +2248,12 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
/* Do an extra sanity check in passing on internal pages */
if (copaque->btpo.level != level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, childblk,
- level - 1, copaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink points to block in index \"%s\" whose level is not one level down "
+ "(Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ level - 1, copaque->btpo.level);
level = copaque->btpo.level;
itemid = PageGetItemIdCareful(state, childblk, child,
@@ -2217,14 +2285,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* parent/ancestor page) lacked a downlink is incidental.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink to deleted leaf page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X.",
- blkno, childblk,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted leaf page found in index \"%s\" "
+ "(Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/*
* Iff leaf page is half-dead, its high key top parent link should point
@@ -2244,14 +2311,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u level=%u page lsn=%X/%X.",
- blkno, opaque->btpo.level,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "internal index block lacks downlink in index \"%s\" "
+ "(Block=%u level=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, opaque->btpo.level,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
}
/*
@@ -2327,16 +2393,12 @@ bt_tuple_present_callback(Relation index, ItemPointer tid, Datum *values,
/* Probe Bloom filter -- tuple should be present */
if (bloom_lacks_element(state->filter, (unsigned char *) norm,
IndexTupleSize(norm)))
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->heaprel),
- RelationGetRelationName(state->rel)),
- !state->readonly
- ? errhint("Retrying verification using the function bt_index_parent_check() might provide a more specific error.")
- : 0));
+ econfess(state->ctx, ItemPointerGetBlockNumber(&(itup->t_tid)), ERRCODE_DATA_CORRUPTED,
+ "heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->heaprel),
+ RelationGetRelationName(state->rel));
state->heaptuplespresent++;
pfree(itup);
@@ -2395,7 +2457,7 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
if (!IndexTupleHasVarwidths(itup))
return itup;
- for (i = 0; i < tupleDescriptor->natts; i++)
+ for (i = 0; CONTINUE_CHECKING(state->ctx) && i < tupleDescriptor->natts; i++)
{
Form_pg_attribute att;
@@ -2415,12 +2477,11 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
* should never be encountered here
*/
if (VARATT_IS_EXTERNAL(DatumGetPointer(normalized[i])))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->rel));
else if (VARATT_IS_COMPRESSED(DatumGetPointer(normalized[i])))
{
formnewtup = true;
@@ -2780,7 +2841,7 @@ invariant_l_nontarget_offset(BtreeCheckState *state, BTScanInsert key,
* misbehaves.
*/
static Page
-palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
+palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum, BtreeCheckContext * ctx)
{
Buffer buffer;
Page page;
@@ -2810,10 +2871,9 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISMETA(opaque) && blocknum != BTREE_METAPAGE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid meta page found at block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid meta page found at block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/* Check page from block that ought to be meta page */
if (blocknum == BTREE_METAPAGE)
@@ -2822,20 +2882,18 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
if (!P_ISMETA(opaque) ||
metad->btm_magic != BTREE_MAGIC)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" meta page is corrupt",
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" meta page is corrupt",
+ RelationGetRelationName(state->rel));
if (metad->btm_version < BTREE_MIN_VERSION ||
metad->btm_version > BTREE_VERSION)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("version mismatch in index \"%s\": file version %d, "
- "current version %d, minimum supported version %d",
- RelationGetRelationName(state->rel),
- metad->btm_version, BTREE_VERSION,
- BTREE_MIN_VERSION)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "version mismatch in index \"%s\": file version %d, "
+ "current version %d, minimum supported version %d",
+ RelationGetRelationName(state->rel),
+ metad->btm_version, BTREE_VERSION,
+ BTREE_MIN_VERSION);
/* Finished with metapage checks */
return page;
@@ -2846,17 +2904,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* page level
*/
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && opaque->btpo.level != 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid leaf page level %u for block %u in index \"%s\"",
- opaque->btpo.level, blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "invalid leaf page level %u for block %u in index \"%s\"",
+ opaque->btpo.level, blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) &&
opaque->btpo.level == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid internal page level 0 for block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid internal page level 0 for block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/*
* Sanity checks for number of items on page.
@@ -2882,23 +2938,20 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
*/
maxoffset = PageGetMaxOffsetNumber(page);
if (maxoffset > MaxIndexTuplesPerPage)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("Number of items on block %u of index \"%s\" exceeds MaxIndexTuplesPerPage (%u)",
- blocknum, RelationGetRelationName(state->rel),
- MaxIndexTuplesPerPage)));
+ econfess(ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "Number of items on block %u of index \"%s\" exceeds MaxIndexTuplesPerPage (%u)",
+ blocknum, RelationGetRelationName(state->rel),
+ MaxIndexTuplesPerPage);
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) && maxoffset < P_FIRSTDATAKEY(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal block %u in index \"%s\" lacks high key and/or at least one downlink",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal block %u in index \"%s\" lacks high key and/or at least one downlink",
+ blocknum, RelationGetRelationName(state->rel));
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && !P_RIGHTMOST(opaque) && maxoffset < P_HIKEY)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("non-rightmost leaf block %u in index \"%s\" lacks high key item",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "non-rightmost leaf block %u in index \"%s\" lacks high key item",
+ blocknum, RelationGetRelationName(state->rel));
/*
* In general, internal pages are never marked half-dead, except on
@@ -2910,17 +2963,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* Internal pages should never have garbage items, either.
*/
if (!P_ISLEAF(opaque) && P_ISHALFDEAD(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" is half-dead",
- blocknum, RelationGetRelationName(state->rel)),
- errhint("This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it.")));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" is half-dead "
+ "(This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it)",
+ blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && P_HAS_GARBAGE(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" has garbage items",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" has garbage items",
+ blocknum, RelationGetRelationName(state->rel));
return page;
}
@@ -2971,14 +3022,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
BLCKSZ - sizeof(BTPageOpaqueData))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("line pointer points past end of tuple space in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "line pointer points past end of tuple space in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
/*
* Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree
@@ -2987,14 +3037,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
*/
if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "invalid line pointer storage in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
return itemid;
}
@@ -3016,26 +3065,23 @@ BTreeTupleGetHeapTIDCareful(BtreeCheckState *state, IndexTuple itup,
*/
Assert(state->heapkeyspace);
if (BTreeTupleIsPivot(itup) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
if (!BTreeTupleIsPivot(itup) && !nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
htid = BTreeTupleGetHeapTID(itup);
if (!ItemPointerIsValid(htid) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
return htid;
}
@@ -3066,3 +3112,53 @@ BTreeTupleGetPointsToTID(IndexTuple itup)
/* Pivot tuple returns TID with downlink block (heapkeyspace variant) */
return &itup->t_tid;
}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_btreeam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(BTREECHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == BTREECHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * confess
+ *
+ * Return a message about index corruption
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg)
+{
+ Datum values[BTREECHECK_RELATION_COLS];
+ bool nulls[BTREECHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(blkno);
+ nulls[0] = (blkno == InvalidBlockNumber);
+ values[1] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. That is OK, but leaves no room for leaking all the msg
+ * arguments that are allocated during the scan.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1e14..467120f1d0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -275,6 +275,7 @@ BrinSpecialSpace
BrinStatsData
BrinTuple
BrinValues
+BtreeCheckContext
BtreeCheckState
BtreeLevel
Bucket
--
2.21.1 (Apple Git-122.3)
v12-0002-Adding-function-verify_heapam-to-amcheck-module.patchapplication/octet-stream; name=v12-0002-Adding-function-verify_heapam-to-amcheck-module.patch; x-unix-mode=0644Download
From 997c6acfb6b57c862251cc00b1a888ad70e4358b Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 21 Jul 2020 08:02:16 -0700
Subject: [PATCH v12 2/3] Adding function verify_heapam to amcheck module
Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.
---
contrib/amcheck/Makefile | 5 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 54 +
contrib/amcheck/amcheck.h | 5 +
contrib/amcheck/expected/check_heap.out | 67 +
.../amcheck/expected/disallowed_reltypes.out | 48 +
contrib/amcheck/sql/check_heap.sql | 19 +
contrib/amcheck/sql/disallowed_reltypes.sql | 48 +
contrib/amcheck/t/001_verify_heapam.pl | 94 ++
contrib/amcheck/verify_heapam.c | 1151 +++++++++++++++++
doc/src/sgml/amcheck.sgml | 106 +-
src/backend/access/heap/hio.c | 11 +
src/tools/pgindent/typedefs.list | 1 +
12 files changed, 1607 insertions(+), 2 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/001_verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index b288c28fa0..27d38b2e86 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..df418a850b
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,54 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(rel regclass,
+ on_error_stop boolean default false,
+ skip cstring default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
+
+--
+-- verify_btreeam()
+--
+CREATE FUNCTION verify_btreeam(rel regclass,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+CREATE FUNCTION verify_btreeam(rel regclass,
+ on_error_stop boolean,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_btreeam(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION verify_btreeam(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..74edfc2f65
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..4175bb2d37
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,67 @@
+CREATE TABLE heaptest (a integer, b text);
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'rope');
+ERROR: unrecognized parameter for 'skip': rope
+HINT: please choose from 'all-visible', 'all-frozen', or 'none'
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+ERROR: starting block 0 is out of bounds for relation with no blocks
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 100000, endblock := 200000);
+ERROR: block range 100000 .. 200000 is out of bounds for relation with block count 370
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+VACUUM FREEZE heaptest;
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..892ae89652
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..c75f5ff869
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,19 @@
+CREATE TABLE heaptest (a integer, b text);
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'rope');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 100000, endblock := 200000);
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+VACUUM FREEZE heaptest;
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..fc90e6ca33
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
new file mode 100644
index 0000000000..c2d890bcd9
--- /dev/null
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -0,0 +1,94 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 48;
+
+my ($node, $result);
+
+# Check various options are stable (don't abort) when running verify_heapam on
+# the test table. For uncorrupted tables, there isn't anything to check except
+# that it runs without crashing.
+sub check_all_options
+{
+ for my $stop (qw(NULL true false))
+ {
+ for my $skip ("'none'", "'all-frozen'", "'all-visible'")
+ {
+ my $check = "SELECT verify_heapam('test', $stop, $skip)";
+ $result = $node->safe_psql('postgres', "$check; SELECT 1");
+ is ($result, 1, "checked: $check");
+ }
+ }
+}
+
+# Stops the server and writes nulls in the first page of the table,
+# assuming page size is large enough for offset 1000..1016 to be
+# in the midst of the first page of data.
+sub corrupt_first_page
+{
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('test')));
+ my $relpath = "$pgdata/$rel";
+ $node->stop;
+
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 1000, 0);
+ syswrite($fh, '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', 16);
+ close($fh);
+
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Check empty table
+$node->safe_psql('postgres', q(
+ CREATE TABLE test (a integer);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+));
+check_all_options();
+
+# Check table with trivial data
+$node->safe_psql('postgres', q(INSERT INTO test VALUES (0)));
+check_all_options();
+
+# Check table with non-trivial data (more than a page worth) but
+# without any all-frozen or all-visible
+$node->safe_psql('postgres', q(
+INSERT INTO test SELECT generate_series(1,10000)));
+check_all_options();
+
+# Check table with all-visible data
+$node->safe_psql('postgres', q(VACUUM test));
+check_all_options();
+
+# Check table with all-frozen data
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+check_all_options();
+
+# Check table with corruption, no skipping
+corrupt_first_page();
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is($result, 't', 'corruption detected on first page');
+
+# Check table with corruption, skipping all-visible blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all-visible', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all-visible first page');
+
+# Check table with corruption, skipping all-frozen blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all-frozen', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all-frozen first page');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..0fd0082f8b
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1151 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+#include "amcheck.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/* The number of columns in tuples returned by verify_heapam */
+#define HEAPCHECK_RELATION_COLS 8
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam execution.
+ */
+typedef struct HeapCheckContext
+{
+ /*
+ * While verifying a table, we check whether any xid we encounter is
+ * either too old or too new. We could naively check that by taking the
+ * XidGenLock each time and reading ShmemVariableCache. We instead cache
+ * the values and rely on the fact that we have the table locked
+ * sufficiently that the oldest xid in the table cannot change
+ * mid-verification, and although the newest xid in the table may advance,
+ * it cannot retreat. As such, whenever we encounter an xid older than
+ * our cached oldest xid, we know it is invalid, and when we encounter an
+ * xid newer than our cached newest xid, we recheck the
+ * ShmemVariableCache.
+ */
+ TransactionId nextKnownValidXid;
+ TransactionId oldestValidXid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toastrel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Whether verify_heapam has yet encountered any corrupt tuples */
+ bool is_corrupt;
+
+ /* The descriptor and tuplestore for verify_heapam's result tuples */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext *ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool TransactionIdValidInRel(TransactionId xid, HeapCheckContext *ctx);
+static bool tuple_is_visible(HeapTupleHeader tuphdr, HeapCheckContext *ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx);
+static bool check_tuple_attribute(HeapCheckContext *ctx);
+static void check_tuple(HeapCheckContext *ctx);
+
+typedef enum SkipPages
+{
+ SKIP_ALL_FROZEN_PAGES,
+ SKIP_ALL_VISIBLE_PAGES,
+ SKIP_PAGES_NONE
+} SkipPages;
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ bool randomAccess;
+ HeapCheckContext ctx;
+ FullTransactionId nextFullXid;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool fatal = false;
+ bool on_error_stop;
+ SkipPages skip_option = SKIP_PAGES_NONE;
+ int64 startblock;
+ int64 endblock;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ relid = PG_GETARG_OID(0);
+ on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+ if (!PG_ARGISNULL(2))
+ {
+ const char *skip = PG_GETARG_CSTRING(2);
+
+ if (pg_strcasecmp(skip, "all-visible") == 0)
+ skip_option = SKIP_ALL_VISIBLE_PAGES;
+ else if (pg_strcasecmp(skip, "all-frozen") == 0)
+ skip_option = SKIP_ALL_FROZEN_PAGES;
+ else if (pg_strcasecmp(skip, "none") == 0)
+ skip_option = SKIP_PAGES_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all-visible', 'all-frozen', or 'none'")));
+ }
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+
+ /* Early exit if the relation is empty */
+ if (!ctx.nblocks)
+ {
+ /*
+ * For consistency, we need to enforce that the startblock and
+ * endblock are within the valid range if the user specified them.
+ * Yet, for an empty table with no blocks, no specified block can be
+ * in range.
+ */
+ if (!PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("starting block " INT64_FORMAT
+ " is out of bounds for relation with no blocks",
+ PG_GETARG_INT64(3))));
+ if (!PG_ARGISNULL(4))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block " INT64_FORMAT
+ " is out of bounds for relation with no blocks",
+ PG_GETARG_INT64(4))));
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+ PG_RETURN_NULL();
+ }
+
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ /* If we get this far, we know the relation has at least one block */
+ startblock = PG_ARGISNULL(3) ? 0 : PG_GETARG_INT64(3);
+ endblock = PG_ARGISNULL(4) ? ((int64) ctx.nblocks) - 1 : PG_GETARG_INT64(4);
+ if (startblock < 0 || endblock >= ctx.nblocks || startblock > endblock)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block range " INT64_FORMAT " .. " INT64_FORMAT
+ " is out of bounds for relation with block count %u",
+ startblock, endblock, ctx.nblocks)));
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toastrel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toastrel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toastrel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance.
+ *
+ * If relfrozenxid is normal, it contains the oldest valid xid we may
+ * encounter in the table. If not, the oldest xid for our database is the
+ * oldest we should encounter.
+ *
+ * Bugs in pg_upgrade are reported (see commands/vacuum.c circa line 1572)
+ * to have sometimes rendered the oldest xid value for a database invalid.
+ * It seems unwise to report rows as corrupt for failing to be newer than
+ * a value which itself may be corrupt. We instead use the oldest xid for
+ * the entire cluster, which must be at least as old as the oldest xid for
+ * our database.
+ *
+ * If neither the value for the database nor the xids for any row are
+ * corrupt, then this gives the right answer. If the rows disagree with
+ * the value for the database, how can we know which one is wrong?
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ nextFullXid = ShmemVariableCache->nextFullXid;
+ ctx.oldestValidXid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.nextKnownValidXid = XidFromFullTransactionId(nextFullXid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relfrozenxid %u precedes global oldest valid xid %u",
+ ctx.relfrozenxid, ctx.oldestValidXid));
+ fatal = true;
+ }
+ else if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldestValidXid))
+ {
+ confess(&ctx, psprintf("relminmxid %u precedes global oldest valid xid %u",
+ ctx.relminmxid, ctx.oldestValidXid));
+ fatal = true;
+ }
+
+ if (fatal)
+ {
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ if (ctx.toastrel)
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldestValidXid = ctx.relfrozenxid;
+
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
+
+ for (ctx.blkno = startblock; ctx.blkno <= endblock; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+ PageHeader ph;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_option != SKIP_PAGES_NONE)
+ {
+ bool all_frozen,
+ all_visible;
+
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE;
+ all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN;
+
+ if ((all_frozen && skip_option == SKIP_ALL_FROZEN_PAGES) ||
+ (all_visible && skip_option == SKIP_ALL_VISIBLE_PAGES))
+ {
+ continue;
+ }
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+ ph = (PageHeader) ctx.page;
+
+ /* We rely on this math property for the first iteration */
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber,
+ "InvalidOffsetNumber increments to FirstOffsetNumber");
+
+ ctx.offnum = InvalidOffsetNumber;
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ ctx.tuphdr = NULL;
+ ctx.natts = 0;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it
+ * redirects to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
+ ItemId rditem;
+
+ if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ {
+ confess(&ctx, psprintf(
+ "line pointer redirection to item at offset number %u is outside valid bounds %u .. %u",
+ (unsigned) rdoffnum, (unsigned) FirstOffsetNumber,
+ (unsigned) maxoff));
+ continue;
+ }
+ rditem = PageGetItemId(ctx.page, rdoffnum);
+ if (!ItemIdIsUsed(rditem))
+ confess(&ctx, psprintf(
+ "line pointer redirection to unused item at offset %u",
+ (unsigned) rdoffnum));
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ ctx.itemid = NULL;
+ ctx.lp_len = 0;
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
+ }
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap",
+ RelationGetRelationName(rel))));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext *ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. That's ok, but if we also leak the msg argument memory
+ * until the end of the query, we could exceed workmem by more than a
+ * trivial amount. Therefore, free the msg argument each time we are
+ * called rather than waiting for our current memory context to be freed.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext *ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldestValidXid, xid) &&
+ TransactionIdPrecedes(xid, ctx->nextKnownValidXid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+TransactionIdValidInRel(TransactionId xid, HeapCheckContext *ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->nextKnownValidXid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * tuple_is_visible
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+tuple_is_visible(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!TransactionIdValidInRel(xvac, ctx))
+ {
+ confess(ctx, psprintf("old-style VACUUM FULL transaction ID %u is invalid in this relation",
+ xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return true; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return true; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx, pstrdup(
+ "heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor has a valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequence number is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toastrel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx, pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ uint32 header = ((varattrib_4b *) chunk)->va_4byte.va_header;
+
+ confess(ctx, psprintf(
+ "corrupt extended toast chunk with sequence number %d has invalid varlena header %0x",
+ curchunk, header));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx, psprintf(
+ "toast chunk sequence number %u not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx, psprintf(
+ "toast chunk sequence number %u exceeds the end chunk sequence number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx, psprintf("toast chunk size %u differs from expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in
+ * the case of a toasted value, continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The checks
+ * performed by Asserts present in those two functions are also performed
+ * here. In cases where those two functions are a bit cavalier in their
+ * assumptions about data being correct, we perform additional checks not
+ * present in either of those two functions. Where some condition is checked
+ * in both of those functions, we perform it here twice, as we parallel the
+ * logical flow of those two functions. The presence of duplicate checks
+ * seems a reasonable price to pay for keeping this code tightly coupled with
+ * the code it protects.
+ *
+ * Returns true if the tuple attribute is sane enough for processing to
+ * continue on to the next attribute, false otherwise.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext *ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask;
+ Form_pg_attribute thisatt;
+
+ infomask = ctx->tuphdr->t_infomask;
+ thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf(
+ "tuple attribute should start at offset %u, but tuple length is only %u",
+ ctx->tuphdr->t_hoff + ctx->offset, ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx, psprintf(
+ "%s toast at offset %u is unexpected",
+ va_tag == VARTAG_INDIRECT ? "indirect" :
+ va_tag == VARTAG_EXPANDED_RO ? "expanded" :
+ va_tag == VARTAG_EXPANDED_RW ? "expanded" :
+ "unexpected",
+ ctx->tuphdr->t_hoff + ctx->offset));
+ /* We can't know where the next attribute begins */
+ return false;
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx, psprintf(
+ "tuple attribute of length %u ends at offset %u, but tuple length is only %u",
+ thisatt->attlen, ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx, pstrdup(
+ "attribute is external but tuple header flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx, pstrdup(
+ "attribute is external but relation has no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toastrel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx, psprintf(
+ "final chunk number %u differs from expected value %u",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx, pstrdup("toasted value missing from toast table"));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext *ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /*
+ * If the line pointer for this tuple does not reserve enough space for a
+ * complete tuple header, we dare not read the tuple header.
+ */
+ if (ctx->lp_len < MAXALIGN(SizeofHeapTupleHeader))
+ {
+ confess(ctx, psprintf(
+ "tuple's %u byte line pointer length is less than the %u byte minimum tuple header size",
+ ctx->lp_len, (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ return;
+ }
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx, psprintf(
+ "tuple xmax %u precedes relminmxid %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf(
+ "tuple xmin %u precedes relfrozenxid %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmin, ctx))
+ {
+ confess(ctx, psprintf(
+ "tuple xmin %u follows last assigned xid %u",
+ xmin, ctx->nextKnownValidXid));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx, psprintf(
+ "tuple xmax %u precedes relfrozenxid %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!TransactionIdValidInRel(xmax, ctx))
+ {
+ confess(ctx, psprintf(
+ "tuple xmax %u follows last assigned xid %u",
+ xmax, ctx->nextKnownValidXid));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("tuple's header size is %u bytes which is less than the %u byte minimum valid header size",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx, psprintf(
+ "tuple's %u byte header size exceeds the %u byte length of the entire tuple",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx, psprintf(
+ "tuple's user data offset %u not maximally aligned to %u",
+ ctx->tuphdr->t_hoff, (uint32) MAXALIGN(ctx->tuphdr->t_hoff)));
+ fatal = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ confess(ctx,
+ psprintf("tuple xmax marked incompatibly as keys updated and locked only"));
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ confess(ctx,
+ psprintf("tuple xmax marked incompatibly as committed and as a multitransaction ID"));
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the t_hoff field or the infomask
+ * bit HEAP_HASNULL.
+ *
+ * If the tuple does not have nulls, check that no space has been reserved
+ * for the null bitmap.
+ */
+ if ((infomask & HEAP_HASNULL) &&
+ (ctx->tuphdr->t_hoff != MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts))))
+ {
+ confess(ctx, psprintf(
+ "tuple with null values has user data offset %u rather than the expected offset %u",
+ ctx->tuphdr->t_hoff,
+ (uint32) MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts))));
+ fatal = true;
+ }
+ else if (!(infomask & HEAP_HASNULL) &&
+ (ctx->tuphdr->t_hoff != MAXALIGN(SizeofHeapTupleHeader)))
+ {
+ confess(ctx, psprintf(
+ "tuple without null values has user data offset %u rather than the expected offset %u",
+ ctx->tuphdr->t_hoff,
+ (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!tuple_is_visible(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx, psprintf(
+ "tuple has %u attributes in relation with only %u attributes",
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a9df2c1a9d..b8170bbfdf 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -69,7 +69,7 @@ AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
@@ -165,6 +165,110 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if on_error_stop is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..00de10b7c9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you decide to disable one or more of these
+ * assertions, make corresponding changes to contrib/amcheck.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 467120f1d0..531710594a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1018,6 +1018,7 @@ HbaToken
HeadlineJsonState
HeadlineParsedText
HeadlineWordEntry
+HeapCheckContext
HeapScanDesc
HeapTuple
HeapTupleData
--
2.21.1 (Apple Git-122.3)
v12-0003-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v12-0003-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 4b0d3794048abc5f14a100c8906462382e0cfd0a Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 21 Jul 2020 08:05:19 -0700
Subject: [PATCH v12 3/3] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 900 ++++++++++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 55 ++
contrib/pg_amcheck/t/003_check.pl | 85 ++
contrib/pg_amcheck/t/004_verify_heapam.pl | 434 +++++++++++
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pg_amcheck.sgml | 136 ++++
src/tools/pgindent/typedefs.list | 2 +
12 files changed, 1655 insertions(+)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 doc/src/sgml/pg_amcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..c21c27cbeb 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..2bd98076eb
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,900 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --schema=PATTERN check all relations in the specified schema(s)",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified "
+ "schema(s)",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -i, --check-indexes check associated btree indexes, if any",
+ " -I, --exclude-indexes do NOT check associated btree indexes",
+ " --strict-names require table and/or schema include patterns "
+ "to match at least one entity each",
+ " -b, --startblock check relations beginning at the given "
+ "starting block number",
+ " -e, --endblock check relations only up to the given ending "
+ "block number",
+ " -f, --skip-all-frozen do not check blocks marked as all frozen",
+ " -v, --skip-all-visible do not check blocks marked as all visible",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen "
+ "automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes for tables */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of OIDs of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+
+static void check_tables(SimpleOidList *checklist);
+static void check_table(Oid tbloid);
+static void check_indexes(Oid tbloid);
+static void check_index(Oid tbloid, Oid idxoid);
+
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+
+static void NoticeProcessor(void *arg, const char *message);
+
+static void expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_table_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(SimpleOidList *include_nsp,
+ SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl,
+ SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+
+static void die_on_query_failure(const char *query);
+static void ExecuteSqlStatement(const char *query);
+static PGresult *ExecuteSqlQuery(const char *query, ExecStatusType status);
+static PGresult *ExecuteSqlQueryForSingleRow(const char *query);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+#define NOPAGER 0
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ /* Default behaviors */
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+ settings.check_indexes = true;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /* Expand schema selection patterns into OID lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+ expand_schema_name_patterns(&schema_exclude_patterns,
+ &schema_exclude_oids,
+ false);
+ /* non-matching exclusion patterns aren't an error */
+
+ /* Expand table selection patterns into OID lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+ expand_table_name_patterns(&table_exclude_patterns,
+ &table_exclude_oids,
+ false);
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ check_table(cell->val);
+ if (settings.check_indexes)
+ check_indexes(cell->val);
+ }
+}
+
+static void
+check_table(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+ char *skip;
+ const char *stop;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (settings.startblock == NULL)
+ settings.startblock = pg_strdup("NULL");
+ if (settings.endblock == NULL)
+ settings.endblock = pg_strdup("NULL");
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("NULL");
+ stop = (settings.on_error_stop) ? "true" : "false";
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(relname=%s,blkno=%s,offnum=%s,lp_off=%s,lp_flags=%s,"
+ "lp_len=%s,attnum=%s,chunk=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* lp_off */
+ PQgetvalue(res, i, 4), /* lp_flags */
+ PQgetvalue(res, i, 5), /* lp_len */
+ PQgetvalue(res, i, 6), /* attnum */
+ PQgetvalue(res, i, 7), /* chunk */
+ PQgetvalue(res, i, 8)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_indexes(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ check_index(tbloid, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_index(Oid tbloid, Oid idxoid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT ct.relname, ci.relname, blkno, msg"
+ "\nFROM verify_btreeam(%u,%s),"
+ "\n pg_catalog.pg_class ci,"
+ "\n pg_catalog.pg_class ct"
+ "\nWHERE ci.oid = %u"
+ "\n AND ct.oid = %u",
+ idxoid,
+ settings.on_error_stop ? "true" : "false",
+ idxoid, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(table=%s,index=%s,blkno=%s)"
+ "\n%s\n",
+ PQgetvalue(res, i, 0), /* table relname */
+ PQgetvalue(res, i, 1), /* index relname */
+ PQgetvalue(res, i, 2), /* index blkno */
+ PQgetvalue(res, i, 3)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"startblock", required_argument, NULL, 'b'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"host", required_argument, NULL, 'h'},
+ {"check-indexes", no_argument, NULL, 'i'},
+ {"exclude-indexes", no_argument, NULL, 'I'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"schema", required_argument, NULL, 'n'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"port", required_argument, NULL, 'p'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"help", optional_argument, NULL, '?'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "b:d:e:fh:iIn:N:op:st:T:U:vVwW?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ settings.check_indexes = true;
+ break;
+ case 'I':
+ settings.check_indexes = false;
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ FILE *output;
+ int lines;
+ int lineno;
+
+ for (lines = 0; usage_text[lines]; lines++)
+ ;
+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+
+ ClosePager(output);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+static void
+expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list. See also expand_dbname_patterns()
+ * in pg_dumpall.c
+ */
+static void
+expand_table_name_patterns(SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(settings.db, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+static void
+append_csv_oids(PQExpBuffer query, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(query, "%s%u", comma, cell->val);
+}
+
+static bool
+append_filter(PQExpBuffer query, const char *lval, const char *operator,
+ const SimpleOidList *oids)
+{
+ if (!oids->head)
+ return false;
+ appendPQExpBuffer(query, "\nAND %s %s ANY(array[\n", lval, operator);
+ append_csv_oids(query, oids);
+ appendPQExpBuffer(query, "\n])");
+ return true;
+}
+
+static void
+get_table_check_list(SimpleOidList *include_nsp, SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl, SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.=)", include_nsp);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.!=)", exclude_nsp);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.=)", include_tbl);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.!=)", exclude_tbl);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+static void
+ExecuteSqlStatement(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ die_on_query_failure(query);
+ PQclear(res);
+}
+
+static PGresult *
+ExecuteSqlQuery(const char *query, ExecStatusType status)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != status)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute an SQL query and verify that we got exactly one row back.
+ */
+static PGresult *
+ExecuteSqlQueryForSingleRow(const char *query)
+{
+ PGresult *res;
+ int ntups;
+
+ res = ExecuteSqlQuery(query, PGRES_TUPLES_OK);
+
+ /* Expecting a single result only */
+ ntups = PQntuples(res);
+ if (ntups != 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+
+ return res;
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..c63ba4452e
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,55 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..01531e5c77
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,85 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+CREATE SCHEMA s1;
+CREATE SCHEMA s2;
+CREATE SCHEMA s3;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE TABLE s1.t3 (a TEXT);
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE TABLE s2.t3 (a TEXT);
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE TABLE s3.t3 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+CREATE INDEX i3 ON s1.t3(a);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+CREATE INDEX i3 ON s2.t3(a);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+CREATE INDEX i3 ON s3.t3(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-I', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-i', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+;$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1'
+ ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1'
+ ],
+ 'pg_amcheck all objects not in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-i', '-n', 's*', '-t', 't1'
+ ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-I', '-p', $port, 'postgres', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-I', '-p', $port, 'postgres', '-N', 's1', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..58d5ab88cb
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,434 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 48;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 14;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '--check-indexes', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ elsif ($tupidx == 12)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+ }
+ elsif ($tupidx == 13)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin 3 precedes relfrozenxid $relfrozenxid
+0|2|8064|1|58|||tuple xmin 4026531839 precedes relfrozenxid $relfrozenxid
+0|3|8000|1|58|||tuple xmax 4026531839 precedes relfrozenxid $relfrozenxid
+0|4|7936|1|58|||tuple's 152 byte header size exceeds the 58 byte length of the entire tuple
+0|4|7936|1|58|||tuple without null values has user data offset 152 rather than the expected offset 24
+0|5|7872|1|58|||tuple's user data offset 27 not maximally aligned to 32
+0|5|7872|1|58|||tuple without null values has user data offset 27 rather than the expected offset 24
+0|6|7808|1|58|||tuple's header size is 16 bytes which is less than the 23 byte minimum valid header size
+0|6|7808|1|58|||tuple without null values has user data offset 16 rather than the expected offset 24
+0|7|7744|1|58|||tuple's header size is 21 bytes which is less than the 23 byte minimum valid header size
+0|7|7744|1|58|||tuple's user data offset 21 not maximally aligned to 24
+0|7|7744|1|58|||tuple without null values has user data offset 21 rather than the expected offset 24
+0|8|7680|1|58|||tuple has 2047 attributes in relation with only 3 attributes
+0|9|7616|1|58|||tuple with null values has user data offset 24 rather than the expected offset 280
+0|10|7552|1|58|||tuple has 67 attributes in relation with only 3 attributes
+0|11|7488|1|58|1||tuple attribute of length 4294967295 ends at offset 416848000, but tuple length is only 58
+0|12|7424|1|58|2|0|final chunk number 0 differs from expected value 6
+0|12|7424|1|58|2|0|toasted value missing from toast table
+0|13|7360|1|58|||tuple xmax marked incompatibly as keys updated and locked only
+0|14|7296|1|58|||tuple xmax 0 precedes relminmxid 1
+0|14|7296|1|58|||tuple xmax marked incompatibly as committed and as a multitransaction ID",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,lp_off=\d*,lp_flags=\d*,lp_len=\d*,attnum=\d*,chunk=\d*/,
+
+ # individual detected corruptions
+ qr/final chunk number \d+ differs from expected value \d+/,
+ qr/toasted value missing from toast table/,
+ qr/tuple attribute of length \d+ ends at offset \d+, but tuple length is only \d+/,
+ qr/tuple has \d+ attributes in relation with only \d+ attributes/,
+ qr/tuple with null values has user data offset \d+ rather than the expected offset \d+/,
+ qr/tuple without null values has user data offset \d+ rather than the expected offset \d+/,
+ qr/tuple xmax \d+ precedes relfrozenxid \d+/,
+ qr/tuple xmax \d+ precedes relminmxid \d+/,
+ qr/tuple xmax marked incompatibly as committed and as a multitransaction ID/,
+ qr/tuple xmax marked incompatibly as keys updated and locked only/,
+ qr/tuple xmin \d+ precedes relfrozenxid \d+/,
+ qr/tuple's \d+ byte header size exceeds the \d+ byte length of the entire tuple/,
+ qr/tuple's header size is \d+ bytes which is less than the \d+ byte minimum valid header size/,
+ qr/tuple's user data offset \d+ not maximally aligned to \d+/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '--exclude-indexes', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..f606e42fb9 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pg_amcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..10e1ca9663 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pg_amcheck SYSTEM "pg_amcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pg_amcheck.sgml b/doc/src/sgml/pg_amcheck.sgml
new file mode 100644
index 0000000000..a0b9c9d19b
--- /dev/null
+++ b/doc/src/sgml/pg_amcheck.sgml
@@ -0,0 +1,136 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "corrupted",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck -i test
+(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 3289393 is in the future
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmax = 0 precedes relation relminmxid = 1
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 12593 is in the future
+</screen>
+
+ <para>
+ .... many pages of output removed for brevity ....
+ </para>
+
+<screen>
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+tuple xmin = 305 precedes relation relfrozenxid = 487
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff > lp_len (54 > 34)
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff not max-aligned (54)
+</screen>
+
+ <para>
+ Each detected corruption is reported on two lines, the first line shows the
+ location and the second line shows a message describing the problem.
+ </para>
+ </sect2>
+</sect1>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 531710594a..050c7b87d3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -102,6 +102,7 @@ AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AlternativeSubPlanState
+AmCheckSettings
AnalyzeAttrComputeStatsFunc
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
@@ -404,6 +405,7 @@ ConnCacheEntry
ConnCacheKey
ConnStatusType
ConnType
+ConnectOptions
ConnectionStateEnum
ConsiderSplitContext
Const
--
2.21.1 (Apple Git-122.3)
On Tue, Jul 21, 2020 at 2:32 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
[....]
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber, + "InvalidOffsetNumber increments to FirstOffsetNumber");If you are going to rely on this property, I agree that it is good to
check it. But it would be better to NOT rely on this property, and I
suspect the code can be written quite cleanly without relying on it.
And actually, that's what you did, because you first set ctx.offnum =
InvalidOffsetNumber but then just after that you set ctx.offnum = 0 in
the loop initializer. So AFAICS the first initializer, and the static
assert, are pointless.Ah, right you are. Removed.
I can see the same assert and the unnecessary assignment in v12-0002, is that
the same thing that is supposed to be removed, or am I missing something?
[....]
+confess(HeapCheckContext * ctx, char *msg) +TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx) +check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)This is what happens when you pgindent without adding all the right
things to typedefs.list first ... or when you don't pgindent and have
odd ideas about how to indent things.Hmm. I don't see the three lines of code you are quoting. Which patch is that from?
I think it was the same thing related to my previous suggestion to list new
structures to typedefs.list. V12 has listed new structures but I think there
are still some more adjustments needed in the code e.g. see space between
HeapCheckContext and * (asterisk) that need to be fixed. I am not sure if the
pgindent will do that or not.
Here are a few more minor comments for the v12-0002 patch & some of them
apply to other patches as well:
#include "utils/snapmgr.h"
-
+#include "amcheck.h"
Doesn't seem to be at the correct place -- need to be in sorted order.
+ if (!PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("starting block " INT64_FORMAT
+ " is out of bounds for relation with no blocks",
+ PG_GETARG_INT64(3))));
+ if (!PG_ARGISNULL(4))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block " INT64_FORMAT
+ " is out of bounds for relation with no blocks",
+ PG_GETARG_INT64(4))));
I think these errmsg() strings also should be in one line.
+ if (fatal)
+ {
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ if (ctx.toastrel)
+ table_close(ctx.toastrel, ShareUpdateExclusiveLock);
Toast index and rel closing block style is not the same as at the ending of
verify_heapam().
+ /* If we get this far, we know the relation has at least one block */
+ startblock = PG_ARGISNULL(3) ? 0 : PG_GETARG_INT64(3);
+ endblock = PG_ARGISNULL(4) ? ((int64) ctx.nblocks) - 1 : PG_GETARG_INT64(4);
+ if (startblock < 0 || endblock >= ctx.nblocks || startblock > endblock)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block range " INT64_FORMAT " .. " INT64_FORMAT
+ " is out of bounds for relation with block count %u",
+ startblock, endblock, ctx.nblocks)));
+
...
...
+ if (startblock < 0)
+ startblock = 0;
+ if (endblock < 0 || endblock > ctx.nblocks)
+ endblock = ctx.nblocks;
Other than endblock < 0 case, do we really need that? I think due to the above
error check the rest of the cases will not reach this place.
+ confess(ctx, psprintf(
+ "tuple xmax %u follows last assigned xid %u",
+ xmax, ctx->nextKnownValidXid));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("tuple's header size is %u bytes which is less than the %u
byte minimum valid header size",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
confess() call has two different code styles, first one where psprintf()'s only
argument got its own line and second style where psprintf has its own line with
the argument. I think the 2nd style is what we do follow & correct, not the
former.
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a heap",
+ RelationGetRelationName(rel))));
Like elsewhere, can we have errmsg as "only heap AM is supported" and error
code is ERRCODE_FEATURE_NOT_SUPPORTED ?
That all, for now, apologize for multiple review emails.
Regards,
Amul
On Jul 26, 2020, at 9:27 PM, Amul Sul <sulamul@gmail.com> wrote:
On Tue, Jul 21, 2020 at 2:32 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:[....]
+ StaticAssertStmt(InvalidOffsetNumber + 1 == FirstOffsetNumber, + "InvalidOffsetNumber increments to FirstOffsetNumber");If you are going to rely on this property, I agree that it is good to
check it. But it would be better to NOT rely on this property, and I
suspect the code can be written quite cleanly without relying on it.
And actually, that's what you did, because you first set ctx.offnum =
InvalidOffsetNumber but then just after that you set ctx.offnum = 0 in
the loop initializer. So AFAICS the first initializer, and the static
assert, are pointless.Ah, right you are. Removed.
I can see the same assert and the unnecessary assignment in v12-0002, is that
the same thing that is supposed to be removed, or am I missing something?
That's the same thing. I removed it, but obviously I somehow removed the removal prior to making the patch. My best guess is that I reverted some set of changes that unintentionally included this one.
[....]
+confess(HeapCheckContext * ctx, char *msg) +TransactionIdValidInRel(TransactionId xid, HeapCheckContext * ctx) +check_tuphdr_xids(HeapTupleHeader tuphdr, HeapCheckContext * ctx)This is what happens when you pgindent without adding all the right
things to typedefs.list first ... or when you don't pgindent and have
odd ideas about how to indent things.Hmm. I don't see the three lines of code you are quoting. Which patch is that from?
I think it was the same thing related to my previous suggestion to list new
structures to typedefs.list. V12 has listed new structures but I think there
are still some more adjustments needed in the code e.g. see space between
HeapCheckContext and * (asterisk) that need to be fixed. I am not sure if the
pgindent will do that or not.
Hmm. I'm not seeing an example of HeapCheckContext with wrong spacing. Can you provide a file and line number? There was a problem with enum SkipPages. I've added that to the typedefs.list and rerun pgindent.
While looking at that, I noticed that the function and variable naming conventions in this patch were irregular, with names like TransactionIdValidInRel (init-caps) and tuple_is_visible (underscores), so I spent some time cleaning that up for v13.
Here are a few more minor comments for the v12-0002 patch & some of them
apply to other patches as well:#include "utils/snapmgr.h" - +#include "amcheck.h"Doesn't seem to be at the correct place -- need to be in sorted order.
Fixed.
+ if (!PG_ARGISNULL(3)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("starting block " INT64_FORMAT + " is out of bounds for relation with no blocks", + PG_GETARG_INT64(3)))); + if (!PG_ARGISNULL(4)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("ending block " INT64_FORMAT + " is out of bounds for relation with no blocks", + PG_GETARG_INT64(4))));I think these errmsg() strings also should be in one line.
I chose not to do so, because the INT64_FORMAT bit breaks up the text even if placed all on one line. I don't feel strongly about that, though, so I'll join them for v13.
+ if (fatal) + { + if (ctx.toast_indexes) + toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes, + ShareUpdateExclusiveLock); + if (ctx.toastrel) + table_close(ctx.toastrel, ShareUpdateExclusiveLock);Toast index and rel closing block style is not the same as at the ending of
verify_heapam().
I've harmonized the two. Thanks for noticing.
+ /* If we get this far, we know the relation has at least one block */ + startblock = PG_ARGISNULL(3) ? 0 : PG_GETARG_INT64(3); + endblock = PG_ARGISNULL(4) ? ((int64) ctx.nblocks) - 1 : PG_GETARG_INT64(4); + if (startblock < 0 || endblock >= ctx.nblocks || startblock > endblock) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("block range " INT64_FORMAT " .. " INT64_FORMAT + " is out of bounds for relation with block count %u", + startblock, endblock, ctx.nblocks))); + ... ... + if (startblock < 0) + startblock = 0; + if (endblock < 0 || endblock > ctx.nblocks) + endblock = ctx.nblocks;Other than endblock < 0 case
This case does not need special checking, either. The combination of checking that startblock >= 0 and that startblock <= endblock already handles it.
, do we really need that? I think due to the above
error check the rest of the cases will not reach this place.
We don't need any of that. Removed in v13.
+ confess(ctx, psprintf( + "tuple xmax %u follows last assigned xid %u", + xmax, ctx->nextKnownValidXid)); + fatal = true; + } + } + + /* Check for tuple header corruption */ + if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader) + { + confess(ctx, + psprintf("tuple's header size is %u bytes which is less than the %u byte minimum valid header size", + ctx->tuphdr->t_hoff, + (unsigned) SizeofHeapTupleHeader));confess() call has two different code styles, first one where psprintf()'s only
argument got its own line and second style where psprintf has its own line with
the argument. I think the 2nd style is what we do follow & correct, not the
former.
Ok, standardized in v13.
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("\"%s\" is not a heap", + RelationGetRelationName(rel))));Like elsewhere, can we have errmsg as "only heap AM is supported" and error
code is ERRCODE_FEATURE_NOT_SUPPORTED ?
I'm indifferent about that change. Done for v13.
That all, for now, apologize for multiple review emails.
Not at all! I appreciate all the reviews.
Attachments:
v13-0001-Adding-function-verify_btreeam-and-bumping-versi.patchapplication/octet-stream; name=v13-0001-Adding-function-verify_btreeam-and-bumping-versi.patch; x-unix-mode=0644Download
From 980ba0a318b1d876fd0e9c8b0cf8167a600a5c71 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 27 Jul 2020 08:02:24 -0700
Subject: [PATCH v13 1/3] Adding function verify_btreeam and bumping version
For most errors found while verifying a btree index, the new
function verify_btreeam returns one row per error containing the
block number where the error was discovered and an error message
describing the problem. The pre-existing behavior for functions
bt_index_parent_check and bt_index_check is unchanged.
---
contrib/amcheck/Makefile | 2 +-
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/expected/check_btree.out | 35 +
contrib/amcheck/sql/check_btree.sql | 13 +
contrib/amcheck/verify_nbtree.c | 834 +++++++++++++----------
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 516 insertions(+), 371 deletions(-)
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..b288c28fa0 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -6,7 +6,7 @@ OBJS = \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
REGRESS = check check_btree
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index f82f48d23b..7297abb577 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -21,6 +21,8 @@ SELECT bt_index_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_check
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
ERROR: permission denied for function bt_index_parent_check
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
-- to run this cluster-wide with a restricted account, and as tested
@@ -29,6 +31,7 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -42,17 +45,23 @@ SELECT bt_index_parent_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ERROR: permission denied for function verify_btreeam
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
ERROR: "bttest_a" is not an index
SELECT bt_index_parent_check('bttest_a');
ERROR: "bttest_a" is not an index
+SELECT * FROM verify_btreeam('bttest_a');
+ERROR: "bttest_a" is not an index
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
ERROR: could not open relation with OID 17
SELECT bt_index_parent_check(17);
ERROR: could not open relation with OID 17
+SELECT * FROM verify_btreeam(17);
+ERROR: could not open relation with OID 17
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
@@ -60,6 +69,12 @@ SELECT bt_index_parent_check('bttest_a_brin_idx');
ERROR: only B-Tree indexes are supported as targets for verification
DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
ROLLBACK;
+BEGIN;
+CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ERROR: only B-Tree indexes are supported as targets for verification
+DETAIL: Relation "bttest_a_brin_idx" is not a B-Tree index.
+ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
bt_index_check
@@ -67,6 +82,11 @@ SELECT bt_index_check('bttest_a_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
bt_index_check
@@ -93,6 +113,11 @@ SELECT bt_index_parent_check('bttest_b_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_a_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -118,6 +143,11 @@ SELECT bt_index_check('bttest_multi_idx');
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
bt_index_parent_check
@@ -134,6 +164,11 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
(1 row)
+SELECT * FROM verify_btreeam('bttest_multi_idx');
+ blkno | msg
+-------+-----
+(0 rows)
+
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
-- checks
diff --git a/contrib/amcheck/sql/check_btree.sql b/contrib/amcheck/sql/check_btree.sql
index a1fef644cb..816ca9d033 100644
--- a/contrib/amcheck/sql/check_btree.sql
+++ b/contrib/amcheck/sql/check_btree.sql
@@ -24,6 +24,7 @@ CREATE ROLE regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx'::regclass);
SELECT bt_index_parent_check('bttest_a_idx'::regclass);
+SELECT * FROM verify_btreeam('bttest_a_idx'::regclass);
RESET ROLE;
-- we, intentionally, don't check relation permissions - it's useful
@@ -33,27 +34,36 @@ GRANT EXECUTE ON FUNCTION bt_index_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_check(regclass, boolean) TO regress_bttest_role;
GRANT EXECUTE ON FUNCTION bt_index_parent_check(regclass, boolean) TO regress_bttest_role;
+GRANT EXECUTE ON FUNCTION verify_btreeam(regclass, boolean) TO regress_bttest_role;
SET ROLE regress_bttest_role;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
RESET ROLE;
-- verify plain tables are rejected (error)
SELECT bt_index_check('bttest_a');
SELECT bt_index_parent_check('bttest_a');
+SELECT * FROM verify_btreeam('bttest_a');
-- verify non-existing indexes are rejected (error)
SELECT bt_index_check(17);
SELECT bt_index_parent_check(17);
+SELECT * FROM verify_btreeam(17);
-- verify wrong index types are rejected (error)
BEGIN;
CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
SELECT bt_index_parent_check('bttest_a_brin_idx');
ROLLBACK;
+BEGIN;
+CREATE INDEX bttest_a_brin_idx ON bttest_a USING brin(id);
+SELECT * FROM verify_btreeam('bttest_a_brin_idx');
+ROLLBACK;
-- normal check outside of xact
SELECT bt_index_check('bttest_a_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- more expansive tests
SELECT bt_index_check('bttest_a_idx', true);
SELECT bt_index_parent_check('bttest_b_idx', true);
@@ -61,6 +71,7 @@ SELECT bt_index_parent_check('bttest_b_idx', true);
BEGIN;
SELECT bt_index_check('bttest_a_idx');
SELECT bt_index_parent_check('bttest_b_idx');
+SELECT * FROM verify_btreeam('bttest_a_idx');
-- make sure we don't have any leftover locks
SELECT * FROM pg_locks
WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
@@ -74,6 +85,7 @@ SELECT bt_index_check('bttest_a_idx', true);
-- normal check outside of xact for index with included columns
SELECT bt_index_check('bttest_multi_idx');
+SELECT * FROM verify_btreeam('bttest_multi_idx');
-- more expansive tests for index with included columns
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
@@ -81,6 +93,7 @@ SELECT bt_index_parent_check('bttest_multi_idx', true, true);
TRUNCATE bttest_multi;
INSERT INTO bttest_multi SELECT i, i%2 FROM generate_series(1, 100000) as i;
SELECT bt_index_parent_check('bttest_multi_idx', true, true);
+SELECT * FROM verify_btreeam('bttest_multi_idx');
--
-- Test for multilevel page deletion/downlink present checks, and rootdescend
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index e4d501a85d..ea70fc41a9 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -32,16 +32,22 @@
#include "catalog/index.h"
#include "catalog/pg_am.h"
#include "commands/tablecmds.h"
+#include "funcapi.h"
#include "lib/bloomfilter.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
#include "storage/smgr.h"
+#include "utils/builtins.h"
#include "utils/memutils.h"
#include "utils/snapmgr.h"
-
+#include "amcheck.h"
PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(bt_index_check);
+PG_FUNCTION_INFO_V1(bt_index_parent_check);
+PG_FUNCTION_INFO_V1(verify_btreeam);
+
/*
* A B-Tree cannot possibly have this many levels, since there must be one
* block per level, which is bound by the range of BlockNumber:
@@ -50,6 +56,20 @@ PG_MODULE_MAGIC;
#define BTreeTupleGetNKeyAtts(itup, rel) \
Min(IndexRelationGetNumberOfKeyAttributes(rel), BTreeTupleGetNAtts(itup, rel))
+/*
+ * Context for use within verify_btreeam()
+ */
+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;
+
+#define CONTINUE_CHECKING(ctx) \
+ (ctx == NULL || !((ctx)->is_corrupt && (ctx)->on_error_stop))
+
/*
* State associated with verifying a B-Tree index
*
@@ -116,6 +136,9 @@ typedef struct BtreeCheckState
bloom_filter *filter;
/* Debug counter */
int64 heaptuplespresent;
+
+ /* Error reporting context */
+ BtreeCheckContext *ctx;
} BtreeCheckState;
/*
@@ -133,28 +156,28 @@ typedef struct BtreeLevel
bool istruerootlevel;
} BtreeLevel;
-PG_FUNCTION_INFO_V1(bt_index_check);
-PG_FUNCTION_INFO_V1(bt_index_parent_check);
-
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);
static inline void btree_index_checkable(Relation rel);
static inline bool btree_index_mainfork_expected(Relation rel);
static void bt_check_every_level(Relation rel, Relation heaprel,
bool heapkeyspace, bool readonly, bool heapallindexed,
- bool rootdescend);
+ bool rootdescend, BtreeCheckContext * ctx);
static BtreeLevel bt_check_level_from_leftmost(BtreeCheckState *state,
- BtreeLevel level);
-static void bt_target_page_check(BtreeCheckState *state);
-static BTScanInsert bt_right_page_check_scankey(BtreeCheckState *state);
+ BtreeLevel level, BtreeCheckContext * ctx);
+static void bt_target_page_check(BtreeCheckState *state, BtreeCheckContext * ctx);
+static BTScanInsert bt_right_page_check_scankey(BtreeCheckState *state, BtreeCheckContext * ctx);
static void bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
- OffsetNumber downlinkoffnum);
+ OffsetNumber downlinkoffnum, BtreeCheckContext * ctx);
static void bt_child_highkey_check(BtreeCheckState *state,
OffsetNumber target_downlinkoffnum,
Page loaded_child,
- uint32 target_level);
+ uint32 target_level,
+ BtreeCheckContext * ctx);
static void bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
- BlockNumber targetblock, Page target);
+ BlockNumber targetblock, Page target,
+ BtreeCheckContext * ctx);
static void bt_tuple_present_callback(Relation index, ItemPointer tid,
Datum *values, bool *isnull,
bool tupleIsAlive, void *checkstate);
@@ -176,7 +199,7 @@ static inline bool invariant_l_nontarget_offset(BtreeCheckState *state,
BlockNumber nontargetblock,
Page nontarget,
OffsetNumber upperbound);
-static Page palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum);
+static Page palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum, BtreeCheckContext * ctx);
static inline BTScanInsert bt_mkscankey_pivotsearch(Relation rel,
IndexTuple itup);
static ItemId PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block,
@@ -185,6 +208,26 @@ static inline ItemPointer BTreeTupleGetHeapTIDCareful(BtreeCheckState *state,
IndexTuple itup, bool nonpivot);
static inline ItemPointer BTreeTupleGetPointsToTID(IndexTuple itup);
+static TupleDesc verify_btreeam_tupdesc(void);
+static void confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg);
+
+/*
+ * Macro for either calling ereport(...) or confess(...) depending on whether
+ * a context for returning the error message exists. Prior to version 1.3,
+ * all functions reported any detected corruption via ereport, but starting in
+ * 1.3, the new function verify_btreeam reports detected corruption back to
+ * the caller as a set of rows, and pre-existing functions continue to report
+ * corruption via ereport. This macro allows the shared implementation to
+ * to do the right thing depending on context.
+ */
+#define econfess(ctx, blkno, code, ...) \
+ do { \
+ if (ctx) \
+ confess(ctx, blkno, psprintf(__VA_ARGS__)); \
+ else \
+ ereport(ERROR, (errcode(code), errmsg(__VA_ARGS__))); \
+ } while(0)
+
/*
* bt_index_check(index regclass, heapallindexed boolean)
*
@@ -203,7 +246,7 @@ bt_index_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 2)
heapallindexed = PG_GETARG_BOOL(1);
- bt_index_check_internal(indrelid, false, heapallindexed, false);
+ bt_index_check_internal(indrelid, false, heapallindexed, false, NULL);
PG_RETURN_VOID();
}
@@ -229,17 +272,64 @@ bt_index_parent_check(PG_FUNCTION_ARGS)
if (PG_NARGS() == 3)
rootdescend = PG_GETARG_BOOL(2);
- bt_index_check_internal(indrelid, true, heapallindexed, rootdescend);
+ bt_index_check_internal(indrelid, true, heapallindexed, rootdescend, NULL);
PG_RETURN_VOID();
}
+Datum
+verify_btreeam(PG_FUNCTION_ARGS)
+{
+#define BTREECHECK_RELATION_COLS 2
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext oldcontext;
+ BtreeCheckContext ctx;
+ bool randomAccess;
+ Oid indrelid;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ indrelid = PG_GETARG_OID(0);
+
+ memset(&ctx, 0, sizeof(BtreeCheckContext));
+
+ ctx.on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ oldcontext = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ randomAccess = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_btreeam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(oldcontext);
+
+ bt_index_check_internal(indrelid, true, true, true, &ctx);
+
+ PG_RETURN_NULL();
+}
+
/*
* Helper for bt_index_[parent_]check, coordinating the bulk of the work.
*/
static void
bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
- bool rootdescend)
+ bool rootdescend, BtreeCheckContext * ctx)
{
Oid heapid;
Relation indrel;
@@ -300,15 +390,16 @@ bt_index_check_internal(Oid indrelid, bool parentcheck, bool heapallindexed,
RelationOpenSmgr(indrel);
if (!smgrexists(indrel->rd_smgr, MAIN_FORKNUM))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" lacks a main relation fork",
- RelationGetRelationName(indrel))));
+ econfess(ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" lacks a main relation fork",
+ RelationGetRelationName(indrel));
/* Check index, possibly against table it is an index on */
- _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
- bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
- heapallindexed, rootdescend);
+ if (CONTINUE_CHECKING(ctx))
+ _bt_metaversion(indrel, &heapkeyspace, &allequalimage);
+ if (CONTINUE_CHECKING(ctx))
+ bt_check_every_level(indrel, heaprel, heapkeyspace, parentcheck,
+ heapallindexed, rootdescend, ctx);
}
/*
@@ -402,7 +493,8 @@ btree_index_mainfork_expected(Relation rel)
*/
static void
bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
- bool readonly, bool heapallindexed, bool rootdescend)
+ bool readonly, bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx)
{
BtreeCheckState *state;
Page metapage;
@@ -434,6 +526,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->readonly = readonly;
state->heapallindexed = heapallindexed;
state->rootdescend = rootdescend;
+ state->ctx = ctx;
if (state->heapallindexed)
{
@@ -506,7 +599,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);
/* Get true root block from meta-page */
- metapage = palloc_btree_page(state, BTREE_METAPAGE);
+ metapage = palloc_btree_page(state, BTREE_METAPAGE, ctx);
metad = BTPageGetMeta(metapage);
/*
@@ -535,19 +628,18 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
current.level = metad->btm_level;
current.leftmost = metad->btm_root;
current.istruerootlevel = true;
- while (current.leftmost != P_NONE)
+ while (CONTINUE_CHECKING(state->ctx) && current.leftmost != P_NONE)
{
/*
* Verify this level, and get left most page for next level down, if
* not at leaf level
*/
- current = bt_check_level_from_leftmost(state, current);
+ current = bt_check_level_from_leftmost(state, current, ctx);
if (current.leftmost == InvalidBlockNumber)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" has no valid pages on level below %u or first level",
- RelationGetRelationName(rel), previouslevel)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" has no valid pages on level below %u or first level",
+ RelationGetRelationName(rel), previouslevel);
previouslevel = current.level;
}
@@ -555,7 +647,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
/*
* * Check whether heap contains unindexed/malformed tuples *
*/
- if (state->heapallindexed)
+ if (CONTINUE_CHECKING(state->ctx) && state->heapallindexed)
{
IndexInfo *indexinfo = BuildIndexInfo(state->rel);
TableScanDesc scan;
@@ -639,7 +731,7 @@ bt_check_every_level(Relation rel, Relation heaprel, bool heapkeyspace,
* each call to bt_target_page_check().
*/
static BtreeLevel
-bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
+bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level, BtreeCheckContext * ctx)
{
/* State to establish early, concerning entire level */
BTPageOpaque opaque;
@@ -672,7 +764,7 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
/* Initialize state for this iteration */
state->targetblock = current;
- state->target = palloc_btree_page(state, state->targetblock);
+ state->target = palloc_btree_page(state, state->targetblock, ctx);
state->targetlsn = PageGetLSN(state->target);
opaque = (BTPageOpaque) PageGetSpecialPointer(state->target);
@@ -691,18 +783,16 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* checked.
*/
if (state->readonly && P_ISDELETED(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink or sibling link points to deleted block in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "downlink or sibling link points to deleted block in index \"%s\" "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
if (P_RIGHTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u fell off the end of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u fell off the end of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
else
ereport(DEBUG1,
(errcode(ERRCODE_NO_DATA),
@@ -722,16 +812,14 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
if (state->readonly)
{
if (!P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not leftmost in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not leftmost in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
if (level.istruerootlevel && !P_ISROOT(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u is not true root in index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "block %u is not true root in index \"%s\"",
+ current, RelationGetRelationName(state->rel));
}
/*
@@ -780,33 +868,30 @@ bt_check_level_from_leftmost(BtreeCheckState *state, BtreeLevel level)
* so sibling pointers should always be in mutual agreement
*/
if (state->readonly && opaque->btpo_prev != leftcurrent)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("left link/right link pair in index \"%s\" not in agreement",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u left block=%u left link from block=%u.",
- current, leftcurrent, opaque->btpo_prev)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "left link/right link pair in index \"%s\" not in agreement "
+ "(Block=%u left block=%u left link from block=%u)",
+ RelationGetRelationName(state->rel),
+ current, leftcurrent, opaque->btpo_prev);
/* Check level, which must be valid for non-ignorable page */
if (level.level != opaque->btpo.level)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leftmost down link for level points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- current, level.level, opaque->btpo.level)));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "leftmost down link for level points to block in index \"%s\" whose level is not one level down "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ current, level.level, opaque->btpo.level);
/* Verify invariants for page */
- bt_target_page_check(state);
+ bt_target_page_check(state, ctx);
nextpage:
/* Try to detect circular links */
if (current == leftcurrent || current == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- current, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, current, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ current, RelationGetRelationName(state->rel));
leftcurrent = current;
current = opaque->btpo_next;
@@ -850,7 +935,7 @@ nextpage:
/* Free page and associated memory for this iteration */
MemoryContextReset(state->targetcontext);
}
- while (current != P_NONE);
+ while (CONTINUE_CHECKING(state->ctx) && current != P_NONE);
if (state->lowkey)
{
@@ -902,7 +987,7 @@ nextpage:
* resetting state->targetcontext.
*/
static void
-bt_target_page_check(BtreeCheckState *state)
+bt_target_page_check(BtreeCheckState *state, BtreeCheckContext * ctx)
{
OffsetNumber offset;
OffsetNumber max;
@@ -930,16 +1015,15 @@ bt_target_page_check(BtreeCheckState *state)
P_HIKEY))
{
itup = (IndexTuple) PageGetItem(state->target, itemid);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of high key index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index block=%u natts=%u block type=%s page lsn=%X/%X.",
- state->targetblock,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of high key index tuple attributes in index \"%s\" "
+ "(Index block=%u natts=%u block type=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -949,7 +1033,7 @@ bt_target_page_check(BtreeCheckState *state)
* real item (if any).
*/
for (offset = P_FIRSTDATAKEY(topaque);
- offset <= max;
+ offset <= max && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
ItemId itemid;
@@ -973,16 +1057,15 @@ bt_target_page_check(BtreeCheckState *state)
* frequently, and is surprisingly tolerant of corrupt lp_len fields.
*/
if (tupsize != ItemIdGetLength(itemid))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index tuple size does not equal lp_len in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X.",
- state->targetblock, offset,
- tupsize, ItemIdGetLength(itemid),
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn),
- errhint("This could be a torn page problem.")));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "index tuple size does not equal lp_len in index \"%s\" "
+ "(Index tid=(%u,%u) tuple size=%zu lp_len=%u page lsn=%X/%X) "
+ "(This could be a torn page problem)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ tupsize, ItemIdGetLength(itemid),
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check the number of index tuple attributes */
if (!_bt_check_natts(state->rel, state->heapkeyspace, state->target,
@@ -998,17 +1081,16 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("wrong number of index tuple attributes in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X.",
- itid,
- BTreeTupleGetNAtts(itup, state->rel),
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "wrong number of index tuple attributes in index \"%s\" "
+ "(Index tid=%s natts=%u points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ BTreeTupleGetNAtts(itup, state->rel),
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1027,7 +1109,8 @@ bt_target_page_check(BtreeCheckState *state)
bt_child_highkey_check(state,
offset,
NULL,
- topaque->btpo.level);
+ topaque->btpo.level,
+ ctx);
}
continue;
}
@@ -1049,14 +1132,13 @@ bt_target_page_check(BtreeCheckState *state)
htid = psprintf("(%u,%u)", ItemPointerGetBlockNumber(tid),
ItemPointerGetOffsetNumber(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("could not find tuple using search from root page in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to heap tid=%s page lsn=%X/%X.",
- itid, htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "could not find tuple using search from root page in index \"%s\" "
+ "(Index tid=%s points to heap tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1079,14 +1161,13 @@ bt_target_page_check(BtreeCheckState *state)
{
char *itid = psprintf("(%u,%u)", state->targetblock, offset);
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("posting list contains misplaced TID in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s posting list offset=%d page lsn=%X/%X.",
- itid, i,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "posting list contains misplaced TID in index \"%s\" "
+ "(Index tid=%s posting list offset=%d page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid, i,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
ItemPointerCopy(current, &last);
@@ -1134,16 +1215,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index row size %zu exceeds maximum for index \"%s\"",
- tupsize, RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index row size %zu exceeds maximum for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ tupsize, RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Fingerprint leaf page tuples (those that point to the heap) */
@@ -1242,16 +1322,15 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("high key invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=%s points to %s tid=%s page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "high key invariant violated for index \"%s\" "
+ "(Index tid=%s points to %s tid=%s page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/* Reset, in case scantid was set to (itup) posting tuple's max TID */
skey->scantid = scantid;
@@ -1289,21 +1368,20 @@ bt_target_page_check(BtreeCheckState *state)
ItemPointerGetBlockNumberNoCheck(tid),
ItemPointerGetOffsetNumberNoCheck(tid));
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Lower index tid=%s (points to %s tid=%s) "
- "higher index tid=%s (points to %s tid=%s) "
- "page lsn=%X/%X.",
- itid,
- P_ISLEAF(topaque) ? "heap" : "index",
- htid,
- nitid,
- P_ISLEAF(topaque) ? "heap" : "index",
- nhtid,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "item order invariant violated for index \"%s\" "
+ "(Lower index tid=%s (points to %s tid=%s) "
+ "higher index tid=%s (points to %s tid=%s) "
+ "page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ itid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ htid,
+ nitid,
+ P_ISLEAF(topaque) ? "heap" : "index",
+ nhtid,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
/*
@@ -1328,7 +1406,7 @@ bt_target_page_check(BtreeCheckState *state)
BTScanInsert rightkey;
/* Get item in next/right page */
- rightkey = bt_right_page_check_scankey(state);
+ rightkey = bt_right_page_check_scankey(state, ctx);
if (rightkey &&
!invariant_g_offset(state, rightkey, max))
@@ -1343,7 +1421,7 @@ bt_target_page_check(BtreeCheckState *state)
if (!state->readonly)
{
/* Get fresh copy of target page */
- state->target = palloc_btree_page(state, state->targetblock);
+ state->target = palloc_btree_page(state, state->targetblock, ctx);
/* Note that we deliberately do not update target LSN */
topaque = (BTPageOpaque) PageGetSpecialPointer(state->target);
@@ -1354,14 +1432,13 @@ bt_target_page_check(BtreeCheckState *state)
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("cross page item order invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Last item on page tid=(%u,%u) page lsn=%X/%X.",
- state->targetblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "cross page item order invariant violated for index \"%s\" "
+ "(Last item on page tid=(%u,%u) page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1374,7 +1451,7 @@ bt_target_page_check(BtreeCheckState *state)
* because it has no useful value to compare).
*/
if (!P_ISLEAF(topaque) && state->readonly)
- bt_child_check(state, skey, offset);
+ bt_child_check(state, skey, offset, ctx);
}
/*
@@ -1386,10 +1463,11 @@ bt_target_page_check(BtreeCheckState *state)
* right of the child page pointer to by our rightmost downlink. And they
* might have missing downlinks. This final call checks for them.
*/
- if (!P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
+ if (CONTINUE_CHECKING(state->ctx) &&
+ !P_ISLEAF(topaque) && P_RIGHTMOST(topaque) && state->readonly)
{
bt_child_highkey_check(state, InvalidOffsetNumber,
- NULL, topaque->btpo.level);
+ NULL, topaque->btpo.level, ctx);
}
}
@@ -1410,7 +1488,7 @@ bt_target_page_check(BtreeCheckState *state)
* been concurrently deleted.
*/
static BTScanInsert
-bt_right_page_check_scankey(BtreeCheckState *state)
+bt_right_page_check_scankey(BtreeCheckState *state, BtreeCheckContext * ctx)
{
BTPageOpaque opaque;
ItemId rightitem;
@@ -1455,7 +1533,7 @@ bt_right_page_check_scankey(BtreeCheckState *state)
{
CHECK_FOR_INTERRUPTS();
- rightpage = palloc_btree_page(state, targetnext);
+ rightpage = palloc_btree_page(state, targetnext, ctx);
opaque = (BTPageOpaque) PageGetSpecialPointer(rightpage);
if (!P_IGNORE(opaque) || P_RIGHTMOST(opaque))
@@ -1666,7 +1744,8 @@ static void
bt_child_highkey_check(BtreeCheckState *state,
OffsetNumber target_downlinkoffnum,
Page loaded_child,
- uint32 target_level)
+ uint32 target_level,
+ BtreeCheckContext * ctx)
{
BlockNumber blkno = state->prevrightlink;
Page page;
@@ -1708,7 +1787,7 @@ bt_child_highkey_check(BtreeCheckState *state,
}
/* Move to the right on the child level */
- while (true)
+ while (CONTINUE_CHECKING(state->ctx))
{
/*
* Did we traverse the whole tree level and this is check for pages to
@@ -1723,51 +1802,47 @@ bt_child_highkey_check(BtreeCheckState *state,
/* Did we traverse the whole tree level and don't find next downlink? */
if (blkno == P_NONE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't traverse from downlink %u to downlink %u of index \"%s\"",
- state->prevrightlink, downlink,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "can't traverse from downlink %u to downlink %u of index \"%s\"",
+ state->prevrightlink, downlink,
+ RelationGetRelationName(state->rel));
/* Load page contents */
if (blkno == downlink && loaded_child)
page = loaded_child;
else
- page = palloc_btree_page(state, blkno);
+ page = palloc_btree_page(state, blkno, ctx);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && !P_LEFTMOST(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("the first child of leftmost target page is not leftmost of its level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "the first child of leftmost target page is not leftmost of its level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
/* Check level for non-ignorable page */
if (!P_IGNORE(opaque) && opaque->btpo.level != target_level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block found while following rightlinks from child of index \"%s\" has invalid level",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, target_level - 1, opaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "block found while following rightlinks from child of index \"%s\" has invalid level "
+ "(Block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, target_level - 1, opaque->btpo.level);
/* Try to detect circular links */
if ((!first && blkno == state->prevrightlink) || blkno == opaque->btpo_prev)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("circular link chain found in block %u of index \"%s\"",
- blkno, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "circular link chain found in block %u of index \"%s\"",
+ blkno, RelationGetRelationName(state->rel));
if (blkno != downlink && !P_IGNORE(opaque))
{
/* blkno probably has missing parent downlink */
- bt_downlink_missing_check(state, rightsplit, blkno, page);
+ bt_downlink_missing_check(state, rightsplit, blkno, page, ctx);
}
rightsplit = P_INCOMPLETE_SPLIT(opaque);
@@ -1825,14 +1900,13 @@ bt_child_highkey_check(BtreeCheckState *state,
if (pivotkey_offset > PageGetMaxOffsetNumber(state->target))
{
if (P_RIGHTMOST(topaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("child high key is greater than rightmost pivot key on target level in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "child high key is greater than rightmost pivot key on target level in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
pivotkey_offset = P_HIKEY;
}
itemid = PageGetItemIdCareful(state, state->targetblock,
@@ -1856,27 +1930,25 @@ bt_child_highkey_check(BtreeCheckState *state,
* page.
*/
if (!state->lowkey)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("can't find left sibling high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "can't find left sibling high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
itup = state->lowkey;
}
if (!bt_pivot_tuple_identical(highkey, itup))
{
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("mismatch between parent key and child high key in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Target block=%u child block=%u target page lsn=%X/%X.",
- state->targetblock, blkno,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "mismatch between parent key and child high key in index \"%s\" "
+ "(Target block=%u child block=%u target page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, blkno,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
}
@@ -1913,7 +1985,7 @@ bt_child_highkey_check(BtreeCheckState *state,
*/
static void
bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
- OffsetNumber downlinkoffnum)
+ OffsetNumber downlinkoffnum, BtreeCheckContext * ctx)
{
ItemId itemid;
IndexTuple itup;
@@ -1978,7 +2050,7 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* the operator class obeys the transitive law.
*/
topaque = (BTPageOpaque) PageGetSpecialPointer(state->target);
- child = palloc_btree_page(state, childblock);
+ child = palloc_btree_page(state, childblock, ctx);
copaque = (BTPageOpaque) PageGetSpecialPointer(child);
maxoffset = PageGetMaxOffsetNumber(child);
@@ -1987,7 +2059,7 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* check for downlink connectivity.
*/
bt_child_highkey_check(state, downlinkoffnum,
- child, topaque->btpo.level);
+ child, topaque->btpo.level, ctx);
/*
* Since there cannot be a concurrent VACUUM operation in readonly mode,
@@ -2014,17 +2086,16 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
* to test.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("downlink to deleted page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child block=%u parent page lsn=%X/%X.",
- state->targetblock, childblock,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted page found in index \"%s\" "
+ "(Parent block=%u child block=%u parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
for (offset = P_FIRSTDATAKEY(copaque);
- offset <= maxoffset;
+ offset <= maxoffset && CONTINUE_CHECKING(state->ctx);
offset = OffsetNumberNext(offset))
{
/*
@@ -2056,14 +2127,13 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
if (!invariant_l_nontarget_offset(state, targetkey, childblock, child,
offset))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("down-link lower bound invariant violated for index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X.",
- state->targetblock, childblock, offset,
- (uint32) (state->targetlsn >> 32),
- (uint32) state->targetlsn)));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "down-link lower bound invariant violated for index \"%s\" "
+ "(Parent block=%u child index tid=(%u,%u) parent page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ state->targetblock, childblock, offset,
+ (uint32) (state->targetlsn >> 32),
+ (uint32) state->targetlsn);
}
pfree(child);
@@ -2084,7 +2154,7 @@ bt_child_check(BtreeCheckState *state, BTScanInsert targetkey,
*/
static void
bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
- BlockNumber blkno, Page page)
+ BlockNumber blkno, Page page, BtreeCheckContext * ctx)
{
BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
ItemId itemid;
@@ -2150,14 +2220,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* inconsistencies anywhere else.
*/
if (P_ISLEAF(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("leaf index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u page lsn=%X/%X.",
- blkno,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "leaf index block lacks downlink in index \"%s\" "
+ "(Block=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/* Descend from the given page, which is an internal page */
elog(DEBUG1, "checking for interrupted multi-level deletion due to missing downlink in index \"%s\"",
@@ -2167,11 +2236,11 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
itemid = PageGetItemIdCareful(state, blkno, page, P_FIRSTDATAKEY(opaque));
itup = (IndexTuple) PageGetItem(page, itemid);
childblk = BTreeTupleGetDownLink(itup);
- for (;;)
+ while (CONTINUE_CHECKING(state->ctx))
{
CHECK_FOR_INTERRUPTS();
- child = palloc_btree_page(state, childblk);
+ child = palloc_btree_page(state, childblk, ctx);
copaque = (BTPageOpaque) PageGetSpecialPointer(child);
if (P_ISLEAF(copaque))
@@ -2179,13 +2248,12 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
/* Do an extra sanity check in passing on internal pages */
if (copaque->btpo.level != level - 1)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink points to block in index \"%s\" whose level is not one level down",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u.",
- blkno, childblk,
- level - 1, copaque->btpo.level)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink points to block in index \"%s\" whose level is not one level down "
+ "(Top parent/under check block=%u block pointed to=%u expected level=%u level in pointed to block=%u)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ level - 1, copaque->btpo.level);
level = copaque->btpo.level;
itemid = PageGetItemIdCareful(state, childblk, child,
@@ -2217,14 +2285,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
* parent/ancestor page) lacked a downlink is incidental.
*/
if (P_ISDELETED(copaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("downlink to deleted leaf page found in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X.",
- blkno, childblk,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "downlink to deleted leaf page found in index \"%s\" "
+ "(Top parent/target block=%u leaf block=%u top parent/under check lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, childblk,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
/*
* Iff leaf page is half-dead, its high key top parent link should point
@@ -2244,14 +2311,13 @@ bt_downlink_missing_check(BtreeCheckState *state, bool rightsplit,
return;
}
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal index block lacks downlink in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Block=%u level=%u page lsn=%X/%X.",
- blkno, opaque->btpo.level,
- (uint32) (pagelsn >> 32),
- (uint32) pagelsn)));
+ econfess(state->ctx, blkno, ERRCODE_INDEX_CORRUPTED,
+ "internal index block lacks downlink in index \"%s\" "
+ "(Block=%u level=%u page lsn=%X/%X)",
+ RelationGetRelationName(state->rel),
+ blkno, opaque->btpo.level,
+ (uint32) (pagelsn >> 32),
+ (uint32) pagelsn);
}
/*
@@ -2327,16 +2393,12 @@ bt_tuple_present_callback(Relation index, ItemPointer tid, Datum *values,
/* Probe Bloom filter -- tuple should be present */
if (bloom_lacks_element(state->filter, (unsigned char *) norm,
IndexTupleSize(norm)))
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->heaprel),
- RelationGetRelationName(state->rel)),
- !state->readonly
- ? errhint("Retrying verification using the function bt_index_parent_check() might provide a more specific error.")
- : 0));
+ econfess(state->ctx, ItemPointerGetBlockNumber(&(itup->t_tid)), ERRCODE_DATA_CORRUPTED,
+ "heap tuple (%u,%u) from table \"%s\" lacks matching index tuple within index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->heaprel),
+ RelationGetRelationName(state->rel));
state->heaptuplespresent++;
pfree(itup);
@@ -2395,7 +2457,7 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
if (!IndexTupleHasVarwidths(itup))
return itup;
- for (i = 0; i < tupleDescriptor->natts; i++)
+ for (i = 0; CONTINUE_CHECKING(state->ctx) && i < tupleDescriptor->natts; i++)
{
Form_pg_attribute att;
@@ -2415,12 +2477,11 @@ bt_normalize_tuple(BtreeCheckState *state, IndexTuple itup)
* should never be encountered here
*/
if (VARATT_IS_EXTERNAL(DatumGetPointer(normalized[i])))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
- ItemPointerGetBlockNumber(&(itup->t_tid)),
- ItemPointerGetOffsetNumber(&(itup->t_tid)),
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "external varlena datum in tuple that references heap row (%u,%u) in index \"%s\"",
+ ItemPointerGetBlockNumber(&(itup->t_tid)),
+ ItemPointerGetOffsetNumber(&(itup->t_tid)),
+ RelationGetRelationName(state->rel));
else if (VARATT_IS_COMPRESSED(DatumGetPointer(normalized[i])))
{
formnewtup = true;
@@ -2780,7 +2841,7 @@ invariant_l_nontarget_offset(BtreeCheckState *state, BTScanInsert key,
* misbehaves.
*/
static Page
-palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
+palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum, BtreeCheckContext * ctx)
{
Buffer buffer;
Page page;
@@ -2810,10 +2871,9 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
if (P_ISMETA(opaque) && blocknum != BTREE_METAPAGE)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid meta page found at block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid meta page found at block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/* Check page from block that ought to be meta page */
if (blocknum == BTREE_METAPAGE)
@@ -2822,20 +2882,18 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
if (!P_ISMETA(opaque) ||
metad->btm_magic != BTREE_MAGIC)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("index \"%s\" meta page is corrupt",
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "index \"%s\" meta page is corrupt",
+ RelationGetRelationName(state->rel));
if (metad->btm_version < BTREE_MIN_VERSION ||
metad->btm_version > BTREE_VERSION)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("version mismatch in index \"%s\": file version %d, "
- "current version %d, minimum supported version %d",
- RelationGetRelationName(state->rel),
- metad->btm_version, BTREE_VERSION,
- BTREE_MIN_VERSION)));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "version mismatch in index \"%s\": file version %d, "
+ "current version %d, minimum supported version %d",
+ RelationGetRelationName(state->rel),
+ metad->btm_version, BTREE_VERSION,
+ BTREE_MIN_VERSION);
/* Finished with metapage checks */
return page;
@@ -2846,17 +2904,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* page level
*/
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && opaque->btpo.level != 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid leaf page level %u for block %u in index \"%s\"",
- opaque->btpo.level, blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, InvalidBlockNumber, ERRCODE_INDEX_CORRUPTED,
+ "invalid leaf page level %u for block %u in index \"%s\"",
+ opaque->btpo.level, blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) &&
opaque->btpo.level == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid internal page level 0 for block %u in index \"%s\"",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "invalid internal page level 0 for block %u in index \"%s\"",
+ blocknum, RelationGetRelationName(state->rel));
/*
* Sanity checks for number of items on page.
@@ -2882,23 +2938,20 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
*/
maxoffset = PageGetMaxOffsetNumber(page);
if (maxoffset > MaxIndexTuplesPerPage)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("Number of items on block %u of index \"%s\" exceeds MaxIndexTuplesPerPage (%u)",
- blocknum, RelationGetRelationName(state->rel),
- MaxIndexTuplesPerPage)));
+ econfess(ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "Number of items on block %u of index \"%s\" exceeds MaxIndexTuplesPerPage (%u)",
+ blocknum, RelationGetRelationName(state->rel),
+ MaxIndexTuplesPerPage);
if (!P_ISLEAF(opaque) && !P_ISDELETED(opaque) && maxoffset < P_FIRSTDATAKEY(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal block %u in index \"%s\" lacks high key and/or at least one downlink",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal block %u in index \"%s\" lacks high key and/or at least one downlink",
+ blocknum, RelationGetRelationName(state->rel));
if (P_ISLEAF(opaque) && !P_ISDELETED(opaque) && !P_RIGHTMOST(opaque) && maxoffset < P_HIKEY)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("non-rightmost leaf block %u in index \"%s\" lacks high key item",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "non-rightmost leaf block %u in index \"%s\" lacks high key item",
+ blocknum, RelationGetRelationName(state->rel));
/*
* In general, internal pages are never marked half-dead, except on
@@ -2910,17 +2963,15 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* Internal pages should never have garbage items, either.
*/
if (!P_ISLEAF(opaque) && P_ISHALFDEAD(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" is half-dead",
- blocknum, RelationGetRelationName(state->rel)),
- errhint("This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it.")));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" is half-dead "
+ "(This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it)",
+ blocknum, RelationGetRelationName(state->rel));
if (!P_ISLEAF(opaque) && P_HAS_GARBAGE(opaque))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("internal page block %u in index \"%s\" has garbage items",
- blocknum, RelationGetRelationName(state->rel))));
+ econfess(state->ctx, blocknum, ERRCODE_INDEX_CORRUPTED,
+ "internal page block %u in index \"%s\" has garbage items",
+ blocknum, RelationGetRelationName(state->rel));
return page;
}
@@ -2971,14 +3022,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
if (ItemIdGetOffset(itemid) + ItemIdGetLength(itemid) >
BLCKSZ - sizeof(BTPageOpaqueData))
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("line pointer points past end of tuple space in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "line pointer points past end of tuple space in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
/*
* Verify that line pointer isn't LP_REDIRECT or LP_UNUSED, since nbtree
@@ -2987,14 +3037,13 @@ PageGetItemIdCareful(BtreeCheckState *state, BlockNumber block, Page page,
*/
if (ItemIdIsRedirected(itemid) || !ItemIdIsUsed(itemid) ||
ItemIdGetLength(itemid) == 0)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("invalid line pointer storage in index \"%s\"",
- RelationGetRelationName(state->rel)),
- errdetail_internal("Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u.",
- block, offset, ItemIdGetOffset(itemid),
- ItemIdGetLength(itemid),
- ItemIdGetFlags(itemid))));
+ econfess(state->ctx, block, ERRCODE_INDEX_CORRUPTED,
+ "invalid line pointer storage in index \"%s\" "
+ "(Index tid=(%u,%u) lp_off=%u, lp_len=%u lp_flags=%u)",
+ RelationGetRelationName(state->rel),
+ block, offset, ItemIdGetOffset(itemid),
+ ItemIdGetLength(itemid),
+ ItemIdGetFlags(itemid));
return itemid;
}
@@ -3016,26 +3065,23 @@ BTreeTupleGetHeapTIDCareful(BtreeCheckState *state, IndexTuple itup,
*/
Assert(state->heapkeyspace);
if (BTreeTupleIsPivot(itup) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
if (!BTreeTupleIsPivot(itup) && !nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg_internal("block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" has unexpected non-pivot tuple",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
htid = BTreeTupleGetHeapTID(itup);
if (!ItemPointerIsValid(htid) && nonpivot)
- ereport(ERROR,
- (errcode(ERRCODE_INDEX_CORRUPTED),
- errmsg("block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
- state->targetblock,
- RelationGetRelationName(state->rel))));
+ econfess(state->ctx, state->targetblock, ERRCODE_INDEX_CORRUPTED,
+ "block %u or its right sibling block or child block in index \"%s\" contains non-pivot tuple that lacks a heap TID",
+ state->targetblock,
+ RelationGetRelationName(state->rel));
return htid;
}
@@ -3066,3 +3112,53 @@ BTreeTupleGetPointsToTID(IndexTuple itup)
/* Pivot tuple returns TID with downlink block (heapkeyspace variant) */
return &itup->t_tid;
}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_btreeam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(BTREECHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == BTREECHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * confess
+ *
+ * Return a message about index corruption
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(BtreeCheckContext * ctx, BlockNumber blkno, char *msg)
+{
+ Datum values[BTREECHECK_RELATION_COLS];
+ bool nulls[BTREECHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(blkno);
+ nulls[0] = (blkno == InvalidBlockNumber);
+ values[1] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. That is OK, but leaves no room for leaking all the msg
+ * arguments that are allocated during the scan.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1e14..467120f1d0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -275,6 +275,7 @@ BrinSpecialSpace
BrinStatsData
BrinTuple
BrinValues
+BtreeCheckContext
BtreeCheckState
BtreeLevel
Bucket
--
2.21.1 (Apple Git-122.3)
v13-0002-Adding-function-verify_heapam-to-amcheck-module.patchapplication/octet-stream; name=v13-0002-Adding-function-verify_heapam-to-amcheck-module.patch; x-unix-mode=0644Download
From 6846abccc7c87b9377dc2f12bb6d8321f445d118 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 27 Jul 2020 08:03:53 -0700
Subject: [PATCH v13 2/3] Adding function verify_heapam to amcheck module
Adding new function verify_heapam for checking a heap relation and
associated toast relation, if any, to contrib/amcheck.
---
contrib/amcheck/Makefile | 5 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 54 +
contrib/amcheck/amcheck.h | 5 +
contrib/amcheck/expected/check_heap.out | 67 +
.../amcheck/expected/disallowed_reltypes.out | 48 +
contrib/amcheck/sql/check_heap.sql | 19 +
contrib/amcheck/sql/disallowed_reltypes.sql | 48 +
contrib/amcheck/t/001_verify_heapam.pl | 94 ++
contrib/amcheck/verify_heapam.c | 1139 +++++++++++++++++
doc/src/sgml/amcheck.sgml | 106 +-
src/backend/access/heap/hio.c | 11 +
src/tools/pgindent/typedefs.list | 2 +
12 files changed, 1596 insertions(+), 2 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/expected/disallowed_reltypes.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/sql/disallowed_reltypes.sql
create mode 100644 contrib/amcheck/t/001_verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index b288c28fa0..27d38b2e86 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap disallowed_reltypes
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..df418a850b
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,54 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.3,
+-- create new, overloaded version of the 1.2 function signature
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(rel regclass,
+ on_error_stop boolean default false,
+ skip cstring default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
+
+--
+-- verify_btreeam()
+--
+CREATE FUNCTION verify_btreeam(rel regclass,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+CREATE FUNCTION verify_btreeam(rel regclass,
+ on_error_stop boolean,
+ blkno OUT bigint,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_btreeam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_btreeam(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION verify_btreeam(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..74edfc2f65
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..4175bb2d37
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,67 @@
+CREATE TABLE heaptest (a integer, b text);
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'rope');
+ERROR: unrecognized parameter for 'skip': rope
+HINT: please choose from 'all-visible', 'all-frozen', or 'none'
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+ERROR: starting block 0 is out of bounds for relation with no blocks
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 100000, endblock := 200000);
+ERROR: block range 100000 .. 200000 is out of bounds for relation with block count 370
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+VACUUM FREEZE heaptest;
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | lp_off | lp_flags | lp_len | attnum | chunk | msg
+-------+--------+--------+----------+--------+--------+-------+-----
+(0 rows)
+
diff --git a/contrib/amcheck/expected/disallowed_reltypes.out b/contrib/amcheck/expected/disallowed_reltypes.out
new file mode 100644
index 0000000000..892ae89652
--- /dev/null
+++ b/contrib/amcheck/expected/disallowed_reltypes.out
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..c75f5ff869
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,19 @@
+CREATE TABLE heaptest (a integer, b text);
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'rope');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,10000) gs);
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 100000, endblock := 200000);
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
+VACUUM FREEZE heaptest;
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(rel := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(rel := 'heaptest', startblock := 0, endblock := 0);
diff --git a/contrib/amcheck/sql/disallowed_reltypes.sql b/contrib/amcheck/sql/disallowed_reltypes.sql
new file mode 100644
index 0000000000..fc90e6ca33
--- /dev/null
+++ b/contrib/amcheck/sql/disallowed_reltypes.sql
@@ -0,0 +1,48 @@
+--
+-- check that using the module's functions with unsupported relations will fail
+--
+
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
+-- indexes do not, so these all fail
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create view test_view as select 1;
+-- views do not have vms, so these all fail
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create sequence test_sequence;
+-- sequences do not have vms, so these all fail
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+-- foreign tables do not have vms, so these all fail
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
new file mode 100644
index 0000000000..c2d890bcd9
--- /dev/null
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -0,0 +1,94 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 48;
+
+my ($node, $result);
+
+# Check various options are stable (don't abort) when running verify_heapam on
+# the test table. For uncorrupted tables, there isn't anything to check except
+# that it runs without crashing.
+sub check_all_options
+{
+ for my $stop (qw(NULL true false))
+ {
+ for my $skip ("'none'", "'all-frozen'", "'all-visible'")
+ {
+ my $check = "SELECT verify_heapam('test', $stop, $skip)";
+ $result = $node->safe_psql('postgres', "$check; SELECT 1");
+ is ($result, 1, "checked: $check");
+ }
+ }
+}
+
+# Stops the server and writes nulls in the first page of the table,
+# assuming page size is large enough for offset 1000..1016 to be
+# in the midst of the first page of data.
+sub corrupt_first_page
+{
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('test')));
+ my $relpath = "$pgdata/$rel";
+ $node->stop;
+
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 1000, 0);
+ syswrite($fh, '\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', 16);
+ close($fh);
+
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Check empty table
+$node->safe_psql('postgres', q(
+ CREATE TABLE test (a integer);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+));
+check_all_options();
+
+# Check table with trivial data
+$node->safe_psql('postgres', q(INSERT INTO test VALUES (0)));
+check_all_options();
+
+# Check table with non-trivial data (more than a page worth) but
+# without any all-frozen or all-visible
+$node->safe_psql('postgres', q(
+INSERT INTO test SELECT generate_series(1,10000)));
+check_all_options();
+
+# Check table with all-visible data
+$node->safe_psql('postgres', q(VACUUM test));
+check_all_options();
+
+# Check table with all-frozen data
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+check_all_options();
+
+# Check table with corruption, no skipping
+corrupt_first_page();
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is($result, 't', 'corruption detected on first page');
+
+# Check table with corruption, skipping all-visible blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all-visible', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all-visible first page');
+
+# Check table with corruption, skipping all-frozen blocks
+$result = $node->safe_psql('postgres', q(
+SELECT COUNT(*) > 0 FROM verify_heapam('test', on_error_stop := false, skip := 'all-frozen', startblock := NULL, endblock := NULL)));
+is($result, 'f', 'skipping all-frozen first page');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..8bf6b891ff
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1139 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "amcheck.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/* The number of columns in tuples returned by verify_heapam */
+#define HEAPCHECK_RELATION_COLS 8
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam execution.
+ */
+typedef struct HeapCheckContext
+{
+ /*
+ * While verifying a table, we check whether any xid we encounter is
+ * either too old or too new. We could naively check that by taking the
+ * XidGenLock each time and reading ShmemVariableCache. We instead cache
+ * the values and rely on the fact that we have the table locked
+ * sufficiently that the oldest xid in the table cannot change
+ * mid-verification, and although the newest xid in the table may advance,
+ * it cannot retreat. As such, whenever we encounter an xid older than
+ * our cached oldest xid, we know it is invalid, and when we encounter an
+ * xid newer than our cached newest xid, we recheck the
+ * ShmemVariableCache.
+ */
+ TransactionId next_valid_xid;
+ TransactionId oldest_valid_xid;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ TransactionId relminmxid;
+ Relation toast_rel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Whether verify_heapam has yet encountered any corrupt tuples */
+ bool is_corrupt;
+
+ /* The descriptor and tuplestore for verify_heapam's result tuples */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+
+static void confess(HeapCheckContext *ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+
+static bool xid_valid_in_rel(TransactionId xid, HeapCheckContext *ctx);
+static bool tuple_is_visible(HeapTupleHeader tuphdr, HeapCheckContext *ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx);
+static bool check_tuple_attribute(HeapCheckContext *ctx);
+static void check_tuple(HeapCheckContext *ctx);
+
+typedef enum SkipPages
+{
+ SKIP_ALL_FROZEN_PAGES,
+ SKIP_ALL_VISIBLE_PAGES,
+ SKIP_PAGES_NONE
+} SkipPages;
+
+/*
+ * verify_heapam
+ *
+ * Scan and report corruption in heap pages or in associated toast relation.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext old_context;
+ bool random_access;
+ HeapCheckContext ctx;
+ FullTransactionId next_full_xid;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool fatal = false;
+ bool on_error_stop;
+ SkipPages skip_option = SKIP_PAGES_NONE;
+ int64 start_block;
+ int64 end_block;
+
+ /* check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
+ relid = PG_GETARG_OID(0);
+ on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+ if (!PG_ARGISNULL(2))
+ {
+ const char *skip = PG_GETARG_CSTRING(2);
+
+ if (pg_strcasecmp(skip, "all-visible") == 0)
+ skip_option = SKIP_ALL_VISIBLE_PAGES;
+ else if (pg_strcasecmp(skip, "all-frozen") == 0)
+ skip_option = SKIP_ALL_FROZEN_PAGES;
+ else if (pg_strcasecmp(skip, "none") == 0)
+ skip_option = SKIP_PAGES_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all-visible', 'all-frozen', or 'none'")));
+ }
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ old_context = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ random_access = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(random_access, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+
+ MemoryContextSwitchTo(old_context);
+
+ /*
+ * Open the relation. We use ShareUpdateExclusive to prevent concurrent
+ * vacuums from changing the relfrozenxid, relminmxid, or advancing the
+ * global oldestXid to be newer than those.
+ */
+ ctx.rel = relation_open(relid, ShareUpdateExclusiveLock);
+ check_relation_relkind_and_relam(ctx.rel);
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+
+ /* Early exit if the relation is empty */
+ if (!ctx.nblocks)
+ {
+ /*
+ * For consistency, we need to enforce that the start_block and
+ * end_block are within the valid range if the user specified them.
+ * Yet, for an empty table with no blocks, no specified block can be
+ * in range.
+ */
+ if (!PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("starting block " INT64_FORMAT " is out of bounds for relation with no blocks",
+ PG_GETARG_INT64(3))));
+ if (!PG_ARGISNULL(4))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block " INT64_FORMAT " is out of bounds for relation with no blocks",
+ PG_GETARG_INT64(4))));
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+ PG_RETURN_NULL();
+ }
+
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ /* If we get this far, we know the relation has at least one block */
+ start_block = PG_ARGISNULL(3) ? 0 : PG_GETARG_INT64(3);
+ end_block = PG_ARGISNULL(4) ? ((int64) ctx.nblocks) - 1 : PG_GETARG_INT64(4);
+ if (start_block < 0 || end_block >= ctx.nblocks || start_block > end_block)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block range " INT64_FORMAT " .. " INT64_FORMAT " is out of bounds for relation with block count %u",
+ start_block, end_block, ctx.nblocks)));
+
+ /*
+ * Open the toast relation, if any, also protected from concurrent
+ * vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toast_rel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ ShareUpdateExclusiveLock);
+ offset = toast_open_indexes(ctx.toast_rel,
+ ShareUpdateExclusiveLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /* Main relation has no associated toast relation */
+ ctx.toast_rel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ /*
+ * Now that we have our relation(s) locked, oldestXid cannot advance
+ * beyond the oldest valid xid in our table, nor can our relfrozenxid
+ * advance.
+ *
+ * If relfrozenxid is normal, it contains the oldest valid xid we may
+ * encounter in the table. If not, the oldest xid for our database is the
+ * oldest we should encounter.
+ *
+ * Bugs in pg_upgrade are reported (see commands/vacuum.c circa line 1572)
+ * to have sometimes rendered the oldest xid value for a database invalid.
+ * It seems unwise to report rows as corrupt for failing to be newer than
+ * a value which itself may be corrupt. We instead use the oldest xid for
+ * the entire cluster, which must be at least as old as the oldest xid for
+ * our database.
+ *
+ * If neither the value for the database nor the xids for any row are
+ * corrupt, then this gives the right answer. If the rows disagree with
+ * the value for the database, how can we know which one is wrong?
+ */
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ next_full_xid = ShmemVariableCache->nextFullXid;
+ ctx.oldest_valid_xid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+ ctx.next_valid_xid = XidFromFullTransactionId(next_full_xid);
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid) &&
+ TransactionIdPrecedes(ctx.relfrozenxid, ctx.oldest_valid_xid))
+ {
+ confess(&ctx,
+ psprintf("relfrozenxid %u precedes global oldest valid xid %u",
+ ctx.relfrozenxid, ctx.oldest_valid_xid));
+ fatal = true;
+ }
+ else if (TransactionIdIsNormal(ctx.relminmxid) &&
+ TransactionIdPrecedes(ctx.relminmxid, ctx.oldest_valid_xid))
+ {
+ confess(&ctx,
+ psprintf("relminmxid %u precedes global oldest valid xid %u",
+ ctx.relminmxid, ctx.oldest_valid_xid));
+ fatal = true;
+ }
+
+ if (fatal)
+ {
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ if (ctx.toast_rel)
+ table_close(ctx.toast_rel, ShareUpdateExclusiveLock);
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+ }
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldest_valid_xid = ctx.relfrozenxid;
+
+ for (ctx.blkno = start_block; ctx.blkno <= end_block; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+ PageHeader ph;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_option != SKIP_PAGES_NONE)
+ {
+ bool all_frozen,
+ all_visible;
+
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE;
+ all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN;
+
+ if ((all_frozen && skip_option == SKIP_ALL_FROZEN_PAGES) ||
+ (all_visible && skip_option == SKIP_ALL_VISIBLE_PAGES))
+ {
+ continue;
+ }
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+ ph = (PageHeader) ctx.page;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it
+ * redirects to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
+ ItemId rditem;
+
+ if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ {
+ confess(&ctx,
+ psprintf("line pointer redirection to item at offset number %u is outside valid bounds %u .. %u",
+ (unsigned) rdoffnum, (unsigned) FirstOffsetNumber,
+ (unsigned) maxoff));
+ continue;
+ }
+ rditem = PageGetItemId(ctx.page, rdoffnum);
+ if (!ItemIdIsUsed(rditem))
+ confess(&ctx,
+ psprintf("line pointer redirection to unused item at offset %u",
+ (unsigned) rdoffnum));
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /*
+ * Reset information about individual attributes and related toast
+ * values, so they show as NULL in the corruption report if we
+ * record a corruption before beginning to iterate over the
+ * attributes.
+ */
+ ctx.attnum = -1;
+ ctx.chunkno = -1;
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ ShareUpdateExclusiveLock);
+ if (ctx.toast_rel)
+ table_close(ctx.toast_rel, ShareUpdateExclusiveLock);
+
+ /* Close the main relation */
+ relation_close(ctx.rel, ShareUpdateExclusiveLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * check_relation_relkind_and_relam
+ *
+ * convenience routine to check that relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only heap AM is supported")));
+}
+
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext *ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+ int16 lp_off = ItemIdGetOffset(ctx->itemid);
+ int16 lp_flags = ItemIdGetFlags(ctx->itemid);
+ int16 lp_len = ItemIdGetLength(ctx->itemid);
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ nulls[1] = (ctx->offnum < 0);
+ values[2] = Int16GetDatum(lp_off);
+ nulls[2] = (lp_off < 0);
+ values[3] = Int16GetDatum(lp_flags);
+ nulls[3] = (lp_flags < 0);
+ values[4] = Int16GetDatum(lp_len);
+ nulls[4] = (lp_len < 0);
+ values[5] = Int32GetDatum(ctx->attnum);
+ nulls[5] = (ctx->attnum < 0);
+ values[6] = Int32GetDatum(ctx->chunkno);
+ nulls[6] = (ctx->chunkno < 0);
+ values[7] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using workmem worth of memory building up the
+ * tuplestore. That's ok, but if we also leak the msg argument memory
+ * until the end of the query, we could exceed workmem by more than a
+ * trivial amount. Therefore, free the msg argument each time we are
+ * called rather than waiting for our current memory context to be freed.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Helper function to construct the TupleDesc needed by verify_heapam.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_off", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_flags", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "lp_len", INT2OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "chunk", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext *ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldest_valid_xid, xid) &&
+ TransactionIdPrecedes(xid, ctx->next_valid_xid));
+}
+
+/*
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
+ *
+ * Returns whether the xid is newer than the oldest clog xid.
+ */
+static bool
+xid_valid_in_rel(TransactionId xid, HeapCheckContext *ctx)
+{
+ /* Quick return for special oids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /*
+ * If this xid is within the last known valid range of xids, then it has
+ * to be ok. The oldest valid xid cannot advance, because we have too
+ * strong a lock on the relation for that, and although the newest valid
+ * xid may advance, that doesn't invalidate anything from the range we've
+ * already identified.
+ */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ ctx->next_valid_xid =
+ XidFromFullTransactionId(ReadNextFullTransactionId());
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * tuple_is_visible
+ *
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
+ *
+ * 1) Does not touch hint bits. It seems imprudent to write hint bits
+ * to a table during a corruption check.
+ * 2) Only makes a boolean determination of whether verification should
+ * see the tuple, rather than doing extra work for vacuum-related
+ * categorization.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ */
+static bool
+tuple_is_visible(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (TransactionIdIsCurrentTransactionId(xvac))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (TransactionIdIsInProgress(xvac))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ if (!xid_valid_in_rel(xvac, ctx))
+ {
+ confess(ctx,
+ psprintf("old-style VACUUM FULL transaction ID %u is invalid in this relation",
+ xvac));
+ return false;
+ }
+ else if (TransactionIdDidCommit(xvac))
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return true; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return true; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ confess(ctx,
+ pstrdup("heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor has a valid xmax"));
+ return false;
+ }
+ if (TransactionIdIsInProgress(xmax))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+
+ else if (TransactionIdDidCommit(xmax))
+ {
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ {
+ if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ /* Ok, the tuple is live */
+ }
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true;
+}
+
+/*
+ * check_toast_tuple
+ *
+ * Checks the current toast tuple as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk sequence number is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ confess(ctx,
+ pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ uint32 header = ((varattrib_4b *) chunk)->va_4byte.va_header;
+
+ confess(ctx,
+ psprintf("corrupt extended toast chunk with sequence number %d has invalid varlena header %0x",
+ curchunk, header));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ confess(ctx,
+ psprintf("toast chunk sequence number %u not the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ confess(ctx,
+ psprintf("toast chunk sequence number %u exceeds the end chunk sequence number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ confess(ctx,
+ psprintf("toast chunk size %u differs from expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+
+ ctx->chunkno++;
+}
+
+/*
+ * check_tuple_attribute
+ *
+ * Checks the current attribute as tracked in ctx for corruption. Records
+ * any corruption found in ctx->corruption.
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in
+ * the case of a toasted value, continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The checks
+ * performed by Asserts present in those two functions are also performed
+ * here. In cases where those two functions are a bit cavalier in their
+ * assumptions about data being correct, we perform additional checks not
+ * present in either of those two functions. Where some condition is checked
+ * in both of those functions, we perform it here twice, as we parallel the
+ * logical flow of those two functions. The presence of duplicate checks
+ * seems a reasonable price to pay for keeping this code tightly coupled with
+ * the code it protects.
+ *
+ * Returns true if the tuple attribute is sane enough for processing to
+ * continue on to the next attribute, false otherwise.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext *ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask;
+ Form_pg_attribute thisatt;
+
+ infomask = ctx->tuphdr->t_infomask;
+ thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx,
+ psprintf("tuple attribute should start at offset %u, but tuple length is only %u",
+ ctx->tuphdr->t_hoff + ctx->offset, ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ confess(ctx,
+ psprintf("%s toast at offset %u is unexpected",
+ va_tag == VARTAG_INDIRECT ? "indirect" :
+ va_tag == VARTAG_EXPANDED_RO ? "expanded" :
+ va_tag == VARTAG_EXPANDED_RW ? "expanded" :
+ "unexpected",
+ ctx->tuphdr->t_hoff + ctx->offset));
+ /* We can't know where the next attribute begins */
+ return false;
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx,
+ psprintf("tuple attribute of length %u ends at offset %u, but tuple length is only %u",
+ thisatt->attlen, ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ confess(ctx,
+ pstrdup("attribute is external but tuple header flag HEAP_HASEXTERNAL not set"));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ confess(ctx,
+ pstrdup("attribute is external but relation has no toast relation"));
+ return true;
+ }
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toast_rel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ confess(ctx,
+ psprintf("final chunk number %u differs from expected value %u",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ confess(ctx,
+ pstrdup("toasted value missing from toast table"));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * check_tuple
+ *
+ * Checks the current tuple as tracked in ctx for corruption. Records any
+ * corruption found in ctx->corruption.
+ */
+static void
+check_tuple(HeapCheckContext *ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /*
+ * If the line pointer for this tuple does not reserve enough space for a
+ * complete tuple header, we dare not read the tuple header.
+ */
+ if (ctx->lp_len < MAXALIGN(SizeofHeapTupleHeader))
+ {
+ confess(ctx,
+ psprintf("tuple's %u byte line pointer length is less than the %u byte minimum tuple header size",
+ ctx->lp_len, (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ return;
+ }
+
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx,
+ psprintf("tuple xmax %u precedes relminmxid %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx,
+ psprintf("tuple xmin %u precedes relfrozenxid %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!xid_valid_in_rel(xmin, ctx))
+ {
+ confess(ctx,
+ psprintf("tuple xmin %u follows last assigned xid %u",
+ xmin, ctx->next_valid_xid));
+ fatal = true;
+ }
+ }
+
+ /* Check xmax against relfrozenxid */
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ confess(ctx,
+ psprintf("tuple xmax %u precedes relfrozenxid %u",
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!xid_valid_in_rel(xmax, ctx))
+ {
+ confess(ctx,
+ psprintf("tuple xmax %u follows last assigned xid %u",
+ xmax, ctx->next_valid_xid));
+ fatal = true;
+ }
+ }
+
+ /* Check for tuple header corruption */
+ if (ctx->tuphdr->t_hoff < SizeofHeapTupleHeader)
+ {
+ confess(ctx,
+ psprintf("tuple's header size is %u bytes which is less than the %u byte minimum valid header size",
+ ctx->tuphdr->t_hoff,
+ (unsigned) SizeofHeapTupleHeader));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ confess(ctx,
+ psprintf("tuple's %u byte header size exceeds the %u byte length of the entire tuple",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ fatal = true;
+ }
+ if (ctx->tuphdr->t_hoff != MAXALIGN(ctx->tuphdr->t_hoff))
+ {
+ confess(ctx,
+ psprintf("tuple's user data offset %u not maximally aligned to %u",
+ ctx->tuphdr->t_hoff, (uint32) MAXALIGN(ctx->tuphdr->t_hoff)));
+ fatal = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ confess(ctx,
+ psprintf("tuple xmax marked incompatibly as keys updated and locked only"));
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ confess(ctx,
+ psprintf("tuple xmax marked incompatibly as committed and as a multitransaction ID"));
+ }
+
+ /*
+ * If the tuple has nulls, check that the implied length of the variable
+ * length nulls bitmap field t_bits does not overflow the allowed space.
+ * We don't know if the corruption is in the t_hoff field or the infomask
+ * bit HEAP_HASNULL.
+ *
+ * If the tuple does not have nulls, check that no space has been reserved
+ * for the null bitmap.
+ */
+ if ((infomask & HEAP_HASNULL) &&
+ (ctx->tuphdr->t_hoff != MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts))))
+ {
+ confess(ctx,
+ psprintf("tuple with null values has user data offset %u rather than the expected offset %u",
+ ctx->tuphdr->t_hoff,
+ (uint32) MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts))));
+ fatal = true;
+ }
+ else if (!(infomask & HEAP_HASNULL) &&
+ (ctx->tuphdr->t_hoff != MAXALIGN(SizeofHeapTupleHeader)))
+ {
+ confess(ctx,
+ psprintf("tuple without null values has user data offset %u rather than the expected offset %u",
+ ctx->tuphdr->t_hoff,
+ (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ fatal = true;
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Skip tuples that are invisible, as we cannot assume the TupleDesc we
+ * are using is appropriate.
+ */
+ if (!tuple_is_visible(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ confess(ctx,
+ psprintf("tuple has %u attributes in relation with only %u attributes",
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
+ return;
+ }
+
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a9df2c1a9d..b8170bbfdf 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -69,7 +69,7 @@ AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND c.relkind = 'i' AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
----------------+---------------------------------+----------
| pg_depend_reference_index | 43
| pg_depend_depender_index | 40
@@ -165,6 +165,110 @@ ORDER BY c.relpages DESC LIMIT 10;
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ skip_all_frozen boolean,
+ skip_all_visible boolean,
+ blkno OUT bigint,
+ offnum OUT integer,
+ lp_off OUT smallint,
+ lp_flags OUT smallint,
+ lp_len OUT smallint,
+ attnum OUT integer,
+ chunk OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks for "logical" corruption, where the page is valid but inconsistent
+ with the rest of the database cluster. This can happen due to faulty or
+ ill-conceived backup and restore tools, or bad storage, or user error, or
+ bugs in the server itself. It checks xmin and xmax values against
+ relfrozenxid and relminmxid, and also validates TOAST pointers.
+ </para>
+
+ <para>
+ For each block in the relation where corruption is detected, or for just
+ the first block if on_error_stop is true, for each corruption detected,
+ returns one row containing the following fields:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_off</term>
+ <listitem>
+ <para>
+ The offset into the page of the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_flags</term>
+ <listitem>
+ <para>
+ The flags in the line pointer for the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>lp_len</term>
+ <listitem>
+ <para>
+ The length of the corrupt tuple as recorded in the line pointer.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>chunk</term>
+ <listitem>
+ <para>
+ The chunk number of the corrupt toasted attribute, if the corruption
+ is specific to a toasted value.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+
</variablelist>
<tip>
<para>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..00de10b7c9 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you decide to disable one or more of these
+ * assertions, make corresponding changes to contrib/amcheck.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 467120f1d0..c58d025902 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1018,6 +1018,7 @@ HbaToken
HeadlineJsonState
HeadlineParsedText
HeadlineWordEntry
+HeapCheckContext
HeapScanDesc
HeapTuple
HeapTupleData
@@ -3321,6 +3322,7 @@ sigjmp_buf
signedbitmapword
sigset_t
size_t
+SkipPages
slist_head
slist_iter
slist_mutable_iter
--
2.21.1 (Apple Git-122.3)
v13-0003-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v13-0003-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 2cc3a67a3faa87625268796a4e903312556f0409 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 27 Jul 2020 08:04:49 -0700
Subject: [PATCH v13 3/3] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 900 ++++++++++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 55 ++
contrib/pg_amcheck/t/003_check.pl | 85 ++
contrib/pg_amcheck/t/004_verify_heapam.pl | 434 +++++++++++
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pg_amcheck.sgml | 136 ++++
src/tools/pgindent/typedefs.list | 2 +
12 files changed, 1655 insertions(+)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 doc/src/sgml/pg_amcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..c21c27cbeb 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..2bd98076eb
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,900 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --schema=PATTERN check all relations in the specified schema(s)",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified "
+ "schema(s)",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -i, --check-indexes check associated btree indexes, if any",
+ " -I, --exclude-indexes do NOT check associated btree indexes",
+ " --strict-names require table and/or schema include patterns "
+ "to match at least one entity each",
+ " -b, --startblock check relations beginning at the given "
+ "starting block number",
+ " -e, --endblock check relations only up to the given ending "
+ "block number",
+ " -f, --skip-all-frozen do not check blocks marked as all frozen",
+ " -v, --skip-all-visible do not check blocks marked as all visible",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen "
+ "automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes for tables */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of OIDs of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+
+static void check_tables(SimpleOidList *checklist);
+static void check_table(Oid tbloid);
+static void check_indexes(Oid tbloid);
+static void check_index(Oid tbloid, Oid idxoid);
+
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+
+static void NoticeProcessor(void *arg, const char *message);
+
+static void expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_table_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(SimpleOidList *include_nsp,
+ SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl,
+ SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+
+static void die_on_query_failure(const char *query);
+static void ExecuteSqlStatement(const char *query);
+static PGresult *ExecuteSqlQuery(const char *query, ExecStatusType status);
+static PGresult *ExecuteSqlQueryForSingleRow(const char *query);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+#define NOPAGER 0
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ /* Default behaviors */
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+ settings.check_indexes = true;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /* Expand schema selection patterns into OID lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+ expand_schema_name_patterns(&schema_exclude_patterns,
+ &schema_exclude_oids,
+ false);
+ /* non-matching exclusion patterns aren't an error */
+
+ /* Expand table selection patterns into OID lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+ expand_table_name_patterns(&table_exclude_patterns,
+ &table_exclude_oids,
+ false);
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ check_table(cell->val);
+ if (settings.check_indexes)
+ check_indexes(cell->val);
+ }
+}
+
+static void
+check_table(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+ char *skip;
+ const char *stop;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (settings.startblock == NULL)
+ settings.startblock = pg_strdup("NULL");
+ if (settings.endblock == NULL)
+ settings.endblock = pg_strdup("NULL");
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("NULL");
+ stop = (settings.on_error_stop) ? "true" : "false";
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(relname=%s,blkno=%s,offnum=%s,lp_off=%s,lp_flags=%s,"
+ "lp_len=%s,attnum=%s,chunk=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* lp_off */
+ PQgetvalue(res, i, 4), /* lp_flags */
+ PQgetvalue(res, i, 5), /* lp_len */
+ PQgetvalue(res, i, 6), /* attnum */
+ PQgetvalue(res, i, 7), /* chunk */
+ PQgetvalue(res, i, 8)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_indexes(Oid tbloid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ check_index(tbloid, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+check_index(Oid tbloid, Oid idxoid)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT ct.relname, ci.relname, blkno, msg"
+ "\nFROM verify_btreeam(%u,%s),"
+ "\n pg_catalog.pg_class ci,"
+ "\n pg_catalog.pg_class ct"
+ "\nWHERE ci.oid = %u"
+ "\n AND ct.oid = %u",
+ idxoid,
+ settings.on_error_stop ? "true" : "false",
+ idxoid, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ if (PQntuples(res) > 0)
+ {
+ int lines = PQntuples(res) * 2;
+ FILE *output = PageOutput(lines, NULL);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ fprintf(output,
+ "(table=%s,index=%s,blkno=%s)"
+ "\n%s\n",
+ PQgetvalue(res, i, 0), /* table relname */
+ PQgetvalue(res, i, 1), /* index relname */
+ PQgetvalue(res, i, 2), /* index blkno */
+ PQgetvalue(res, i, 3)); /* msg */
+ }
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"startblock", required_argument, NULL, 'b'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"host", required_argument, NULL, 'h'},
+ {"check-indexes", no_argument, NULL, 'i'},
+ {"exclude-indexes", no_argument, NULL, 'I'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"schema", required_argument, NULL, 'n'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"port", required_argument, NULL, 'p'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"help", optional_argument, NULL, '?'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "b:d:e:fh:iIn:N:op:st:T:U:vVwW?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ settings.check_indexes = true;
+ break;
+ case 'I':
+ settings.check_indexes = false;
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ FILE *output;
+ int lines;
+ int lineno;
+
+ for (lines = 0; usage_text[lines]; lines++)
+ ;
+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+
+ ClosePager(output);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+static void
+expand_schema_name_patterns(SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list. See also expand_dbname_patterns()
+ * in pg_dumpall.c
+ */
+static void
+expand_table_name_patterns(SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(settings.db, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+static void
+append_csv_oids(PQExpBuffer query, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(query, "%s%u", comma, cell->val);
+}
+
+static bool
+append_filter(PQExpBuffer query, const char *lval, const char *operator,
+ const SimpleOidList *oids)
+{
+ if (!oids->head)
+ return false;
+ appendPQExpBuffer(query, "\nAND %s %s ANY(array[\n", lval, operator);
+ append_csv_oids(query, oids);
+ appendPQExpBuffer(query, "\n])");
+ return true;
+}
+
+static void
+get_table_check_list(SimpleOidList *include_nsp, SimpleOidList *exclude_nsp,
+ SimpleOidList *include_tbl, SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_table_name_patterns");
+
+ query = createPQExpBuffer();
+
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.=)", include_nsp);
+ append_filter(query, "n.oid", "OPERATOR(pg_catalog.!=)", exclude_nsp);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.=)", include_tbl);
+ append_filter(query, "c.oid", "OPERATOR(pg_catalog.!=)", exclude_tbl);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
+}
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+static void
+ExecuteSqlStatement(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ die_on_query_failure(query);
+ PQclear(res);
+}
+
+static PGresult *
+ExecuteSqlQuery(const char *query, ExecStatusType status)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != status)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute an SQL query and verify that we got exactly one row back.
+ */
+static PGresult *
+ExecuteSqlQueryForSingleRow(const char *query)
+{
+ PGresult *res;
+ int ntups;
+
+ res = ExecuteSqlQuery(query, PGRES_TUPLES_OK);
+
+ /* Expecting a single result only */
+ ntups = PQntuples(res);
+ if (ntups != 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+
+ return res;
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..c63ba4452e
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,55 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 12;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..01531e5c77
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,85 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 7;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+CREATE SCHEMA s1;
+CREATE SCHEMA s2;
+CREATE SCHEMA s3;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE TABLE s1.t3 (a TEXT);
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE TABLE s2.t3 (a TEXT);
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE TABLE s3.t3 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+CREATE INDEX i3 ON s1.t3(a);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+CREATE INDEX i3 ON s2.t3(a);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+CREATE INDEX i3 ON s3.t3(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-I', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-i', '-p', $port, 'postgres'
+ ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+;$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1'
+ ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1'
+ ],
+ 'pg_amcheck all objects not in schema s1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-p', $port, 'postgres', '-i', '-n', 's*', '-t', 't1'
+ ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-I', '-p', $port, 'postgres', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [
+ 'pg_amcheck', '-I', '-p', $port, 'postgres', '-N', 's1', '-T', 't1'
+ ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..58d5ab88cb
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,434 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 48;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 14;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '--check-indexes', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ elsif ($tupidx == 12)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+ }
+ elsif ($tupidx == 13)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', on_error_stop := false, skip := NULL, startblock := NULL, endblock := NULL)));
+is ($result,
+"0|1|8128|1|58|||tuple xmin 3 precedes relfrozenxid $relfrozenxid
+0|2|8064|1|58|||tuple xmin 4026531839 precedes relfrozenxid $relfrozenxid
+0|3|8000|1|58|||tuple xmax 4026531839 precedes relfrozenxid $relfrozenxid
+0|4|7936|1|58|||tuple's 152 byte header size exceeds the 58 byte length of the entire tuple
+0|4|7936|1|58|||tuple without null values has user data offset 152 rather than the expected offset 24
+0|5|7872|1|58|||tuple's user data offset 27 not maximally aligned to 32
+0|5|7872|1|58|||tuple without null values has user data offset 27 rather than the expected offset 24
+0|6|7808|1|58|||tuple's header size is 16 bytes which is less than the 23 byte minimum valid header size
+0|6|7808|1|58|||tuple without null values has user data offset 16 rather than the expected offset 24
+0|7|7744|1|58|||tuple's header size is 21 bytes which is less than the 23 byte minimum valid header size
+0|7|7744|1|58|||tuple's user data offset 21 not maximally aligned to 24
+0|7|7744|1|58|||tuple without null values has user data offset 21 rather than the expected offset 24
+0|8|7680|1|58|||tuple has 2047 attributes in relation with only 3 attributes
+0|9|7616|1|58|||tuple with null values has user data offset 24 rather than the expected offset 280
+0|10|7552|1|58|||tuple has 67 attributes in relation with only 3 attributes
+0|11|7488|1|58|1||tuple attribute of length 4294967295 ends at offset 416848000, but tuple length is only 58
+0|12|7424|1|58|2|0|final chunk number 0 differs from expected value 6
+0|12|7424|1|58|2|0|toasted value missing from toast table
+0|13|7360|1|58|||tuple xmax marked incompatibly as keys updated and locked only
+0|14|7296|1|58|||tuple xmax 0 precedes relminmxid 1
+0|14|7296|1|58|||tuple xmax marked incompatibly as committed and as a multitransaction ID",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,lp_off=\d*,lp_flags=\d*,lp_len=\d*,attnum=\d*,chunk=\d*/,
+
+ # individual detected corruptions
+ qr/final chunk number \d+ differs from expected value \d+/,
+ qr/toasted value missing from toast table/,
+ qr/tuple attribute of length \d+ ends at offset \d+, but tuple length is only \d+/,
+ qr/tuple has \d+ attributes in relation with only \d+ attributes/,
+ qr/tuple with null values has user data offset \d+ rather than the expected offset \d+/,
+ qr/tuple without null values has user data offset \d+ rather than the expected offset \d+/,
+ qr/tuple xmax \d+ precedes relfrozenxid \d+/,
+ qr/tuple xmax \d+ precedes relminmxid \d+/,
+ qr/tuple xmax marked incompatibly as committed and as a multitransaction ID/,
+ qr/tuple xmax marked incompatibly as keys updated and locked only/,
+ qr/tuple xmin \d+ precedes relfrozenxid \d+/,
+ qr/tuple's \d+ byte header size exceeds the \d+ byte length of the entire tuple/,
+ qr/tuple's header size is \d+ bytes which is less than the \d+ byte minimum valid header size/,
+ qr/tuple's user data offset \d+ not maximally aligned to \d+/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '--exclude-indexes', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..f606e42fb9 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pg_amcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..10e1ca9663 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pg_amcheck SYSTEM "pg_amcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pg_amcheck.sgml b/doc/src/sgml/pg_amcheck.sgml
new file mode 100644
index 0000000000..a0b9c9d19b
--- /dev/null
+++ b/doc/src/sgml/pg_amcheck.sgml
@@ -0,0 +1,136 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "corrupted",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck -i test
+(relname=corrupted,blkno=0,offnum=16,lp_off=7680,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 3289393 is in the future
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmax = 0 precedes relation relminmxid = 1
+(relname=corrupted,blkno=0,offnum=17,lp_off=7648,lp_flags=1,lp_len=31,attnum=,chunk=)
+tuple xmin = 12593 is in the future
+</screen>
+
+ <para>
+ .... many pages of output removed for brevity ....
+ </para>
+
+<screen>
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+tuple xmin = 305 precedes relation relfrozenxid = 487
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff > lp_len (54 > 34)
+(relname=corrupted,blkno=107,offnum=22,lp_off=7312,lp_flags=1,lp_len=34,attnum=,chunk=)
+t_hoff not max-aligned (54)
+</screen>
+
+ <para>
+ Each detected corruption is reported on two lines, the first line shows the
+ location and the second line shows a message describing the problem.
+ </para>
+ </sect2>
+</sect1>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c58d025902..6fb42e0897 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -102,6 +102,7 @@ AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AlternativeSubPlanState
+AmCheckSettings
AnalyzeAttrComputeStatsFunc
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
@@ -404,6 +405,7 @@ ConnCacheEntry
ConnCacheKey
ConnStatusType
ConnType
+ConnectOptions
ConnectionStateEnum
ConsiderSplitContext
Const
--
2.21.1 (Apple Git-122.3)
On Mon, Jul 20, 2020 at 5:02 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I've made the options 'all-visible', 'all-frozen', and 'none'. It defaults to 'none'.
That looks nice.
I guess that
could still be expensive if there's a lot of them, but needing
ShareUpdateExclusiveLock rather than only AccessShareLock is a little
unfortunate.I welcome strategies that would allow for taking a lesser lock.
I guess I'm not seeing why you need any particular strategy here. Say
that at the beginning you note the starting relfrozenxid of the table
-- I think I would lean toward just ignoring datfrozenxid and the
cluster-wide value completely. You also note the current value of the
transaction ID counter. Those are the two ends of the acceptable
range.
Let's first consider the oldest acceptable XID, bounded by
relfrozenxid. If you see a value that is older than the relfrozenxid
value that you noted at the start, it is definitely invalid. If you
see a newer value, it could still be older than the table's current
relfrozenxid, but that doesn't seem very worrisome. If the user
vacuumed the table while they were running this tool, they can always
run the tool again afterward if they wish. Forcing the vacuum to wait
by taking ShareUpdateExclusiveLock doesn't actually solve anything
anyway: you STILL won't notice any problems the vacuum introduces, and
in fact you are now GUARANTEED not to notice them, plus now the vacuum
happens later.
Now let's consider the newest acceptable XID, bounded by the value of
the transaction ID counter. Any time you see a newer XID than the last
value of the transaction ID counter that you observed, you go observe
it again. If the value from the table still looks invalid, then you
complain about it. Either way, you remember the new observation and
check future tuples against that value. I think the patch is already
doing this anyway; if it weren't, you'd need an even stronger lock,
one sufficient to prevent any insert/update/delete activity on the
table altogether.
Maybe I'm just being dense here -- exactly what problem are you worried about?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Jul 27, 2020 at 1:02 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Not at all! I appreciate all the reviews.
Reviewing 0002, reading through verify_heapam.c:
+typedef enum SkipPages
+{
+ SKIP_ALL_FROZEN_PAGES,
+ SKIP_ALL_VISIBLE_PAGES,
+ SKIP_PAGES_NONE
+} SkipPages;
This looks inconsistent. Maybe just start them all with SKIP_PAGES_.
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("missing required parameter for 'rel'")));
This doesn't look much like other error messages in the code. Do
something like git grep -A4 PG_ARGISNULL | grep -A3 ereport and study
the comparables.
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("unrecognized parameter for 'skip': %s", skip),
+ errhint("please choose from 'all-visible', 'all-frozen', or 'none'")));
Same problem. Check pg_prewarm's handling of the prewarm type, or
EXPLAIN's handling of the FORMAT option, or similar examples. Read the
message style guidelines concerning punctuation of hint and detail
messages.
+ * Bugs in pg_upgrade are reported (see commands/vacuum.c circa line 1572)
+ * to have sometimes rendered the oldest xid value for a database invalid.
+ * It seems unwise to report rows as corrupt for failing to be newer than
+ * a value which itself may be corrupt. We instead use the oldest xid for
+ * the entire cluster, which must be at least as old as the oldest xid for
+ * our database.
This kind of reference to another comment will not age well; line
numbers and files change a lot. But I think the right thing to do here
is just rely on relfrozenxid and relminmxid. If the table is
inconsistent with those, then something needs fixing. datfrozenxid and
the cluster-wide value can look out for themselves. The corruption
detector shouldn't be trying to work around any bugs in setting
relfrozenxid itself; such problems are arguably precisely what we're
here to find.
+/*
+ * confess
+ *
+ * Return a message about corruption, including information
+ * about where in the relation the corruption was found.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+confess(HeapCheckContext *ctx, char *msg)
Contrary to what the comments say, the function doesn't return a
message about corruption or anything else. It returns void.
I don't really like the name, either. I get that it's probably
inspired by Perl, but I think it should be given a less-clever name
like report_corruption() or something.
+ * corrupted table from using workmem worth of memory building up the
This kind of thing destroys grep-ability. If you're going to refer to
work_mem, you gotta spell it the same way we do everywhere else.
+ * Helper function to construct the TupleDesc needed by verify_heapam.
Instead of saying it's the TupleDesc somebody needs, how about saying
that it's the TupleDesc that we'll use to report problems that we find
while scanning the heap, or something like that?
+ * Given a TransactionId, attempt to interpret it as a valid
+ * FullTransactionId, neither in the future nor overlong in
+ * the past. Stores the inferred FullTransactionId in *fxid.
It really doesn't, because there's no such thing as 'fxid' referenced
anywhere here. You should really make the effort to proofread your
patches before posting, and adjust comments and so on as you go.
Otherwise reviewing takes longer, and if you keep introducing new
stuff like this as you fix other stuff, you can fail to ever produce a
committable patch.
+ * Determine whether tuples are visible for verification. Similar to
+ * HeapTupleSatisfiesVacuum, but with critical differences.
Not accurate, because it also reports problems, which is not mentioned
anywhere in the function header comment that purports to be a detailed
description of what the function does.
+ else if (TransactionIdIsCurrentTransactionId(raw_xmin))
+ return true; /* insert or delete in progress */
+ else if (TransactionIdIsInProgress(raw_xmin))
+ return true; /* HEAPTUPLE_INSERT_IN_PROGRESS */
+ else if (!TransactionIdDidCommit(raw_xmin))
+ {
+ return false; /* HEAPTUPLE_DEAD */
+ }
One of these cases is not punctuated like the others.
+ pstrdup("heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor
has a valid xmax"));
1. I don't think that's very grammatical.
2. Why abbreviate HEAP_XMAX_IS_MULTI to XMAX_IS_MULTI and
HEAP_XMAX_IS_LOCKED_ONLY to LOCKED_ONLY? I don't even think you should
be referencing C constant names here at all, and if you are I don't
think you should abbreviate, and if you do abbreviate I don't think
you should omit different numbers of words depending on which constant
it is.
I wonder what the intended division of responsibility is here,
exactly. It seems like you've ended up with some sanity checks in
check_tuple() before tuple_is_visible() is called, and others in
tuple_is_visible() proper. As far as I can see the comments don't
really discuss the logic behind the split, but there's clearly a close
relationship between the two sets of checks, even to the point where
you have "heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor has
a valid xmax" in tuple_is_visible() and "tuple xmax marked
incompatibly as keys updated and locked only" in check_tuple(). Now,
those are not the same check, but they seem like closely related
things, so it's not ideal that they happen in different functions with
differently-formatted messages to report problems and no explanation
of why it's different.
I think it might make sense here to see whether you could either move
more stuff out of tuple_is_visible(), so that it really just checks
whether the tuple is visible, or move more stuff into it, so that it
has the job not only of checking whether we should continue with
checks on the tuple contents but also complaining about any other
visibility problems. Or if neither of those make sense then there
should be a stronger attempt to rationalize in the comments what
checks are going where and for what reason, and also a stronger
attempt to rationalize the message wording.
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toast_rel->rd_att, &isnull));
Should we be worrying about the possibility of fastgetattr crapping
out if the TOAST tuple is corrupted?
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ confess(ctx,
+ psprintf("tuple attribute should start at offset %u, but tuple
length is only %u",
+ ctx->tuphdr->t_hoff + ctx->offset, ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ return true;
+ }
This looks like it's not going to complain about a fixed-length
attribute that overruns the tuple length. There's code further down
that handles that case for a varlena attribute, but there's nothing
comparable for the fixed-length case.
+ confess(ctx,
+ psprintf("%s toast at offset %u is unexpected",
+ va_tag == VARTAG_INDIRECT ? "indirect" :
+ va_tag == VARTAG_EXPANDED_RO ? "expanded" :
+ va_tag == VARTAG_EXPANDED_RW ? "expanded" :
+ "unexpected",
+ ctx->tuphdr->t_hoff + ctx->offset));
I suggest "unexpected TOAST tag %d", without trying to convert to a
string. Such a conversion will likely fail in the case of genuine
corruption, and isn't meaningful even if it works.
Again, let's try to standardize terminology here: most of the messages
in this function are now of the form "tuple attribute %d has some
problem" or "attribute %d has some problem", but some have neither.
Since we're separately returning attnum I don't see why it should be
in the message, and if we weren't separately returning attnum then it
ought to be in the message the same way all the time, rather than
sometimes writing "attribute" and other times "tuple attribute".
+ /* Check relminmxid against mxid, if any */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if (infomask & HEAP_XMAX_IS_MULTI &&
+ MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ confess(ctx,
+ psprintf("tuple xmax %u precedes relminmxid %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
There are checks that an XID is neither too old nor too new, and
presumably something similar could be done for MultiXactIds, but here
you only check one end of the range. Seems like you should check both.
+ /* Check xmin against relfrozenxid */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ confess(ctx,
+ psprintf("tuple xmin %u precedes relfrozenxid %u",
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!xid_valid_in_rel(xmin, ctx))
+ {
+ confess(ctx,
+ psprintf("tuple xmin %u follows last assigned xid %u",
+ xmin, ctx->next_valid_xid));
+ fatal = true;
+ }
+ }
Here you do check both ends of the range, but the comment claims
otherwise. Again, please proof-read for this kind of stuff.
+ /* Check xmax against relfrozenxid */
Ditto here.
+ psprintf("tuple's header size is %u bytes which is less than the %u
byte minimum valid header size",
I suggest: tuple data begins at byte %u, but the tuple header must be
at least %u bytes
+ psprintf("tuple's %u byte header size exceeds the %u byte length of
the entire tuple",
I suggest: tuple data begins at byte %u, but the entire tuple length
is only %u bytes
+ psprintf("tuple's user data offset %u not maximally aligned to %u",
I suggest: tuple data begins at byte %u, but that is not maximally aligned
Or: tuple data begins at byte %u, which is not a multiple of %u
That makes the messages look much more similar to each other
grammatically and is more consistent about calling things by the same
names.
+ psprintf("tuple with null values has user data offset %u rather than
the expected offset %u",
+ psprintf("tuple without null values has user data offset %u rather
than the expected offset %u",
I suggest merging these: tuple data offset %u, but expected offset %u
(%u attributes, %s)
where %s is either "has nulls" or "no nulls"
In fact, aren't several of the above checks redundant with this one?
Like, why check for a value less than SizeofHeapTupleHeader or that's
not properly aligned first? Just check this straightaway and call it
good.
+ * If we get this far, the tuple is visible to us, so it must not be
+ * incompatible with our relDesc. The natts field could be legitimately
+ * shorter than rel's natts, but it cannot be longer than rel's natts.
This is yet another case where you didn't update the comments.
tuple_is_visible() now checks whether the tuple is visible to anyone,
not whether it's visible to us, but the comment doesn't agree. In some
sense I think this comment is redundant with the previous one anyway,
because that one already talks about the tuple being visible. Maybe
just write: The tuple is visible, so it must be compatible with the
current version of the relation descriptor. It might have fewer
columns than are present in the relation descriptor, but it cannot
have more.
+ psprintf("tuple has %u attributes in relation with only %u attributes",
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
I suggest: tuple has %u attributes, but relation has only %u attributes
+ /*
+ * Iterate over the attributes looking for broken toast values. This
+ * roughly follows the logic of heap_deform_tuple, except that it doesn't
+ * bother building up isnull[] and values[] arrays, since nobody wants
+ * them, and it unrolls anything that might trip over an Assert when
+ * processing corrupt data.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ {
+ if (!check_tuple_attribute(ctx))
+ break;
+ }
I think this comment is too wordy. This text belongs in the header
comment of check_tuple_attribute(), not at the place where it gets
called. Otherwise, as you update what check_tuple_attribute() does,
you have to remember to come find this comment and fix it to match,
and you might forget to do that. In fact... looks like that already
happened, because check_tuple_attribute() now checks more than broken
TOAST attributes. Seems like you could just simplify this down to
something like "Now check each attribute." Also, you could lose the
extra braces.
- bt_index_check | relname | relpages
+ bt_index_check | relname | relpages
Don't include unrelated changes in the patch.
I'm not really sure that the list of fields you're displaying for each
reported problem really makes sense. I think the theory here should be
that we want to report the information that the user needs to localize
the problem but not everything that they could find out from
inspecting the page, and not things that are too specific to
particular classes of errors. So I would vote for keeping blkno,
offnum, and attnum, but I would lose lp_flags, lp_len, and chunk.
lp_off feels like it's a more arguable case: technically, it's a
locator for the problem, because it gives you the byte offset within
the page, but normally we reference tuples by TID, i.e. (blkno,
offset), not byte offset. On balance I'd be inclined to omit it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jul 29, 2020, at 12:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jul 20, 2020 at 5:02 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:I've made the options 'all-visible', 'all-frozen', and 'none'. It defaults to 'none'.
That looks nice.
I guess that
could still be expensive if there's a lot of them, but needing
ShareUpdateExclusiveLock rather than only AccessShareLock is a little
unfortunate.I welcome strategies that would allow for taking a lesser lock.
I guess I'm not seeing why you need any particular strategy here. Say
that at the beginning you note the starting relfrozenxid of the table
-- I think I would lean toward just ignoring datfrozenxid and the
cluster-wide value completely. You also note the current value of the
transaction ID counter. Those are the two ends of the acceptable
range.Let's first consider the oldest acceptable XID, bounded by
relfrozenxid. If you see a value that is older than the relfrozenxid
value that you noted at the start, it is definitely invalid. If you
see a newer value, it could still be older than the table's current
relfrozenxid, but that doesn't seem very worrisome. If the user
vacuumed the table while they were running this tool, they can always
run the tool again afterward if they wish. Forcing the vacuum to wait
by taking ShareUpdateExclusiveLock doesn't actually solve anything
anyway: you STILL won't notice any problems the vacuum introduces, and
in fact you are now GUARANTEED not to notice them, plus now the vacuum
happens later.Now let's consider the newest acceptable XID, bounded by the value of
the transaction ID counter. Any time you see a newer XID than the last
value of the transaction ID counter that you observed, you go observe
it again. If the value from the table still looks invalid, then you
complain about it. Either way, you remember the new observation and
check future tuples against that value. I think the patch is already
doing this anyway; if it weren't, you'd need an even stronger lock,
one sufficient to prevent any insert/update/delete activity on the
table altogether.Maybe I'm just being dense here -- exactly what problem are you worried about?
Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax committed via TransactionIdDidCommit. I am worried about concurrent truncation of clog entries causing I/O errors on SLRU lookup when performing that check. The three strategies I had for dealing with that were taking the XactTruncationLock (formerly known as CLogTruncationLock, for those reading this thread from the beginning), locking out vacuum, and the idea upthread from Andres about setting PROC_IN_VACUUM and such. Maybe I'm being dense and don't need to worry about this. But I haven't convinced myself of that, yet.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-07-30 13:18:01 -0700, Mark Dilger wrote:
Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax committed via TransactionIdDidCommit. I am worried about concurrent truncation of clog entries causing I/O errors on SLRU lookup when performing that check. The three strategies I had for dealing with that were taking the XactTruncationLock (formerly known as CLogTruncationLock, for those reading this thread from the beginning), locking out vacuum, and the idea upthread from Andres about setting PROC_IN_VACUUM and such. Maybe I'm being dense and don't need to worry about this. But I haven't convinced myself of that, yet.
I think it's not at all ok to look in the procarray or clog for xids
that are older than what you're announcing you may read. IOW I don't
think it's OK to just ignore the problem, or try to work around it by
holding XactTruncationLock.
Greetings,
Andres Freund
On Thu, Jul 30, 2020 at 4:18 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Maybe I'm just being dense here -- exactly what problem are you worried about?
Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax committed via TransactionIdDidCommit. I am worried about concurrent truncation of clog entries causing I/O errors on SLRU lookup when performing that check. The three strategies I had for dealing with that were taking the XactTruncationLock (formerly known as CLogTruncationLock, for those reading this thread from the beginning), locking out vacuum, and the idea upthread from Andres about setting PROC_IN_VACUUM and such. Maybe I'm being dense and don't need to worry about this. But I haven't convinced myself of that, yet.
I don't get it. If you've already checked that the XIDs are >=
relfrozenxid and <= ReadNewFullTransactionId(), then this shouldn't be
a problem. It could be, if CLOG is hosed, which is possible, because
if the table is corrupted, why shouldn't CLOG also be corrupted? But
I'm not sure that's what your concern is here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jul 30, 2020, at 2:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 30, 2020 at 4:18 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:Maybe I'm just being dense here -- exactly what problem are you worried about?
Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax committed via TransactionIdDidCommit. I am worried about concurrent truncation of clog entries causing I/O errors on SLRU lookup when performing that check. The three strategies I had for dealing with that were taking the XactTruncationLock (formerly known as CLogTruncationLock, for those reading this thread from the beginning), locking out vacuum, and the idea upthread from Andres about setting PROC_IN_VACUUM and such. Maybe I'm being dense and don't need to worry about this. But I haven't convinced myself of that, yet.
I don't get it. If you've already checked that the XIDs are >=
relfrozenxid and <= ReadNewFullTransactionId(), then this shouldn't be
a problem. It could be, if CLOG is hosed, which is possible, because
if the table is corrupted, why shouldn't CLOG also be corrupted? But
I'm not sure that's what your concern is here.
No, that wasn't my concern. I was thinking about CLOG entries disappearing during the scan as a consequence of concurrent vacuums, and the effect that would have on the validity of the cached [relfrozenxid..next_valid_xid] range. In the absence of corruption, I don't immediately see how this would cause any problems. But for a corrupt table, I'm less certain how it would play out.
The kind of scenario I'm worried about may not be possible in practice. I think it would depend on how vacuum behaves when scanning a corrupt table that is corrupt in some way that vacuum doesn't notice, and whether vacuum could finish scanning the table with the false belief that it has frozen all tuples with xids less than some cutoff.
I thought it would be safer if that kind of thing were not happening during verify_heapam's scan of the table. Even if a careful analysis proved it was not an issue with the current coding of vacuum, I don't think there is any coding convention requiring future versions of vacuum to be hardened against corruption, so I don't see how I can rely on vacuum not causing such problems.
I don't think this is necessarily a too-rare-to-care-about type concern, either. If corruption across multiple tables prevents autovacuum from succeeding, and the DBA doesn't get involved in scanning tables for corruption until the lack of successful vacuums impacts the production system, I imagine you could end up with vacuums repeatedly happening (or trying to happen) around the time the DBA is trying to fix tables, or perhaps drop them, or whatever, using verify_heapam for guidance on which tables are corrupted.
Anyway, that's what I was thinking. I was imagining that calling TransactionIdDidCommit might keep crashing the backend while the DBA is trying to find and fix corruption, and that could get really annoying.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jul 30, 2020, at 1:47 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-07-30 13:18:01 -0700, Mark Dilger wrote:
Per tuple, tuple_is_visible() potentially checks whether the xmin or xmax committed via TransactionIdDidCommit. I am worried about concurrent truncation of clog entries causing I/O errors on SLRU lookup when performing that check. The three strategies I had for dealing with that were taking the XactTruncationLock (formerly known as CLogTruncationLock, for those reading this thread from the beginning), locking out vacuum, and the idea upthread from Andres about setting PROC_IN_VACUUM and such. Maybe I'm being dense and don't need to worry about this. But I haven't convinced myself of that, yet.
I think it's not at all ok to look in the procarray or clog for xids
that are older than what you're announcing you may read. IOW I don't
think it's OK to just ignore the problem, or try to work around it by
holding XactTruncationLock.
The current state of the patch is that concurrent vacuums are kept out of the table being checked by means of taking a ShareUpdateExclusive lock on the table being checked. In response to Robert's review, I was contemplating whether that was necessary, but you raise the interesting question of whether it is even sufficient. The logic in verify_heapam is currently relying on the ShareUpdateExclusive lock to prevent any of the xids in the range relfrozenxid..nextFullXid from being invalid arguments to TransactionIdDidCommit. Ignoring whether that is a good choice vis-a-vis performance, is that even a valid strategy? It sounds like you are saying it is not.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Jul 30, 2020 at 6:10 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
No, that wasn't my concern. I was thinking about CLOG entries disappearing during the scan as a consequence of concurrent vacuums, and the effect that would have on the validity of the cached [relfrozenxid..next_valid_xid] range. In the absence of corruption, I don't immediately see how this would cause any problems. But for a corrupt table, I'm less certain how it would play out.
Oh, hmm. I wasn't thinking about that problem. I think the only way
this can happen is if we read a page and then, before we try to look
up the CID, vacuum zooms past, finishes the whole table, and truncates
clog. But if that's possible, then it seems like it would be an issue
for SELECT as well, and it apparently isn't, or we would've done
something about it by now. I think the reason it's not possible is
because of the locking rules described in
src/backend/storage/buffer/README, which require that you hold a
buffer lock until you've determined that the tuple is visible. Since
you hold a share lock on the buffer, a VACUUM that hasn't already
processed that freeze the tuples in that buffer; it would need an
exclusive lock on the buffer to do that. Therefore it can't finish and
truncate clog either.
Now, you raise the question of whether this is still true if the table
is corrupt, but I don't really see why that makes any difference.
VACUUM is supposed to freeze each page it encounters, to the extent
that such freezing is necessary, and with Andres's changes, it's
supposed to ERROR out if things are messed up. We can postulate a bug
in that logic, but inserting a VACUUM-blocking lock into this tool to
guard against a hypothetical vacuum bug seems strange to me. Why would
the right solution not be to fix such a bug if and when we find that
there is one?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jul 30, 2020, at 5:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 30, 2020 at 6:10 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:No, that wasn't my concern. I was thinking about CLOG entries disappearing during the scan as a consequence of concurrent vacuums, and the effect that would have on the validity of the cached [relfrozenxid..next_valid_xid] range. In the absence of corruption, I don't immediately see how this would cause any problems. But for a corrupt table, I'm less certain how it would play out.
Oh, hmm. I wasn't thinking about that problem. I think the only way
this can happen is if we read a page and then, before we try to look
up the CID, vacuum zooms past, finishes the whole table, and truncates
clog. But if that's possible, then it seems like it would be an issue
for SELECT as well, and it apparently isn't, or we would've done
something about it by now. I think the reason it's not possible is
because of the locking rules described in
src/backend/storage/buffer/README, which require that you hold a
buffer lock until you've determined that the tuple is visible. Since
you hold a share lock on the buffer, a VACUUM that hasn't already
processed that freeze the tuples in that buffer; it would need an
exclusive lock on the buffer to do that. Therefore it can't finish and
truncate clog either.Now, you raise the question of whether this is still true if the table
is corrupt, but I don't really see why that makes any difference.
VACUUM is supposed to freeze each page it encounters, to the extent
that such freezing is necessary, and with Andres's changes, it's
supposed to ERROR out if things are messed up. We can postulate a bug
in that logic, but inserting a VACUUM-blocking lock into this tool to
guard against a hypothetical vacuum bug seems strange to me. Why would
the right solution not be to fix such a bug if and when we find that
there is one?
Since I can't think of a plausible concrete example of corruption which would elicit the problem I was worrying about, I'll withdraw the argument. But that leaves me wondering about a comment that Andres made upthread:
On Apr 20, 2020, at 12:42 PM, Andres Freund <andres@anarazel.de> wrote:
I don't think random interspersed uses of CLogTruncationLock are a good
idea. If you move to only checking visibility after tuple fits into
[relfrozenxid, nextXid), then you don't need to take any locks here, as
long as a lock against vacuum is taken (which I think this should do
anyway).
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Jul 30, 2020 at 9:38 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
On Jul 30, 2020, at 5:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 30, 2020 at 6:10 PM Mark DilgerSince I can't think of a plausible concrete example of corruption which would elicit the problem I was worrying about, I'll withdraw the argument. But that leaves me wondering about a comment that Andres made upthread:
On Apr 20, 2020, at 12:42 PM, Andres Freund <andres@anarazel.de> wrote:
I don't think random interspersed uses of CLogTruncationLock are a good
idea. If you move to only checking visibility after tuple fits into
[relfrozenxid, nextXid), then you don't need to take any locks here, as
long as a lock against vacuum is taken (which I think this should do
anyway).
The version of the patch I'm looking at doesn't seem to mention
CLogTruncationLock at all, so I'm confused about the comment. But what
it is that you are wondering about exactly?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jul 31, 2020, at 5:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 30, 2020 at 9:38 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:On Jul 30, 2020, at 5:53 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 30, 2020 at 6:10 PM Mark DilgerSince I can't think of a plausible concrete example of corruption which would elicit the problem I was worrying about, I'll withdraw the argument. But that leaves me wondering about a comment that Andres made upthread:
On Apr 20, 2020, at 12:42 PM, Andres Freund <andres@anarazel.de> wrote:
I don't think random interspersed uses of CLogTruncationLock are a good
idea. If you move to only checking visibility after tuple fits into
[relfrozenxid, nextXid), then you don't need to take any locks here, as
long as a lock against vacuum is taken (which I think this should do
anyway).The version of the patch I'm looking at doesn't seem to mention
CLogTruncationLock at all, so I'm confused about the comment. But what
it is that you are wondering about exactly?
In earlier versions of the patch, I was guarding (perhaps unnecessarily) against clog truncation, (perhaps incorrectly) by taking the CLogTruncationLock (aka XactTruncationLock.) . I thought Andres was arguing that such locks were not necessary "as long as a lock against vacuum is taken". That's what motivated me to remove the clog locking business and put in the ShareUpdateExclusive lock. I don't want to remove the ShareUpdateExclusive lock from the patch without perhaps a clarification from Andres on the subject. His recent reply upthread seems to still support the idea that some kind of protection is required:
I think it's not at all ok to look in the procarray or clog for xids
that are older than what you're announcing you may read. IOW I don't
think it's OK to just ignore the problem, or try to work around it by
holding XactTruncationLock.
I don't understand that paragraph fully, in particular the part about "than what you're announcing you may read", since the cached value of relfrozenxid is not announced; we're just assuming that as long as vacuum cannot advance it during our scan, that we should be safe checking whether xids newer than that value (and not in the future) were committed.
Andres?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-07-31 08:51:50 -0700, Mark Dilger wrote:
In earlier versions of the patch, I was guarding (perhaps
unnecessarily) against clog truncation, (perhaps incorrectly) by
taking the CLogTruncationLock (aka XactTruncationLock.) . I thought
Andres was arguing that such locks were not necessary "as long as a
lock against vacuum is taken". That's what motivated me to remove the
clog locking business and put in the ShareUpdateExclusive lock. I
don't want to remove the ShareUpdateExclusive lock from the patch
without perhaps a clarification from Andres on the subject. His
recent reply upthread seems to still support the idea that some kind
of protection is required:
I'm not sure what I was thinking "back then", but right now I'd argue
that the best lock against vacuum isn't a SUE, but announcing the
correct ->xmin, so you can be sure that clog entries won't be yanked out
from under you. Potentially with the right flag sets to avoid old enough
tuples eing pruned.
I think it's not at all ok to look in the procarray or clog for xids
that are older than what you're announcing you may read. IOW I don't
think it's OK to just ignore the problem, or try to work around it by
holding XactTruncationLock.I don't understand that paragraph fully, in particular the part about
"than what you're announcing you may read", since the cached value of
relfrozenxid is not announced; we're just assuming that as long as
vacuum cannot advance it during our scan, that we should be safe
checking whether xids newer than that value (and not in the future)
were committed.
With 'announcing' I mean using the normal mechanism for avoiding the
clog being truncated for values one might look up. Which is announcing
the oldest xid one may look up in PGXACT->xmin.
Greetings,
Andres Freund
On Fri, Jul 31, 2020 at 12:33 PM Andres Freund <andres@anarazel.de> wrote:
I'm not sure what I was thinking "back then", but right now I'd argue
that the best lock against vacuum isn't a SUE, but announcing the
correct ->xmin, so you can be sure that clog entries won't be yanked out
from under you. Potentially with the right flag sets to avoid old enough
tuples eing pruned.
Suppose we don't even do anything special in terms of advertising
xmin. What can go wrong? To have a problem, we've got to be running
concurrently with a vacuum that truncates clog. The clog truncation
must happen before our XID lookups, but vacuum has to remove the XIDs
from the heap before it can truncate. So we have to observe the XIDs
before vacuum removes them, but then vacuum has to truncate before we
look them up. But since we observe them and look them up while holding
a ShareLock on the buffer, this seems impossible. What's the flaw in
this argument?
If we do need to do something special in terms of advertising xmin,
how would you do it? Normally it happens by registering a snapshot,
but here all we would have is an XID; specifically, the value of
relfrozenxid that we observed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-07-31 12:42:51 -0400, Robert Haas wrote:
On Fri, Jul 31, 2020 at 12:33 PM Andres Freund <andres@anarazel.de> wrote:
I'm not sure what I was thinking "back then", but right now I'd argue
that the best lock against vacuum isn't a SUE, but announcing the
correct ->xmin, so you can be sure that clog entries won't be yanked out
from under you. Potentially with the right flag sets to avoid old enough
tuples eing pruned.Suppose we don't even do anything special in terms of advertising
xmin. What can go wrong? To have a problem, we've got to be running
concurrently with a vacuum that truncates clog. The clog truncation
must happen before our XID lookups, but vacuum has to remove the XIDs
from the heap before it can truncate. So we have to observe the XIDs
before vacuum removes them, but then vacuum has to truncate before we
look them up. But since we observe them and look them up while holding
a ShareLock on the buffer, this seems impossible. What's the flaw in
this argument?
The page could have been wrongly marked all-frozen. There could be
interactions between heap and toast table that are checked. Other bugs
could apply, like a broken hot chain or such.
If we do need to do something special in terms of advertising xmin,
how would you do it? Normally it happens by registering a snapshot,
but here all we would have is an XID; specifically, the value of
relfrozenxid that we observed.
An appropriate procarray or snapmgr function would probably suffice?
Greetings,
Andres Freund
On Fri, Jul 31, 2020 at 3:05 PM Andres Freund <andres@anarazel.de> wrote:
The page could have been wrongly marked all-frozen. There could be
interactions between heap and toast table that are checked. Other bugs
could apply, like a broken hot chain or such.
OK, at least the first two of these do sound like problems. Not sure
about the third one.
If we do need to do something special in terms of advertising xmin,
how would you do it? Normally it happens by registering a snapshot,
but here all we would have is an XID; specifically, the value of
relfrozenxid that we observed.An appropriate procarray or snapmgr function would probably suffice?
Not sure; I guess that'll need some investigation.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jul 30, 2020, at 10:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2, + ctx->toast_rel->rd_att, &isnull));Should we be worrying about the possibility of fastgetattr crapping
out if the TOAST tuple is corrupted?
I think we should, but I'm not sure we should be worrying about it at this location. If the toast index is corrupt, systable_getnext_ordered could trip over the index corruption in the process of retrieving the toast tuple, so checking the toast tuple only helps if the toast index does not cause a crash first. I think the toast index should be checked before this point, ala verify_nbtree, so that we don't need to worry about that here. It might also make more sense to verify the toast table ala verify_heapam prior to here, so we don't have to worry about that here either. But that raises questions about whose responsibility this all is. If verify_heapam checks the toast table and toast index before the main table, that takes care of it, but makes a mess of the idea of verify_heapam taking a start and end block, since verifying the toast index is an all or nothing proposition, not something to be done in incremental pieces. If we leave verify_heapam as it is, then it is up to the caller to check the toast before the main relation, which is more flexible, but is more complicated and requires the user to remember to do it. We could split the difference by having verify_heapam do nothing about toast, leaving it up to the caller, but make pg_amcheck handle it by default, making it easier for users to not think about the issue. Users who want to do incremental checking could still keep track of the chunks that have already been checked, not just for the main relation, but for the toast relation, too, and give start and end block arguments to verify_heapam for the toast table check and then again for the main table check. That doesn't fix the question of incrementally checking the index, though.
Looking at it a slightly different way, I think what is being checked at the point in the code you mention is the logical structure of the toasted value related to the current main table tuple, not the lower level tuple structure of the toast table. We already have a function for checking a heap, namely verify_heapam, and we (or the caller, really) should be using that. The clean way to do things is
verify_heapam(toast_rel)
verify_btreeam(toast_idx)
verify_heapam(main_rel)
and then depending on how fast and loose you want to be, you can use the start and end block arguments, which are inherently a bit half-baked, given the lack of any way to be sure you check precisely the right range of blocks, and also you can be fast and loose about skipping the index check or not, as you see fit.
Thoughts?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Jul 27, 2020 at 10:02 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I'm indifferent about that change. Done for v13.
Moving on with verification of the same index in the event of B-Tree
index corruption is a categorical mistake. verify_nbtree.c was simply
not designed to work that way.
You were determined to avoid allowing any behavior that can result in
a backend crash in the event of corruption, but this design will
defeat various measures I took to avoid crashing with corrupt data
(e.g. in commit a9ce839a313).
What's the point in not just giving up on the index (though not
necessarily the table or other indexes) at the first sign of trouble,
anyway? It makes sense for the heap structure, but not for indexes.
--
Peter Geoghegan
On Thu, Jul 30, 2020 at 10:59 AM Robert Haas <robertmhaas@gmail.com> wrote:
I don't really like the name, either. I get that it's probably
inspired by Perl, but I think it should be given a less-clever name
like report_corruption() or something.
+1 -- confess() is an awful name for this.
--
Peter Geoghegan
On Aug 2, 2020, at 8:59 PM, Peter Geoghegan <pg@bowt.ie> wrote:
What's the point in not just giving up on the index (though not
necessarily the table or other indexes) at the first sign of trouble,
anyway? It makes sense for the heap structure, but not for indexes.
The case that came to mind was an index broken by a glibc update with breaking changes to the collation sort order underlying the index. If the breaking change has already been live in production for quite some time before a DBA notices, they might want to quantify how broken the index has been for the last however many days, not just drop and recreate the index. I'm happy to drop that from the patch, though.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Aug 2, 2020, at 9:13 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Jul 30, 2020 at 10:59 AM Robert Haas <robertmhaas@gmail.com> wrote:
I don't really like the name, either. I get that it's probably
inspired by Perl, but I think it should be given a less-clever name
like report_corruption() or something.+1 -- confess() is an awful name for this.
I was trying to limit unnecessary whitespace changes. s/ereport/econfess/ leaves the function name nearly the same length such that the following lines of indented error text don't usually get moved by pgindent. Given the unpopularity of the name, it's not worth it, so I'll go with Robert's report_corruption, instead.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Aug 3, 2020 at 12:00 AM Peter Geoghegan <pg@bowt.ie> wrote:
Moving on with verification of the same index in the event of B-Tree
index corruption is a categorical mistake. verify_nbtree.c was simply
not designed to work that way.You were determined to avoid allowing any behavior that can result in
a backend crash in the event of corruption, but this design will
defeat various measures I took to avoid crashing with corrupt data
(e.g. in commit a9ce839a313).What's the point in not just giving up on the index (though not
necessarily the table or other indexes) at the first sign of trouble,
anyway? It makes sense for the heap structure, but not for indexes.
I agree that there's a serious design problem with Mark's patch in
this regard, but I disagree that the effort is pointless on its own
terms. You're basically postulating that users don't care how corrupt
their index is: whether there's one problem or one million problems,
it's all the same. If the user presents an index with one million
problems and we tell them about one of them, we've done our job and
can go home.
This doesn't match my experience. When an EDB customer reports
corruption, typically one of the first things I want to understand is
how widespread the problem is. This same issue came up on the thread
about relfrozenxid/relminmxid corruption. If you've got a table with
one or two rows where tuple.xmin < relfrozenxid, that's a different
kind of problem than if 50% of the tuples in the table have tuple.xmin
< relfrozenxid; the latter might well indicate that relfrozenxid value
itself is garbage, while the former indicates that a few tuples
slipped through the cracks somehow. If you're contemplating a recovery
strategy like "nuke the affected tuples from orbit," you really need
to understand which of those cases you've got.
Granted, this is a bit less important with indexes, because in most
cases you're just going to REINDEX. But, even there, the question is
not entirely academic. For instance, consider the case of a user whose
database crashes and then fails to restart because WAL replay fails.
Typically, there is little option here but to run pg_resetwal. At this
point, you know that there is some damage, but you don't know how bad
it is. If there was little system activity at the time of the crash,
there may be only a handful of problems with the database. If there
was a heavy OLTP workload running at the time of the crash, with a
long checkpoint interval, the problems may be widespread. If the user
has done this repeatedly before bothering to contact support, which is
more common than you might suppose, the damage may be extremely
widespread.
Now, you could argue (and not unreasonably) that in any case after
something like this happens even once, the user ought to dump and
restore to get back to a known good state. However, when the cluster
is 10TB in size and there's a $100,000 financial loss for every hour
of downtime, the question naturally arises of how urgent that dump and
restore is. Can we wait until our next maintenance window? Can we at
least wait until off hours? Being able to tell the user whether
they've got a tiny bit of corruption or a whole truckload of
corruption can enable them to make better decisions in such cases, or
at least more educated ones.
Now, again, just replacing ereport(ERROR, ...) with something else
that does not abort the rest of the checks is clearly not OK. I don't
endorse that approach, or anything like it. But neither do I accept
the argument that it would be useless to report all the errors even if
we could do so safely.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Aug 3, 2020 at 11:02 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I was trying to limit unnecessary whitespace changes. s/ereport/econfess/ leaves the function name nearly the same length such that the following lines of indented error text don't usually get moved by pgindent. Given the unpopularity of the name, it's not worth it, so I'll go with Robert's report_corruption, instead.
Yeah, that's not really a good reason for something like that. I think
what you should do is drop the nbtree portion of this for now; the
length of the name then doesn't even matter at all, because all the
code in which this is used will be new code. Even if we were churning
existing code, mechanical stuff like this isn't really a huge problem
most of the time, but there's no need for that here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Aug 3, 2020 at 8:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
I agree that there's a serious design problem with Mark's patch in
this regard, but I disagree that the effort is pointless on its own
terms. You're basically postulating that users don't care how corrupt
their index is: whether there's one problem or one million problems,
it's all the same. If the user presents an index with one million
problems and we tell them about one of them, we've done our job and
can go home.
It's not so much that I think that users won't care about whether any
given index is a bit corrupt or very corrupt. It's more like I don't
think that it's worth the eye-watering complexity, especially without
a real concrete goal in mind. "Counting all the errors, not just the
first" sounds like a tractable goal for the heap/table structure, but
it's just not like that with indexes. If you really wanted to do this,
you'd have to describe a practical scenario under which it made sense
to soldier on, where we'd definitely be able to count the number of
problems in a meaningful way, without much risk of either massively
overcounting or undecounting inconsistencies.
Consider how the search in verify_ntree.c actually works at a high
level. If you thoroughly corrupted one B-Tree leaf page (let's say you
replaced it with an all-zero page image), all pages to the right of
the page would be fundamentally inaccessible to the left-to-right
level search that is coordinated within
bt_check_level_from_leftmost(). And yet, most real index scans can
still be expected to work. How do you know to skip past that one
corrupt leaf page (by going back to the parent to get the next sibling
leaf page) during index verification? That's what it would take to do
this in the general case, I guess. More fundamentally, I wonder how
many inconsistencies one should imagine that this index has, before we
even get into talking about the implementation.
--
Peter Geoghegan
On Mon, Aug 3, 2020 at 1:16 PM Peter Geoghegan <pg@bowt.ie> wrote:
If you really wanted to do this,
you'd have to describe a practical scenario under which it made sense
to soldier on, where we'd definitely be able to count the number of
problems in a meaningful way, without much risk of either massively
overcounting or undecounting inconsistencies.
I completely agree. You have to have a careful plan to make this sort
of thing work - you want to skip checking the things that are
dependent on the part already determined to be bad, without skipping
everything. You need a strategy for where and how to restart checking,
first bypassing whatever needs to be skipped.
Consider how the search in verify_ntree.c actually works at a high
level. If you thoroughly corrupted one B-Tree leaf page (let's say you
replaced it with an all-zero page image), all pages to the right of
the page would be fundamentally inaccessible to the left-to-right
level search that is coordinated within
bt_check_level_from_leftmost(). And yet, most real index scans can
still be expected to work. How do you know to skip past that one
corrupt leaf page (by going back to the parent to get the next sibling
leaf page) during index verification? That's what it would take to do
this in the general case, I guess.
In that particular example, you would want the function that verifies
that page to return some indicator. If it finds that two keys in the
page are out-of-order, it tells the caller that it can still follow
the right-link. But if it finds that the whole page is garbage, then
it tells the caller that it doesn't have a valid right-link and the
caller's got to do something else, like give up on the rest of the
checks or (better) try to recover a pointer to the next page from the
parent.
More fundamentally, I wonder how
many inconsistencies one should imagine that this index has, before we
even get into talking about the implementation.
I think we should try not to imagine anything in particular. Just to
be clear, I am not trying to knock what you have; I know it was a lot
of work to create and it's a huge improvement over having nothing. But
in my mind, a perfect tool would do just what a human being would do
if investigating manually: assume initially that you know nothing -
the index might be totally fine, mildly corrupted in a very localized
way, completely hosed, or anything in between. And it would
systematically try to track that down by traversing the usable
pointers that it has until it runs out of things to do. It does not
seem impossible to build a tool that would allow us to take a big
index and overwrite a random subset of pages with garbage data and
have the tool tell us about all the bad pages that are still reachable
from the root by any path. If you really wanted to go crazy with it,
you could even try to find the bad pages that are not reachable from
the root, by doing a pass after the fact over all the pages that you
didn't otherwise reach. It would be a lot of work to build something
like that and maybe not the best use of time, but if I got to wave
tools into existence using my magic wand, I think that would be the
gold standard.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Aug 4, 2020 at 7:59 AM Robert Haas <robertmhaas@gmail.com> wrote:
I think we should try not to imagine anything in particular. Just to
be clear, I am not trying to knock what you have; I know it was a lot
of work to create and it's a huge improvement over having nothing. But
in my mind, a perfect tool would do just what a human being would do
if investigating manually: assume initially that you know nothing -
the index might be totally fine, mildly corrupted in a very localized
way, completely hosed, or anything in between. And it would
systematically try to track that down by traversing the usable
pointers that it has until it runs out of things to do. It does not
seem impossible to build a tool that would allow us to take a big
index and overwrite a random subset of pages with garbage data and
have the tool tell us about all the bad pages that are still reachable
from the root by any path. If you really wanted to go crazy with it,
you could even try to find the bad pages that are not reachable from
the root, by doing a pass after the fact over all the pages that you
didn't otherwise reach. It would be a lot of work to build something
like that and maybe not the best use of time, but if I got to wave
tools into existence using my magic wand, I think that would be the
gold standard.
I guess that might be true.
With indexes you tend to have redundancy in how relationships among
pages are described. So you have siblings whose pointers must be in
agreement (left points to right, right points to left), and it's not
clear which one you should trust when they don't agree. It's not like
simple heuristics get you all that far. I really can't think of a good
one, and detecting corruption should mean detecting truly exceptional
cases. I guess you could build a model based on Bayesian methods, or
something like that. But that is very complicated, and only used when
you actually have corruption -- which is presumably extremely rare in
reality. That's very unappealing as a project.
I have always believed that the big problem is not "known unknowns".
Rather, I think that the problem is "unknown unknowns". I accept that
you have a point, especially when it comes to heap checking, but even
there the most important consideration should be to make corruption
detection thorough and cheap. The vast vast majority of databases do
not have any corruption at any given time. You're not searching for a
needle in a haystack; you're searching for a needle in many many
haystacks within a field filled with haystacks, which taken together
probably contain no needles at all. (OTOH, once you find one needle
all bets are off, and you could very well go on to find a huge number
of them.)
--
Peter Geoghegan
On Fri, Jul 31, 2020 at 12:33 PM Andres Freund <andres@anarazel.de> wrote:
I'm not sure what I was thinking "back then", but right now I'd argue
that the best lock against vacuum isn't a SUE, but announcing the
correct ->xmin, so you can be sure that clog entries won't be yanked out
from under you. Potentially with the right flag sets to avoid old enough
tuples eing pruned.
I was just thinking about this some more (and talking it over with
Mark) and I think this might actually be a really bad idea. One
problem with it is that it means that the oldest-xmin value can go
backward, which is something that I think has caused us some problems
before. There are some other cases where it can happen, and I'm not
sure that there's any necessarily fatal problem with doing it in this
case, but it would definitely be a shame if this contrib module broke
something for core in a way that was hard to fix. But let's leave that
aside and suppose that there is no fatal problem there. Essentially
what we're talking about here is advertising the table's relfrozenxid
as our xmin. How old is that likely to be? Maybe pretty old. The
default value of vacuum_freeze_table_age is 150 million transactions,
and that's just the trigger to start vacuuming; the actual value of
age(relfrozenxid) could easily be higher than that. But even if it's
only a fraction of that, it's still pretty bad. Advertising an xmin
half that old (75 million transactions) is equivalent to keeping a
snapshot open for an amount of time equal to however long it takes you
to burn through 75 million XIDs. For instance, if you burn 10 million
XIDs/hour, that's the equivalent of keeping a snapshot open for 7.5
hours. In other words, it's quite likely that doing this is going to
make VACUUM (and HOT pruning) drastically less effective throughout
the entire database cluster. To me, this seems a lot worse than just
taking ShareUpdateExclusiveLock on the table. After all,
ShareUpdateExclusiveLock will prevent VACUUM from running on that
table, but it only affects that one table rather than the whole
cluster, and it "only" stops VACUUM from running, which is still
better than having it do lots of I/O but not clean anything up.
I think I see another problem with this approach, too: it's racey. If
some other process has entered vac_update_datfrozenxid() and has
gotten past the calls to GetOldestXmin() and GetOldestMultiXactId(),
and we then advertise an older xmin (and I guess also oldestMXact) it
can still go on to update datfrozenxid/datminmxid and then truncate
the SLRUs. Even holding XactTruncationLock is insufficient to protect
against this race condition, and there doesn't seem to be any other
obvious approach, either.
So I would like to back up a minute and lay out the possible solutions
as I understand them. The specific problem here I'm talking about here
is: how do we keep from looking up an XID or MXID whose information
might have been truncated away from the relevant SLRU?
1. Take a ShareUpdateExclusiveLock on the table. This prevents VACUUM
from running concurrently on this table (which sucks), but that for
sure guarantees that the table's relfrozenxid and relminmxid can't
advance, which precludes a concurrent CLOG truncation.
2. Advertise an older xmin and minimum MXID. See above.
3. Acquire XactTruncationLock for each lookup, like pg_xact_status().
One downside here is a lot of extra lock acquisitions, but we can
mitigate that to some degree by caching the results of lookups, and by
not doing it for XIDs that our newer than our advertised xmin (which
must be OK) or at least as old as the newest XID we previously
discovered to be unsafe to look up (because those must not be OK
either). The problem case is a table with lots of different XIDs that
are all new enough to look up but older than our xmin, e.g. a table
populated using many single-row inserts. But even if we hit this case,
how bad is it really? I don't think XactTruncationLock is particularly
hot, so maybe it just doesn't matter very much. We could contend
against other sessions checking other tables, or against widespread
use of pg_xact_status(), but I think that's about it. Another downside
of this approach is that I'm not sure it does anything to help us with
the MXID case; fixing that might require building some new
infrastructure similar to XactTruncationLock but for MXIDs.
4. Provide entrypoints for looking up XIDs that fail gently instead of
throwing errors. I've got my doubts about how practical this is; if
it's easy, why didn't we do that instead of inventing
XactTruncationLock?
Maybe there are other options here, too? At the moment, I'm thinking
that (2) and (4) are just bad and so we ought to either do (3) if it
doesn't suck too much for performance (which I don't quite see why it
should, but it might) or else fall back on (1). (1) doesn't feel
clever enough but it might be better to be not clever enough than to
be too clever.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Aug 4, 2020 at 12:00 PM Peter Geoghegan <pg@bowt.ie> wrote:
With indexes you tend to have redundancy in how relationships among
pages are described. So you have siblings whose pointers must be in
agreement (left points to right, right points to left), and it's not
clear which one you should trust when they don't agree. It's not like
simple heuristics get you all that far. I really can't think of a good
one, and detecting corruption should mean detecting truly exceptional
cases. I guess you could build a model based on Bayesian methods, or
something like that. But that is very complicated, and only used when
you actually have corruption -- which is presumably extremely rare in
reality. That's very unappealing as a project.
I think it might be possible to distinguish between different types of
corruption and to separate, at least to some degree, the checking
associated with each type. I think one can imagine something that
checks the structure of a btree without regard to the contents. That
is, it cares that left and right links are consistent with each other
and with downlinks from the parent level. So it checks things like the
left link of the page to which my right link points is pointing back
to me, and that's also the page to which my parent's next downlink
points. It could also verify that there's a proper tree structure,
where every page has a well-defined tree level. So you assign the root
page level 1, and each time you traverse a downlink you assign that
page a level one larger. If you ever try to assign to a page a level
unequal to the level previously assigned to it, you report that as a
problem. You can check, too, that if a page does not have a left or
right link, it's actually the last page at that level according what
you saw at the parent, grandparent, etc. levels. Finally, you can
check that all of the max-level pages you can find are leaf pages, and
the others are all internal pages. All of this structural stuff can be
verified without caring a whit about what keys you've got or what they
mean or whether there's even a heap associated with this index.
Now a second type of checking, which can also be done without regard
to keys, is checking that the TIDs in the index point to TIDs that are
on heap pages that actually exist, and that the corresponding items
are not unused, nor are they tuples which are not the root of a HOT
chain. Passing a check of this type doesn't prove that the index and
heap are consistent, but failing it proves that they are inconsistent.
This kind of check can be done on every leaf index page you can find
by any means even if it fails the structural checks described above.
Failure of these checks on one page does not preclude checking the
same invariants for other pages. Let's call this kind of thing "basic
index-heap sanity checking."
A third type of checking is to verify the relationship between the
index keys within and across the index pages: are the keys actually in
order within a page, and are they in order across pages? The first
part of this can be checked individually for each page pretty much no
matter what other problems we may have; we only have to abandon this
checking for a particular page if it's total garbage and we cannot
identify any index items on the page at all. The second part, though,
has the problem you mention. I think the solution is to skip the
second part of the check for any pages that failed related structural
checks. For example, if my right sibling thinks that I am not its left
sibling, or my right sibling and I agree that we are siblings but do
not agree on who our parent is, or if that parent does not agree that
we have the same sibling relationship that we think we have, then we
should report that problem and forget about issuing any complaints
about the relationship between my key space and that sibling's key
space. The internal consistency of each page with respect to key
ordering can still be verified, though, and it's possible that my key
space can be validly compared to the key space of my other sibling, if
the structural checks pass on that side.
A fourth type of checking is to verify the index key against the keys
in the heap tuples to which they point, but only for index tuples that
passed the basic index-heap sanity checking and where the tuples have
not been pruned. This can be sensibly done even if the structural
checks or index-ordering checks have failed.
I don't mean to suggest that one would implement all of these things
as separate phases; that would be crazy expensive, and what if things
changed by the time you visit the page? Rather, the checks likely
ought to be interleaved, just keeping track internally of which things
need to be skipped because prerequisite checks have already failed.
Aside from providing a way to usefully continue after errors, this
would also be useful in certain scenarios where you want to know what
kind of corruption you have. For example, suppose that I start getting
wrong answers from index lookups on a particular index. Upon
investigation, it turns out that my last glibc update changed my OS
collation definitions for the collation I'm using, and therefore it is
to be expected that some of my keys may appear to be out of order with
respect to the new definitions. Now what I really want to know before
running REINDEX is that this is the only problem I have. It would be
amazing if I could run the tool and have it give me a list of problems
so that I could confirm that I have only index-ordering problems, not
any other kind, and even more amazing if it could tell me the specific
keys that were affected so that I could understand exactly how the
sorting behavior changed. If I were to discover that my index also has
structural problems or inconsistencies with the heap, then I'd know
that it couldn't be right to blame it only the collation update;
something else has gone wrong.
I'm speaking here with fairly limited knowledge of the details of how
all this actually works and, again, I'm not trying to suggest that you
or anyone is obligated to do any work on this, or that it would be
easy to accomplish or worth the time it took. I'm just trying to
sketch out what I see as maybe being theoretically possible, and why I
think it would be useful if it did.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Aug 4, 2020 at 9:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
I think it might be possible to distinguish between different types of
corruption and to separate, at least to some degree, the checking
associated with each type. I think one can imagine something that
checks the structure of a btree without regard to the contents. That
is, it cares that left and right links are consistent with each other
and with downlinks from the parent level. So it checks things like the
left link of the page to which my right link points is pointing back
to me, and that's also the page to which my parent's next downlink
points.
I think that this kind of phased approach to B-Tree verification is
possible, more or less, but hard to justify. And it seems impossible
to do with only an AccessShareLock.
It's not clear that what you describe is much better than just
checking a bunch of indexes and seeing what patterns emerge. For
example, the involvement of collated text might be a common factor
across indexes. That kind of pattern is the first thing that I look
for, and often the only thing. It also serves to give me an idea of
how messed up things are. There are not that many meaningful degrees
of messed-up with indexes in my experience. The first error really
does tell you most of what you need to know about any given corrupt
index. Kind of like how you can bucket the number of cockroaches in
your home into perhaps three meaningful buckets: 0 cockroaches, at
least 1 cockroach, and lots of cockroaches. (Even there, if you really
care about the distinction between the second and third bucket,
something has gone terribly wrong -- so even three buckets seems like
a lot to me.)
FWIW, current DEBUG1 + DEBUG2 output for amcheck shows you quite a lot
of details about the tree structure. It's a handy way of getting a
sense of what's going on at a high level. For example, if index
corruption is found very early on, that strongly suggests that it's
pretty pervasive.
Now a second type of checking, which can also be done without regard
to keys, is checking that the TIDs in the index point to TIDs that are
on heap pages that actually exist, and that the corresponding items
are not unused, nor are they tuples which are not the root of a HOT
chain. Passing a check of this type doesn't prove that the index and
heap are consistent, but failing it proves that they are inconsistent.
This kind of check can be done on every leaf index page you can find
by any means even if it fails the structural checks described above.
Failure of these checks on one page does not preclude checking the
same invariants for other pages. Let's call this kind of thing "basic
index-heap sanity checking."
One real weakness in the current code is our inability to detect index
tuples that are in the correct order and so on, but point to the wrong
thing -- we can detect that if it manifests itself as the absence of
an index tuple that should be in the index (when you use
heapallindexed verification), but we cannot *reliably* detect the
presence of an index tuple that shouldn't be in the index at all
(though in practice it probably mostly gets caught).
The checks on the tree structure itself are excellent with
bt_index_parent_check() following Alexander's commit d114cc53 (which I
thought was really excellent work). But we still have that one
remaining blind spot in verify_nbtree.c, even when you opt in to every
possible type of verification (i.e. bt_index_parent_check() with all
options). I'd much rather fix that, or help with the new heap checker
stuff.
A fourth type of checking is to verify the index key against the keys
in the heap tuples to which they point, but only for index tuples that
passed the basic index-heap sanity checking and where the tuples have
not been pruned. This can be sensibly done even if the structural
checks or index-ordering checks have failed.
That's going to require the equivalent of a merge join, which is
terribly expensive relative to such a small benefit.
Aside from providing a way to usefully continue after errors, this
would also be useful in certain scenarios where you want to know what
kind of corruption you have. For example, suppose that I start getting
wrong answers from index lookups on a particular index. Upon
investigation, it turns out that my last glibc update changed my OS
collation definitions for the collation I'm using, and therefore it is
to be expected that some of my keys may appear to be out of order with
respect to the new definitions. Now what I really want to know before
running REINDEX is that this is the only problem I have. It would be
amazing if I could run the tool and have it give me a list of problems
so that I could confirm that I have only index-ordering problems, not
any other kind, and even more amazing if it could tell me the specific
keys that were affected so that I could understand exactly how the
sorting behavior changed.
This detail seems really hard. There are probably lots of cases where
the sorting behavior changed but it just didn't affect you, given the
data you had -- it just so happened that you didn't have exactly the
wrong kind of diacritic mark or whatever. After all, revisions to how
strings in a given natural language are supposed to sort are likely to
be relatively rare and relatively obscure (even among people that
speak the language in question). Also, the task of figuring out if the
tuple to the left or the right is in the wrong order seems kind of
daunting.
Meanwhile, a simple smoke test covering many indexes probably gives
you a fairly meaningful idea of the extent of the damage, without
requiring that we do any hard engineering work.
I'm speaking here with fairly limited knowledge of the details of how
all this actually works and, again, I'm not trying to suggest that you
or anyone is obligated to do any work on this, or that it would be
easy to accomplish or worth the time it took. I'm just trying to
sketch out what I see as maybe being theoretically possible, and why I
think it would be useful if it did.
I don't think that your relatively limited knowledge of the B-Tree
code is an issue here -- your intuitions seem pretty reasonable. I
appreciate your perspective here. Corruption detection presents us
with some odd qualitative questions of the kind that are just awkward
to discuss. Discouraging perspectives that don't quite match my own
would be quite counterproductive.
That having been said, I suspect that this is a huge task for a small
benefit. It's exceptionally hard to test because you have lots of
non-trivial code that only gets used in circumstances that by
definition should never happen. If users really needed to recover the
data in the index then maybe it would happen -- but they don't.
The biggest problem that amcheck currently has is that it isn't used
enough, because it isn't positioned as a general purpose tool at all.
I'm hoping that the work from Mark helps with that.
--
Peter Geoghegan
On Tue, Aug 4, 2020 at 9:06 PM Peter Geoghegan <pg@bowt.ie> wrote:
of messed-up with indexes in my experience. The first error really
does tell you most of what you need to know about any given corrupt
index. Kind of like how you can bucket the number of cockroaches in
your home into perhaps three meaningful buckets: 0 cockroaches, at
least 1 cockroach, and lots of cockroaches. (Even there, if you really
care about the distinction between the second and third bucket,
something has gone terribly wrong -- so even three buckets seems like
a lot to me.)
Not sure I agree with this. As a homeowner, the distinction between 0
and 1 is less significant to me than the distinction between a few
(preferably in places where I'll never see them) and whole lot. I
agree with you to an extent though: all I really care about is whether
I have too few to worry about, enough that I'd better try to take care
of it somehow, or so many that I need a professional exterminator. If,
however, I were a professional exterminator, I would be unhappy with
just knowing that there are few problems or many. I suspect I would
want to know something about where the problems were, and get a more
nuanced indication of just how bad things are in each location.
FWIW, pg_catcheck is an example of an existing tool (designed by me
and written partially by me) that uses the kind of model I'm talking
about. It does a single SELECT * FROM pg_<whatever> on each catalog
table - so that it doesn't get confused if your system catalog indexes
are messed up - and then performs a bunch of cross-checks on the
tuples it gets back and tells you about all the messed up stuff. If it
can't get data from all your catalog tables it performs whichever
checks are valid given what data it was able to get. As a professional
exterminator of catalog corruption, I find it quite helpful. If
someone sends me the output from a database cluster, I can tell right
away whether they are just fine, in a little bit of trouble, or in a
whole lot of trouble; I can speculate pretty well about what kind of
thing might've happened to cause the problem; and I can recommend
steps to straighten things out.
FWIW, current DEBUG1 + DEBUG2 output for amcheck shows you quite a lot
of details about the tree structure. It's a handy way of getting a
sense of what's going on at a high level. For example, if index
corruption is found very early on, that strongly suggests that it's
pretty pervasive.
Interesting.
A fourth type of checking is to verify the index key against the keys
in the heap tuples to which they point, but only for index tuples that
passed the basic index-heap sanity checking and where the tuples have
not been pruned. This can be sensibly done even if the structural
checks or index-ordering checks have failed.That's going to require the equivalent of a merge join, which is
terribly expensive relative to such a small benefit.
I think it depends on how big your data is. If you've got a 2TB table
and 512GB of RAM, it's pretty impractical no matter the algorithm. But
for small tables even a naive nested loop will suffice.
Meanwhile, a simple smoke test covering many indexes probably gives
you a fairly meaningful idea of the extent of the damage, without
requiring that we do any hard engineering work.
In my experience, when EDB customers complain about corruption-related
problems, the two most common patterns are: (1) my whole system is
messed up and (2) I have one or a few specific objects which are
messed up and everything else is fine. The first category is often
something like inability to start the database, or scary messages in
the log file complaining about, say, checkpoints failing. The second
category is the one I'm worried about here. The people who are in this
category generally already know which things are broken; they've
figured that out through trial and error. Sometimes they miss some
problems, but more frequently, in my experience, their understanding
of what problems they have is accurate. Now that category of users can
be further decomposed into two groups: the people who don't care what
happened and just want to barrel through it, and the people who do
care what happened and want to know what happened, why it happened,
whether it's a bug, etc. The first group are unproblematic: tell them
to REINDEX (or restore from backup, or whatever) and you're done.
The second group is a lot harder. It is in general difficult to
speculate about how something that is now wrong got that way given
knowledge only of the present state of affairs. But good tooling makes
it easier to speculate intelligently. To take a classic example,
there's a great difference between a checksum failure caused by the
checksum being incorrect on an otherwise-valid page; a checksum
failure on a page the first half of which appears valid and the second
half of which looks like it might be some other database page; and a
checksum failure on a page whose contents appear to be taken from a
Microsoft Word document. I'm not saying we ever want a tool which
tries to figure that sort of thing out in an automated way; there's no
substitute for human intelligence (yet, anyway). But, the more the
tools we do have localize the problems to particular pages or tuples
and describe them accurately, the easier it is to do manual
investigation as follow-up, when it's necessary.
That having been said, I suspect that this is a huge task for a small
benefit. It's exceptionally hard to test because you have lots of
non-trivial code that only gets used in circumstances that by
definition should never happen. If users really needed to recover the
data in the index then maybe it would happen -- but they don't.
Yep, that's a very key difference as compared to the heap.
The biggest problem that amcheck currently has is that it isn't used
enough, because it isn't positioned as a general purpose tool at all.
I'm hoping that the work from Mark helps with that.
Agreed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Aug 5, 2020 at 7:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
Not sure I agree with this. As a homeowner, the distinction between 0
and 1 is less significant to me than the distinction between a few
(preferably in places where I'll never see them) and whole lot. I
agree with you to an extent though: all I really care about is whether
I have too few to worry about, enough that I'd better try to take care
of it somehow, or so many that I need a professional exterminator. If,
however, I were a professional exterminator, I would be unhappy with
just knowing that there are few problems or many. I suspect I would
want to know something about where the problems were, and get a more
nuanced indication of just how bad things are in each location.
Right, but the professional exterminator can be expected to use expert
level tools, where a great deal of technical sophistication is
required to interpret what's going on sensibly. An amatuer can only
use them to determine if something is wrong at all, which is usually
not how they add value.
(I think that my analogy is slightly flawed in that it hinged upon
everybody hating cockroaches as much as I do, which is more than the
ordinary amount.)
FWIW, pg_catcheck is an example of an existing tool (designed by me
and written partially by me) that uses the kind of model I'm talking
about. It does a single SELECT * FROM pg_<whatever> on each catalog
table - so that it doesn't get confused if your system catalog indexes
are messed up - and then performs a bunch of cross-checks on the
tuples it gets back and tells you about all the messed up stuff. If it
can't get data from all your catalog tables it performs whichever
checks are valid given what data it was able to get. As a professional
exterminator of catalog corruption, I find it quite helpful.
I myself seem to have had quite different experiences with corruption,
presumably because it happened at product companies like Heroku. I
tended to find software bugs (e.g. the one fixed by commit 008c4135)
that were rare and novel by casting a wide net over a large number of
relatively homogenous databases. Whereas your experiences tend to
involve large support customers with more opportunity for operator
error. Both perspectives are important.
The second group is a lot harder. It is in general difficult to
speculate about how something that is now wrong got that way given
knowledge only of the present state of affairs. But good tooling makes
it easier to speculate intelligently. To take a classic example,
there's a great difference between a checksum failure caused by the
checksum being incorrect on an otherwise-valid page; a checksum
failure on a page the first half of which appears valid and the second
half of which looks like it might be some other database page; and a
checksum failure on a page whose contents appear to be taken from a
Microsoft Word document. I'm not saying we ever want a tool which
tries to figure that sort of thing out in an automated way; there's no
substitute for human intelligence (yet, anyway).
I wrote my own expert level tool, pg_hexedit. I have to admit that the
level of interest in that tool doesn't seem to be all that great,
though I myself have used it to investigate corruption to great
effect. But I suppose there is no way to know how it's being used.
--
Peter Geoghegan
On Wed, Aug 5, 2020 at 4:36 PM Peter Geoghegan <pg@bowt.ie> wrote:
Right, but the professional exterminator can be expected to use expert
level tools, where a great deal of technical sophistication is
required to interpret what's going on sensibly. An amatuer can only
use them to determine if something is wrong at all, which is usually
not how they add value.
Quite true.
I myself seem to have had quite different experiences with corruption,
presumably because it happened at product companies like Heroku. I
tended to find software bugs (e.g. the one fixed by commit 008c4135)
that were rare and novel by casting a wide net over a large number of
relatively homogenous databases. Whereas your experiences tend to
involve large support customers with more opportunity for operator
error. Both perspectives are important.
I concur.
I wrote my own expert level tool, pg_hexedit. I have to admit that the
level of interest in that tool doesn't seem to be all that great,
though I myself have used it to investigate corruption to great
effect. But I suppose there is no way to know how it's being used.
I admit not to having tried pg_hexedit, but I doubt that it would help
me very much outside of my own development work. The problem is that
in a typical case I am trying to help someone in a professional
capacity without access to their machines, and without knowledge of
their environment or data. Moreover, sometimes the person I'm trying
to help is an unreliable narrator. I can ask people to run tools they
have and send the output, and then I can look at that output and tell
them what to do next. But it has to be a tool they have (or they can
easily get) and it can't involve any complicated if-then stuff.
Something like "if the page is totally garbled then do X but if it
looks mostly OK then do Y" is radically out of reach. They have no
clue about that. Hence my interest in tools that automate as much of
the investigation as may be practical.
We're probably beating this topic to death at this point; I don't
think we are really in any sort of meaningful disagreement, and the
next steps in this particular case seem clear enough.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Jul 30, 2020 at 11:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jul 27, 2020 at 1:02 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:Not at all! I appreciate all the reviews.
Reviewing 0002, reading through verify_heapam.c:
+typedef enum SkipPages +{ + SKIP_ALL_FROZEN_PAGES, + SKIP_ALL_VISIBLE_PAGES, + SKIP_PAGES_NONE +} SkipPages;This looks inconsistent. Maybe just start them all with SKIP_PAGES_.
+ if (PG_ARGISNULL(0)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("missing required parameter for 'rel'")));This doesn't look much like other error messages in the code. Do
something like git grep -A4 PG_ARGISNULL | grep -A3 ereport and study
the comparables.+ ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("unrecognized parameter for 'skip': %s", skip), + errhint("please choose from 'all-visible', 'all-frozen', or 'none'")));Same problem. Check pg_prewarm's handling of the prewarm type, or
EXPLAIN's handling of the FORMAT option, or similar examples. Read the
message style guidelines concerning punctuation of hint and detail
messages.+ * Bugs in pg_upgrade are reported (see commands/vacuum.c circa line 1572) + * to have sometimes rendered the oldest xid value for a database invalid. + * It seems unwise to report rows as corrupt for failing to be newer than + * a value which itself may be corrupt. We instead use the oldest xid for + * the entire cluster, which must be at least as old as the oldest xid for + * our database.This kind of reference to another comment will not age well; line
numbers and files change a lot. But I think the right thing to do here
is just rely on relfrozenxid and relminmxid. If the table is
inconsistent with those, then something needs fixing. datfrozenxid and
the cluster-wide value can look out for themselves. The corruption
detector shouldn't be trying to work around any bugs in setting
relfrozenxid itself; such problems are arguably precisely what we're
here to find.+/* + * confess + * + * Return a message about corruption, including information + * about where in the relation the corruption was found. + * + * The msg argument is pfree'd by this function. + */ +static void +confess(HeapCheckContext *ctx, char *msg)Contrary to what the comments say, the function doesn't return a
message about corruption or anything else. It returns void.I don't really like the name, either. I get that it's probably
inspired by Perl, but I think it should be given a less-clever name
like report_corruption() or something.+ * corrupted table from using workmem worth of memory building up the
This kind of thing destroys grep-ability. If you're going to refer to
work_mem, you gotta spell it the same way we do everywhere else.+ * Helper function to construct the TupleDesc needed by verify_heapam.
Instead of saying it's the TupleDesc somebody needs, how about saying
that it's the TupleDesc that we'll use to report problems that we find
while scanning the heap, or something like that?+ * Given a TransactionId, attempt to interpret it as a valid + * FullTransactionId, neither in the future nor overlong in + * the past. Stores the inferred FullTransactionId in *fxid.It really doesn't, because there's no such thing as 'fxid' referenced
anywhere here. You should really make the effort to proofread your
patches before posting, and adjust comments and so on as you go.
Otherwise reviewing takes longer, and if you keep introducing new
stuff like this as you fix other stuff, you can fail to ever produce a
committable patch.+ * Determine whether tuples are visible for verification. Similar to + * HeapTupleSatisfiesVacuum, but with critical differences.Not accurate, because it also reports problems, which is not mentioned
anywhere in the function header comment that purports to be a detailed
description of what the function does.+ else if (TransactionIdIsCurrentTransactionId(raw_xmin)) + return true; /* insert or delete in progress */ + else if (TransactionIdIsInProgress(raw_xmin)) + return true; /* HEAPTUPLE_INSERT_IN_PROGRESS */ + else if (!TransactionIdDidCommit(raw_xmin)) + { + return false; /* HEAPTUPLE_DEAD */ + }One of these cases is not punctuated like the others.
+ pstrdup("heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor
has a valid xmax"));1. I don't think that's very grammatical.
2. Why abbreviate HEAP_XMAX_IS_MULTI to XMAX_IS_MULTI and
HEAP_XMAX_IS_LOCKED_ONLY to LOCKED_ONLY? I don't even think you should
be referencing C constant names here at all, and if you are I don't
think you should abbreviate, and if you do abbreviate I don't think
you should omit different numbers of words depending on which constant
it is.I wonder what the intended division of responsibility is here,
exactly. It seems like you've ended up with some sanity checks in
check_tuple() before tuple_is_visible() is called, and others in
tuple_is_visible() proper. As far as I can see the comments don't
really discuss the logic behind the split, but there's clearly a close
relationship between the two sets of checks, even to the point where
you have "heap tuple with XMAX_IS_MULTI is neither LOCKED_ONLY nor has
a valid xmax" in tuple_is_visible() and "tuple xmax marked
incompatibly as keys updated and locked only" in check_tuple(). Now,
those are not the same check, but they seem like closely related
things, so it's not ideal that they happen in different functions with
differently-formatted messages to report problems and no explanation
of why it's different.I think it might make sense here to see whether you could either move
more stuff out of tuple_is_visible(), so that it really just checks
whether the tuple is visible, or move more stuff into it, so that it
has the job not only of checking whether we should continue with
checks on the tuple contents but also complaining about any other
visibility problems. Or if neither of those make sense then there
should be a stronger attempt to rationalize in the comments what
checks are going where and for what reason, and also a stronger
attempt to rationalize the message wording.+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2, + ctx->toast_rel->rd_att, &isnull));Should we be worrying about the possibility of fastgetattr crapping
out if the TOAST tuple is corrupted?+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len) + { + confess(ctx, + psprintf("tuple attribute should start at offset %u, but tuple length is only %u", + ctx->tuphdr->t_hoff + ctx->offset, ctx->lp_len)); + return false; + } + + /* Skip null values */ + if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits)) + return true; + + /* Skip non-varlena values, but update offset first */ + if (thisatt->attlen != -1) + { + ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign); + ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen, + tp + ctx->offset); + return true; + }This looks like it's not going to complain about a fixed-length
attribute that overruns the tuple length. There's code further down
that handles that case for a varlena attribute, but there's nothing
comparable for the fixed-length case.+ confess(ctx, + psprintf("%s toast at offset %u is unexpected", + va_tag == VARTAG_INDIRECT ? "indirect" : + va_tag == VARTAG_EXPANDED_RO ? "expanded" : + va_tag == VARTAG_EXPANDED_RW ? "expanded" : + "unexpected", + ctx->tuphdr->t_hoff + ctx->offset));I suggest "unexpected TOAST tag %d", without trying to convert to a
string. Such a conversion will likely fail in the case of genuine
corruption, and isn't meaningful even if it works.Again, let's try to standardize terminology here: most of the messages
in this function are now of the form "tuple attribute %d has some
problem" or "attribute %d has some problem", but some have neither.
Since we're separately returning attnum I don't see why it should be
in the message, and if we weren't separately returning attnum then it
ought to be in the message the same way all the time, rather than
sometimes writing "attribute" and other times "tuple attribute".+ /* Check relminmxid against mxid, if any */ + xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr); + if (infomask & HEAP_XMAX_IS_MULTI && + MultiXactIdPrecedes(xmax, ctx->relminmxid)) + { + confess(ctx, + psprintf("tuple xmax %u precedes relminmxid %u", + xmax, ctx->relminmxid)); + fatal = true; + }There are checks that an XID is neither too old nor too new, and
presumably something similar could be done for MultiXactIds, but here
you only check one end of the range. Seems like you should check both.+ /* Check xmin against relfrozenxid */ + xmin = HeapTupleHeaderGetXmin(ctx->tuphdr); + if (TransactionIdIsNormal(ctx->relfrozenxid) && + TransactionIdIsNormal(xmin)) + { + if (TransactionIdPrecedes(xmin, ctx->relfrozenxid)) + { + confess(ctx, + psprintf("tuple xmin %u precedes relfrozenxid %u", + xmin, ctx->relfrozenxid)); + fatal = true; + } + else if (!xid_valid_in_rel(xmin, ctx)) + { + confess(ctx, + psprintf("tuple xmin %u follows last assigned xid %u", + xmin, ctx->next_valid_xid)); + fatal = true; + } + }Here you do check both ends of the range, but the comment claims
otherwise. Again, please proof-read for this kind of stuff.+ /* Check xmax against relfrozenxid */
Ditto here.
+ psprintf("tuple's header size is %u bytes which is less than the %u
byte minimum valid header size",I suggest: tuple data begins at byte %u, but the tuple header must be
at least %u bytes+ psprintf("tuple's %u byte header size exceeds the %u byte length of
the entire tuple",I suggest: tuple data begins at byte %u, but the entire tuple length
is only %u bytes+ psprintf("tuple's user data offset %u not maximally aligned to %u",
I suggest: tuple data begins at byte %u, but that is not maximally aligned
Or: tuple data begins at byte %u, which is not a multiple of %uThat makes the messages look much more similar to each other
grammatically and is more consistent about calling things by the same
names.+ psprintf("tuple with null values has user data offset %u rather than the expected offset %u", + psprintf("tuple without null values has user data offset %u rather than the expected offset %u",I suggest merging these: tuple data offset %u, but expected offset %u
(%u attributes, %s)
where %s is either "has nulls" or "no nulls"In fact, aren't several of the above checks redundant with this one?
Like, why check for a value less than SizeofHeapTupleHeader or that's
not properly aligned first? Just check this straightaway and call it
good.+ * If we get this far, the tuple is visible to us, so it must not be + * incompatible with our relDesc. The natts field could be legitimately + * shorter than rel's natts, but it cannot be longer than rel's natts.This is yet another case where you didn't update the comments.
tuple_is_visible() now checks whether the tuple is visible to anyone,
not whether it's visible to us, but the comment doesn't agree. In some
sense I think this comment is redundant with the previous one anyway,
because that one already talks about the tuple being visible. Maybe
just write: The tuple is visible, so it must be compatible with the
current version of the relation descriptor. It might have fewer
columns than are present in the relation descriptor, but it cannot
have more.+ psprintf("tuple has %u attributes in relation with only %u attributes", + ctx->natts, + RelationGetDescr(ctx->rel)->natts));I suggest: tuple has %u attributes, but relation has only %u attributes
+ /* + * Iterate over the attributes looking for broken toast values. This + * roughly follows the logic of heap_deform_tuple, except that it doesn't + * bother building up isnull[] and values[] arrays, since nobody wants + * them, and it unrolls anything that might trip over an Assert when + * processing corrupt data. + */ + ctx->offset = 0; + for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++) + { + if (!check_tuple_attribute(ctx)) + break; + }I think this comment is too wordy. This text belongs in the header
comment of check_tuple_attribute(), not at the place where it gets
called. Otherwise, as you update what check_tuple_attribute() does,
you have to remember to come find this comment and fix it to match,
and you might forget to do that. In fact... looks like that already
happened, because check_tuple_attribute() now checks more than broken
TOAST attributes. Seems like you could just simplify this down to
something like "Now check each attribute." Also, you could lose the
extra braces.- bt_index_check | relname | relpages + bt_index_check | relname | relpagesDon't include unrelated changes in the patch.
I'm not really sure that the list of fields you're displaying for each
reported problem really makes sense. I think the theory here should be
that we want to report the information that the user needs to localize
the problem but not everything that they could find out from
inspecting the page, and not things that are too specific to
particular classes of errors. So I would vote for keeping blkno,
offnum, and attnum, but I would lose lp_flags, lp_len, and chunk.
lp_off feels like it's a more arguable case: technically, it's a
locator for the problem, because it gives you the byte offset within
the page, but normally we reference tuples by TID, i.e. (blkno,
offset), not byte offset. On balance I'd be inclined to omit it.--
In addition to this, I found a few more things while reading v13 patch are as
below:
Patch v13-0001:
-
+#include "amcheck.h"
Not in correct order.
+typedef struct BtreeCheckContext
+{
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+ bool is_corrupt;
+ bool on_error_stop;
+} BtreeCheckContext;
Unnecessary spaces/tabs between } and BtreeCheckContext.
static void bt_index_check_internal(Oid indrelid, bool parentcheck,
- bool heapallindexed, bool rootdescend);
+ bool heapallindexed, bool rootdescend,
+ BtreeCheckContext * ctx);
Unnecessary space between * and ctx. The same changes needed for other places as
well.
---
Patch v13-0002:
+-- partitioned tables (the parent ones) don't have visibility maps
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+-- these should all fail
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+create table test_partition partition of test_partitioned for values in (1);
+create index test_index on test_partition (a);
Can't we make it work? If the input is partitioned, I think we could
collect all its leaf partitions and process them one by one. Thoughts?
+ ctx->chunkno++;
Instead of incrementing in check_toast_tuple(), I think incrementing should
happen at the caller -- just after check_toast_tuple() call.
---
Patch v13-0003:
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);
resetPQExpBuffer() will be unnecessary if the next call is destroyPQExpBuffer().
+ appendPQExpBuffer(query,
+ "SELECT c.relname, v.blkno, v.offnum, v.lp_off, "
+ "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg"
+ "\nFROM verify_heapam(rel := %u, on_error_stop := %s, "
+ "skip := %s, startblock := %s, endblock := %s) v, "
+ "pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, settings.startblock,
+ settings.endblock, tbloid
pg_class should be schema-qualified like elsewhere. IIUC, pg_class is meant to
get relname only, instead, we could use '%u'::pg_catalog.regclass in the target
list for the relname. Thoughts?
Also I think we should skip '\n' from the query string (see appendPQExpBuffer()
in pg_dump.c)
+ appendPQExpBuffer(query,
+ "SELECT i.indexrelid"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c"
+ "\nWHERE i.indexrelid = c.oid"
+ "\n AND c.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+
+ ExecuteSqlStatement("RESET search_path");
+ res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));
I don't think we need the search_path query. The main query doesn't have any
dependencies on it. Same is in check_indexes(), check_index (),
expand_table_name_patterns() & get_table_check_list().
Correct me if I am missing something.
+ output = PageOutput(lines + 2, NULL);
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ fprintf(output, "%s\n", usage_text[lineno]);
+ fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
I am not sure why we want PageOutput() if the second argument is always going to
be NULL? Can't we directly use printf() instead of PageOutput() + fprintf() ?
e.g. usage() function in pg_basebackup.c.
Regards,
Amul
On Aug 16, 2020, at 9:37 PM, Amul Sul <sulamul@gmail.com> wrote:
In addition to this, I found a few more things while reading v13 patch are as
below:Patch v13-0001:
- +#include "amcheck.h"Not in correct order.
Fixed.
+typedef struct BtreeCheckContext +{ + TupleDesc tupdesc; + Tuplestorestate *tupstore; + bool is_corrupt; + bool on_error_stop; +} BtreeCheckContext;Unnecessary spaces/tabs between } and BtreeCheckContext.
This refers to a change in verify_nbtree.c that has been removed. Per discussions with Peter and Robert, I have simply withdrawn that portion of the patch.
static void bt_index_check_internal(Oid indrelid, bool parentcheck, - bool heapallindexed, bool rootdescend); + bool heapallindexed, bool rootdescend, + BtreeCheckContext * ctx);Unnecessary space between * and ctx. The same changes needed for other places as
well.
Same as above. The changes to verify_nbtree.c have been withdrawn.
---
Patch v13-0002:
+-- partitioned tables (the parent ones) don't have visibility maps +create table test_partitioned (a int, b text default repeat('x', 5000)) + partition by list (a); +-- these should all fail +select * from verify_heapam('test_partitioned', + on_error_stop := false, + skip := NULL, + startblock := NULL, + endblock := NULL); +ERROR: "test_partitioned" is not a table, materialized view, or TOAST table +create table test_partition partition of test_partitioned for values in (1); +create index test_index on test_partition (a);Can't we make it work? If the input is partitioned, I think we could
collect all its leaf partitions and process them one by one. Thoughts?
I was following the example from pg_visibility. I haven't thought about your proposal enough to have much opinion as yet, except that if we do this for pg_amcheck we should do likewise to pg_visibility, for consistency of the user interface.
+ ctx->chunkno++;
Instead of incrementing in check_toast_tuple(), I think incrementing should
happen at the caller -- just after check_toast_tuple() call.
I agree.
---
Patch v13-0003:
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);resetPQExpBuffer() will be unnecessary if the next call is destroyPQExpBuffer().
Thanks. I removed it in cases where destroyPQExpBuffer is obviously the very next call.
+ appendPQExpBuffer(query, + "SELECT c.relname, v.blkno, v.offnum, v.lp_off, " + "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg" + "\nFROM verify_heapam(rel := %u, on_error_stop := %s, " + "skip := %s, startblock := %s, endblock := %s) v, " + "pg_class c" + "\nWHERE c.oid = %u", + tbloid, stop, skip, settings.startblock, + settings.endblock, tbloidpg_class should be schema-qualified like elsewhere.
Agreed, and changed.
IIUC, pg_class is meant to
get relname only, instead, we could use '%u'::pg_catalog.regclass in the target
list for the relname. Thoughts?
get_table_check_list() creates the list of all tables to be checked, which check_tables() then iterates over, calling check_table() for each one. I think some verification that the table still exists is in order. Using '%u'::pg_catalog.regclass for a table that has since been dropped would pass in the old table Oid and draw an error of the 'ERROR: could not open relation with OID 36311' variety, whereas the current coding will just skip the dropped table.
Also I think we should skip '\n' from the query string (see appendPQExpBuffer()
in pg_dump.c)
I'm not sure I understand. pg_dump.c uses "\n" in query strings it passes to appendPQExpBuffer(), in a manner very similar to what this patch does.
+ appendPQExpBuffer(query, + "SELECT i.indexrelid" + "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c" + "\nWHERE i.indexrelid = c.oid" + "\n AND c.relam = %u" + "\n AND i.indrelid = %u", + BTREE_AM_OID, tbloid); + + ExecuteSqlStatement("RESET search_path"); + res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK); + PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));I don't think we need the search_path query. The main query doesn't have any
dependencies on it. Same is in check_indexes(), check_index (),
expand_table_name_patterns() & get_table_check_list().
Correct me if I am missing something.
Right.
+ output = PageOutput(lines + 2, NULL); + for (lineno = 0; usage_text[lineno]; lineno++) + fprintf(output, "%s\n", usage_text[lineno]); + fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT); + fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);I am not sure why we want PageOutput() if the second argument is always going to
be NULL? Can't we directly use printf() instead of PageOutput() + fprintf() ?
e.g. usage() function in pg_basebackup.c.
Done.
Please find attached the next version of the patch. In addition to your review comments (above), I have made changes in response to Peter and Robert's review comments upthread.
Attachments:
v14-0001-Adding-function-verify_heapam-to-amcheck-module.patchapplication/octet-stream; name=v14-0001-Adding-function-verify_heapam-to-amcheck-module.patch; x-unix-mode=0644Download
From 322ebe44a5e54c6ec3ede10aef7073aa91a1d1fb Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 19 Aug 2020 18:55:11 -0700
Subject: [PATCH v14 1/2] Adding function verify_heapam to amcheck module
Adding new function verify_heapam for checking a heap relation and
optionally its associated toast relation, if any.
---
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 27 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/amcheck.h | 5 +
contrib/amcheck/expected/check_heap.out | 146 +++
contrib/amcheck/sql/check_heap.sql | 100 ++
contrib/amcheck/t/001_verify_heapam.pl | 242 ++++
contrib/amcheck/verify_heapam.c | 1432 +++++++++++++++++++++++
doc/src/sgml/amcheck.sgml | 215 ++++
src/backend/access/heap/hio.c | 11 +
src/backend/access/transam/multixact.c | 19 +
src/include/access/multixact.h | 1 +
src/tools/pgindent/typedefs.list | 2 +
13 files changed, 2206 insertions(+), 3 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/amcheck.h
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/t/001_verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..b82f221e50 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..aa7c381ccd
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,27 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip cstring default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION
+verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/amcheck.h b/contrib/amcheck/amcheck.h
new file mode 100644
index 0000000000..74edfc2f65
--- /dev/null
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..c494b28198
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,146 @@
+CREATE TABLE heaptest (a integer, b text);
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+ERROR: invalid skip option
+HINT: Valid skip options are "all-visible", "all-frozen", and "none".
+-- Check that block range is reject for an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ERROR: starting block is out of bounds for relation with no blocks: 0
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+-- Check that an invalid block range is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 100000, endblock := 200000);
+ERROR: block range is out of bounds for relation with block count 1: 100000 .. 200000
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+create table test_partition partition of test_partitioned for values in (1);
+select * from verify_heapam('test_partition',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+insert into test_partitioned (a) (select 1 from generate_series(1,1000) gs);
+select * from verify_heapam('test_partition',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that indexes are rejected
+create index test_index on test_partition (a);
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+-- Check that views are rejected
+create view test_view as select 1;
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+-- Check that sequences are rejected
+create sequence test_sequence;
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+-- Check that foreign tables are rejected
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..30d00571f2
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,100 @@
+CREATE TABLE heaptest (a integer, b text);
+
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+
+-- Check that block range is reject for an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+
+-- Check that an invalid block range is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 100000, endblock := 200000);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+select * from verify_heapam('test_partitioned',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+create table test_partition partition of test_partitioned for values in (1);
+select * from verify_heapam('test_partition',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+insert into test_partitioned (a) (select 1 from generate_series(1,1000) gs);
+select * from verify_heapam('test_partition',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that indexes are rejected
+create index test_index on test_partition (a);
+select * from verify_heapam('test_index',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that views are rejected
+create view test_view as select 1;
+select * from verify_heapam('test_view',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that sequences are rejected
+create sequence test_sequence;
+select * from verify_heapam('test_sequence',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that foreign tables are rejected
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+select * from verify_heapam('test_foreign_table',
+ on_error_stop := false,
+ skip := NULL,
+ startblock := NULL,
+ endblock := NULL);
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
new file mode 100644
index 0000000000..7da8526198
--- /dev/null
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -0,0 +1,242 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 125;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#
+# Check a table with data loaded but no corruption, freezing, etc.
+#
+fresh_test_table('test');
+check_all_options_uncorrupted('test', 'plain');
+
+#
+# Check a corrupt table
+#
+fresh_test_table('test');
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "plain corrupted table");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-visible')",
+ "plain corrupted table skipping all-visible");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "plain corrupted table skipping all-frozen");
+detects_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "plain corrupted table skipping toast");
+detects_corruption(
+ "verify_heapam('test', startblock := 0, endblock := 0)",
+ "plain corrupted table checking only block zero");
+
+#
+# Check a corrupt table with all-frozen data
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "all-frozen corrupted table");
+detects_no_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "all-frozen corrupted table skipping all-frozen");
+
+#
+# Check a corrupt table with corrupt page header
+#
+fresh_test_table('test');
+corrupt_first_page_and_header('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "corrupted test table with bad page header");
+
+#
+# Check an uncorrupted table with corrupt toast page header
+#
+fresh_test_table('test');
+my $toast = get_toast_for('test');
+corrupt_first_page_and_header($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast page header checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast page header skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast page header");
+
+#
+# Check an uncorrupted table with corrupt toast
+#
+fresh_test_table('test');
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table");
+
+#
+# Check an uncorrupted all-frozen table with corrupt toast
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "all-frozen table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "all-frozen table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table of all-frozen table");
+
+# Returns the filesystem path for the named relation.
+sub relation_filepath
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the fully qualified name of the toast table for the named relation
+sub get_toast_for
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ SELECT 'pg_toast.' || t.relname
+ FROM pg_catalog.pg_class c, pg_catalog.pg_class t
+ WHERE c.relname = '$relname'
+ AND c.reltoastrelid = t.oid));
+}
+
+# (Re)create and populate a test table of the given name.
+sub fresh_test_table
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ DROP TABLE IF EXISTS $relname CASCADE;
+ CREATE TABLE $relname (a integer, b text);
+ ALTER TABLE $relname SET (autovacuum_enabled=false);
+ ALTER TABLE $relname ALTER b SET STORAGE external;
+ INSERT INTO $relname (a, b)
+ (SELECT gs, repeat('b',gs*10) FROM generate_series(1,1000) gs);
+ ));
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+sub corrupt_first_page_internal
+{
+ my ($relname, $corrupt_header) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+
+ # If we corrupt the header, postgres won't allow the page into the buffer.
+ syswrite($fh, '\xFF\xFF\xFF\xFF', 8) if ($corrupt_header);
+
+ # Corrupt at least the line pointers. Exactly what this corrupts will
+ # depend on the page, as it may run past the line pointers into the user
+ # data. We stop short of writing 2048 bytes (2k), the smallest supported
+ # page size, as we don't want to corrupt the next page.
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+sub corrupt_first_page
+{
+ corrupt_first_page_internal($_[0], undef);
+}
+
+sub corrupt_first_page_and_header
+{
+ corrupt_first_page_internal($_[0], 1);
+}
+
+sub detects_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) > 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+sub detects_no_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) = 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+# Check various options are stable (don't abort) and do not report corruption
+# when running verify_heapam on an uncorrupted test table.
+#
+# The relname *must* be an uncorrupted table, or this will fail.
+#
+# The prefix is used to identify the test, along with the options,
+# and should be unique.
+sub check_all_options_uncorrupted
+{
+ my ($relname, $prefix) = @_;
+ for my $stop (qw(NULL true false))
+ {
+ for my $check_toast (qw(NULL true false))
+ {
+ for my $skip ("'none'", "'all-frozen'", "'all-visible'")
+ {
+ for my $startblock (qw(NULL 0))
+ {
+ for my $endblock (qw(NULL 0))
+ {
+ my $opts = "on_error_stop := $stop, " .
+ "check_toast := $check_toast, " .
+ "skip := $skip, " .
+ "startblock := $startblock, " .
+ "endblock := $endblock";
+
+ detects_no_corruption(
+ "verify_heapam('$relname', $opts)",
+ "$prefix: $opts");
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..dbdf3d23f4
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1432 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/htup_details.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "access/xact.h"
+#include "catalog/pg_am.h"
+#include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "storage/smgr.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
+#include "utils/rel.h"
+#include "utils/snapmgr.h"
+#include "utils/syscache.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/* The number of columns in tuples returned by verify_heapam */
+#define HEAPCHECK_RELATION_COLS 4
+
+typedef enum XidCommitStatus
+{
+ XID_TOO_NEW,
+ XID_TOO_OLD,
+ XID_COMMITTED,
+ XID_IN_PROGRESS,
+ XID_ABORTED
+} XidCommitStatus;
+
+typedef enum SkipPages
+{
+ SKIP_PAGES_ALL_FROZEN,
+ SKIP_PAGES_ALL_VISIBLE,
+ SKIP_PAGES_NONE
+} SkipPages;
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam execution.
+ */
+typedef struct HeapCheckContext
+{
+ /*
+ * Cached copies of values from ShmemVariableCache and computed values from
+ * them.
+ */
+ FullTransactionId next_fxid; /* ShmemVariableCache->nextXid */
+ TransactionId next_xid; /* 32-bit version of next_fxid */
+ TransactionId oldest_xid; /* ShmemVariableCache->oldestXid */
+ FullTransactionId oldest_fxid; /* 64-bit version of oldest_xid, computed
+ * relative to next_fxid */
+
+ /*
+ * Cached copy of value from MultiXactState
+ */
+ MultiXactId next_mxact; /* MultiXactState->nextMXact */
+ MultiXactId oldest_mxact; /* MultiXactState->oldestMultiXactId */
+
+ /*
+ * Cached copies of the most recently checked xid and its status.
+ */
+ TransactionId cached_xid;
+ XidCommitStatus cached_status;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ FullTransactionId relfrozenfxid;
+ TransactionId relminmxid;
+ Relation toast_rel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Whether verify_heapam has yet encountered any corrupt tuples */
+ bool is_corrupt;
+
+ /* The descriptor and tuplestore for verify_heapam's result tuples */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+static void check_tuple(HeapCheckContext *ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx);
+
+static bool check_tuple_attribute(HeapCheckContext *ctx);
+static bool check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx);
+
+static void report_corruption(HeapCheckContext *ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+static bool xid_valid_in_rel(TransactionId xid, HeapCheckContext *ctx);
+static FullTransactionId FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx);
+static void update_cached_xid_range(HeapCheckContext *ctx);
+static void update_cached_mxid_range(HeapCheckContext *ctx);
+static XidCommitStatus get_xid_status(TransactionId xid, HeapCheckContext *ctx);
+
+/*
+ * Return whether the given FullTransactionId is within our cached valid
+ * transaction ID range.
+ */
+static inline bool
+fxid_in_cached_range(FullTransactionId fxid, const HeapCheckContext *ctx)
+{
+ return (FullTransactionIdPrecedesOrEquals(ctx->oldest_fxid, fxid) &&
+ FullTransactionIdPrecedes(fxid, ctx->next_fxid));
+}
+
+/*
+ * Scan and report corruption in heap pages, optionally reconciling toasted
+ * attributes with entries in the associated toast table. Intended to be
+ * called from SQL with the following parameters:
+ *
+ * relation
+ * The Oid of the heap relation to be checked.
+ *
+ * on_error_stop:
+ * Whether to stop at the end of the first page for which errors are
+ * detected. Note that multiple rows may be returned.
+ *
+ * check_toast:
+ * Whether to check each toasted attribute against the toast table to
+ * verify that it can be found there.
+ *
+ * skip:
+ * What kinds of pages in the heap relation should be skipped. Valid
+ * options are "all-visible", "all-frozen", and "none".
+ *
+ * Returns to the SQL caller a set of tuples, each containing the location
+ * and a description of a corruption found in the heap.
+ *
+ * Note that if check_toast is true, it is the caller's responsibility to
+ * provide that the toast table and index are not corrupt, and that they
+ * do not become corrupt while this function is running.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext old_context;
+ bool random_access;
+ HeapCheckContext ctx;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool check_toast;
+ SkipPages skip_option = SKIP_PAGES_NONE;
+ int64 start_block;
+ int64 end_block;
+
+ /* Check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("relation cannot be null")));
+ relid = PG_GETARG_OID(0);
+ on_error_stop = PG_ARGISNULL(1) ? false : PG_GETARG_BOOL(1);
+ check_toast = PG_ARGISNULL(2) ? false : PG_GETARG_BOOL(2);
+ if (!PG_ARGISNULL(3))
+ {
+ const char *skip = PG_GETARG_CSTRING(3);
+
+ if (pg_strcasecmp(skip, "all-visible") == 0)
+ skip_option = SKIP_PAGES_ALL_VISIBLE;
+ else if (pg_strcasecmp(skip, "all-frozen") == 0)
+ skip_option = SKIP_PAGES_ALL_FROZEN;
+ else if (pg_strcasecmp(skip, "none") == 0)
+ skip_option = SKIP_PAGES_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid skip option"),
+ errhint("Valid skip options are \"all-visible\", \"all-frozen\", and \"none\".")));
+ }
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+ ctx.cached_xid = InvalidTransactionId;
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ old_context = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ random_access = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(random_access, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+ MemoryContextSwitchTo(old_context);
+
+ /*
+ * Open the relation.
+ */
+ ctx.rel = relation_open(relid, AccessShareLock);
+ check_relation_relkind_and_relam(ctx.rel);
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+
+ /* Early exit if the relation is empty */
+ if (!ctx.nblocks)
+ {
+ /*
+ * For consistency, we need to enforce that the start_block and
+ * end_block are within the valid range if the user specified them.
+ * Yet, for an empty table with no blocks, no specified block can be
+ * in range.
+ */
+ if (!PG_ARGISNULL(4))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*------
+ translator: The integer value is the offset number of the
+ block. */
+ errmsg("starting block is out of bounds for relation with no blocks: " INT64_FORMAT,
+ PG_GETARG_INT64(4))));
+ if (!PG_ARGISNULL(5))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*------
+ translator: The integer value is the offset number of the
+ block. */
+ errmsg("ending block is out of bounds for relation with no blocks: " INT64_FORMAT,
+ PG_GETARG_INT64(5))));
+ relation_close(ctx.rel, AccessShareLock);
+ PG_RETURN_NULL();
+ }
+
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ /* If we get this far, we know the relation has at least one block */
+ start_block = PG_ARGISNULL(4) ? 0 : PG_GETARG_INT64(4);
+ end_block = PG_ARGISNULL(5) ? ((int64) ctx.nblocks) - 1 : PG_GETARG_INT64(5);
+ if (start_block < 0 || end_block >= ctx.nblocks || start_block > end_block)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*------
+ translator: The first integer value is the total number of
+ blocks in the relation. The second and third integer values
+ represent starting and ending block offsets. */
+ errmsg("block range is out of bounds for relation with block count %u: " INT64_FORMAT " .. " INT64_FORMAT,
+ ctx.nblocks, start_block, end_block)));
+
+ /*
+ * Optionally open the toast relation, if any, also protected from
+ * concurrent vacuums.
+ */
+ if (ctx.rel->rd_rel->reltoastrelid && check_toast)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toast_rel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ AccessShareLock);
+ offset = toast_open_indexes(ctx.toast_rel,
+ AccessShareLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /*
+ * Main relation has no associated toast relation, or we're
+ * intentionally skipping it
+ */
+ ctx.toast_rel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ update_cached_xid_range(&ctx);
+ update_cached_mxid_range(&ctx);
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relfrozenfxid = FullTransactionIdFromXidAndCtx(ctx.relfrozenxid, &ctx);
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldest_xid = ctx.relfrozenxid;
+
+ for (ctx.blkno = start_block; ctx.blkno <= end_block; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+ PageHeader ph;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_option != SKIP_PAGES_NONE)
+ {
+ bool all_frozen,
+ all_visible;
+
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE;
+ all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN;
+
+ if ((all_frozen && skip_option == SKIP_PAGES_ALL_FROZEN) ||
+ (all_visible && skip_option == SKIP_PAGES_ALL_VISIBLE))
+ {
+ continue;
+ }
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+ ph = (PageHeader) ctx.page;
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it
+ * redirects to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
+ ItemId rditem;
+
+ if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ {
+ report_corruption(&ctx,
+ /* translator: Both %u are offsets. */
+ psprintf(_("line pointer redirection to item at offset exceeding maximum: %u vs. %u"),
+ (unsigned) rdoffnum,
+ (unsigned) maxoff));
+ continue;
+ }
+ rditem = PageGetItemId(ctx.page, rdoffnum);
+ if (!ItemIdIsUsed(rditem))
+ report_corruption(&ctx,
+ /* translator: The %u is an offset. */
+ psprintf(_("line pointer redirection to unused item at offset: %u"),
+ (unsigned) rdoffnum));
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ AccessShareLock);
+ if (ctx.toast_rel)
+ table_close(ctx.toast_rel, AccessShareLock);
+
+ /* Close the main relation */
+ relation_close(ctx.rel, AccessShareLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Check that a relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ /* translator: %s is a user supplied object name */
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only heap AM is supported")));
+}
+
+/*
+ * Record a single corruption found in the table. The values in ctx should
+ * reflect the location of the corruption, and the msg argument should contain
+ * a human readable description of the corruption.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+report_corruption(HeapCheckContext *ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ values[2] = Int32GetDatum(ctx->attnum);
+ nulls[2] = (ctx->attnum < 0);
+ values[3] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using work_mem worth of memory building up the
+ * tuplestore. That's ok, but if we also leak the msg argument memory
+ * until the end of the query, we could exceed work_mem by more than a
+ * trivial amount. Therefore, free the msg argument each time we are
+ * called rather than waiting for our current memory context to be freed.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Construct the TupleDesc used to report messages about corruptions found
+ * while scanning the heap.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Return whether a transaction ID is in the cached valid range.
+ */
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext *ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldest_xid, xid) &&
+ TransactionIdPrecedesOrEquals(ctx->relfrozenxid, xid) &&
+ TransactionIdPrecedes(xid, ctx->next_xid));
+}
+
+/*
+ * Return wehter a multitransaction ID is in the cached valid range.
+ */
+static inline bool
+MxidInValidRange(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ return (MultiXactIdPrecedesOrEquals(ctx->relminmxid, mxid) &&
+ MultiXactIdPrecedesOrEquals(ctx->oldest_mxact, mxid) &&
+ MultiXactIdPrecedes(mxid, ctx->next_mxact));
+}
+
+/*
+ * Return whether the given transaction ID is (or was recently) valid to appear
+ * in the heap being checked.
+ *
+ * We cache the range of valid transaction IDs. If xid is in that range, we
+ * conclude that it is valid, even though concurrent changes to the table might
+ * invalidate it under certain corrupt conditions. (For example, if the
+ * table contains corrupt all frozen bits, a concurrent vacuum might skip the
+ * page(s) containing the xid and then truncate clog and advance the
+ * relfrozenxid beyond xid.) Reporting the xid as valid under such conditions
+ * seems acceptable, since if we had checked it earlier in our scan it would
+ * have truly been valid at that time, and we break no MVCC guarantees by
+ * failing to notice the concurrent change in its status.
+ */
+static bool
+xid_valid_in_rel(TransactionId xid, HeapCheckContext *ctx)
+{
+ /* Quick return for special xids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /* Quick return for xids within cached range */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ update_cached_xid_range(ctx);
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * Returns whether the given mxid is valid to appear in the heap being
+ * checked.
+ *
+ * This function attempts to return quickly by caching the known valid mxid
+ * range in ctx. Callers should already have performed the initial setup of
+ * the cache prior to the first call to this function.
+ */
+static bool
+mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ if (MxidInValidRange(mxid, ctx))
+ return true;
+
+ /* The range may have advanced. Recheck. */
+ update_cached_mxid_range(ctx);
+ if (MxidInValidRange(mxid, ctx))
+ return true;
+
+ return false;
+}
+
+/*
+ * Check for tuple header corruption and tuple visibility.
+ *
+ * Since we do not hold a snapshot, tuple visibility is not a question of
+ * whether we should be able to see the tuple relative to any particular
+ * snapshot, but rather a question of whether it is safe and reasonable to
+ * to check the tuple attributes.
+ *
+ * Some kinds of tuple header corruption make it unsafe to check the tuple
+ * attributes, for example when the tuple is foreshortened and such checks
+ * would read beyond the end of the line pointer (and perhaps the page). In
+ * such cases, we return false (not visible) after recording appropriate
+ * corruption messages.
+ *
+ * Some other kinds of tuple header corruption confuse the question of where
+ * the tuple attributes begin, or how long the nulls bitmap is, etc., making it
+ * unreasonable to attempt to check attributes, even if all candidate answers
+ * to those questions would not result in reading past the end of the line
+ * pointer or page. In such cases, like above, we record corruption messages
+ * about the header and then return false.
+ *
+ * Other kinds of tuple header corruption do not bare on the question of
+ * whether the tuple attributes can be checked, so we record corruption
+ * messages for them but do not base our visibility determination on them. (In
+ * other words, we do not return false merely because we detected them.)
+ *
+ * For visibility determination not specifically related to corruption, what we
+ * want to know is if a tuple is potentially visible to any running
+ * transaction. If you are tempted to replace this function's visibility logic
+ * with a call to another visibility checking function, keep in mind that this
+ * function does not update hint bits, as it seems imprudent to write hint bits
+ * (or anything at all) to a table during a corruption check. Nor does this
+ * function bother classifying tuple visibility beyond a boolean visible vs.
+ * not visible.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ *
+ * Returns whether the tuple is both visible and sufficiently sensible to
+ * undergo attribute checks.
+ */
+static bool
+check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+ bool header_garbled = false;
+ unsigned expected_hoff;;
+
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: First %u is the offset, second %u is the
+ total length. */
+ psprintf(_("data begins at offset beyond the tuple length: %u vs. %u"),
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ header_garbled = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ report_corruption(ctx,
+ pstrdup(_("updating transaction ID marked incompatibly as keys updated and locked only")));
+ header_garbled = true;
+ }
+
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ report_corruption(ctx,
+ pstrdup(_("updating transaction ID marked incompatibly as committed and as a multitransaction ID")));
+
+ /*
+ * This condition is clearly wrong, but we do not consider the header
+ * garbled, because we don't rely on this property for determining if
+ * the tuple is visible or for interpreting other relevant header
+ * fields.
+ */
+ }
+
+ if (infomask & HEAP_HASNULL)
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts));
+ else
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader);
+ if (ctx->tuphdr->t_hoff != expected_hoff)
+ {
+ if ((infomask & HEAP_HASNULL) && ctx->natts == 1)
+ report_corruption(ctx,
+ /* translator: Both %u represent an offset. */
+ psprintf(_("data offset differs from expected: %u vs. %u (1 attribute, has nulls)"),
+ ctx->tuphdr->t_hoff, expected_hoff));
+ else if ((infomask & HEAP_HASNULL))
+ report_corruption(ctx,
+ /*------
+ translator: First and second %u represent offsets,
+ third %u represents the number of attributes. */
+ psprintf(_("data offset differs from expected: %u vs. %u (%u attributes, has nulls)"),
+ ctx->tuphdr->t_hoff, expected_hoff, ctx->natts));
+ else if (ctx->natts == 1)
+ report_corruption(ctx,
+ /* translator: Both %u represent offsets. */
+ psprintf(_("data offset differs from expected: %u vs. %u (1 attribute, no nulls)"),
+ ctx->tuphdr->t_hoff, expected_hoff));
+ else
+ report_corruption(ctx,
+ /*------
+ translator: First and second %u represent an
+ offset, third %u represents the number of
+ attributes. */
+ psprintf(_("data offset differs from expected: %u vs. %u (%u attributes, no nulls)"),
+ ctx->tuphdr->t_hoff, expected_hoff, ctx->natts));
+ header_garbled = true;
+ }
+
+ if (header_garbled)
+ return false; /* checking of this tuple should not continue */
+
+ /*
+ * Ok, we can examine the header for tuple visibility purposes, though we
+ * still need to be careful about a few remaining types of header
+ * corruption. This logic roughly follows that of
+ * HeapTupleSatisfiesVacuum. Where possible the comments indicate which
+ * HTSV_Result we think that function might return for this tuple.
+ */
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ return false; /* HEAPTUPLE_DEAD */
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ XidCommitStatus xvac_status;
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (!TransactionIdIsValid(xvac))
+ {
+ report_corruption(ctx,
+ pstrdup(_("old-style VACUUM FULL transaction ID is invalid")));
+ return false; /* corrupt */
+ }
+
+ xvac_status = get_xid_status(xvac, ctx);
+ if (xvac_status == XID_IN_PROGRESS)
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (xvac_status == XID_TOO_NEW)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("old-style VACUUM FULL transaction ID is in the future: %u"),
+ xvac));
+ return false; /* corrupt */
+ }
+ if (xvac_status == XID_TOO_OLD)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("old-style VACUUM FULL transaction ID precedes freeze threshold: %u"),
+ xvac));
+ }
+ if (!xid_valid_in_rel(xvac, ctx))
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("old-style VACUUM FULL transaction ID is invalid in this relation: %u"),
+ xvac));
+ return false; /* corrupt */
+ }
+ if (xvac_status == XID_COMMITTED)
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else
+ {
+ XidCommitStatus raw_xmin_status;
+
+ if (!TransactionIdIsValid(raw_xmin))
+ {
+ report_corruption(ctx,
+ pstrdup(_("inserting transaction ID is invalid")));
+ return false;
+ }
+
+ raw_xmin_status = get_xid_status(raw_xmin, ctx);
+ if (raw_xmin_status == XID_IN_PROGRESS)
+ return true; /* insert or delete in progress */
+ if (raw_xmin_status != XID_COMMITTED)
+ return false; /* HEAPTUPLE_DEAD */
+ if (raw_xmin_status == XID_TOO_NEW)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("inserting transaction ID is in the future: %u"),
+ raw_xmin));
+ return false; /* corrupt */
+ }
+ if (raw_xmin_status == XID_TOO_OLD)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("inserting transaction ID precedes freeze threshold: %u"),
+ raw_xmin));
+ return false; /* corrupt */
+ }
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ XidCommitStatus xmax_status;
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ report_corruption(ctx,
+ pstrdup(_("updating transaction ID is invalid")));
+ return false; /* corrupt */
+ }
+
+ xmax_status = get_xid_status(xmax, ctx);
+ if (xmax_status == XID_IN_PROGRESS)
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (xmax_status == XID_COMMITTED)
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ if (xmax_status == XID_TOO_NEW)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("updating transaction ID is in the future: %u"),
+ xmax));
+ return false; /* corrupt */
+ }
+ if (xmax_status == XID_TOO_OLD)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("updating transaction ID precedes freeze threshold: %u"),
+ xmax));
+ return false; /* corrupt */
+ }
+
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS or
+ * HEAPTUPLE_LIVE */
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true; /* not dead */
+}
+
+/*
+ * Check the current toast tuple against the state tracked in ctx, recording
+ * any corruption found in ctx->tupstore.
+ *
+ * This is not equivalent to running verify_heapam on the toast table itself,
+ * and is not hardened against corruption of the toast table. Rather, when
+ * validating a toasted attribute in the main table, the sequence of toast
+ * tuples that store the toasted value are retrieved and checked in order, with
+ * each toast tuple being checked against where we are in the sequence, as well
+ * as each toast tuple having its varlena structure sanity checked.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ char *chunkdata;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk sequence number is null")));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk data is null")));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ {
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ chunkdata = VARDATA(chunk);
+ }
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ chunkdata = VARDATA_SHORT(chunk);
+ }
+ else
+ {
+ /* should never happen */
+ uint32 header = ((varattrib_4b *) chunk)->va_4byte.va_header;
+
+ report_corruption(ctx,
+ /*------
+ translator: %0x represents a bit pattern in
+ hexadecimal, %d represents the sequence number. */
+ psprintf(_("corrupt extended toast chunk has invalid varlena header: %0x (sequence number %d)"),
+ header, curchunk));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ report_corruption(ctx,
+ /* translator: Both %u represent sequence numbers. */
+ psprintf(_("toast chunk sequence number not the expected sequence number: %u vs. %u"),
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ report_corruption(ctx,
+ /* translator: Both %u represent sequence numbers. */
+ psprintf(_("toast chunk sequence number exceeds the end chunk sequence number: %u vs. %u"),
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ report_corruption(ctx,
+ /* translator: Both %u represent a chunk size. */
+ psprintf(_("toast chunk size differs from expected size: %u vs. %u"),
+ chunksize, expected_size));
+ return;
+ }
+}
+
+/*
+ * Check the current attribute as tracked in ctx, recording any corruption
+ * found in ctx->tupstore.
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in the
+ * case of a toasted value, optionally continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The checks
+ * performed by Asserts present in those two functions are also performed here.
+ * In cases where those two functions are a bit cavalier in their assumptions
+ * about data being correct, we perform additional checks not present in either
+ * of those two functions. Where some condition is checked in both of those
+ * functions, we perform it here twice, as we parallel the logical flow of
+ * those two functions. The presence of duplicate checks seems a reasonable
+ * price to pay for keeping this code tightly coupled with the code it
+ * protects.
+ *
+ * Returns true if the tuple attribute is sane enough for processing to
+ * continue on to the next attribute, false otherwise.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext *ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask;
+ Form_pg_attribute thisatt;
+
+ infomask = ctx->tuphdr->t_infomask;
+ thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: First %u represents an offset, second and
+ third %u represents a length. */
+ psprintf(_("attribute starts at offset beyond total tuple length: %u vs. %u (attribute length %u)"),
+ ctx->tuphdr->t_hoff + ctx->offset, ctx->lp_len,
+ thisatt->attlen));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: First %u represents an offset, second
+ and third %u represents a length. */
+ psprintf(_("attribute ends at offset beyond total tuple length: %u vs. %u (attribute length %u)"),
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len, thisatt->attlen));
+ return false;
+ }
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u represents an enumeration value. */
+ psprintf(_("toasted attribute has unexpected TOAST tag: %u"),
+ va_tag));
+ /* We can't know where the next attribute begins */
+ return false;
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: First %u represents an offset, second and
+ third %u represents a length. */
+ psprintf(_("attribute ends at offset beyond total tuple length: %u vs. %u (attribute length %u)"),
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len, thisatt->attlen));
+
+ return false;
+ }
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ report_corruption(ctx,
+ pstrdup(_("attribute is external but tuple header flag HEAP_HASEXTERNAL not set")));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ report_corruption(ctx,
+ pstrdup(_("attribute is external but relation has no toast relation")));
+ return true;
+ }
+
+ /* If we were told to skip toast checking, then we're done. */
+ if (ctx->toast_rel == NULL)
+ return true;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toast_rel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ ctx->chunkno++;
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ report_corruption(ctx,
+ /* translator: Both %u represent a chunk number. */
+ psprintf(_("final toast chunk number differs from expected value: %u vs. %u"),
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ report_corruption(ctx,
+ pstrdup(_("toasted value missing from toast table")));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * Check the current tuple as tracked in ctx, recording any corruption found in
+ * ctx->tupstore.
+ */
+static void
+check_tuple(HeapCheckContext *ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /*
+ * If we report corruption before iterating over individual attributes, we
+ * need attnum to be reported as NULL. Set that up before any corruption
+ * reporting might happen.
+ */
+ ctx->attnum = -1;
+
+ /*
+ * If the line pointer for this tuple does not reserve enough space for a
+ * complete tuple header, we dare not read the tuple header.
+ */
+ if (ctx->lp_len < MAXALIGN(SizeofHeapTupleHeader))
+ {
+ report_corruption(ctx,
+ /* translator: Both %u represent a size. */
+ psprintf(_("line pointer length is less than the minimum tuple header size: %u vs. %u"),
+ ctx->lp_len, (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ return;
+ }
+
+ /* If xmin is normal, it should be within valid range */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ report_corruption(ctx,
+ /* translator: Both %u are transaction IDs. */
+ psprintf(_("inserting transaction ID is from before freeze cutoff: %u vs. %u"),
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!xid_valid_in_rel(xmin, ctx))
+ {
+ report_corruption(ctx,
+ /* translator: %u is a transaction ID. */
+ psprintf(_("inserting transaction ID is in the future: %u"),
+ xmin));
+ fatal = true;
+ }
+ }
+
+ /* If xmax is a multixact, it should be within valid range */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if ((infomask & HEAP_XMAX_IS_MULTI) && !mxid_valid_in_rel(xmax, ctx))
+ {
+ if (MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ report_corruption(ctx,
+ /* translator: Both %u are multitransaction IDs. */
+ psprintf(_("multitransaction ID is from before relation cutoff: %u vs. %u"),
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+ else if (MultiXactIdPrecedes(xmax, ctx->oldest_mxact))
+ {
+ report_corruption(ctx,
+ /* translator: Both %u are multitransaction IDs. */
+ psprintf(_("multitransaction ID is from before cutoff: %u vs. %u"),
+ xmax, ctx->oldest_mxact));
+ fatal = true;
+ }
+ else if (MultiXactIdPrecedesOrEquals(ctx->next_mxact, xmax))
+ {
+ report_corruption(ctx,
+ /* translator: %u is a multitransaction ID. */
+ psprintf(_("multitransaction ID is in the future: %u"),
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /* If xmax is normal, it should be within valid range */
+ if (TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ report_corruption(ctx,
+ /* translator: Both %u are transaction IDs. */
+ psprintf(_("updating transaction ID is from before relation cutoff: %u vs. %u"),
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!xid_valid_in_rel(xmax, ctx))
+ {
+ report_corruption(ctx,
+ /* translator: %u is a transaction ID. */
+ psprintf(_("updating transaction ID is in the future: %u"),
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Check various forms of tuple header corruption. If the header is too
+ * corrupt to continue checking, or if the tuple is not visible to anyone,
+ * we cannot continue with other checks.
+ */
+ if (!check_tuple_header_and_visibilty(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * The tuple is visible, so it must be compatible with the current version
+ * of the relation descriptor. It might have fewer columns than are
+ * present in the relation descriptor, but it cannot have more.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ report_corruption(ctx,
+ /* translator: Both %u are a number. */
+ psprintf(_("number of attributes exceeds maximum expected for table: %u vs. %u"),
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
+ return;
+ }
+
+ /*
+ * Check each attribute unless we hit corruption that confuses what to do
+ * next, at which point we abort further attribute checks for this tuple.
+ * Note that we don't abort for all types of corruption, only for those
+ * types where we don't know how to continue.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ if (!check_tuple_attribute(ctx))
+ break; /* cannot continue */
+}
+
+/*
+ * Convert a TransactionId into a FullTransactionId using our cached values of
+ * the valid transaction ID range. It is the caller's responsibility to have
+ * already updated the cached values, if necessary.
+ */
+static FullTransactionId
+FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
+{
+ uint32 epoch;
+
+ if (!TransactionIdIsNormal(xid))
+ return FullTransactionIdFromEpochAndXid(0, xid);
+ epoch = EpochFromFullTransactionId(ctx->next_fxid);
+ if (xid > ctx->next_xid)
+ epoch--;
+ return FullTransactionIdFromEpochAndXid(epoch, xid);
+}
+
+/*
+ * Update our cached range of valid transaction IDs.
+ */
+static void
+update_cached_xid_range(HeapCheckContext *ctx)
+{
+ /* Make cached copies */
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ ctx->next_fxid = ShmemVariableCache->nextXid;
+ ctx->oldest_xid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+
+ /* And compute alternate versions of the same */
+ ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
+ ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+}
+
+/*
+ * Update our cached range of valid multitransaction IDs.
+ */
+static void
+update_cached_mxid_range(HeapCheckContext *ctx)
+{
+ ReadMultiXactIdRange(&ctx->oldest_mxact, &ctx->next_mxact);
+}
+
+/*
+ * Return the commit status for a TransactionId. The cached range of
+ * valid transaction IDs may be updated as a side effect.
+ */
+static XidCommitStatus
+get_xid_status(TransactionId xid, HeapCheckContext *ctx)
+{
+ XidCommitStatus result;
+ FullTransactionId fxid;
+ FullTransactionId clog_horizon;
+
+ Assert(TransactionIdIsValid(xid));
+
+ /* If we just checked this xid, return the cached status */
+ if (xid == ctx->cached_xid)
+ return ctx->cached_status;
+
+ /* Check if the xid is within bounds */
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ if (!fxid_in_cached_range(fxid, ctx))
+ {
+ /*
+ * We may have been checking against stale values. Update the cached
+ * range to be sure, and since we relied on the cached range when we
+ * performed the full xid conversion, reconvert.
+ */
+ update_cached_xid_range(ctx);
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+
+ if (FullTransactionIdPrecedesOrEquals(ctx->next_fxid, fxid))
+ {
+ ctx->cached_xid = xid;
+ ctx->cached_status = XID_TOO_NEW;
+ return XID_TOO_NEW;
+ }
+ if (FullTransactionIdPrecedes(fxid, ctx->oldest_fxid) ||
+ FullTransactionIdPrecedes(fxid, ctx->relfrozenfxid))
+ {
+ ctx->cached_xid = xid;
+ ctx->cached_status = XID_TOO_OLD;
+ return XID_TOO_OLD;
+ }
+ }
+
+ result = XID_COMMITTED;
+ LWLockAcquire(XactTruncationLock, LW_SHARED);
+ clog_horizon = FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid, ctx);
+ if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
+ {
+ if (TransactionIdIsCurrentTransactionId(xid))
+ result = XID_IN_PROGRESS;
+ else if (TransactionIdDidCommit(xid))
+ result = XID_COMMITTED;
+ else if (TransactionIdDidAbort(xid))
+ result = XID_ABORTED;
+ else
+ result = XID_IN_PROGRESS;
+ }
+ LWLockRelease(XactTruncationLock);
+ ctx->cached_xid = xid;
+ ctx->cached_status = result;
+ return result;
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a9df2c1a9d..159b1e63bf 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -187,6 +187,221 @@ SET client_min_messages = DEBUG1;
</para>
</tip>
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ check_toast boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks a table for structural corruption, where pages in the relation
+ contain data that is invalidly formatted, and for logical corruption,
+ where pages are structurally valid but inconsistent with the rest of the
+ database cluster. Example usage:
+<screen>
+test=# select * from verify_heapam('mytable', check_toast := true);
+ blkno | offnum | attnum | msg
+-------+--------+--------+---------------------------------------------------------------------------------------------------
+ 0 | 1 | | inserting transaction ID is from before freeze cutoff: 3 vs. 524
+ 0 | 2 | | inserting transaction ID is from before freeze cutoff: 4026531839 vs. 524
+ 0 | 3 | | updating transaction ID is from before relation cutoff: 4026531839 vs. 524
+ 0 | 4 | | data begins at offset beyond the tuple length: 152 vs. 58
+ 0 | 4 | | data offset differs from expected: 152 vs. 24 (3 attributes, no nulls)
+ 0 | 5 | | data offset differs from expected: 27 vs. 24 (3 attributes, no nulls)
+ 0 | 6 | | data offset differs from expected: 16 vs. 24 (3 attributes, no nulls)
+ 0 | 7 | | data offset differs from expected: 21 vs. 24 (3 attributes, no nulls)
+ 0 | 8 | | number of attributes exceeds maximum expected for table: 2047 vs. 3
+ 0 | 9 | | data offset differs from expected: 24 vs. 280 (2047 attributes, has nulls)
+ 0 | 10 | | number of attributes exceeds maximum expected for table: 67 vs. 3
+ 0 | 11 | 1 | attribute ends at offset beyond total tuple length: 416848000 vs. 58 (attribute length 4294967295)
+ 0 | 12 | 2 | final toast chunk number differs from expected value: 0 vs. 6
+ 0 | 12 | 2 | toasted value missing from toast table
+ 0 | 13 | | updating transaction ID marked incompatibly as keys updated and locked only
+ 0 | 14 | | multitransaction ID is from before relation cutoff: 0 vs. 1
+
+
+(16 rows)
+</screen>
+ As this example shows, the Tuple ID (TID) of the corrupt tuple is given
+ in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
+ for corruptions specific to a particular attribute in the tuple, the
+ <literal>attnum</literal> field shows which one.
+ </para>
+ <para>
+ Structural corruption can happen due to faulty storage hardware, or
+ relation files being overwritten or modified by unrelated software.
+ This kind of corruption can also be detected with
+ <link linkend="app-initdb-data-checksums"><application>data page
+ checksums</application></link>.
+ </para>
+ <para>
+ Relation pages which are correctly formatted, internally consistent, and
+ correct relative to their own internal checksums may still contain
+ logical corruption. As such, this kind of corruption cannot be detected
+ with <application>checksums</application>. Examples include toasted
+ values in the main table which lack a corresponding entry in the toast
+ table, and tuples in the main table with a Transaction ID that is older
+ than the oldest valid Transaction ID in the database or cluster.
+ </para>
+ <para>
+ Multiple causes of logical corruption have been observed in production
+ systems, including bugs in the <productname>PostgreSQL</productname>
+ server software, faulty and ill-conceived backup and restore tools, and
+ user error.
+ </para>
+ <para>
+ Corrupt relations are most concerning in live production environments,
+ precisely the same environments where high risk activities are least
+ welcome. For this reason, <function>verify_heapam</function> has been
+ designed to diagnose corruption without undue risk. It cannot guard
+ against all causes of backend crashes, as even executing the calling
+ query could be unsafe on a badly corrupted system. Access to <link
+ linkend="catalogs-overview">catalog tables</link> are performed and could
+ be problematic if the catalogs themselves are corrupted.
+ </para>
+ <para>
+ The design principle adhered to in <function>verify_heapam</function> is
+ that, if the rest of the system and server hardware are correct, under
+ default options, <function>verify_heapam</function> will not crash the
+ server due merely to structural or logical corruption in the target
+ table.
+ </para>
+ <para>
+ An experimental option, <literal>check_toast</literal>, exists to
+ reconcile the target table against entries in its corresponding toast
+ table. This option may change in future, is disabled by default, and is
+ known to be slow. It is also unsafe under some conditions. If the
+ target relation's corresponding toast table or toast index are corrupt,
+ reconciling the target table against toast values may be unsafe. If the
+ catalogs, toast table and toast index are uncorrupted, and remain so
+ during the check of the target table, reconciling the target table
+ against its toast table should be safe.
+ </para>
+ <para>
+ The following optional arguments are recognized:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>on_error_stop</term>
+ <listitem>
+ <para>
+ If true, corruption checking stops at the end of the first block on
+ which any corruptions are found.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>check_toast</term>
+ <listitem>
+ <para>
+ If this experimental option is true, toasted values are checked gainst
+ the corresponding TOAST table.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>skip</term>
+ <listitem>
+ <para>
+ If not <literal>none</literal>, corruption checking skips blocks that
+ are marked as all-visible or all-frozen, as given.
+ Valid options are <literal>all-visible</literal>,
+ <literal>all-frozen</literal> and <literal>none</literal>.
+ </para>
+ <para>
+ Defaults to <literal>none</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>startblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking begins at the specified block,
+ skipping all previous blocks. It is an error to specify a
+ <literal>startblock</literal> outside the range of blocks in the
+ target table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>endblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking ends at the specified block,
+ skipping all remaining blocks. It is an error to specify an
+ <literal>endblock</literal> outside the range of blocks in the target
+ table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ For each corruption detected, <function>verify_heapam</function> returns
+ a row with the following columns:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</sect2>
<sect2>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ca357410a2 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you disable one or both of these
+ * assertions, make corresponding changes there.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index b8bedca04a..5f0c622ad8 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -735,6 +735,25 @@ ReadNextMultiXactId(void)
return mxid;
}
+/*
+ * ReadMultiXactIdRange
+ * Get the range of IDs that may still be referenced by a relation.
+ */
+void
+ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next)
+{
+ LWLockAcquire(MultiXactGenLock, LW_SHARED);
+ *oldest = MultiXactState->oldestMultiXactId;
+ *next = MultiXactState->nextMXact;
+ LWLockRelease(MultiXactGenLock);
+
+ if (*oldest < FirstMultiXactId)
+ *oldest = FirstMultiXactId;
+ if (*next < FirstMultiXactId)
+ *next = FirstMultiXactId;
+}
+
+
/*
* MultiXactIdCreateFromMembers
* Make a new MultiXactId from the specified set of members
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 6d729008c6..f67f52057c 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -108,6 +108,7 @@ extern MultiXactId MultiXactIdCreateFromMembers(int nmembers,
MultiXactMember *members);
extern MultiXactId ReadNextMultiXactId(void);
+extern void ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next);
extern bool MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly);
extern void MultiXactIdSetOldestMember(void);
extern int GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **xids,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3d990463ce..c48d453793 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1019,6 +1019,7 @@ HbaToken
HeadlineJsonState
HeadlineParsedText
HeadlineWordEntry
+HeapCheckContext
HeapScanDesc
HeapTuple
HeapTupleData
@@ -2282,6 +2283,7 @@ SimpleStringList
SimpleStringListCell
SingleBoundSortItem
Size
+SkipPages
SlabBlock
SlabChunk
SlabContext
--
2.21.1 (Apple Git-122.3)
v14-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v14-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 6383bd9ec2bf0760b823cca5911e39be4aa043ea Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 19 Aug 2020 18:56:14 -0700
Subject: [PATCH v14 2/2] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 1280 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.control | 5 +
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 60 +
contrib/pg_amcheck/t/003_check.pl | 231 ++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 426 +++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 229 ++++
src/tools/pgindent/typedefs.list | 2 +
14 files changed, 2328 insertions(+)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.control
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..ed5589d97b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..1e725112e3
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1280 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "common/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --strict-names require include patterns to match at least one entity each",
+ " -o, --on-error-stop stop checking at end of first corrupt page",
+ "",
+ "Schema checking options:",
+ " -n, --schema=PATTERN check relations in the specified schema(s) only",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)",
+ "",
+ "Table checking options:",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -b, --startblock begin checking table(s) at the given starting block number",
+ " -e, --endblock check table(s) only up to the given ending block number",
+ " -f, --skip-all-frozen do NOT check blocks marked as all frozen",
+ " -v, --skip-all-visible do NOT check blocks marked as all visible",
+ "",
+ "TOAST table checking options:",
+ " -z, --check-toast check associated toast tables and toast indexes",
+ " -Z, --skip-toast do NOT check associated toast tables and toast indexes",
+ " -B, --toast-startblock begin checking toast table(s) at the given starting block",
+ " -E, --toast-endblock check toast table(s) only up to the given ending block",
+ "",
+ "Index checking options:",
+ " -x, --check-indexes check btree indexes associated with tables being checked",
+ " -X, --skip-indexes do NOT check any btree indexes",
+ " -i, --index=PATTERN check the specified index(es) only",
+ " -I, --exclude-index=PATTERN do NOT check the specified index(es)",
+ " -c, --check-corrupt check indexes even if their associated table is corrupt",
+ " -C, --skip-corrupt do NOT check indexes if their associated table is corrupt",
+ " -a, --heapallindexed check index tuples against the table tuples",
+ " -A, --no-heapallindexed do NOT check index tuples against the table tuples",
+ " -r, --rootdescend search from the root page for each index tuple",
+ " -R, --no-rootdescend do NOT search from the root page for each index tuple",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+AmCheckSettings
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes */
+ bool check_toast; /* Check associated toast tables and indexes */
+ bool check_corrupt; /* Check indexes even if table is corrupt */
+ bool heapallindexed; /* Perform index to table reconciling checks */
+ bool rootdescend; /* Perform index rootdescend checks */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends, inclusive */
+ char *toaststart; /* Block number where toast checking begins */
+ char *toastend; /* Block number where toast checking ends,
+ * inclusive */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of Oids of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+static SimpleStringList index_include_patterns = {NULL, NULL};
+static SimpleOidList index_include_oids = {NULL, NULL};
+static SimpleStringList index_exclude_patterns = {NULL, NULL};
+static SimpleOidList index_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+/*
+ * Strings to be constructed once upon first use. These could be made
+ * string constants instead, but that would require embedding knowledge
+ * of the single character values for each relkind, such as 'm' for
+ * materialized views, which we'd rather not embed here.
+ */
+static char *table_relkind_quals = NULL;
+static char *index_relkind_quals = NULL;
+
+/*
+ * Functions to get pointers to the two strings, above, after initializing
+ * them upon the first call to the function.
+ */
+static const char *get_table_relkind_quals(void);
+static const char *get_index_relkind_quals(void);
+
+/*
+ * Functions for running the various corruption checks.
+ */
+static void check_tables(SimpleOidList *checklist);
+static uint64 check_toast(Oid tbloid);
+static uint64 check_table(Oid tbloid, const char *startblock,
+ const char *endblock, bool on_error_stop,
+ bool check_toast);
+static uint64 check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids);
+static uint64 check_index(const char *idxoid, const char *idxname,
+ const char *tblname);
+
+/*
+ * Functions implementing standard command line behaviors.
+ */
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+static void NoticeProcessor(void *arg, const char *message);
+
+/*
+ * Functions for converting command line options that include or exclude
+ * schemas, tables, and indexes by pattern into internally useful lists of
+ * Oids for objects that match those patterns.
+ */
+static void expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals);
+static void expand_table_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_index_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(const SimpleOidList *include_nsp,
+ const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl,
+ const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+static PGresult *ExecuteSqlQuery(const char *query, char **error);
+static PGresult *ExecuteSqlQueryOrDie(const char *query);
+
+static void append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids);
+static void apply_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids, bool include);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+#define NOPAGER 0
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ /*
+ * Default behaviors for user settable options. Note that these default to
+ * doing all the safe checks and none of the unsafe ones, on the theory
+ * that if a user says "pg_amcheck mydb" without specifying any additional
+ * options, we should check everything we know how to check without risking
+ * any backend aborts.
+ */
+
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+
+ /* Index checking options */
+ settings.check_indexes = false;
+ settings.check_corrupt = false;
+ settings.heapallindexed = false;
+ settings.rootdescend = false;
+
+ /*
+ * Reconciling toasted attributes from the main table against the toast
+ * table can crash the backend if the toast table or index are corrupt. We
+ * can optionally check the toast table and then the toast index prior to
+ * checking the main table, but if the toast table or index are
+ * concurrently corrupted after we conclude they are valid, the check of
+ * the main table can crash the backend. The oneous is on any caller who
+ * enables this option to make certain the environment is sufficiently
+ * stable that concurrent corruptions of the toast is not possible.
+ */
+ settings.check_toast = false;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /*
+ * Expand schema, table, and index exclusion patterns, if any. Note that
+ * non-matching exclusion patterns are not an error, even when
+ * --strict-names was specified.
+ */
+ expand_schema_name_patterns(&schema_exclude_patterns, NULL,
+ &schema_exclude_oids, false);
+ expand_table_name_patterns(&table_exclude_patterns, NULL, NULL,
+ &table_exclude_oids, false);
+ expand_index_name_patterns(&index_exclude_patterns, NULL, NULL,
+ &index_exclude_oids, false);
+
+ /* Expand schema selection patterns into Oid lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_exclude_oids,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+
+ /* Expand table selection patterns into Oid lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &schema_exclude_oids,
+ &table_exclude_oids,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+
+ /* Expand index selection patterns into Oid lists */
+ if (index_include_patterns.head != NULL)
+ {
+ expand_index_name_patterns(&index_include_patterns,
+ &schema_exclude_oids,
+ &index_exclude_oids,
+ &index_include_oids,
+ settings.strict_names);
+ if (index_include_oids.head == NULL)
+ fatal("no matching indexes were found");
+ }
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ /*
+ * All information about corrupt indexes are returned via ereport, not as
+ * tuples. We want all the details to report if corruption exists.
+ */
+ PQsetErrorVerbosity(settings.db, PQERRORS_VERBOSE);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+/*
+ * Conditionally add a restriction to a query such that lval must be an Oid in
+ * the given list of Oids, except that for a null or empty oids list argument,
+ * no filtering is done and we return without having modified the query buffer.
+ *
+ * The query argument must already have begun the WHERE clause and must be in a
+ * state where we can append an AND clause. No checking of this requirement is
+ * done here.
+ *
+ * On return, the query buffer will be extended with an AND clause that filters
+ * only those rows where the lval is an Oid present in the given list of oids.
+ */
+static inline void
+include_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, true);
+}
+
+/*
+ * Same as include_filter, above, except that for a non-null, non-empty oids
+ * list, the lval is restricted to not be any of the values in the list.
+ */
+static inline void
+exclude_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, false);
+}
+
+/*
+ * Check each table from the given checklist per the user specified options.
+ */
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ uint64 corruptions = 0;
+ bool reconcile_toast;
+
+ /*
+ * If we skip checking the toast table, or if during the check we
+ * detect any toast table corruption, the main table checks below must
+ * not reconcile toasted attributes against the toast table, as such
+ * accesses to the toast table might crash the backend. Instead, skip
+ * such reconciliations for this table.
+ *
+ * This protection contains a race condition; the toast table or index
+ * could become corrupted concurrently with our checks, but prevention
+ * of such concurrent corruption is documented as the caller's
+ * reponsibility, so we don't worry about it here.
+ */
+ reconcile_toast = false;
+ if (settings.check_toast)
+ {
+ if (check_toast(cell->val) == 0)
+ reconcile_toast = true;
+ }
+
+ corruptions = check_table(cell->val,
+ settings.startblock,
+ settings.endblock,
+ settings.on_error_stop,
+ reconcile_toast);
+
+ if (settings.check_indexes)
+ {
+ bool old_heapallindexed;
+
+ /* Optionally skip the index checks for a corrupt table. */
+ if (corruptions && !settings.check_corrupt)
+ continue;
+
+ /*
+ * The btree checking logic which optionally checks the contents of
+ * an index against the corresponding table has not yet been
+ * sufficiently hardened against corrupt tables. In particular,
+ * when called with heapallindexed true, it segfaults if the file
+ * backing the table relation has been erroneously unlinked. In
+ * any event, it seems unwise to reconcile an index against its
+ * table when we already know the table is corrupt.
+ */
+ old_heapallindexed = settings.heapallindexed;
+ if (corruptions)
+ settings.heapallindexed = false;
+
+ corruptions += check_indexes(cell->val,
+ &index_include_oids,
+ &index_exclude_oids);
+
+ settings.heapallindexed = old_heapallindexed;
+ }
+ }
+}
+
+/*
+ * For a given main table relation, returns the associated toast table,
+ * or InvalidOid if none exists.
+ */
+static Oid
+get_toast_oid(Oid tbloid)
+{
+ PQExpBuffer querybuf = createPQExpBuffer();
+ PGresult *res;
+ char *error = NULL;
+ Oid result = InvalidOid;
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.reltoastrelid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid);
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ result = atooid(PQgetvalue(res, 0, 0));
+ else if (error)
+ die_on_query_failure(querybuf->data);
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return result;
+}
+
+/*
+ * For the given main table relation, checks the associated toast table and
+ * index, in any. This should be performed *before* checking the main table
+ * relation, as the checks inside verify_heapam assume both the toast table and
+ * toast index are usable.
+ *
+ * Returns the number of corruptions detected.
+ */
+static uint64
+check_toast(Oid tbloid)
+{
+ Oid toastoid;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_toast");
+
+ toastoid = get_toast_oid(tbloid);
+ if (OidIsValid(toastoid))
+ {
+ corruption_cnt = check_table(toastoid, settings.toaststart,
+ settings.toastend, settings.on_error_stop,
+ false);
+ /*
+ * If the toast table is corrupt, checking the index is not safe.
+ * There is a race condition here, as the toast table could be
+ * concurrently corrupted, but preventing concurrent corruption is the
+ * caller's responsibility, not ours.
+ */
+ if (corruption_cnt == 0)
+ corruption_cnt += check_indexes(toastoid, NULL, NULL);
+ }
+
+ return corruption_cnt;
+}
+
+/*
+ * Checks the given table for corruption, returning the number of corruptions
+ * detected and printed to the user.
+ */
+static uint64
+check_table(Oid tbloid, const char *startblock, const char *endblock,
+ bool on_error_stop, bool check_toast)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *skip;
+ char *toast;
+ const char *stop;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_table");
+
+ if (startblock == NULL)
+ startblock = "NULL";
+ if (endblock == NULL)
+ endblock = "NULL";
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("NULL");
+ stop = (on_error_stop) ? "true" : "false";
+ toast = (check_toast) ? "true" : "false";
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg"
+ "\nFROM public.verify_heapam("
+ "\nrelation := %u,"
+ "\non_error_stop := %s,"
+ "\nskip := %s,"
+ "\ncheck_toast := %s,"
+ "\nstartblock := %s,"
+ "\nendblock := %s) v, "
+ "\npg_catalog.pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, toast, startblock, endblock, tbloid);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ {
+ corruption_cnt += PQntuples(res);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ printf("(relname=%s,blkno=%s,offnum=%s,attnum=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ }
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+ return corruption_cnt;
+}
+
+static uint64
+check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_indexes");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT i.indexrelid, ci.relname, ct.relname"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class ci, "
+ "pg_catalog.pg_class ct"
+ "\nWHERE i.indexrelid = ci.oid"
+ "\n AND i.indrelid = ct.oid"
+ "\n AND ci.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+ include_filter(querybuf, "i.indexrelid", include_oids);
+ exclude_filter(querybuf, "i.indexrelid", exclude_oids);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ for (i = 0; i < PQntuples(res); i++)
+ corruption_cnt += check_index(PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2));
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static uint64
+check_index(const char *idxoid, const char *idxname, const char *tblname)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_index");
+ if (idxname == NULL)
+ fatal("no index name on entry to check_index");
+ if (tblname == NULL)
+ fatal("no table name on entry to check_index");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT public.bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
+ res = PQexec(settings.db, querybuf->data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("index check failed for index %s of table %s:\n",
+ idxname, tblname);
+ printf("%s", PQerrorMessage(settings.db));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"check-corrupt", no_argument, NULL, 'c'},
+ {"check-indexes", no_argument, NULL, 'x'},
+ {"check-toast", no_argument, NULL, 'z'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"heapallindexed", no_argument, NULL, 'a'},
+ {"help", optional_argument, NULL, '?'},
+ {"host", required_argument, NULL, 'h'},
+ {"index", required_argument, NULL, 'i'},
+ {"no-heapallindexed", no_argument, NULL, 'A'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"no-rootdescend", no_argument, NULL, 'R'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"password", no_argument, NULL, 'W'},
+ {"port", required_argument, NULL, 'p'},
+ {"rootdescend", no_argument, NULL, 'r'},
+ {"schema", required_argument, NULL, 'n'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-corrupt", no_argument, NULL, 'C'},
+ {"skip-indexes", no_argument, NULL, 'X'},
+ {"skip-toast", no_argument, NULL, 'Z'},
+ {"startblock", required_argument, NULL, 'b'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"toast-endblock", required_argument, NULL, 'E'},
+ {"toast-startblock", required_argument, NULL, 'B'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "aAb:B:cCd:e:E:fh:i:I:n:N:op:rRst:T:U:vVwWxXzZ?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'a':
+ settings.heapallindexed = true;
+ break;
+ case 'A':
+ settings.heapallindexed = false;
+ break;
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'B':
+ settings.toaststart = pg_strdup(optarg);
+ break;
+ case 'c':
+ settings.check_corrupt = true;
+ break;
+ case 'C':
+ settings.check_corrupt = false;
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'E':
+ settings.toastend = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ simple_string_list_append(&index_include_patterns, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&index_exclude_patterns, optarg);
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 'r':
+ settings.rootdescend = true;
+ break;
+ case 'R':
+ settings.rootdescend = false;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case 'x':
+ settings.check_indexes = true;
+ break;
+ case 'X':
+ settings.check_indexes = false;
+ break;
+ case 'z':
+ settings.check_toast = true;
+ break;
+ case 'Z':
+ settings.check_toast = false;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ int lineno;
+
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ printf("%s\n", usage_text[lineno]);
+ printf("Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Helper function for apply_filter, below.
+ */
+static void
+append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(querybuf, "%s%u", comma, cell->val);
+}
+
+/*
+ * Internal implementation of include_filter and exclude_filter
+ */
+static void
+apply_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids,
+ bool include)
+{
+ if (!oids || !oids->head)
+ return;
+ if (include)
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.=) ANY(array[", lval);
+ else
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.!=) ALL(array[", lval);
+ append_csv_oids(querybuf, oids);
+ appendPQExpBuffer(querybuf, "]::OID[])");
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all schemas matching the
+ * given list of patterns but not included in the given list of excluded Oids.
+ */
+static void
+expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the Oid list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(querybuf,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, querybuf, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all relations matching the
+ * given list of patterns but not included in the given list of excluded Oids
+ * nor in one of the given excluded namespaces. The relations are filtered by
+ * the given schema_quals. They are further filtered by the given
+ * relkind_quals, allowing the caller to restrict the relations to just indexes
+ * or tables. The missing_errtext should be a message for use in error
+ * messages if no matching relations are found and strict_names was specified.
+ */
+static void
+expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_relkind_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the Oid list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) %s\n",
+ relkind_quals);
+ exclude_filter(querybuf, "c.oid", exclude_oids);
+ exclude_filter(querybuf, "n.oid", exclude_nsp_oids);
+ processSQLNamePattern(settings.db, querybuf, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("%s \"%s\"", missing_errtext, cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find the Oids of all tables matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_table_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching tables were found for pattern",
+ get_table_relkind_quals());
+}
+
+/*
+ * Find the Oids of all indexes matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_index_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching indexes were found for pattern",
+ get_index_relkind_quals());
+}
+
+static void
+get_table_check_list(const SimpleOidList *include_nsp, const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl, const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to get_table_check_list");
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c, pg_catalog.pg_namespace n"
+ "\nWHERE n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\n AND c.relkind OPERATOR(pg_catalog.=) %s\n",
+ get_table_relkind_quals());
+ include_filter(querybuf, "n.oid", include_nsp);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+ include_filter(querybuf, "c.oid", include_tbl);
+ exclude_filter(querybuf, "c.oid", exclude_tbl);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+}
+
+static PGresult *
+ExecuteSqlQueryOrDie(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute the given SQL query. This function should only be used for queries
+ * which are not expected to fail under normal circumstances, as failures will
+ * result in the printing of error messages, which will look a bit messy when
+ * interleaved with corruption reports.
+ *
+ * On error, use the supplied error_context string and the error string
+ * returned from the database connection to print an error message for the
+ * user.
+ *
+ * The error_context argument is pfree'd by us at the end of the call.
+ */
+static PGresult *
+ExecuteSqlQuery(const char *query, char **error)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ *error = pstrdup(PQerrorMessage(settings.db));
+ return res;
+}
+
+/*
+ * Return the cached relkind quals string for tables, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_table_relkind_quals(void)
+{
+ if (!table_relkind_quals)
+ table_relkind_quals = psprintf("ANY(array['%c', '%c', '%c'])",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ return table_relkind_quals;
+}
+
+/*
+ * Return the cached relkind quals string for indexes, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_index_relkind_quals(void)
+{
+ if (!index_relkind_quals)
+ index_relkind_quals = psprintf("'%c'", RELKIND_INDEX);
+ return index_relkind_quals;
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.control b/contrib/pg_amcheck/pg_amcheck.control
new file mode 100644
index 0000000000..395f368101
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.control
@@ -0,0 +1,5 @@
+# pg_amcheck extension
+comment = 'command-line tool for verifying relation integrity'
+default_version = '1.3'
+module_pathname = '$libdir/pg_amcheck'
+relocatable = true
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..68be9c6585
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,60 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-i', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching indexes were found for pattern\E/,
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..4d8e61d871
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,231 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 39;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($)
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the node is running.
+sub corrupt_first_page($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+-- We'll corrupt all indexes in s1
+CREATE SCHEMA s1;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s1.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables in s2
+CREATE SCHEMA s2;
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+INSERT INTO s2.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s2.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables and indexes in s3
+CREATE SCHEMA s3;
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+INSERT INTO s3.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s3.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll leave everything in s4 uncorrupted
+CREATE SCHEMA s4;
+CREATE TABLE s4.t1 (a TEXT);
+CREATE TABLE s4.t2 (a TEXT);
+CREATE INDEX i1 ON s4.t1(a);
+CREATE INDEX i2 ON s4.t2(a);
+INSERT INTO s4.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s4.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('s1.i1');
+corrupt_first_page('s1.i2');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('s2.t1');
+corrupt_first_page('s2.t2');
+
+# Corrupt tables and indexes in schema "s3"
+remove_relation_file('s3.i1');
+corrupt_first_page('s3.i2');
+remove_relation_file('s3.t1');
+corrupt_first_page('s3.t2');
+
+# Leave schema "s4" alone
+
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's*', '-t', 't1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-N', 's1', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s1.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+
+# In schema s3, the tables and indexes are both corrupt. Ordinarily, checking
+# of indexes will not be performed for corrupt tables, but the --check-corrupt
+# option (-c) forces the indexes to also be checked.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s3.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s3.s2 reports index corruption');
+
+
+# Check that '-x' and '-X' work as expected. Since only index corruption
+# (and not table corruption) exists in s1, '-X' should give no errors, and
+# '-x' should give errors about index corruption.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck over tables and indexes in schema s1 reports corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over only tables in schema s1 reports no corruption');
+
+
+# Check that table corruption is reported as expected, with or without
+# index checking
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables in schema s2 reports table corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables and indexes in schema s2 reports table corruption');
+
+# Check that no corruption is reported in schema s4
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s4 reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt schemas
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-N', 's2', '-N', 's3' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt schemas reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt tables
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt tables reports no corruption');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..da1a48f747
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,426 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 39;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 14;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ elsif ($tupidx == 12)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+ }
+ elsif ($tupidx == 13)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', on_error_stop := false, skip := NULL, check_toast := true, startblock := NULL, endblock := NULL)));
+is ($result,
+"0|1||inserting transaction ID is from before freeze cutoff: 3 vs. $relfrozenxid
+0|2||inserting transaction ID is from before freeze cutoff: 4026531839 vs. $relfrozenxid
+0|3||updating transaction ID is from before relation cutoff: 4026531839 vs. $relfrozenxid
+0|4||data begins at offset beyond the tuple length: 152 vs. 58
+0|4||data offset differs from expected: 152 vs. 24 (3 attributes, no nulls)
+0|5||data offset differs from expected: 27 vs. 24 (3 attributes, no nulls)
+0|6||data offset differs from expected: 16 vs. 24 (3 attributes, no nulls)
+0|7||data offset differs from expected: 21 vs. 24 (3 attributes, no nulls)
+0|8||number of attributes exceeds maximum expected for table: 2047 vs. 3
+0|9||data offset differs from expected: 24 vs. 280 (2047 attributes, has nulls)
+0|10||number of attributes exceeds maximum expected for table: 67 vs. 3
+0|11|1|attribute ends at offset beyond total tuple length: 416848000 vs. 58 (attribute length 4294967295)
+0|12|2|final toast chunk number differs from expected value: 0 vs. 6
+0|12|2|toasted value missing from toast table
+0|13||updating transaction ID marked incompatibly as keys updated and locked only
+0|14||multitransaction ID is from before relation cutoff: 0 vs. 1",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,attnum=\d*/,
+
+ # individual detected corruptions
+ qr/attribute ends at offset beyond total tuple length: \d+ vs. \d+ \(attribute length \d+\)/,
+ qr/data begins at offset beyond the tuple length: \d+ vs. \d+/,
+ qr/data offset differs from expected: \d+ vs. \d+ \(\d+ attributes, has nulls\)/,
+ qr/data offset differs from expected: \d+ vs. \d+ \(\d+ attributes, no nulls\)/,
+ qr/final toast chunk number differs from expected value: \d+ vs. \d+/,
+ qr/inserting transaction ID is from before freeze cutoff: \d+ vs. \d+/,
+ qr/multitransaction ID is from before relation cutoff: \d+ vs. \d+/,
+ qr/number of attributes exceeds maximum expected for table: \d+ vs. \d+/,
+ qr/toasted value missing from toast table/,
+ qr/updating transaction ID is from before relation cutoff: \d+ vs. \d+/,
+ qr/updating transaction ID marked incompatibly as keys updated and locked only/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..fdbb1ea402
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..4babcbb39c 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pgamcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..2e8588b879 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..affe2abf8e
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,229 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+<synopsis>
+pg_amcheck [OPTION]... [DBNAME [USERNAME]]
+ General options:
+ -V, --version output version information, then exit
+ -?, --help show this help, then exit
+ -s, --strict-names require include patterns to match at least one entity each
+ -o, --on-error-stop stop checking at end of first corrupt page
+
+ Schema checking options:
+ -n, --schema=PATTERN check relations in the specified schema(s) only
+ -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)
+
+ Table checking options:
+ -t, --table=PATTERN check the specified table(s) only
+ -T, --exclude-table=PATTERN do NOT check the specified table(s)
+ -b, --startblock begin checking table(s) at the given starting block number
+ -e, --endblock check table(s) only up to the given ending block number
+ -f, --skip-all-frozen do NOT check blocks marked as all frozen
+ -v, --skip-all-visible do NOT check blocks marked as all visible
+
+ TOAST table checking options:
+ -z, --check-toast check associated toast tables and toast indexes
+ -Z, --skip-toast do NOT check associated toast tables and toast indexes
+ -B, --toast-startblock begin checking toast table(s) at the given starting block
+ -E, --toast-endblock check toast table(s) only up to the given ending block
+
+ Index checking options:
+ -x, --check-indexes check btree indexes associated with tables being checked
+ -X, --skip-indexes do NOT check any btree indexes
+ -i, --index=PATTERN check the specified index(es) only
+ -I, --exclude-index=PATTERN do NOT check the specified index(es)
+ -c, --check-corrupt check indexes even if their associated table is corrupt
+ -C, --skip-corrupt do NOT check indexes if their associated table is corrupt
+ -a, --heapallindexed check index tuples against the table tuples
+ -A, --no-heapallindexed do NOT check index tuples against the table tuples
+ -r, --rootdescend search from the root page for each index tuple
+ -R, --no-rootdescend do NOT search from the root page for each index tuple
+
+ Connection options:
+ -d, --dbname=DBNAME database name to connect to
+ -h, --host=HOSTNAME database server host or socket directory
+ -p, --port=PORT database server port
+ -U, --username=USERNAME database user name
+ -w, --no-password never prompt for password
+ -W, --password force password prompt (should happen automatically)
+</synopsis>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ For table corruption, each detected corruption is reported on two lines, the
+ first line shows the location and the second line shows a message describing
+ the problem.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "mytable",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --skip-indexes mydb
+(relname=mytable,blkno=0,offnum=1,attnum=)
+inserting transaction ID is from before freeze cutoff: 3 vs. 524
+(relname=mytable,blkno=0,offnum=2,attnum=)
+inserting transaction ID is from before freeze cutoff: 4026531839 vs. 524
+(relname=mytable,blkno=0,offnum=3,attnum=)
+updating transaction ID is from before relation cutoff: 4026531839 vs. 524
+(relname=mytable,blkno=0,offnum=4,attnum=)
+data begins at offset beyond the tuple length: 152 vs. 58
+(relname=mytable,blkno=0,offnum=4,attnum=)
+data offset differs from expected: 152 vs. 24 (3 attributes, no nulls)
+(relname=mytable,blkno=0,offnum=5,attnum=)
+data offset differs from expected: 27 vs. 24 (3 attributes, no nulls)
+(relname=mytable,blkno=0,offnum=6,attnum=)
+data offset differs from expected: 16 vs. 24 (3 attributes, no nulls)
+(relname=mytable,blkno=0,offnum=7,attnum=)
+data offset differs from expected: 21 vs. 24 (3 attributes, no nulls)
+(relname=mytable,blkno=0,offnum=8,attnum=)
+number of attributes exceeds maximum expected for table: 2047 vs. 3
+(relname=mytable,blkno=0,offnum=9,attnum=)
+data offset differs from expected: 24 vs. 280 (2047 attributes, has nulls)
+(relname=mytable,blkno=0,offnum=10,attnum=)
+number of attributes exceeds maximum expected for table: 67 vs. 3
+(relname=mytable,blkno=0,offnum=11,attnum=1)
+attribute ends at offset beyond total tuple length: 416848000 vs. 58 (attribute length 4294967295)
+(relname=mytable,blkno=0,offnum=12,attnum=2)
+final toast chunk number differs from expected value: 0 vs. 6
+(relname=mytable,blkno=0,offnum=12,attnum=2)
+toasted value missing from toast table
+(relname=mytable,blkno=0,offnum=13,attnum=)
+updating transaction ID marked incompatibly as keys updated and locked only
+(relname=mytable,blkno=0,offnum=14,attnum=)
+multitransaction ID is from before relation cutoff: 0 vs. 1
+</screen>
+
+ <para>
+ For index corruption, the output is more free-form, and may span more than
+ one line per corruption detected.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt index,
+ "corrupt_index", with corruption in the page header, along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index "corrupt_index" is not a btree
+LOCATION: _bt_getmeta, nbtpage.c:152
+</screen>
+
+ <para>
+ Checking again after rebuilding the index but corrupting the contents,
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index tuple size does not equal lp_len in index "corrupt_index"
+DETAIL: Index tid=(39,49) tuple size=3373 lp_len=24 page lsn=0/2B548C0.
+HINT: This could be a torn page problem.
+LOCATION: bt_target_page_check, verify_nbtree.c:1125
+</screen>
+
+ </sect2>
+</sect1>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c48d453793..426da02784 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -102,6 +102,7 @@ AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AlternativeSubPlanState
+AmCheckSettings
AnalyzeAttrComputeStatsFunc
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
@@ -402,6 +403,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnectOptions
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
On Thu, Aug 20, 2020 at 8:00 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
On Aug 16, 2020, at 9:37 PM, Amul Sul <sulamul@gmail.com> wrote:
In addition to this, I found a few more things while reading v13 patch are as
below:Patch v13-0001:
- +#include "amcheck.h"Not in correct order.
Fixed.
+typedef struct BtreeCheckContext +{ + TupleDesc tupdesc; + Tuplestorestate *tupstore; + bool is_corrupt; + bool on_error_stop; +} BtreeCheckContext;Unnecessary spaces/tabs between } and BtreeCheckContext.
This refers to a change in verify_nbtree.c that has been removed. Per discussions with Peter and Robert, I have simply withdrawn that portion of the patch.
static void bt_index_check_internal(Oid indrelid, bool parentcheck, - bool heapallindexed, bool rootdescend); + bool heapallindexed, bool rootdescend, + BtreeCheckContext * ctx);Unnecessary space between * and ctx. The same changes needed for other places as
well.Same as above. The changes to verify_nbtree.c have been withdrawn.
---
Patch v13-0002:
+-- partitioned tables (the parent ones) don't have visibility maps +create table test_partitioned (a int, b text default repeat('x', 5000)) + partition by list (a); +-- these should all fail +select * from verify_heapam('test_partitioned', + on_error_stop := false, + skip := NULL, + startblock := NULL, + endblock := NULL); +ERROR: "test_partitioned" is not a table, materialized view, or TOAST table +create table test_partition partition of test_partitioned for values in (1); +create index test_index on test_partition (a);Can't we make it work? If the input is partitioned, I think we could
collect all its leaf partitions and process them one by one. Thoughts?I was following the example from pg_visibility. I haven't thought about your proposal enough to have much opinion as yet, except that if we do this for pg_amcheck we should do likewise to pg_visibility, for consistency of the user interface.
pg_visibility does exist from before the declarative partitioning came
in, I think it's time to improve that as well.
+ ctx->chunkno++;
Instead of incrementing in check_toast_tuple(), I think incrementing should
happen at the caller -- just after check_toast_tuple() call.I agree.
---
Patch v13-0003:
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);resetPQExpBuffer() will be unnecessary if the next call is destroyPQExpBuffer().
Thanks. I removed it in cases where destroyPQExpBuffer is obviously the very next call.
+ appendPQExpBuffer(query, + "SELECT c.relname, v.blkno, v.offnum, v.lp_off, " + "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg" + "\nFROM verify_heapam(rel := %u, on_error_stop := %s, " + "skip := %s, startblock := %s, endblock := %s) v, " + "pg_class c" + "\nWHERE c.oid = %u", + tbloid, stop, skip, settings.startblock, + settings.endblock, tbloidpg_class should be schema-qualified like elsewhere.
Agreed, and changed.
IIUC, pg_class is meant to
get relname only, instead, we could use '%u'::pg_catalog.regclass in the target
list for the relname. Thoughts?get_table_check_list() creates the list of all tables to be checked, which check_tables() then iterates over, calling check_table() for each one. I think some verification that the table still exists is in order. Using '%u'::pg_catalog.regclass for a table that has since been dropped would pass in the old table Oid and draw an error of the 'ERROR: could not open relation with OID 36311' variety, whereas the current coding will just skip the dropped table.
Also I think we should skip '\n' from the query string (see appendPQExpBuffer()
in pg_dump.c)I'm not sure I understand. pg_dump.c uses "\n" in query strings it passes to appendPQExpBuffer(), in a manner very similar to what this patch does.
I see there is a mix of styles, I was referring to dumpDatabase() from pg_dump.c
which doesn't include '\n'.
+ appendPQExpBuffer(query, + "SELECT i.indexrelid" + "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c" + "\nWHERE i.indexrelid = c.oid" + "\n AND c.relam = %u" + "\n AND i.indrelid = %u", + BTREE_AM_OID, tbloid); + + ExecuteSqlStatement("RESET search_path"); + res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK); + PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));I don't think we need the search_path query. The main query doesn't have any
dependencies on it. Same is in check_indexes(), check_index (),
expand_table_name_patterns() & get_table_check_list().
Correct me if I am missing something.Right.
+ output = PageOutput(lines + 2, NULL); + for (lineno = 0; usage_text[lineno]; lineno++) + fprintf(output, "%s\n", usage_text[lineno]); + fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT); + fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);I am not sure why we want PageOutput() if the second argument is always going to
be NULL? Can't we directly use printf() instead of PageOutput() + fprintf() ?
e.g. usage() function in pg_basebackup.c.Done.
Please find attached the next version of the patch. In addition to your review comments (above), I have made changes in response to Peter and Robert's review comments upthread.
Thanks for the updated version, I'll have a look.
Regards,
Amul
Few comments for v14 version:
v14-0001:
verify_heapam.c: In function ‘verify_heapam’:
verify_heapam.c:339:14: warning: variable ‘ph’ set but not used
[-Wunused-but-set-variable]
PageHeader ph;
^
verify_heapam.c: In function ‘check_toast_tuple’:
verify_heapam.c:877:8: warning: variable ‘chunkdata’ set but not used
[-Wunused-but-set-variable]
char *chunkdata;
Got these compilation warnings
+++ b/contrib/amcheck/amcheck.h
@@ -0,0 +1,5 @@
+#include "postgres.h"
+
+Datum verify_heapam(PG_FUNCTION_ARGS);
+Datum bt_index_check(PG_FUNCTION_ARGS);
+Datum bt_index_parent_check(PG_FUNCTION_ARGS);
bt_index_* are needed?
#include "access/htup_details.h"
#include "access/xact.h"
#include "catalog/pg_type.h"
#include "catalog/storage_xlog.h"
#include "storage/smgr.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"
These header file inclusion to verify_heapam.c. can be omitted. Some of those
might be implicitly got added by other header files or no longer need due to
recent changes.
+ * on_error_stop:
+ * Whether to stop at the end of the first page for which errors are
+ * detected. Note that multiple rows may be returned.
+ *
+ * check_toast:
+ * Whether to check each toasted attribute against the toast table to
+ * verify that it can be found there.
+ *
+ * skip:
+ * What kinds of pages in the heap relation should be skipped. Valid
+ * options are "all-visible", "all-frozen", and "none".
I think it would be good if the description also includes what will be default
value otherwise.
+ /*
+ * Optionally open the toast relation, if any, also protected from
+ * concurrent vacuums.
+ */
Now lock is changed to AccessShareLock, I think we need to rephrase this comment
as well since we are not really doing anything extra explicitly to protect from
the concurrent vacuum.
+/*
+ * Return wehter a multitransaction ID is in the cached valid range.
+ */
Typo: s/wehter/whether
v14-0002:
+#define NOPAGER 0
Unused macro.
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg"
+ "\nFROM public.verify_heapam("
+ "\nrelation := %u,"
+ "\non_error_stop := %s,"
+ "\nskip := %s,"
+ "\ncheck_toast := %s,"
+ "\nstartblock := %s,"
+ "\nendblock := %s) v, "
+ "\npg_catalog.pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid, stop, skip, toast, startblock, endblock, tbloid);
[....]
+ appendPQExpBuffer(querybuf,
+ "SELECT public.bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
The assumption that the amcheck extension will be always installed in the public
schema doesn't seem to be correct. This will not work if amcheck install
somewhere else.
Regards,
Amul
Show quoted text
On Thu, Aug 20, 2020 at 5:17 PM Amul Sul <sulamul@gmail.com> wrote:
On Thu, Aug 20, 2020 at 8:00 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:On Aug 16, 2020, at 9:37 PM, Amul Sul <sulamul@gmail.com> wrote:
In addition to this, I found a few more things while reading v13 patch are as
below:Patch v13-0001:
- +#include "amcheck.h"Not in correct order.
Fixed.
+typedef struct BtreeCheckContext +{ + TupleDesc tupdesc; + Tuplestorestate *tupstore; + bool is_corrupt; + bool on_error_stop; +} BtreeCheckContext;Unnecessary spaces/tabs between } and BtreeCheckContext.
This refers to a change in verify_nbtree.c that has been removed. Per discussions with Peter and Robert, I have simply withdrawn that portion of the patch.
static void bt_index_check_internal(Oid indrelid, bool parentcheck, - bool heapallindexed, bool rootdescend); + bool heapallindexed, bool rootdescend, + BtreeCheckContext * ctx);Unnecessary space between * and ctx. The same changes needed for other places as
well.Same as above. The changes to verify_nbtree.c have been withdrawn.
---
Patch v13-0002:
+-- partitioned tables (the parent ones) don't have visibility maps +create table test_partitioned (a int, b text default repeat('x', 5000)) + partition by list (a); +-- these should all fail +select * from verify_heapam('test_partitioned', + on_error_stop := false, + skip := NULL, + startblock := NULL, + endblock := NULL); +ERROR: "test_partitioned" is not a table, materialized view, or TOAST table +create table test_partition partition of test_partitioned for values in (1); +create index test_index on test_partition (a);Can't we make it work? If the input is partitioned, I think we could
collect all its leaf partitions and process them one by one. Thoughts?I was following the example from pg_visibility. I haven't thought about your proposal enough to have much opinion as yet, except that if we do this for pg_amcheck we should do likewise to pg_visibility, for consistency of the user interface.
pg_visibility does exist from before the declarative partitioning came
in, I think it's time to improve that as well.+ ctx->chunkno++;
Instead of incrementing in check_toast_tuple(), I think incrementing should
happen at the caller -- just after check_toast_tuple() call.I agree.
---
Patch v13-0003:
+ resetPQExpBuffer(query);
+ destroyPQExpBuffer(query);resetPQExpBuffer() will be unnecessary if the next call is destroyPQExpBuffer().
Thanks. I removed it in cases where destroyPQExpBuffer is obviously the very next call.
+ appendPQExpBuffer(query, + "SELECT c.relname, v.blkno, v.offnum, v.lp_off, " + "v.lp_flags, v.lp_len, v.attnum, v.chunk, v.msg" + "\nFROM verify_heapam(rel := %u, on_error_stop := %s, " + "skip := %s, startblock := %s, endblock := %s) v, " + "pg_class c" + "\nWHERE c.oid = %u", + tbloid, stop, skip, settings.startblock, + settings.endblock, tbloidpg_class should be schema-qualified like elsewhere.
Agreed, and changed.
IIUC, pg_class is meant to
get relname only, instead, we could use '%u'::pg_catalog.regclass in the target
list for the relname. Thoughts?get_table_check_list() creates the list of all tables to be checked, which check_tables() then iterates over, calling check_table() for each one. I think some verification that the table still exists is in order. Using '%u'::pg_catalog.regclass for a table that has since been dropped would pass in the old table Oid and draw an error of the 'ERROR: could not open relation with OID 36311' variety, whereas the current coding will just skip the dropped table.
Also I think we should skip '\n' from the query string (see appendPQExpBuffer()
in pg_dump.c)I'm not sure I understand. pg_dump.c uses "\n" in query strings it passes to appendPQExpBuffer(), in a manner very similar to what this patch does.
I see there is a mix of styles, I was referring to dumpDatabase() from pg_dump.c
which doesn't include '\n'.+ appendPQExpBuffer(query, + "SELECT i.indexrelid" + "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class c" + "\nWHERE i.indexrelid = c.oid" + "\n AND c.relam = %u" + "\n AND i.indrelid = %u", + BTREE_AM_OID, tbloid); + + ExecuteSqlStatement("RESET search_path"); + res = ExecuteSqlQuery(query->data, PGRES_TUPLES_OK); + PQclear(ExecuteSqlQueryForSingleRow(ALWAYS_SECURE_SEARCH_PATH_SQL));I don't think we need the search_path query. The main query doesn't have any
dependencies on it. Same is in check_indexes(), check_index (),
expand_table_name_patterns() & get_table_check_list().
Correct me if I am missing something.Right.
+ output = PageOutput(lines + 2, NULL); + for (lineno = 0; usage_text[lineno]; lineno++) + fprintf(output, "%s\n", usage_text[lineno]); + fprintf(output, "Report bugs to <%s>.\n", PACKAGE_BUGREPORT); + fprintf(output, "%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);I am not sure why we want PageOutput() if the second argument is always going to
be NULL? Can't we directly use printf() instead of PageOutput() + fprintf() ?
e.g. usage() function in pg_basebackup.c.Done.
Please find attached the next version of the patch. In addition to your review comments (above), I have made changes in response to Peter and Robert's review comments upthread.
Thanks for the updated version, I'll have a look.
Regards,
Amul
On Aug 24, 2020, at 2:48 AM, Amul Sul <sulamul@gmail.com> wrote:
Few comments for v14 version:
v14-0001:
verify_heapam.c: In function ‘verify_heapam’:
verify_heapam.c:339:14: warning: variable ‘ph’ set but not used
[-Wunused-but-set-variable]
PageHeader ph;
^
verify_heapam.c: In function ‘check_toast_tuple’:
verify_heapam.c:877:8: warning: variable ‘chunkdata’ set but not used
[-Wunused-but-set-variable]
char *chunkdata;Got these compilation warnings
Removed.
+++ b/contrib/amcheck/amcheck.h @@ -0,0 +1,5 @@ +#include "postgres.h" + +Datum verify_heapam(PG_FUNCTION_ARGS); +Datum bt_index_check(PG_FUNCTION_ARGS); +Datum bt_index_parent_check(PG_FUNCTION_ARGS);bt_index_* are needed?
This entire header file is not needed. Removed.
#include "access/htup_details.h"
#include "access/xact.h"
#include "catalog/pg_type.h"
#include "catalog/storage_xlog.h"
#include "storage/smgr.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/snapmgr.h"
#include "utils/syscache.h"These header file inclusion to verify_heapam.c. can be omitted. Some of those
might be implicitly got added by other header files or no longer need due to
recent changes.
Removed.
+ * on_error_stop: + * Whether to stop at the end of the first page for which errors are + * detected. Note that multiple rows may be returned. + * + * check_toast: + * Whether to check each toasted attribute against the toast table to + * verify that it can be found there. + * + * skip: + * What kinds of pages in the heap relation should be skipped. Valid + * options are "all-visible", "all-frozen", and "none".I think it would be good if the description also includes what will be default
value otherwise.
The defaults are defined in amcheck--1.2--1.3.sql, and I was concerned that documenting them in verify_heapam.c would create a hazard of the defaults and their documented values getting out of sync. The handling of null arguments in verify_heapam.c was, however, duplicating the defaults from the .sql file, so I've changed that to just ereport error on null. (I can't make the whole function strict, as some other arguments are allowed to be null.) I have not documented the defaults in either file, as they are quite self-evident in the .sql file. I've updated some tests that were passing null to get the default behavior to now either pass nothing or explicitly pass the argument they want.
+ /* + * Optionally open the toast relation, if any, also protected from + * concurrent vacuums. + */Now lock is changed to AccessShareLock, I think we need to rephrase this comment
as well since we are not really doing anything extra explicitly to protect from
the concurrent vacuum.
Right. Comment changed.
+/* + * Return wehter a multitransaction ID is in the cached valid range. + */Typo: s/wehter/whether
Changed.
v14-0002:
+#define NOPAGER 0
Unused macro.
Removed.
+ appendPQExpBuffer(querybuf, + "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg" + "\nFROM public.verify_heapam(" + "\nrelation := %u," + "\non_error_stop := %s," + "\nskip := %s," + "\ncheck_toast := %s," + "\nstartblock := %s," + "\nendblock := %s) v, " + "\npg_catalog.pg_class c" + "\nWHERE c.oid = %u", + tbloid, stop, skip, toast, startblock, endblock, tbloid); [....] + appendPQExpBuffer(querybuf, + "SELECT public.bt_index_parent_check('%s'::regclass, %s, %s)", + idxoid, + settings.heapallindexed ? "true" : "false", + settings.rootdescend ? "true" : "false");The assumption that the amcheck extension will be always installed in the public
schema doesn't seem to be correct. This will not work if amcheck install
somewhere else.
Right. I removed the schema qualification, leaving it up to the search path.
Thanks for the review!
Attachments:
v15-0001-Adding-function-verify_heapam-to-amcheck-module.patchapplication/octet-stream; name=v15-0001-Adding-function-verify_heapam-to-amcheck-module.patch; x-unix-mode=0644Download
From cd2eafad48d4bedcec07aea63215d955a7e821dd Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 24 Aug 2020 14:01:49 -0700
Subject: [PATCH v15 1/2] Adding function verify_heapam to amcheck module
Adding new function verify_heapam for checking a heap relation and
optionally its associated toast relation, if any.
---
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 27 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/expected/check_heap.out | 132 +++
contrib/amcheck/sql/check_heap.sql | 86 ++
contrib/amcheck/t/001_verify_heapam.pl | 242 ++++
contrib/amcheck/verify_heapam.c | 1423 +++++++++++++++++++++++
doc/src/sgml/amcheck.sgml | 215 ++++
src/backend/access/heap/hio.c | 11 +
src/backend/access/transam/multixact.c | 19 +
src/include/access/multixact.h | 1 +
src/tools/pgindent/typedefs.list | 2 +
12 files changed, 2164 insertions(+), 3 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/t/001_verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..b82f221e50 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..aa7c381ccd
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,27 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip cstring default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text
+ )
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION
+verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..45f5ed4c83
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,132 @@
+CREATE TABLE heaptest (a integer, b text);
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+ERROR: invalid skip option
+HINT: Valid skip options are "all-visible", "all-frozen", and "none".
+-- Check that block range is reject for an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ERROR: starting block is out of bounds for relation with no blocks: 0
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+-- Check that an invalid block range is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 100000, endblock := 200000);
+ERROR: block range is out of bounds for relation with block count 1: 100000 .. 200000
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+select * from verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+create table test_partition partition of test_partitioned for values in (1);
+select * from verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+insert into test_partitioned (a) (select 1 from generate_series(1,1000) gs);
+select * from verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that indexes are rejected
+create index test_index on test_partition (a);
+select * from verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+-- Check that views are rejected
+create view test_view as select 1;
+select * from verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+-- Check that sequences are rejected
+create sequence test_sequence;
+select * from verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+-- Check that foreign tables are rejected
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+select * from verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..4e04f1791f
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,86 @@
+CREATE TABLE heaptest (a integer, b text);
+
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+
+-- Check that block range is reject for an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+
+-- Check that an invalid block range is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 100000, endblock := 200000);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+create table test_partitioned (a int, b text default repeat('x', 5000))
+ partition by list (a);
+select * from verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+create table test_partition partition of test_partitioned for values in (1);
+select * from verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+insert into test_partitioned (a) (select 1 from generate_series(1,1000) gs);
+select * from verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that indexes are rejected
+create index test_index on test_partition (a);
+select * from verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that views are rejected
+create view test_view as select 1;
+select * from verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that sequences are rejected
+create sequence test_sequence;
+select * from verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that foreign tables are rejected
+create foreign data wrapper dummy;
+create server dummy_server foreign data wrapper dummy;
+create foreign table test_foreign_table () server dummy_server;
+select * from verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
new file mode 100644
index 0000000000..e7526c17b8
--- /dev/null
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -0,0 +1,242 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 65;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#
+# Check a table with data loaded but no corruption, freezing, etc.
+#
+fresh_test_table('test');
+check_all_options_uncorrupted('test', 'plain');
+
+#
+# Check a corrupt table
+#
+fresh_test_table('test');
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "plain corrupted table");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-visible')",
+ "plain corrupted table skipping all-visible");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "plain corrupted table skipping all-frozen");
+detects_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "plain corrupted table skipping toast");
+detects_corruption(
+ "verify_heapam('test', startblock := 0, endblock := 0)",
+ "plain corrupted table checking only block zero");
+
+#
+# Check a corrupt table with all-frozen data
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "all-frozen corrupted table");
+detects_no_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "all-frozen corrupted table skipping all-frozen");
+
+#
+# Check a corrupt table with corrupt page header
+#
+fresh_test_table('test');
+corrupt_first_page_and_header('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "corrupted test table with bad page header");
+
+#
+# Check an uncorrupted table with corrupt toast page header
+#
+fresh_test_table('test');
+my $toast = get_toast_for('test');
+corrupt_first_page_and_header($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast page header checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast page header skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast page header");
+
+#
+# Check an uncorrupted table with corrupt toast
+#
+fresh_test_table('test');
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table");
+
+#
+# Check an uncorrupted all-frozen table with corrupt toast
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "all-frozen table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "all-frozen table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table of all-frozen table");
+
+# Returns the filesystem path for the named relation.
+sub relation_filepath
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the fully qualified name of the toast table for the named relation
+sub get_toast_for
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ SELECT 'pg_toast.' || t.relname
+ FROM pg_catalog.pg_class c, pg_catalog.pg_class t
+ WHERE c.relname = '$relname'
+ AND c.reltoastrelid = t.oid));
+}
+
+# (Re)create and populate a test table of the given name.
+sub fresh_test_table
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ DROP TABLE IF EXISTS $relname CASCADE;
+ CREATE TABLE $relname (a integer, b text);
+ ALTER TABLE $relname SET (autovacuum_enabled=false);
+ ALTER TABLE $relname ALTER b SET STORAGE external;
+ INSERT INTO $relname (a, b)
+ (SELECT gs, repeat('b',gs*10) FROM generate_series(1,1000) gs);
+ ));
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+sub corrupt_first_page_internal
+{
+ my ($relname, $corrupt_header) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+
+ # If we corrupt the header, postgres won't allow the page into the buffer.
+ syswrite($fh, '\xFF\xFF\xFF\xFF', 8) if ($corrupt_header);
+
+ # Corrupt at least the line pointers. Exactly what this corrupts will
+ # depend on the page, as it may run past the line pointers into the user
+ # data. We stop short of writing 2048 bytes (2k), the smallest supported
+ # page size, as we don't want to corrupt the next page.
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+sub corrupt_first_page
+{
+ corrupt_first_page_internal($_[0], undef);
+}
+
+sub corrupt_first_page_and_header
+{
+ corrupt_first_page_internal($_[0], 1);
+}
+
+sub detects_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) > 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+sub detects_no_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) = 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+# Check various options are stable (don't abort) and do not report corruption
+# when running verify_heapam on an uncorrupted test table.
+#
+# The relname *must* be an uncorrupted table, or this will fail.
+#
+# The prefix is used to identify the test, along with the options,
+# and should be unique.
+sub check_all_options_uncorrupted
+{
+ my ($relname, $prefix) = @_;
+ for my $stop (qw(true false))
+ {
+ for my $check_toast (qw(true false))
+ {
+ for my $skip ("'none'", "'all-frozen'", "'all-visible'")
+ {
+ for my $startblock (qw(NULL 0))
+ {
+ for my $endblock (qw(NULL 0))
+ {
+ my $opts = "on_error_stop := $stop, " .
+ "check_toast := $check_toast, " .
+ "skip := $skip, " .
+ "startblock := $startblock, " .
+ "endblock := $endblock";
+
+ detects_no_corruption(
+ "verify_heapam('$relname', $opts)",
+ "$prefix: $opts");
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..7d7d06546a
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1423 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/* The number of columns in tuples returned by verify_heapam */
+#define HEAPCHECK_RELATION_COLS 4
+
+typedef enum XidCommitStatus
+{
+ XID_TOO_NEW,
+ XID_TOO_OLD,
+ XID_COMMITTED,
+ XID_IN_PROGRESS,
+ XID_ABORTED
+} XidCommitStatus;
+
+typedef enum SkipPages
+{
+ SKIP_PAGES_ALL_FROZEN,
+ SKIP_PAGES_ALL_VISIBLE,
+ SKIP_PAGES_NONE
+} SkipPages;
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam execution.
+ */
+typedef struct HeapCheckContext
+{
+ /*
+ * Cached copies of values from ShmemVariableCache and computed values from
+ * them.
+ */
+ FullTransactionId next_fxid; /* ShmemVariableCache->nextXid */
+ TransactionId next_xid; /* 32-bit version of next_fxid */
+ TransactionId oldest_xid; /* ShmemVariableCache->oldestXid */
+ FullTransactionId oldest_fxid; /* 64-bit version of oldest_xid, computed
+ * relative to next_fxid */
+
+ /*
+ * Cached copy of value from MultiXactState
+ */
+ MultiXactId next_mxact; /* MultiXactState->nextMXact */
+ MultiXactId oldest_mxact; /* MultiXactState->oldestMultiXactId */
+
+ /*
+ * Cached copies of the most recently checked xid and its status.
+ */
+ TransactionId cached_xid;
+ XidCommitStatus cached_status;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ FullTransactionId relfrozenfxid;
+ TransactionId relminmxid;
+ Relation toast_rel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber nblocks;
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Whether verify_heapam has yet encountered any corrupt tuples */
+ bool is_corrupt;
+
+ /* The descriptor and tuplestore for verify_heapam's result tuples */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void check_relation_relkind_and_relam(Relation rel);
+static void check_tuple(HeapCheckContext *ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx);
+
+static bool check_tuple_attribute(HeapCheckContext *ctx);
+static bool check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx);
+
+static void report_corruption(HeapCheckContext *ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+static bool xid_valid_in_rel(TransactionId xid, HeapCheckContext *ctx);
+static FullTransactionId FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx);
+static void update_cached_xid_range(HeapCheckContext *ctx);
+static void update_cached_mxid_range(HeapCheckContext *ctx);
+static XidCommitStatus get_xid_status(TransactionId xid, HeapCheckContext *ctx);
+
+/*
+ * Return whether the given FullTransactionId is within our cached valid
+ * transaction ID range.
+ */
+static inline bool
+fxid_in_cached_range(FullTransactionId fxid, const HeapCheckContext *ctx)
+{
+ return (FullTransactionIdPrecedesOrEquals(ctx->oldest_fxid, fxid) &&
+ FullTransactionIdPrecedes(fxid, ctx->next_fxid));
+}
+
+/*
+ * Scan and report corruption in heap pages, optionally reconciling toasted
+ * attributes with entries in the associated toast table. Intended to be
+ * called from SQL with the following parameters:
+ *
+ * relation
+ * The Oid of the heap relation to be checked.
+ *
+ * on_error_stop:
+ * Whether to stop at the end of the first page for which errors are
+ * detected. Note that multiple rows may be returned.
+ *
+ * check_toast:
+ * Whether to check each toasted attribute against the toast table to
+ * verify that it can be found there.
+ *
+ * skip:
+ * What kinds of pages in the heap relation should be skipped. Valid
+ * options are "all-visible", "all-frozen", and "none".
+ *
+ * Returns to the SQL caller a set of tuples, each containing the location
+ * and a description of a corruption found in the heap.
+ *
+ * Note that if check_toast is true, it is the caller's responsibility to
+ * provide that the toast table and index are not corrupt, and that they
+ * do not become corrupt while this function is running.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext old_context;
+ bool random_access;
+ HeapCheckContext ctx;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool check_toast;
+ SkipPages skip_option = SKIP_PAGES_NONE;
+ int64 start_block;
+ int64 end_block;
+ const char *skip;
+
+ /* Check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("relation cannot be null")));
+ relid = PG_GETARG_OID(0);
+
+ if (PG_ARGISNULL(1))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("on_error_stop cannot be null")));
+ on_error_stop = PG_GETARG_BOOL(1);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check_toast cannot be null")));
+ check_toast = PG_GETARG_BOOL(2);
+
+ if (PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("skip cannot be null")));
+ skip = PG_GETARG_CSTRING(3);
+ if (pg_strcasecmp(skip, "all-visible") == 0)
+ skip_option = SKIP_PAGES_ALL_VISIBLE;
+ else if (pg_strcasecmp(skip, "all-frozen") == 0)
+ skip_option = SKIP_PAGES_ALL_FROZEN;
+ else if (pg_strcasecmp(skip, "none") == 0)
+ skip_option = SKIP_PAGES_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid skip option"),
+ errhint("Valid skip options are \"all-visible\", \"all-frozen\", and \"none\".")));
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+ ctx.cached_xid = InvalidTransactionId;
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ old_context = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ random_access = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(random_access, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+ MemoryContextSwitchTo(old_context);
+
+ /*
+ * Open the relation.
+ */
+ ctx.rel = relation_open(relid, AccessShareLock);
+ check_relation_relkind_and_relam(ctx.rel);
+ ctx.nblocks = RelationGetNumberOfBlocks(ctx.rel);
+
+ /* Early exit if the relation is empty */
+ if (!ctx.nblocks)
+ {
+ /*
+ * For consistency, we need to enforce that the start_block and
+ * end_block are within the valid range if the user specified them.
+ * Yet, for an empty table with no blocks, no specified block can be
+ * in range.
+ */
+ if (!PG_ARGISNULL(4))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*------
+ translator: The integer value is the offset number of the
+ block. */
+ errmsg("starting block is out of bounds for relation with no blocks: " INT64_FORMAT,
+ PG_GETARG_INT64(4))));
+ if (!PG_ARGISNULL(5))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*------
+ translator: The integer value is the offset number of the
+ block. */
+ errmsg("ending block is out of bounds for relation with no blocks: " INT64_FORMAT,
+ PG_GETARG_INT64(5))));
+ relation_close(ctx.rel, AccessShareLock);
+ PG_RETURN_NULL();
+ }
+
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ /* If we get this far, we know the relation has at least one block */
+ start_block = PG_ARGISNULL(4) ? 0 : PG_GETARG_INT64(4);
+ end_block = PG_ARGISNULL(5) ? ((int64) ctx.nblocks) - 1 : PG_GETARG_INT64(5);
+ if (start_block < 0 || end_block >= ctx.nblocks || start_block > end_block)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ /*------
+ translator: The first integer value is the total number of
+ blocks in the relation. The second and third integer values
+ represent starting and ending block offsets. */
+ errmsg("block range is out of bounds for relation with block count %u: " INT64_FORMAT " .. " INT64_FORMAT,
+ ctx.nblocks, start_block, end_block)));
+
+ /* Optionally open the toast relation, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid && check_toast)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toast_rel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ AccessShareLock);
+ offset = toast_open_indexes(ctx.toast_rel,
+ AccessShareLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /*
+ * Main relation has no associated toast relation, or we're
+ * intentionally skipping it.
+ */
+ ctx.toast_rel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ update_cached_xid_range(&ctx);
+ update_cached_mxid_range(&ctx);
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relfrozenfxid = FullTransactionIdFromXidAndCtx(ctx.relfrozenxid, &ctx);
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldest_xid = ctx.relfrozenxid;
+
+ for (ctx.blkno = start_block; ctx.blkno <= end_block; ctx.blkno++)
+ {
+ int32 mapbits;
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_option != SKIP_PAGES_NONE)
+ {
+ bool all_frozen,
+ all_visible;
+
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE;
+ all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN;
+
+ if ((all_frozen && skip_option == SKIP_PAGES_ALL_FROZEN) ||
+ (all_visible && skip_option == SKIP_PAGES_ALL_VISIBLE))
+ {
+ continue;
+ }
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it
+ * redirects to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
+ ItemId rditem;
+
+ if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ {
+ report_corruption(&ctx,
+ /* translator: Both %u are offsets. */
+ psprintf(_("line pointer redirection to item at offset exceeding maximum: %u vs. %u"),
+ (unsigned) rdoffnum,
+ (unsigned) maxoff));
+ continue;
+ }
+ rditem = PageGetItemId(ctx.page, rdoffnum);
+ if (!ItemIdIsUsed(rditem))
+ report_corruption(&ctx,
+ /* translator: The %u is an offset. */
+ psprintf(_("line pointer redirection to unused item at offset: %u"),
+ (unsigned) rdoffnum));
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ AccessShareLock);
+ if (ctx.toast_rel)
+ table_close(ctx.toast_rel, AccessShareLock);
+
+ /* Close the main relation */
+ relation_close(ctx.rel, AccessShareLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Check that a relation is of a supported relkind.
+ */
+static void
+check_relation_relkind_and_relam(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ /* translator: %s is a user supplied object name */
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only heap AM is supported")));
+}
+
+/*
+ * Record a single corruption found in the table. The values in ctx should
+ * reflect the location of the corruption, and the msg argument should contain
+ * a human readable description of the corruption.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+report_corruption(HeapCheckContext *ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ values[2] = Int32GetDatum(ctx->attnum);
+ nulls[2] = (ctx->attnum < 0);
+ values[3] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using work_mem worth of memory building up the
+ * tuplestore. That's ok, but if we also leak the msg argument memory
+ * until the end of the query, we could exceed work_mem by more than a
+ * trivial amount. Therefore, free the msg argument each time we are
+ * called rather than waiting for our current memory context to be freed.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Construct the TupleDesc used to report messages about corruptions found
+ * while scanning the heap.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Return whether a transaction ID is in the cached valid range.
+ */
+static inline bool
+XidInValidRange(TransactionId xid, HeapCheckContext *ctx)
+{
+ return (TransactionIdPrecedesOrEquals(ctx->oldest_xid, xid) &&
+ TransactionIdPrecedesOrEquals(ctx->relfrozenxid, xid) &&
+ TransactionIdPrecedes(xid, ctx->next_xid));
+}
+
+/*
+ * Return wheter a multitransaction ID is in the cached valid range.
+ */
+static inline bool
+MxidInValidRange(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ return (MultiXactIdPrecedesOrEquals(ctx->relminmxid, mxid) &&
+ MultiXactIdPrecedesOrEquals(ctx->oldest_mxact, mxid) &&
+ MultiXactIdPrecedes(mxid, ctx->next_mxact));
+}
+
+/*
+ * Return whether the given transaction ID is (or was recently) valid to appear
+ * in the heap being checked.
+ *
+ * We cache the range of valid transaction IDs. If xid is in that range, we
+ * conclude that it is valid, even though concurrent changes to the table might
+ * invalidate it under certain corrupt conditions. (For example, if the
+ * table contains corrupt all frozen bits, a concurrent vacuum might skip the
+ * page(s) containing the xid and then truncate clog and advance the
+ * relfrozenxid beyond xid.) Reporting the xid as valid under such conditions
+ * seems acceptable, since if we had checked it earlier in our scan it would
+ * have truly been valid at that time, and we break no MVCC guarantees by
+ * failing to notice the concurrent change in its status.
+ */
+static bool
+xid_valid_in_rel(TransactionId xid, HeapCheckContext *ctx)
+{
+ /* Quick return for special xids */
+ switch (xid)
+ {
+ case InvalidTransactionId:
+ return false;
+ case BootstrapTransactionId:
+ case FrozenTransactionId:
+ return true;
+ }
+
+ /* Quick return for xids within cached range */
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* The latest valid xid may have advanced. Recheck. */
+ update_cached_xid_range(ctx);
+ if (XidInValidRange(xid, ctx))
+ return true;
+
+ /* No good. This xid is invalid. */
+ return false;
+}
+
+/*
+ * Returns whether the given mxid is valid to appear in the heap being
+ * checked.
+ *
+ * This function attempts to return quickly by caching the known valid mxid
+ * range in ctx. Callers should already have performed the initial setup of
+ * the cache prior to the first call to this function.
+ */
+static bool
+mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ if (MxidInValidRange(mxid, ctx))
+ return true;
+
+ /* The range may have advanced. Recheck. */
+ update_cached_mxid_range(ctx);
+ if (MxidInValidRange(mxid, ctx))
+ return true;
+
+ return false;
+}
+
+/*
+ * Check for tuple header corruption and tuple visibility.
+ *
+ * Since we do not hold a snapshot, tuple visibility is not a question of
+ * whether we should be able to see the tuple relative to any particular
+ * snapshot, but rather a question of whether it is safe and reasonable to
+ * to check the tuple attributes.
+ *
+ * Some kinds of tuple header corruption make it unsafe to check the tuple
+ * attributes, for example when the tuple is foreshortened and such checks
+ * would read beyond the end of the line pointer (and perhaps the page). In
+ * such cases, we return false (not visible) after recording appropriate
+ * corruption messages.
+ *
+ * Some other kinds of tuple header corruption confuse the question of where
+ * the tuple attributes begin, or how long the nulls bitmap is, etc., making it
+ * unreasonable to attempt to check attributes, even if all candidate answers
+ * to those questions would not result in reading past the end of the line
+ * pointer or page. In such cases, like above, we record corruption messages
+ * about the header and then return false.
+ *
+ * Other kinds of tuple header corruption do not bare on the question of
+ * whether the tuple attributes can be checked, so we record corruption
+ * messages for them but do not base our visibility determination on them. (In
+ * other words, we do not return false merely because we detected them.)
+ *
+ * For visibility determination not specifically related to corruption, what we
+ * want to know is if a tuple is potentially visible to any running
+ * transaction. If you are tempted to replace this function's visibility logic
+ * with a call to another visibility checking function, keep in mind that this
+ * function does not update hint bits, as it seems imprudent to write hint bits
+ * (or anything at all) to a table during a corruption check. Nor does this
+ * function bother classifying tuple visibility beyond a boolean visible vs.
+ * not visible.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ *
+ * Returns whether the tuple is both visible and sufficiently sensible to
+ * undergo attribute checks.
+ */
+static bool
+check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+ bool header_garbled = false;
+ unsigned expected_hoff;;
+
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: First %u is the offset, second %u is the
+ total length. */
+ psprintf(_("data begins at offset beyond the tuple length: %u vs. %u"),
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ header_garbled = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ report_corruption(ctx,
+ pstrdup(_("updating transaction ID marked incompatibly as keys updated and locked only")));
+ header_garbled = true;
+ }
+
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ report_corruption(ctx,
+ pstrdup(_("updating transaction ID marked incompatibly as committed and as a multitransaction ID")));
+
+ /*
+ * This condition is clearly wrong, but we do not consider the header
+ * garbled, because we don't rely on this property for determining if
+ * the tuple is visible or for interpreting other relevant header
+ * fields.
+ */
+ }
+
+ if (infomask & HEAP_HASNULL)
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts));
+ else
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader);
+ if (ctx->tuphdr->t_hoff != expected_hoff)
+ {
+ if ((infomask & HEAP_HASNULL) && ctx->natts == 1)
+ report_corruption(ctx,
+ /* translator: Both %u represent an offset. */
+ psprintf(_("data offset differs from expected: %u vs. %u (1 attribute, has nulls)"),
+ ctx->tuphdr->t_hoff, expected_hoff));
+ else if ((infomask & HEAP_HASNULL))
+ report_corruption(ctx,
+ /*------
+ translator: First and second %u represent offsets,
+ third %u represents the number of attributes. */
+ psprintf(_("data offset differs from expected: %u vs. %u (%u attributes, has nulls)"),
+ ctx->tuphdr->t_hoff, expected_hoff, ctx->natts));
+ else if (ctx->natts == 1)
+ report_corruption(ctx,
+ /* translator: Both %u represent offsets. */
+ psprintf(_("data offset differs from expected: %u vs. %u (1 attribute, no nulls)"),
+ ctx->tuphdr->t_hoff, expected_hoff));
+ else
+ report_corruption(ctx,
+ /*------
+ translator: First and second %u represent an
+ offset, third %u represents the number of
+ attributes. */
+ psprintf(_("data offset differs from expected: %u vs. %u (%u attributes, no nulls)"),
+ ctx->tuphdr->t_hoff, expected_hoff, ctx->natts));
+ header_garbled = true;
+ }
+
+ if (header_garbled)
+ return false; /* checking of this tuple should not continue */
+
+ /*
+ * Ok, we can examine the header for tuple visibility purposes, though we
+ * still need to be careful about a few remaining types of header
+ * corruption. This logic roughly follows that of
+ * HeapTupleSatisfiesVacuum. Where possible the comments indicate which
+ * HTSV_Result we think that function might return for this tuple.
+ */
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ return false; /* HEAPTUPLE_DEAD */
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ XidCommitStatus xvac_status;
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ if (!TransactionIdIsValid(xvac))
+ {
+ report_corruption(ctx,
+ pstrdup(_("old-style VACUUM FULL transaction ID is invalid")));
+ return false; /* corrupt */
+ }
+
+ xvac_status = get_xid_status(xvac, ctx);
+ if (xvac_status == XID_IN_PROGRESS)
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (xvac_status == XID_TOO_NEW)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("old-style VACUUM FULL transaction ID is in the future: %u"),
+ xvac));
+ return false; /* corrupt */
+ }
+ if (xvac_status == XID_TOO_OLD)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("old-style VACUUM FULL transaction ID precedes freeze threshold: %u"),
+ xvac));
+ }
+ if (!xid_valid_in_rel(xvac, ctx))
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("old-style VACUUM FULL transaction ID is invalid in this relation: %u"),
+ xvac));
+ return false; /* corrupt */
+ }
+ if (xvac_status == XID_COMMITTED)
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ else
+ {
+ XidCommitStatus raw_xmin_status;
+
+ if (!TransactionIdIsValid(raw_xmin))
+ {
+ report_corruption(ctx,
+ pstrdup(_("inserting transaction ID is invalid")));
+ return false;
+ }
+
+ raw_xmin_status = get_xid_status(raw_xmin, ctx);
+ if (raw_xmin_status == XID_IN_PROGRESS)
+ return true; /* insert or delete in progress */
+ if (raw_xmin_status != XID_COMMITTED)
+ return false; /* HEAPTUPLE_DEAD */
+ if (raw_xmin_status == XID_TOO_NEW)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("inserting transaction ID is in the future: %u"),
+ raw_xmin));
+ return false; /* corrupt */
+ }
+ if (raw_xmin_status == XID_TOO_OLD)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("inserting transaction ID precedes freeze threshold: %u"),
+ raw_xmin));
+ return false; /* corrupt */
+ }
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ XidCommitStatus xmax_status;
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ if (!TransactionIdIsValid(xmax))
+ {
+ report_corruption(ctx,
+ pstrdup(_("updating transaction ID is invalid")));
+ return false; /* corrupt */
+ }
+
+ xmax_status = get_xid_status(xmax, ctx);
+ if (xmax_status == XID_IN_PROGRESS)
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ if (xmax_status == XID_COMMITTED)
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ if (xmax_status == XID_TOO_NEW)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("updating transaction ID is in the future: %u"),
+ xmax));
+ return false; /* corrupt */
+ }
+ if (xmax_status == XID_TOO_OLD)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier. */
+ psprintf(_("updating transaction ID precedes freeze threshold: %u"),
+ xmax));
+ return false; /* corrupt */
+ }
+
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS or
+ * HEAPTUPLE_LIVE */
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true; /* not dead */
+}
+
+/*
+ * Check the current toast tuple against the state tracked in ctx, recording
+ * any corruption found in ctx->tupstore.
+ *
+ * This is not equivalent to running verify_heapam on the toast table itself,
+ * and is not hardened against corruption of the toast table. Rather, when
+ * validating a toasted attribute in the main table, the sequence of toast
+ * tuples that store the toasted value are retrieved and checked in order, with
+ * each toast tuple being checked against where we are in the sequence, as well
+ * as each toast tuple having its varlena structure sanity checked.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk sequence number is null")));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk data is null")));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ else if (VARATT_IS_SHORT(chunk))
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ else
+ {
+ /* should never happen */
+ uint32 header = ((varattrib_4b *) chunk)->va_4byte.va_header;
+
+ report_corruption(ctx,
+ /*------
+ translator: %0x represents a bit pattern in
+ hexadecimal, %d represents the sequence number. */
+ psprintf(_("corrupt extended toast chunk has invalid varlena header: %0x (sequence number %d)"),
+ header, curchunk));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ report_corruption(ctx,
+ /* translator: Both %u represent sequence numbers. */
+ psprintf(_("toast chunk sequence number not the expected sequence number: %u vs. %u"),
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ report_corruption(ctx,
+ /* translator: Both %u represent sequence numbers. */
+ psprintf(_("toast chunk sequence number exceeds the end chunk sequence number: %u vs. %u"),
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ report_corruption(ctx,
+ /* translator: Both %u represent a chunk size. */
+ psprintf(_("toast chunk size differs from expected size: %u vs. %u"),
+ chunksize, expected_size));
+ return;
+ }
+}
+
+/*
+ * Check the current attribute as tracked in ctx, recording any corruption
+ * found in ctx->tupstore.
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in the
+ * case of a toasted value, optionally continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The checks
+ * performed by Asserts present in those two functions are also performed here.
+ * In cases where those two functions are a bit cavalier in their assumptions
+ * about data being correct, we perform additional checks not present in either
+ * of those two functions. Where some condition is checked in both of those
+ * functions, we perform it here twice, as we parallel the logical flow of
+ * those two functions. The presence of duplicate checks seems a reasonable
+ * price to pay for keeping this code tightly coupled with the code it
+ * protects.
+ *
+ * Returns true if the tuple attribute is sane enough for processing to
+ * continue on to the next attribute, false otherwise.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext *ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask;
+ Form_pg_attribute thisatt;
+
+ infomask = ctx->tuphdr->t_infomask;
+ thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: First %u represents an offset, second and
+ third %u represents a length. */
+ psprintf(_("attribute starts at offset beyond total tuple length: %u vs. %u (attribute length %u)"),
+ ctx->tuphdr->t_hoff + ctx->offset, ctx->lp_len,
+ thisatt->attlen));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: First %u represents an offset, second
+ and third %u represents a length. */
+ psprintf(_("attribute ends at offset beyond total tuple length: %u vs. %u (attribute length %u)"),
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len, thisatt->attlen));
+ return false;
+ }
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u represents an enumeration value. */
+ psprintf(_("toasted attribute has unexpected TOAST tag: %u"),
+ va_tag));
+ /* We can't know where the next attribute begins */
+ return false;
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: First %u represents an offset, second and
+ third %u represents a length. */
+ psprintf(_("attribute ends at offset beyond total tuple length: %u vs. %u (attribute length %u)"),
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len, thisatt->attlen));
+
+ return false;
+ }
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ report_corruption(ctx,
+ pstrdup(_("attribute is external but tuple header flag HEAP_HASEXTERNAL not set")));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ report_corruption(ctx,
+ pstrdup(_("attribute is external but relation has no toast relation")));
+ return true;
+ }
+
+ /* If we were told to skip toast checking, then we're done. */
+ if (ctx->toast_rel == NULL)
+ return true;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toast_rel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ ctx->chunkno++;
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ report_corruption(ctx,
+ /* translator: Both %u represent a chunk number. */
+ psprintf(_("final toast chunk number differs from expected value: %u vs. %u"),
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ report_corruption(ctx,
+ pstrdup(_("toasted value missing from toast table")));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * Check the current tuple as tracked in ctx, recording any corruption found in
+ * ctx->tupstore.
+ */
+static void
+check_tuple(HeapCheckContext *ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /*
+ * If we report corruption before iterating over individual attributes, we
+ * need attnum to be reported as NULL. Set that up before any corruption
+ * reporting might happen.
+ */
+ ctx->attnum = -1;
+
+ /*
+ * If the line pointer for this tuple does not reserve enough space for a
+ * complete tuple header, we dare not read the tuple header.
+ */
+ if (ctx->lp_len < MAXALIGN(SizeofHeapTupleHeader))
+ {
+ report_corruption(ctx,
+ /* translator: Both %u represent a size. */
+ psprintf(_("line pointer length is less than the minimum tuple header size: %u vs. %u"),
+ ctx->lp_len, (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ return;
+ }
+
+ /* If xmin is normal, it should be within valid range */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ if (TransactionIdIsNormal(xmin))
+ {
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ report_corruption(ctx,
+ /* translator: Both %u are transaction IDs. */
+ psprintf(_("inserting transaction ID is from before freeze cutoff: %u vs. %u"),
+ xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!xid_valid_in_rel(xmin, ctx))
+ {
+ report_corruption(ctx,
+ /* translator: %u is a transaction ID. */
+ psprintf(_("inserting transaction ID is in the future: %u"),
+ xmin));
+ fatal = true;
+ }
+ }
+
+ /* If xmax is a multixact, it should be within valid range */
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+ if ((infomask & HEAP_XMAX_IS_MULTI) && !mxid_valid_in_rel(xmax, ctx))
+ {
+ if (MultiXactIdPrecedes(xmax, ctx->relminmxid))
+ {
+ report_corruption(ctx,
+ /* translator: Both %u are multitransaction IDs. */
+ psprintf(_("multitransaction ID is from before relation cutoff: %u vs. %u"),
+ xmax, ctx->relminmxid));
+ fatal = true;
+ }
+ else if (MultiXactIdPrecedes(xmax, ctx->oldest_mxact))
+ {
+ report_corruption(ctx,
+ /* translator: Both %u are multitransaction IDs. */
+ psprintf(_("multitransaction ID is from before cutoff: %u vs. %u"),
+ xmax, ctx->oldest_mxact));
+ fatal = true;
+ }
+ else if (MultiXactIdPrecedesOrEquals(ctx->next_mxact, xmax))
+ {
+ report_corruption(ctx,
+ /* translator: %u is a multitransaction ID. */
+ psprintf(_("multitransaction ID is in the future: %u"),
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /* If xmax is normal, it should be within valid range */
+ if (TransactionIdIsNormal(xmax))
+ {
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdPrecedes(xmax, ctx->relfrozenxid))
+ {
+ report_corruption(ctx,
+ /* translator: Both %u are transaction IDs. */
+ psprintf(_("updating transaction ID is from before relation cutoff: %u vs. %u"),
+ xmax, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!xid_valid_in_rel(xmax, ctx))
+ {
+ report_corruption(ctx,
+ /* translator: %u is a transaction ID. */
+ psprintf(_("updating transaction ID is in the future: %u"),
+ xmax));
+ fatal = true;
+ }
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Check various forms of tuple header corruption. If the header is too
+ * corrupt to continue checking, or if the tuple is not visible to anyone,
+ * we cannot continue with other checks.
+ */
+ if (!check_tuple_header_and_visibilty(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * The tuple is visible, so it must be compatible with the current version
+ * of the relation descriptor. It might have fewer columns than are
+ * present in the relation descriptor, but it cannot have more.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ report_corruption(ctx,
+ /* translator: Both %u are a number. */
+ psprintf(_("number of attributes exceeds maximum expected for table: %u vs. %u"),
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
+ return;
+ }
+
+ /*
+ * Check each attribute unless we hit corruption that confuses what to do
+ * next, at which point we abort further attribute checks for this tuple.
+ * Note that we don't abort for all types of corruption, only for those
+ * types where we don't know how to continue.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ if (!check_tuple_attribute(ctx))
+ break; /* cannot continue */
+}
+
+/*
+ * Convert a TransactionId into a FullTransactionId using our cached values of
+ * the valid transaction ID range. It is the caller's responsibility to have
+ * already updated the cached values, if necessary.
+ */
+static FullTransactionId
+FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
+{
+ uint32 epoch;
+
+ if (!TransactionIdIsNormal(xid))
+ return FullTransactionIdFromEpochAndXid(0, xid);
+ epoch = EpochFromFullTransactionId(ctx->next_fxid);
+ if (xid > ctx->next_xid)
+ epoch--;
+ return FullTransactionIdFromEpochAndXid(epoch, xid);
+}
+
+/*
+ * Update our cached range of valid transaction IDs.
+ */
+static void
+update_cached_xid_range(HeapCheckContext *ctx)
+{
+ /* Make cached copies */
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ ctx->next_fxid = ShmemVariableCache->nextXid;
+ ctx->oldest_xid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+
+ /* And compute alternate versions of the same */
+ ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
+ ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+}
+
+/*
+ * Update our cached range of valid multitransaction IDs.
+ */
+static void
+update_cached_mxid_range(HeapCheckContext *ctx)
+{
+ ReadMultiXactIdRange(&ctx->oldest_mxact, &ctx->next_mxact);
+}
+
+/*
+ * Return the commit status for a TransactionId. The cached range of
+ * valid transaction IDs may be updated as a side effect.
+ */
+static XidCommitStatus
+get_xid_status(TransactionId xid, HeapCheckContext *ctx)
+{
+ XidCommitStatus result;
+ FullTransactionId fxid;
+ FullTransactionId clog_horizon;
+
+ Assert(TransactionIdIsValid(xid));
+
+ /* If we just checked this xid, return the cached status */
+ if (xid == ctx->cached_xid)
+ return ctx->cached_status;
+
+ /* Check if the xid is within bounds */
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ if (!fxid_in_cached_range(fxid, ctx))
+ {
+ /*
+ * We may have been checking against stale values. Update the cached
+ * range to be sure, and since we relied on the cached range when we
+ * performed the full xid conversion, reconvert.
+ */
+ update_cached_xid_range(ctx);
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+
+ if (FullTransactionIdPrecedesOrEquals(ctx->next_fxid, fxid))
+ {
+ ctx->cached_xid = xid;
+ ctx->cached_status = XID_TOO_NEW;
+ return XID_TOO_NEW;
+ }
+ if (FullTransactionIdPrecedes(fxid, ctx->oldest_fxid) ||
+ FullTransactionIdPrecedes(fxid, ctx->relfrozenfxid))
+ {
+ ctx->cached_xid = xid;
+ ctx->cached_status = XID_TOO_OLD;
+ return XID_TOO_OLD;
+ }
+ }
+
+ result = XID_COMMITTED;
+ LWLockAcquire(XactTruncationLock, LW_SHARED);
+ clog_horizon = FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid, ctx);
+ if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
+ {
+ if (TransactionIdIsCurrentTransactionId(xid))
+ result = XID_IN_PROGRESS;
+ else if (TransactionIdDidCommit(xid))
+ result = XID_COMMITTED;
+ else if (TransactionIdDidAbort(xid))
+ result = XID_ABORTED;
+ else
+ result = XID_IN_PROGRESS;
+ }
+ LWLockRelease(XactTruncationLock);
+ ctx->cached_xid = xid;
+ ctx->cached_status = result;
+ return result;
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a9df2c1a9d..159b1e63bf 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -187,6 +187,221 @@ SET client_min_messages = DEBUG1;
</para>
</tip>
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ check_toast boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks a table for structural corruption, where pages in the relation
+ contain data that is invalidly formatted, and for logical corruption,
+ where pages are structurally valid but inconsistent with the rest of the
+ database cluster. Example usage:
+<screen>
+test=# select * from verify_heapam('mytable', check_toast := true);
+ blkno | offnum | attnum | msg
+-------+--------+--------+---------------------------------------------------------------------------------------------------
+ 0 | 1 | | inserting transaction ID is from before freeze cutoff: 3 vs. 524
+ 0 | 2 | | inserting transaction ID is from before freeze cutoff: 4026531839 vs. 524
+ 0 | 3 | | updating transaction ID is from before relation cutoff: 4026531839 vs. 524
+ 0 | 4 | | data begins at offset beyond the tuple length: 152 vs. 58
+ 0 | 4 | | data offset differs from expected: 152 vs. 24 (3 attributes, no nulls)
+ 0 | 5 | | data offset differs from expected: 27 vs. 24 (3 attributes, no nulls)
+ 0 | 6 | | data offset differs from expected: 16 vs. 24 (3 attributes, no nulls)
+ 0 | 7 | | data offset differs from expected: 21 vs. 24 (3 attributes, no nulls)
+ 0 | 8 | | number of attributes exceeds maximum expected for table: 2047 vs. 3
+ 0 | 9 | | data offset differs from expected: 24 vs. 280 (2047 attributes, has nulls)
+ 0 | 10 | | number of attributes exceeds maximum expected for table: 67 vs. 3
+ 0 | 11 | 1 | attribute ends at offset beyond total tuple length: 416848000 vs. 58 (attribute length 4294967295)
+ 0 | 12 | 2 | final toast chunk number differs from expected value: 0 vs. 6
+ 0 | 12 | 2 | toasted value missing from toast table
+ 0 | 13 | | updating transaction ID marked incompatibly as keys updated and locked only
+ 0 | 14 | | multitransaction ID is from before relation cutoff: 0 vs. 1
+
+
+(16 rows)
+</screen>
+ As this example shows, the Tuple ID (TID) of the corrupt tuple is given
+ in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
+ for corruptions specific to a particular attribute in the tuple, the
+ <literal>attnum</literal> field shows which one.
+ </para>
+ <para>
+ Structural corruption can happen due to faulty storage hardware, or
+ relation files being overwritten or modified by unrelated software.
+ This kind of corruption can also be detected with
+ <link linkend="app-initdb-data-checksums"><application>data page
+ checksums</application></link>.
+ </para>
+ <para>
+ Relation pages which are correctly formatted, internally consistent, and
+ correct relative to their own internal checksums may still contain
+ logical corruption. As such, this kind of corruption cannot be detected
+ with <application>checksums</application>. Examples include toasted
+ values in the main table which lack a corresponding entry in the toast
+ table, and tuples in the main table with a Transaction ID that is older
+ than the oldest valid Transaction ID in the database or cluster.
+ </para>
+ <para>
+ Multiple causes of logical corruption have been observed in production
+ systems, including bugs in the <productname>PostgreSQL</productname>
+ server software, faulty and ill-conceived backup and restore tools, and
+ user error.
+ </para>
+ <para>
+ Corrupt relations are most concerning in live production environments,
+ precisely the same environments where high risk activities are least
+ welcome. For this reason, <function>verify_heapam</function> has been
+ designed to diagnose corruption without undue risk. It cannot guard
+ against all causes of backend crashes, as even executing the calling
+ query could be unsafe on a badly corrupted system. Access to <link
+ linkend="catalogs-overview">catalog tables</link> are performed and could
+ be problematic if the catalogs themselves are corrupted.
+ </para>
+ <para>
+ The design principle adhered to in <function>verify_heapam</function> is
+ that, if the rest of the system and server hardware are correct, under
+ default options, <function>verify_heapam</function> will not crash the
+ server due merely to structural or logical corruption in the target
+ table.
+ </para>
+ <para>
+ An experimental option, <literal>check_toast</literal>, exists to
+ reconcile the target table against entries in its corresponding toast
+ table. This option may change in future, is disabled by default, and is
+ known to be slow. It is also unsafe under some conditions. If the
+ target relation's corresponding toast table or toast index are corrupt,
+ reconciling the target table against toast values may be unsafe. If the
+ catalogs, toast table and toast index are uncorrupted, and remain so
+ during the check of the target table, reconciling the target table
+ against its toast table should be safe.
+ </para>
+ <para>
+ The following optional arguments are recognized:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>on_error_stop</term>
+ <listitem>
+ <para>
+ If true, corruption checking stops at the end of the first block on
+ which any corruptions are found.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>check_toast</term>
+ <listitem>
+ <para>
+ If this experimental option is true, toasted values are checked gainst
+ the corresponding TOAST table.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>skip</term>
+ <listitem>
+ <para>
+ If not <literal>none</literal>, corruption checking skips blocks that
+ are marked as all-visible or all-frozen, as given.
+ Valid options are <literal>all-visible</literal>,
+ <literal>all-frozen</literal> and <literal>none</literal>.
+ </para>
+ <para>
+ Defaults to <literal>none</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>startblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking begins at the specified block,
+ skipping all previous blocks. It is an error to specify a
+ <literal>startblock</literal> outside the range of blocks in the
+ target table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>endblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking ends at the specified block,
+ skipping all remaining blocks. It is an error to specify an
+ <literal>endblock</literal> outside the range of blocks in the target
+ table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ For each corruption detected, <function>verify_heapam</function> returns
+ a row with the following columns:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</sect2>
<sect2>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ca357410a2 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you disable one or both of these
+ * assertions, make corresponding changes there.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index b8bedca04a..5f0c622ad8 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -735,6 +735,25 @@ ReadNextMultiXactId(void)
return mxid;
}
+/*
+ * ReadMultiXactIdRange
+ * Get the range of IDs that may still be referenced by a relation.
+ */
+void
+ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next)
+{
+ LWLockAcquire(MultiXactGenLock, LW_SHARED);
+ *oldest = MultiXactState->oldestMultiXactId;
+ *next = MultiXactState->nextMXact;
+ LWLockRelease(MultiXactGenLock);
+
+ if (*oldest < FirstMultiXactId)
+ *oldest = FirstMultiXactId;
+ if (*next < FirstMultiXactId)
+ *next = FirstMultiXactId;
+}
+
+
/*
* MultiXactIdCreateFromMembers
* Make a new MultiXactId from the specified set of members
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 6d729008c6..f67f52057c 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -108,6 +108,7 @@ extern MultiXactId MultiXactIdCreateFromMembers(int nmembers,
MultiXactMember *members);
extern MultiXactId ReadNextMultiXactId(void);
+extern void ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next);
extern bool MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly);
extern void MultiXactIdSetOldestMember(void);
extern int GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **xids,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3d990463ce..c48d453793 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1019,6 +1019,7 @@ HbaToken
HeadlineJsonState
HeadlineParsedText
HeadlineWordEntry
+HeapCheckContext
HeapScanDesc
HeapTuple
HeapTupleData
@@ -2282,6 +2283,7 @@ SimpleStringList
SimpleStringListCell
SingleBoundSortItem
Size
+SkipPages
SlabBlock
SlabChunk
SlabContext
--
2.21.1 (Apple Git-122.3)
v15-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v15-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From df0e9e4dfc3b1c8e849c76c2ffadb4270f736b8b Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 24 Aug 2020 15:02:57 -0700
Subject: [PATCH v15 2/2] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 1279 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.control | 5 +
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 60 +
contrib/pg_amcheck/t/003_check.pl | 231 ++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 426 +++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 229 ++++
src/tools/pgindent/typedefs.list | 2 +
14 files changed, 2327 insertions(+)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.control
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d415b6..ed5589d97b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -29,6 +29,7 @@ SUBDIRS = \
oid2name \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..07ad380105
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+/pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..a017052478
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1279 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "common/connect.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --strict-names require include patterns to match at least one entity each",
+ " -o, --on-error-stop stop checking at end of first corrupt page",
+ "",
+ "Schema checking options:",
+ " -n, --schema=PATTERN check relations in the specified schema(s) only",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)",
+ "",
+ "Table checking options:",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -b, --startblock begin checking table(s) at the given starting block number",
+ " -e, --endblock check table(s) only up to the given ending block number",
+ " -f, --skip-all-frozen do NOT check blocks marked as all frozen",
+ " -v, --skip-all-visible do NOT check blocks marked as all visible",
+ "",
+ "TOAST table checking options:",
+ " -z, --check-toast check associated toast tables and toast indexes",
+ " -Z, --skip-toast do NOT check associated toast tables and toast indexes",
+ " -B, --toast-startblock begin checking toast table(s) at the given starting block",
+ " -E, --toast-endblock check toast table(s) only up to the given ending block",
+ "",
+ "Index checking options:",
+ " -x, --check-indexes check btree indexes associated with tables being checked",
+ " -X, --skip-indexes do NOT check any btree indexes",
+ " -i, --index=PATTERN check the specified index(es) only",
+ " -I, --exclude-index=PATTERN do NOT check the specified index(es)",
+ " -c, --check-corrupt check indexes even if their associated table is corrupt",
+ " -C, --skip-corrupt do NOT check indexes if their associated table is corrupt",
+ " -a, --heapallindexed check index tuples against the table tuples",
+ " -A, --no-heapallindexed do NOT check index tuples against the table tuples",
+ " -r, --rootdescend search from the root page for each index tuple",
+ " -R, --no-rootdescend do NOT search from the root page for each index tuple",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+AmCheckSettings
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes */
+ bool check_toast; /* Check associated toast tables and indexes */
+ bool check_corrupt; /* Check indexes even if table is corrupt */
+ bool heapallindexed; /* Perform index to table reconciling checks */
+ bool rootdescend; /* Perform index rootdescend checks */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends, inclusive */
+ char *toaststart; /* Block number where toast checking begins */
+ char *toastend; /* Block number where toast checking ends,
+ * inclusive */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of Oids of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+static SimpleStringList index_include_patterns = {NULL, NULL};
+static SimpleOidList index_include_oids = {NULL, NULL};
+static SimpleStringList index_exclude_patterns = {NULL, NULL};
+static SimpleOidList index_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+/*
+ * Strings to be constructed once upon first use. These could be made
+ * string constants instead, but that would require embedding knowledge
+ * of the single character values for each relkind, such as 'm' for
+ * materialized views, which we'd rather not embed here.
+ */
+static char *table_relkind_quals = NULL;
+static char *index_relkind_quals = NULL;
+
+/*
+ * Functions to get pointers to the two strings, above, after initializing
+ * them upon the first call to the function.
+ */
+static const char *get_table_relkind_quals(void);
+static const char *get_index_relkind_quals(void);
+
+/*
+ * Functions for running the various corruption checks.
+ */
+static void check_tables(SimpleOidList *checklist);
+static uint64 check_toast(Oid tbloid);
+static uint64 check_table(Oid tbloid, const char *startblock,
+ const char *endblock, bool on_error_stop,
+ bool check_toast);
+static uint64 check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids);
+static uint64 check_index(const char *idxoid, const char *idxname,
+ const char *tblname);
+
+/*
+ * Functions implementing standard command line behaviors.
+ */
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+static void NoticeProcessor(void *arg, const char *message);
+
+/*
+ * Functions for converting command line options that include or exclude
+ * schemas, tables, and indexes by pattern into internally useful lists of
+ * Oids for objects that match those patterns.
+ */
+static void expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals);
+static void expand_table_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_index_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(const SimpleOidList *include_nsp,
+ const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl,
+ const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+static PGresult *ExecuteSqlQuery(const char *query, char **error);
+static PGresult *ExecuteSqlQueryOrDie(const char *query);
+
+static void append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids);
+static void apply_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids, bool include);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char password[100];
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ /*
+ * Default behaviors for user settable options. Note that these default to
+ * doing all the safe checks and none of the unsafe ones, on the theory
+ * that if a user says "pg_amcheck mydb" without specifying any additional
+ * options, we should check everything we know how to check without risking
+ * any backend aborts.
+ */
+
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+
+ /* Index checking options */
+ settings.check_indexes = false;
+ settings.check_corrupt = false;
+ settings.heapallindexed = false;
+ settings.rootdescend = false;
+
+ /*
+ * Reconciling toasted attributes from the main table against the toast
+ * table can crash the backend if the toast table or index are corrupt. We
+ * can optionally check the toast table and then the toast index prior to
+ * checking the main table, but if the toast table or index are
+ * concurrently corrupted after we conclude they are valid, the check of
+ * the main table can crash the backend. The oneous is on any caller who
+ * enables this option to make certain the environment is sufficiently
+ * stable that concurrent corruptions of the toast is not possible.
+ */
+ settings.check_toast = false;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ simple_prompt("Password: ", password, sizeof(password), false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ simple_prompt(password_prompt, password, sizeof(password), false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /*
+ * Expand schema, table, and index exclusion patterns, if any. Note that
+ * non-matching exclusion patterns are not an error, even when
+ * --strict-names was specified.
+ */
+ expand_schema_name_patterns(&schema_exclude_patterns, NULL,
+ &schema_exclude_oids, false);
+ expand_table_name_patterns(&table_exclude_patterns, NULL, NULL,
+ &table_exclude_oids, false);
+ expand_index_name_patterns(&index_exclude_patterns, NULL, NULL,
+ &index_exclude_oids, false);
+
+ /* Expand schema selection patterns into Oid lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_exclude_oids,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+
+ /* Expand table selection patterns into Oid lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &schema_exclude_oids,
+ &table_exclude_oids,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+
+ /* Expand index selection patterns into Oid lists */
+ if (index_include_patterns.head != NULL)
+ {
+ expand_index_name_patterns(&index_include_patterns,
+ &schema_exclude_oids,
+ &index_exclude_oids,
+ &index_include_oids,
+ settings.strict_names);
+ if (index_include_oids.head == NULL)
+ fatal("no matching indexes were found");
+ }
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ /*
+ * All information about corrupt indexes are returned via ereport, not as
+ * tuples. We want all the details to report if corruption exists.
+ */
+ PQsetErrorVerbosity(settings.db, PQERRORS_VERBOSE);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+/*
+ * Conditionally add a restriction to a query such that lval must be an Oid in
+ * the given list of Oids, except that for a null or empty oids list argument,
+ * no filtering is done and we return without having modified the query buffer.
+ *
+ * The query argument must already have begun the WHERE clause and must be in a
+ * state where we can append an AND clause. No checking of this requirement is
+ * done here.
+ *
+ * On return, the query buffer will be extended with an AND clause that filters
+ * only those rows where the lval is an Oid present in the given list of oids.
+ */
+static inline void
+include_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, true);
+}
+
+/*
+ * Same as include_filter, above, except that for a non-null, non-empty oids
+ * list, the lval is restricted to not be any of the values in the list.
+ */
+static inline void
+exclude_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, false);
+}
+
+/*
+ * Check each table from the given checklist per the user specified options.
+ */
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ uint64 corruptions = 0;
+ bool reconcile_toast;
+
+ /*
+ * If we skip checking the toast table, or if during the check we
+ * detect any toast table corruption, the main table checks below must
+ * not reconcile toasted attributes against the toast table, as such
+ * accesses to the toast table might crash the backend. Instead, skip
+ * such reconciliations for this table.
+ *
+ * This protection contains a race condition; the toast table or index
+ * could become corrupted concurrently with our checks, but prevention
+ * of such concurrent corruption is documented as the caller's
+ * reponsibility, so we don't worry about it here.
+ */
+ reconcile_toast = false;
+ if (settings.check_toast)
+ {
+ if (check_toast(cell->val) == 0)
+ reconcile_toast = true;
+ }
+
+ corruptions = check_table(cell->val,
+ settings.startblock,
+ settings.endblock,
+ settings.on_error_stop,
+ reconcile_toast);
+
+ if (settings.check_indexes)
+ {
+ bool old_heapallindexed;
+
+ /* Optionally skip the index checks for a corrupt table. */
+ if (corruptions && !settings.check_corrupt)
+ continue;
+
+ /*
+ * The btree checking logic which optionally checks the contents of
+ * an index against the corresponding table has not yet been
+ * sufficiently hardened against corrupt tables. In particular,
+ * when called with heapallindexed true, it segfaults if the file
+ * backing the table relation has been erroneously unlinked. In
+ * any event, it seems unwise to reconcile an index against its
+ * table when we already know the table is corrupt.
+ */
+ old_heapallindexed = settings.heapallindexed;
+ if (corruptions)
+ settings.heapallindexed = false;
+
+ corruptions += check_indexes(cell->val,
+ &index_include_oids,
+ &index_exclude_oids);
+
+ settings.heapallindexed = old_heapallindexed;
+ }
+ }
+}
+
+/*
+ * For a given main table relation, returns the associated toast table,
+ * or InvalidOid if none exists.
+ */
+static Oid
+get_toast_oid(Oid tbloid)
+{
+ PQExpBuffer querybuf = createPQExpBuffer();
+ PGresult *res;
+ char *error = NULL;
+ Oid result = InvalidOid;
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.reltoastrelid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid);
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ result = atooid(PQgetvalue(res, 0, 0));
+ else if (error)
+ die_on_query_failure(querybuf->data);
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return result;
+}
+
+/*
+ * For the given main table relation, checks the associated toast table and
+ * index, in any. This should be performed *before* checking the main table
+ * relation, as the checks inside verify_heapam assume both the toast table and
+ * toast index are usable.
+ *
+ * Returns the number of corruptions detected.
+ */
+static uint64
+check_toast(Oid tbloid)
+{
+ Oid toastoid;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_toast");
+
+ toastoid = get_toast_oid(tbloid);
+ if (OidIsValid(toastoid))
+ {
+ corruption_cnt = check_table(toastoid, settings.toaststart,
+ settings.toastend, settings.on_error_stop,
+ false);
+ /*
+ * If the toast table is corrupt, checking the index is not safe.
+ * There is a race condition here, as the toast table could be
+ * concurrently corrupted, but preventing concurrent corruption is the
+ * caller's responsibility, not ours.
+ */
+ if (corruption_cnt == 0)
+ corruption_cnt += check_indexes(toastoid, NULL, NULL);
+ }
+
+ return corruption_cnt;
+}
+
+/*
+ * Checks the given table for corruption, returning the number of corruptions
+ * detected and printed to the user.
+ */
+static uint64
+check_table(Oid tbloid, const char *startblock, const char *endblock,
+ bool on_error_stop, bool check_toast)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *skip;
+ char *toast;
+ const char *stop;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_table");
+
+ if (startblock == NULL)
+ startblock = "NULL";
+ if (endblock == NULL)
+ endblock = "NULL";
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("'none'");
+ stop = (on_error_stop) ? "true" : "false";
+ toast = (check_toast) ? "true" : "false";
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg "
+ "FROM verify_heapam("
+ "relation := %u, "
+ "on_error_stop := %s, "
+ "skip := %s, "
+ "check_toast := %s, "
+ "startblock := %s, "
+ "endblock := %s) v, "
+ "pg_catalog.pg_class c "
+ "WHERE c.oid = %u",
+ tbloid, stop, skip, toast, startblock, endblock, tbloid);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ {
+ corruption_cnt += PQntuples(res);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ printf("(relname=%s,blkno=%s,offnum=%s,attnum=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ }
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+ return corruption_cnt;
+}
+
+static uint64
+check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_indexes");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT i.indexrelid, ci.relname, ct.relname"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class ci, "
+ "pg_catalog.pg_class ct"
+ "\nWHERE i.indexrelid = ci.oid"
+ "\n AND i.indrelid = ct.oid"
+ "\n AND ci.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+ include_filter(querybuf, "i.indexrelid", include_oids);
+ exclude_filter(querybuf, "i.indexrelid", exclude_oids);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ for (i = 0; i < PQntuples(res); i++)
+ corruption_cnt += check_index(PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2));
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static uint64
+check_index(const char *idxoid, const char *idxname, const char *tblname)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_index");
+ if (idxname == NULL)
+ fatal("no index name on entry to check_index");
+ if (tblname == NULL)
+ fatal("no table name on entry to check_index");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
+ res = PQexec(settings.db, querybuf->data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("index check failed for index %s of table %s:\n",
+ idxname, tblname);
+ printf("%s", PQerrorMessage(settings.db));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"check-corrupt", no_argument, NULL, 'c'},
+ {"check-indexes", no_argument, NULL, 'x'},
+ {"check-toast", no_argument, NULL, 'z'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"heapallindexed", no_argument, NULL, 'a'},
+ {"help", optional_argument, NULL, '?'},
+ {"host", required_argument, NULL, 'h'},
+ {"index", required_argument, NULL, 'i'},
+ {"no-heapallindexed", no_argument, NULL, 'A'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"no-rootdescend", no_argument, NULL, 'R'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"password", no_argument, NULL, 'W'},
+ {"port", required_argument, NULL, 'p'},
+ {"rootdescend", no_argument, NULL, 'r'},
+ {"schema", required_argument, NULL, 'n'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-corrupt", no_argument, NULL, 'C'},
+ {"skip-indexes", no_argument, NULL, 'X'},
+ {"skip-toast", no_argument, NULL, 'Z'},
+ {"startblock", required_argument, NULL, 'b'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"toast-endblock", required_argument, NULL, 'E'},
+ {"toast-startblock", required_argument, NULL, 'B'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "aAb:B:cCd:e:E:fh:i:I:n:N:op:rRst:T:U:vVwWxXzZ?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'a':
+ settings.heapallindexed = true;
+ break;
+ case 'A':
+ settings.heapallindexed = false;
+ break;
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'B':
+ settings.toaststart = pg_strdup(optarg);
+ break;
+ case 'c':
+ settings.check_corrupt = true;
+ break;
+ case 'C':
+ settings.check_corrupt = false;
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'E':
+ settings.toastend = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ simple_string_list_append(&index_include_patterns, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&index_exclude_patterns, optarg);
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 'r':
+ settings.rootdescend = true;
+ break;
+ case 'R':
+ settings.rootdescend = false;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case 'x':
+ settings.check_indexes = true;
+ break;
+ case 'X':
+ settings.check_indexes = false;
+ break;
+ case 'z':
+ settings.check_toast = true;
+ break;
+ case 'Z':
+ settings.check_toast = false;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ int lineno;
+
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ printf("%s\n", usage_text[lineno]);
+ printf("Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Helper function for apply_filter, below.
+ */
+static void
+append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(querybuf, "%s%u", comma, cell->val);
+}
+
+/*
+ * Internal implementation of include_filter and exclude_filter
+ */
+static void
+apply_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids,
+ bool include)
+{
+ if (!oids || !oids->head)
+ return;
+ if (include)
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.=) ANY(array[", lval);
+ else
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.!=) ALL(array[", lval);
+ append_csv_oids(querybuf, oids);
+ appendPQExpBuffer(querybuf, "]::OID[])");
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all schemas matching the
+ * given list of patterns but not included in the given list of excluded Oids.
+ */
+static void
+expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the Oid list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(querybuf,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, querybuf, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all relations matching the
+ * given list of patterns but not included in the given list of excluded Oids
+ * nor in one of the given excluded namespaces. The relations are filtered by
+ * the given schema_quals. They are further filtered by the given
+ * relkind_quals, allowing the caller to restrict the relations to just indexes
+ * or tables. The missing_errtext should be a message for use in error
+ * messages if no matching relations are found and strict_names was specified.
+ */
+static void
+expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_relkind_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the Oid list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) %s\n",
+ relkind_quals);
+ exclude_filter(querybuf, "c.oid", exclude_oids);
+ exclude_filter(querybuf, "n.oid", exclude_nsp_oids);
+ processSQLNamePattern(settings.db, querybuf, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("%s \"%s\"", missing_errtext, cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find the Oids of all tables matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_table_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching tables were found for pattern",
+ get_table_relkind_quals());
+}
+
+/*
+ * Find the Oids of all indexes matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_index_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching indexes were found for pattern",
+ get_index_relkind_quals());
+}
+
+static void
+get_table_check_list(const SimpleOidList *include_nsp, const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl, const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to get_table_check_list");
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c, pg_catalog.pg_namespace n"
+ "\nWHERE n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\n AND c.relkind OPERATOR(pg_catalog.=) %s\n",
+ get_table_relkind_quals());
+ include_filter(querybuf, "n.oid", include_nsp);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+ include_filter(querybuf, "c.oid", include_tbl);
+ exclude_filter(querybuf, "c.oid", exclude_tbl);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+}
+
+static PGresult *
+ExecuteSqlQueryOrDie(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute the given SQL query. This function should only be used for queries
+ * which are not expected to fail under normal circumstances, as failures will
+ * result in the printing of error messages, which will look a bit messy when
+ * interleaved with corruption reports.
+ *
+ * On error, use the supplied error_context string and the error string
+ * returned from the database connection to print an error message for the
+ * user.
+ *
+ * The error_context argument is pfree'd by us at the end of the call.
+ */
+static PGresult *
+ExecuteSqlQuery(const char *query, char **error)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ *error = pstrdup(PQerrorMessage(settings.db));
+ return res;
+}
+
+/*
+ * Return the cached relkind quals string for tables, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_table_relkind_quals(void)
+{
+ if (!table_relkind_quals)
+ table_relkind_quals = psprintf("ANY(array['%c', '%c', '%c'])",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ return table_relkind_quals;
+}
+
+/*
+ * Return the cached relkind quals string for indexes, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_index_relkind_quals(void)
+{
+ if (!index_relkind_quals)
+ index_relkind_quals = psprintf("'%c'", RELKIND_INDEX);
+ return index_relkind_quals;
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.control b/contrib/pg_amcheck/pg_amcheck.control
new file mode 100644
index 0000000000..395f368101
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.control
@@ -0,0 +1,5 @@
+# pg_amcheck extension
+comment = 'command-line tool for verifying relation integrity'
+default_version = '1.3'
+module_pathname = '$libdir/pg_amcheck'
+relocatable = true
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..68be9c6585
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,60 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-i', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching indexes were found for pattern\E/,
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..4d8e61d871
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,231 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 39;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($)
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the node is running.
+sub corrupt_first_page($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+-- We'll corrupt all indexes in s1
+CREATE SCHEMA s1;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s1.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables in s2
+CREATE SCHEMA s2;
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+INSERT INTO s2.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s2.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables and indexes in s3
+CREATE SCHEMA s3;
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+INSERT INTO s3.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s3.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll leave everything in s4 uncorrupted
+CREATE SCHEMA s4;
+CREATE TABLE s4.t1 (a TEXT);
+CREATE TABLE s4.t2 (a TEXT);
+CREATE INDEX i1 ON s4.t1(a);
+CREATE INDEX i2 ON s4.t2(a);
+INSERT INTO s4.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s4.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('s1.i1');
+corrupt_first_page('s1.i2');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('s2.t1');
+corrupt_first_page('s2.t2');
+
+# Corrupt tables and indexes in schema "s3"
+remove_relation_file('s3.i1');
+corrupt_first_page('s3.i2');
+remove_relation_file('s3.t1');
+corrupt_first_page('s3.t2');
+
+# Leave schema "s4" alone
+
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's*', '-t', 't1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-N', 's1', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s1.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+
+# In schema s3, the tables and indexes are both corrupt. Ordinarily, checking
+# of indexes will not be performed for corrupt tables, but the --check-corrupt
+# option (-c) forces the indexes to also be checked.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s3.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s3.s2 reports index corruption');
+
+
+# Check that '-x' and '-X' work as expected. Since only index corruption
+# (and not table corruption) exists in s1, '-X' should give no errors, and
+# '-x' should give errors about index corruption.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck over tables and indexes in schema s1 reports corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over only tables in schema s1 reports no corruption');
+
+
+# Check that table corruption is reported as expected, with or without
+# index checking
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables in schema s2 reports table corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables and indexes in schema s2 reports table corruption');
+
+# Check that no corruption is reported in schema s4
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s4 reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt schemas
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-N', 's2', '-N', 's3' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt schemas reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt tables
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt tables reports no corruption');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..e397e37d7d
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,426 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 39;
+
+# This regression test demonstrates that the verify_heapam() function supplied
+# with the amcheck contrib module and depended upon by this pg_amcheck contrib
+# module correctly identifies specific kinds of corruption within pages. To
+# test this, we need a mechanism to create corrupt pages with predictable,
+# repeatable corruption. The postgres backend cannot be expected to help us
+# with this, as its design is not consistent with the goal of intentionally
+# corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that verify_heapam
+# reports the corruption, and that it runs without crashing. Note that the
+# backend cannot simply be started to run queries against the corrupt table, as
+# the backend will crash, at least for some of the corruption types we
+# generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Create the test table with precisely the schema that our
+# corruption function expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+use constant ROWCOUNT => 14;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ if ($tupidx == 0)
+ {
+ # Corruptly set xmin < relfrozenxid
+ $tup->{t_xmin} = 3;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 1)
+ {
+ # Corruptly set xmin < relfrozenxid, further back
+ $tup->{t_xmin} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+ }
+ elsif ($tupidx == 2)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839; # Note circularity of xid comparison
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+ }
+ elsif ($tupidx == 3)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+ }
+ elsif ($tupidx == 4)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+ }
+ elsif ($tupidx == 5)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+ }
+ elsif ($tupidx == 6)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+ }
+ elsif ($tupidx == 7)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ }
+ elsif ($tupidx == 8)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+ }
+ elsif ($tupidx == 9)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+ }
+ elsif ($tupidx == 10)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+ }
+ elsif ($tupidx == 11)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+ }
+ elsif ($tupidx == 12)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+ }
+ elsif ($tupidx == 13)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+
+# Run verify_heapam on the corrupted file
+$node->start;
+
+my $result = $node->safe_psql(
+ 'postgres',
+ q(SELECT * FROM verify_heapam('test', check_toast := true)));
+is ($result,
+"0|1||inserting transaction ID is from before freeze cutoff: 3 vs. $relfrozenxid
+0|2||inserting transaction ID is from before freeze cutoff: 4026531839 vs. $relfrozenxid
+0|3||updating transaction ID is from before relation cutoff: 4026531839 vs. $relfrozenxid
+0|4||data begins at offset beyond the tuple length: 152 vs. 58
+0|4||data offset differs from expected: 152 vs. 24 (3 attributes, no nulls)
+0|5||data offset differs from expected: 27 vs. 24 (3 attributes, no nulls)
+0|6||data offset differs from expected: 16 vs. 24 (3 attributes, no nulls)
+0|7||data offset differs from expected: 21 vs. 24 (3 attributes, no nulls)
+0|8||number of attributes exceeds maximum expected for table: 2047 vs. 3
+0|9||data offset differs from expected: 24 vs. 280 (2047 attributes, has nulls)
+0|10||number of attributes exceeds maximum expected for table: 67 vs. 3
+0|11|1|attribute ends at offset beyond total tuple length: 416848000 vs. 58 (attribute length 4294967295)
+0|12|2|final toast chunk number differs from expected value: 0 vs. 6
+0|12|2|toasted value missing from toast table
+0|13||updating transaction ID marked incompatibly as keys updated and locked only
+0|14||multitransaction ID is from before relation cutoff: 0 vs. 1",
+"Expected verify_heapam output");
+
+# Each table corruption message is returned with a standard header, and we can
+# check for those headers to verify that corruption is being reported. We can
+# also check for each individual corruption that we would expect to see.
+my @corruption_re = (
+
+ # standard header
+ qr/relname=test,blkno=\d*,offnum=\d*,attnum=\d*/,
+
+ # individual detected corruptions
+ qr/attribute ends at offset beyond total tuple length: \d+ vs. \d+ \(attribute length \d+\)/,
+ qr/data begins at offset beyond the tuple length: \d+ vs. \d+/,
+ qr/data offset differs from expected: \d+ vs. \d+ \(\d+ attributes, has nulls\)/,
+ qr/data offset differs from expected: \d+ vs. \d+ \(\d+ attributes, no nulls\)/,
+ qr/final toast chunk number differs from expected value: \d+ vs. \d+/,
+ qr/inserting transaction ID is from before freeze cutoff: \d+ vs. \d+/,
+ qr/multitransaction ID is from before relation cutoff: \d+ vs. \d+/,
+ qr/number of attributes exceeds maximum expected for table: \d+ vs. \d+/,
+ qr/toasted value missing from toast table/,
+ qr/updating transaction ID is from before relation cutoff: \d+ vs. \d+/,
+ qr/updating transaction ID marked incompatibly as keys updated and locked only/,
+);
+
+$node->command_like(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'], $_,
+ "pg_amcheck reports: $_"
+ ) for(@corruption_re);
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..fdbb1ea402
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559e81..4babcbb39c 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -118,6 +118,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
<ree;
&pageinspect;
&passwordcheck;
+ &pgamcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0070..2e8588b879 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -131,6 +131,7 @@
<!ENTITY oid2name SYSTEM "oid2name.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..affe2abf8e
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,229 @@
+<!-- doc/src/sgml/pg_amcheck.sgml -->
+
+<sect1 id="pg_amcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pg_amcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires superuser privileges.
+ </para>
+
+<synopsis>
+pg_amcheck [OPTION]... [DBNAME [USERNAME]]
+ General options:
+ -V, --version output version information, then exit
+ -?, --help show this help, then exit
+ -s, --strict-names require include patterns to match at least one entity each
+ -o, --on-error-stop stop checking at end of first corrupt page
+
+ Schema checking options:
+ -n, --schema=PATTERN check relations in the specified schema(s) only
+ -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)
+
+ Table checking options:
+ -t, --table=PATTERN check the specified table(s) only
+ -T, --exclude-table=PATTERN do NOT check the specified table(s)
+ -b, --startblock begin checking table(s) at the given starting block number
+ -e, --endblock check table(s) only up to the given ending block number
+ -f, --skip-all-frozen do NOT check blocks marked as all frozen
+ -v, --skip-all-visible do NOT check blocks marked as all visible
+
+ TOAST table checking options:
+ -z, --check-toast check associated toast tables and toast indexes
+ -Z, --skip-toast do NOT check associated toast tables and toast indexes
+ -B, --toast-startblock begin checking toast table(s) at the given starting block
+ -E, --toast-endblock check toast table(s) only up to the given ending block
+
+ Index checking options:
+ -x, --check-indexes check btree indexes associated with tables being checked
+ -X, --skip-indexes do NOT check any btree indexes
+ -i, --index=PATTERN check the specified index(es) only
+ -I, --exclude-index=PATTERN do NOT check the specified index(es)
+ -c, --check-corrupt check indexes even if their associated table is corrupt
+ -C, --skip-corrupt do NOT check indexes if their associated table is corrupt
+ -a, --heapallindexed check index tuples against the table tuples
+ -A, --no-heapallindexed do NOT check index tuples against the table tuples
+ -r, --rootdescend search from the root page for each index tuple
+ -R, --no-rootdescend do NOT search from the root page for each index tuple
+
+ Connection options:
+ -d, --dbname=DBNAME database name to connect to
+ -h, --host=HOSTNAME database server host or socket directory
+ -p, --port=PORT database server port
+ -U, --username=USERNAME database user name
+ -w, --no-password never prompt for password
+ -W, --password force password prompt (should happen automatically)
+</synopsis>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked all frozen or all visible, respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ For table corruption, each detected corruption is reported on two lines, the
+ first line shows the location and the second line shows a message describing
+ the problem.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "mytable",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --skip-indexes mydb
+(relname=mytable,blkno=0,offnum=1,attnum=)
+inserting transaction ID is from before freeze cutoff: 3 vs. 524
+(relname=mytable,blkno=0,offnum=2,attnum=)
+inserting transaction ID is from before freeze cutoff: 4026531839 vs. 524
+(relname=mytable,blkno=0,offnum=3,attnum=)
+updating transaction ID is from before relation cutoff: 4026531839 vs. 524
+(relname=mytable,blkno=0,offnum=4,attnum=)
+data begins at offset beyond the tuple length: 152 vs. 58
+(relname=mytable,blkno=0,offnum=4,attnum=)
+data offset differs from expected: 152 vs. 24 (3 attributes, no nulls)
+(relname=mytable,blkno=0,offnum=5,attnum=)
+data offset differs from expected: 27 vs. 24 (3 attributes, no nulls)
+(relname=mytable,blkno=0,offnum=6,attnum=)
+data offset differs from expected: 16 vs. 24 (3 attributes, no nulls)
+(relname=mytable,blkno=0,offnum=7,attnum=)
+data offset differs from expected: 21 vs. 24 (3 attributes, no nulls)
+(relname=mytable,blkno=0,offnum=8,attnum=)
+number of attributes exceeds maximum expected for table: 2047 vs. 3
+(relname=mytable,blkno=0,offnum=9,attnum=)
+data offset differs from expected: 24 vs. 280 (2047 attributes, has nulls)
+(relname=mytable,blkno=0,offnum=10,attnum=)
+number of attributes exceeds maximum expected for table: 67 vs. 3
+(relname=mytable,blkno=0,offnum=11,attnum=1)
+attribute ends at offset beyond total tuple length: 416848000 vs. 58 (attribute length 4294967295)
+(relname=mytable,blkno=0,offnum=12,attnum=2)
+final toast chunk number differs from expected value: 0 vs. 6
+(relname=mytable,blkno=0,offnum=12,attnum=2)
+toasted value missing from toast table
+(relname=mytable,blkno=0,offnum=13,attnum=)
+updating transaction ID marked incompatibly as keys updated and locked only
+(relname=mytable,blkno=0,offnum=14,attnum=)
+multitransaction ID is from before relation cutoff: 0 vs. 1
+</screen>
+
+ <para>
+ For index corruption, the output is more free-form, and may span more than
+ one line per corruption detected.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt index,
+ "corrupt_index", with corruption in the page header, along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index "corrupt_index" is not a btree
+LOCATION: _bt_getmeta, nbtpage.c:152
+</screen>
+
+ <para>
+ Checking again after rebuilding the index but corrupting the contents,
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index tuple size does not equal lp_len in index "corrupt_index"
+DETAIL: Index tid=(39,49) tuple size=3373 lp_len=24 page lsn=0/2B548C0.
+HINT: This could be a torn page problem.
+LOCATION: bt_target_page_check, verify_nbtree.c:1125
+</screen>
+
+ </sect2>
+</sect1>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c48d453793..426da02784 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -102,6 +102,7 @@ AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AlternativeSubPlanState
+AmCheckSettings
AnalyzeAttrComputeStatsFunc
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
@@ -402,6 +403,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnectOptions
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
25 авг. 2020 г., в 19:36, Mark Dilger <mark.dilger@enterprisedb.com> написал(а):
Hi Mark!
Thanks for working on this important feature.
I was experimenting a bit with our internal heapcheck and found out that it's not helping with truncated CLOG anyhow.
Will your module be able to gather tid's of similar corruptions?
server/db M # select * from heap_check('pg_toast.pg_toast_4848601');
ERROR: 58P01: could not access status of transaction 636558742
DETAIL: Could not open file "pg_xact/025F": No such file or directory.
LOCATION: SlruReportIOError, slru.c:913
Time: 3439.915 ms (00:03.440)
Thanks!
Best regards, Andrey Borodin.
On Fri, Aug 28, 2020 at 1:07 AM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
I was experimenting a bit with our internal heapcheck and found out that it's not helping with truncated CLOG anyhow.
Will your module be able to gather tid's of similar corruptions?server/db M # select * from heap_check('pg_toast.pg_toast_4848601');
ERROR: 58P01: could not access status of transaction 636558742
DETAIL: Could not open file "pg_xact/025F": No such file or directory.
LOCATION: SlruReportIOError, slru.c:913
Time: 3439.915 ms (00:03.440)
This kind of thing gets really tricky. PostgreSQL uses errors in tons
of places to report problems, and if you want to accumulate a list of
errors and report them all rather than just letting the first one
cancel the operation, you need special handling for each individual
error you want to bypass. A tool like this naturally wants to use as
much PostgreSQL infrastructure as possible, to avoid duplicating a ton
of code and creating a bloated monstrosity, but all that code can
throw errors. I think the code in its current form is trying to be
resilient against problems on the table pages that it is actually
checking, but it can't necessarily handle gracefully corruption in
other parts of the system. For instance:
- CLOG could be truncated, as in your example
- the disk files could have had their permissions changed so that they
can't be accessed
- the PageIsVerified() check might fail when pages are read
- the TOAST table's metadata in pg_class/pg_attribute/etc. could be corrupted
- ...or the files for those system catalogs could've had their
permissions changed
- ....or they could contain invalid pages
- ...or their indexes could be messed up
I think there are probably a bunch more, and I don't think it's
practical to allow this tool to continue after arbitrary stuff goes
wrong. It'll be too much code and impossible to maintain. In the case
you mention, I think we should view that as a problem with clog rather
than a problem with the table, and thus out of scope.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Aug 27, 2020, at 10:07 PM, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
25 авг. 2020 г., в 19:36, Mark Dilger <mark.dilger@enterprisedb.com> написал(а):
Hi Mark!
Thanks for working on this important feature.
I was experimenting a bit with our internal heapcheck and found out that it's not helping with truncated CLOG anyhow.
Will your module be able to gather tid's of similar corruptions?server/db M # select * from heap_check('pg_toast.pg_toast_4848601');
ERROR: 58P01: could not access status of transaction 636558742
DETAIL: Could not open file "pg_xact/025F": No such file or directory.
LOCATION: SlruReportIOError, slru.c:913
Time: 3439.915 ms (00:03.440)
The design principle for verify_heapam.c is, if the rest of the system is not corrupt, corruption in the table being checked should not cause a crash during the table check. This is a very limited principle. Even corruption in the associated toast table or toast index could cause a crash. That is why checking against the toast table is optional, and false by default.
Perhaps a more extensive effort could be made later. I think it is out of scope for this release cycle. It is a very interesting area for further research, though.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
28 авг. 2020 г., в 18:58, Robert Haas <robertmhaas@gmail.com> написал(а):
In the case
you mention, I think we should view that as a problem with clog rather
than a problem with the table, and thus out of scope.
I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).
Best regards, Andrey Borodin.
On Aug 28, 2020, at 11:10 AM, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
28 авг. 2020 г., в 18:58, Robert Haas <robertmhaas@gmail.com> написал(а):
In the case
you mention, I think we should view that as a problem with clog rather
than a problem with the table, and thus out of scope.I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).Best regards, Andrey Borodin.
If you lock the relations involved, check the toast table first, the toast index second, and the main table third, do you still get the problem? Look at how pg_amcheck handles this and let me know if you still see a problem. There is the ever present problem that external forces, like a rogue process deleting backend files, will strike at precisely the wrong moment, but barring that kind of concurrent corruption, I think the toast table being checked prior to the main table being checked solves some of the issues you are worried about.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).
The code can (and should, and I think does) refrain from looking up
XIDs that are out of the range thought to be valid -- but how do you
propose that it avoid looking up XIDs that ought to have clog data
associated with them despite being >= relfrozenxid and < nextxid?
TransactionIdDidCommit() does not have a suppress-errors flag, adding
one would be quite invasive, yet we cannot safely perform a
significant number of checks without knowing whether the inserting
transaction committed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
29 авг. 2020 г., в 00:56, Robert Haas <robertmhaas@gmail.com> написал(а):
On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).The code can (and should, and I think does) refrain from looking up
XIDs that are out of the range thought to be valid -- but how do you
propose that it avoid looking up XIDs that ought to have clog data
associated with them despite being >= relfrozenxid and < nextxid?
TransactionIdDidCommit() does not have a suppress-errors flag, adding
one would be quite invasive, yet we cannot safely perform a
significant number of checks without knowing whether the inserting
transaction committed.
What you write seems completely correct to me. I agree that CLOG thresholds lookup seems unnecessary.
But I have a real corruption at hand (on testing site). If I have proposed here heapcheck. And I have pg_surgery from the thread nearby. Yet I cannot fix the problem, because cannot list affected tuples. These tools do not solve the problem neglected for long enough. It would be supercool if they could.
This corruption like a caries had 3 stages:
1. incorrect VM flag that page do not need vacuum
2. xmin and xmax < relfrozenxid
3. CLOG truncated
Stage 2 is curable with proposed toolset, stage 3 is not. But they are not that different.
Thanks!
Best regards, Andrey Borodin.
On Aug 29, 2020, at 3:27 AM, Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
29 авг. 2020 г., в 00:56, Robert Haas <robertmhaas@gmail.com> написал(а):
On Fri, Aug 28, 2020 at 2:10 PM Andrey M. Borodin <x4mmm@yandex-team.ru> wrote:
I don't think so. ISTM It's the same problem of xmax<relfrozenxid actually, just hidden behind detoasing.
Our regular heap_check was checking xmin\xmax invariants for tables, but failed to recognise the problem in toast (while toast was accessible until CLOG truncation).The code can (and should, and I think does) refrain from looking up
XIDs that are out of the range thought to be valid -- but how do you
propose that it avoid looking up XIDs that ought to have clog data
associated with them despite being >= relfrozenxid and < nextxid?
TransactionIdDidCommit() does not have a suppress-errors flag, adding
one would be quite invasive, yet we cannot safely perform a
significant number of checks without knowing whether the inserting
transaction committed.What you write seems completely correct to me. I agree that CLOG thresholds lookup seems unnecessary.
But I have a real corruption at hand (on testing site). If I have proposed here heapcheck. And I have pg_surgery from the thread nearby. Yet I cannot fix the problem, because cannot list affected tuples. These tools do not solve the problem neglected for long enough. It would be supercool if they could.
This corruption like a caries had 3 stages:
1. incorrect VM flag that page do not need vacuum
2. xmin and xmax < relfrozenxid
3. CLOG truncatedStage 2 is curable with proposed toolset, stage 3 is not. But they are not that different.
I had an earlier version of the verify_heapam patch that included a non-throwing interface to clog. Ultimately, I ripped that out. My reasoning was that a simpler patch submission was more likely to be acceptable to the community.
If you want to submit a separate patch that creates a non-throwing version of the clog interface, and get the community to accept and commit it, I would seriously consider using that from verify_heapam. If it gets committed in time, I might even do so for this release cycle. But I don't want to make this patch dependent on that hypothetical patch getting written and accepted.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Aug 25, 2020 at 10:36 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Thanks for the review!
+ msg OUT text
+ )
Looks like atypical formatting.
+REVOKE ALL ON FUNCTION
+verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint)
+FROM PUBLIC;
This too.
+-- Don't want this to be available to public
Add "by default, but superusers can grant access" or so?
I think there should be a call to pg_class_aclcheck() here, just like
the one in pg_prewarm, so that if the superuser does choose to grant
access, users given access can check tables they anyway have
permission to access, but not others. Maybe put that in
check_relation_relkind_and_relam() and rename it. Might want to look
at the pg_surgery precedent, too. Oh, and that functions header
comment is also wrong.
I think that the way the checks on the block range are performed could
be improved. Generally, we want to avoid reporting the same problem
with a variety of different message strings, because it adds burden
for translators and is potentially confusing for users. You've got two
message strings that are only going to be used for empty relations and
a third message string that is only going to be used for non-empty
relations. What stops you from just ripping off the way that this is
done in pg_prewarm, which requires only 2 messages? Then you'd be
adding a net total of 0 new messages instead of 3, and in my view they
would be clearer than your third message, "block range is out of
bounds for relation with block count %u: " INT64_FORMAT " .. "
INT64_FORMAT, which doesn't say very precisely what the problem is,
and also falls afoul of our usual practice of avoiding the use of
INT64_FORMAT in error messages that are subject to translation. I
notice that pg_prewarm just silently does nothing if the start and end
blocks are swapped, rather than generating an error. We could choose
to do differently here, but I'm not sure why we should bother.
+ all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE;
+ all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN;
+
+ if ((all_frozen && skip_option ==
SKIP_PAGES_ALL_FROZEN) ||
+ (all_visible && skip_option ==
SKIP_PAGES_ALL_VISIBLE))
+ {
+ continue;
+ }
This isn't horrible style, but why not just get rid of the local
variables? e.g. if (skip_option == SKIP_PAGES_ALL_FROZEN) { if
((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) continue; } else { ... }
Typically no braces around a block containing only one line.
+ * table contains corrupt all frozen bits, a concurrent vacuum might skip the
all-frozen?
+ * relfrozenxid beyond xid.) Reporting the xid as valid under such conditions
+ * seems acceptable, since if we had checked it earlier in our scan it would
+ * have truly been valid at that time, and we break no MVCC guarantees by
+ * failing to notice the concurrent change in its status.
I agree with the first half of this sentence, but I don't know what
MVCC guarantees have to do with anything. I'd just delete the second
part, or make it a lot clearer.
+ * Some kinds of tuple header corruption make it unsafe to check the tuple
+ * attributes, for example when the tuple is foreshortened and such checks
+ * would read beyond the end of the line pointer (and perhaps the page). In
I think of foreshortening mostly as an art term, though I guess it has
other meanings. Maybe it would be clearer to say something like "Some
kinds of corruption make it unsafe to check the tuple attributes, for
example when the line pointer refers to a range of bytes outside the
page"?
+ * Other kinds of tuple header corruption do not bare on the question of
bear
+ pstrdup(_("updating
transaction ID marked incompatibly as keys updated and locked
only")));
+ pstrdup(_("updating
transaction ID marked incompatibly as committed and as a
multitransaction ID")));
"updating transaction ID" might scare somebody who thinks that you are
telling them that you changed something. That's not what it means, but
it might not be totally clear. Maybe:
tuple is marked as only locked, but also claims key columns were updated
multixact should not be marked committed
+
psprintf(_("data offset differs from expected: %u vs. %u (1 attribute,
has nulls)"),
For these, how about:
tuple data should begin at byte %u, but actually begins at byte %u (1
attribute, has nulls)
etc.
+
psprintf(_("old-style VACUUM FULL transaction ID is in the future:
%u"),
+
psprintf(_("old-style VACUUM FULL transaction ID precedes freeze
threshold: %u"),
+
psprintf(_("old-style VACUUM FULL transaction ID is invalid in this
relation: %u"),
old-style VACUUM FULL transaction ID %u is in the future
old-style VACUUM FULL transaction ID %u precedes freeze threshold %u
old-style VACUUM FULL transaction ID %u out of range %u..%u
Doesn't the second of these overlap with the third?
Similarly in other places, e.g.
+
psprintf(_("inserting transaction ID is in the future: %u"),
I think this should change to: inserting transaction ID %u is in the future
+ else if (VARATT_IS_SHORT(chunk))
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
Add braces here, since there are multiple lines.
+ psprintf(_("toast
chunk sequence number not the expected sequence number: %u vs. %u"),
toast chunk sequence number %u does not match expected sequence number %u
There are more instances of this kind of thing.
+
psprintf(_("toasted attribute has unexpected TOAST tag: %u"),
Remove colon.
+
psprintf(_("attribute ends at offset beyond total tuple length: %u vs.
%u (attribute length %u)"),
Let's try to specify the attribute number in the attribute messages
where we can, e.g.
+
psprintf(_("attribute ends at offset beyond total tuple length: %u vs.
%u (attribute length %u)"),
How about: attribute %u with length %u should end at offset %u, but
the tuple length is only %u
+ if (TransactionIdIsNormal(ctx->relfrozenxid) &&
+ TransactionIdPrecedes(xmin, ctx->relfrozenxid))
+ {
+ report_corruption(ctx,
+ /*
translator: Both %u are transaction IDs. */
+
psprintf(_("inserting transaction ID is from before freeze cutoff: %u
vs. %u"),
+
xmin, ctx->relfrozenxid));
+ fatal = true;
+ }
+ else if (!xid_valid_in_rel(xmin, ctx))
+ {
+ report_corruption(ctx,
+ /*
translator: %u is a transaction ID. */
+
psprintf(_("inserting transaction ID is in the future: %u"),
+
xmin));
+ fatal = true;
+ }
This seems like good evidence that xid_valid_in_rel needs some
rethinking. As far as I can see, every place where you call
xid_valid_in_rel, you have checks beforehand that duplicate some of
what it does, so that you can give a more accurate error message.
That's not good. Either the message should be adjusted so that it
covers all the cases "e.g. tuple xmin %u is outside acceptable range
%u..%u" or we should just get rid of xid_valid_in_rel() and have
separate error messages for each case, e.g. tuple xmin %u precedes
relfrozenxid %u". I think it's OK to use terms like xmin and xmax in
these messages, rather than inserting transaction ID etc. We have
existing instances of that, and while someone might judge it
user-unfriendly, I disagree. A person who is qualified to interpret
this output must know what 'tuplex min' means immediately, but whether
they can understand that 'inserting transaction ID' means the same
thing is questionable, I think.
This is not a full review, but in general I think that this is getting
pretty close to being committable. The error messages seem to still
need some polishing and I wouldn't be surprised if there are a few
more bugs lurking yet, but I think it's come a long way.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sep 21, 2020, at 2:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I think there should be a call to pg_class_aclcheck() here, just like
the one in pg_prewarm, so that if the superuser does choose to grant
access, users given access can check tables they anyway have
permission to access, but not others. Maybe put that in
check_relation_relkind_and_relam() and rename it. Might want to look
at the pg_surgery precedent, too.
In the presence of corruption, verify_heapam() reports to the user (in other words, leaks) metadata about the corrupted rows. Reasoning about the attack vectors this creates is hard, but a conservative approach is to assume that an attacker can cause corruption in order to benefit from the leakage, and make sure the leakage does not violate any reasonable security expectations.
Basing the security decision on whether the user has access to read the table seems insufficient, as it ignores row level security. Perhaps that is ok if row level security is not enabled for the table or if the user has been granted BYPASSRLS. There is another problem, though. There is no grantable privilege to read dead rows. In the case of corruption, verify_heapam() may well report metadata about dead rows.
pg_surgery also appears to leak information about dead rows. Owners of tables can probe whether supplied TIDs refer to dead rows. If a table containing sensitive information has rows deleted prior to ownership being transferred, the new owner of the table could probe each page of deleted data to determine something of the content that was there. Information about the number of deleted rows is already available through the pg_stat_* views, but those views don't give such a fine-grained approach to figuring out how large each deleted row was. For a table with fixed content options, the content can sometimes be completely inferred from the length of the row. (Consider a table with a single text column containing either "approved" or "denied".)
But pg_surgery is understood to be a collection of sharp tools only to be used under fairly exceptional conditions. amcheck, on the other hand, is something that feels safer and more reasonable to use on a regular basis, perhaps from a cron job executed by a less trusted user. Forcing the user to be superuser makes it clearer that this feeling of safety is not justified.
I am inclined to just restrict verify_heapam() to superusers and be done. What do you think?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Sep 22, 2020 at 10:55 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I am inclined to just restrict verify_heapam() to superusers and be done. What do you think?
The existing amcheck functions were designed to have execute privilege
granted to non-superusers, though we never actually advertised that
fact. Maybe now would be a good time to start doing so.
--
Peter Geoghegan
On Tue, Sep 22, 2020 at 1:55 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I am inclined to just restrict verify_heapam() to superusers and be done. What do you think?
I think that's an old and largely failed approach. If you want to use
pg_class_ownercheck here rather than pg_class_aclcheck or something
like that, seems fair enough. But I don't think there should be an
is-superuser check in the code, because we've been trying really hard
to get rid of those in most places. And I also don't think there
should be no secondary permissions check, because if somebody does
grant execute permission on these functions, it's unlikely that they
want the person getting that permission to be able to check every
relation in the system even those on which they have no other
privileges at all.
But now I see that there's no secondary permission check in the
verify_nbtree.c code. Is that intentional? Peter, what's the
justification for that?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Sep 22, 2020 at 12:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
But now I see that there's no secondary permission check in the
verify_nbtree.c code. Is that intentional? Peter, what's the
justification for that?
As noted by comments in contrib/amcheck/sql/check_btree.sql (the
verify_nbtree.c tests), this is intentional. Note that we explicitly
test that a non-superuser role can perform verification following
GRANT EXECUTE ON FUNCTION ... .
As I mentioned earlier, this is supported (or at least it is supported
in my interpretation of things). It just isn't documented anywhere
outside the test itself.
--
Peter Geoghegan
On Mon, Sep 21, 2020 at 2:09 PM Robert Haas <robertmhaas@gmail.com> wrote:
+REVOKE ALL ON FUNCTION +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint) +FROM PUBLIC;This too.
Do we really want to use a cstring as an enum-like argument?
I think that I see a bug at this point in check_tuple() (in
v15-0001-Adding-function-verify_heapam-to-amcheck-module.patch):
+ /* If xmax is a multixact, it should be within valid range */ + xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr); + if ((infomask & HEAP_XMAX_IS_MULTI) && !mxid_valid_in_rel(xmax, ctx)) + {
*** SNIP ***
+ } + + /* If xmax is normal, it should be within valid range */ + if (TransactionIdIsNormal(xmax)) + {
Why should it be okay to call TransactionIdIsNormal(xmax) at this
point? It isn't certain that xmax is an XID at all (could be a
MultiXactId, since you called HeapTupleHeaderGetRawXmax() to get the
value in the first place). Don't you need to check "(infomask &
HEAP_XMAX_IS_MULTI) == 0" here?
This does look like it's shaping up. Thanks for working on it, Mark.
--
Peter Geoghegan
On Sat, Aug 29, 2020 at 10:48 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I had an earlier version of the verify_heapam patch that included a non-throwing interface to clog. Ultimately, I ripped that out. My reasoning was that a simpler patch submission was more likely to be acceptable to the community.
Isn't some kind of pragmatic compromise possible?
But I don't want to make this patch dependent on that hypothetical patch getting written and accepted.
Fair enough, but if you're alluding to what I said then about
check_tuphdr_xids()/clog checking a while back then FWIW I didn't
intend to block progress on clog/xact status verification at all. I
just don't think that it is sensible to impose an iron clad guarantee
about having no assertion failures with corrupt clog data -- that
leads to far too much code duplication. But why should you need to
provide an absolute guarantee of that?
I for one would be fine with making the clog checks an optional extra,
that rescinds the no crash guarantee that you're keen on -- just like
with the TOAST checks that you have already in v15. It might make
sense to review how often crashes occur with simulated corruption, and
then to minimize the number of occurrences in the real world. Maybe we
could tolerate a usually-no-crash interface to clog -- if it could
still have assertion failures. Making a strong guarantee about
assertions seems unnecessary.
I don't see how verify_heapam will avoid raising an error during basic
validation from PageIsVerified(), which will violate the guarantee
about not throwing errors. I don't see that as a problem myself, but
presumably you will.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes:
On Mon, Sep 21, 2020 at 2:09 PM Robert Haas <robertmhaas@gmail.com> wrote:
+REVOKE ALL ON FUNCTION +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint) +FROM PUBLIC;This too.
Do we really want to use a cstring as an enum-like argument?
Ugh. We should not be using cstring as a SQL-exposed datatype
unless there really is no alternative. Why wasn't this argument
declared "text"?
regards, tom lane
Greetings,
* Peter Geoghegan (pg@bowt.ie) wrote:
On Tue, Sep 22, 2020 at 12:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
But now I see that there's no secondary permission check in the
verify_nbtree.c code. Is that intentional? Peter, what's the
justification for that?As noted by comments in contrib/amcheck/sql/check_btree.sql (the
verify_nbtree.c tests), this is intentional. Note that we explicitly
test that a non-superuser role can perform verification following
GRANT EXECUTE ON FUNCTION ... .
As I mentioned earlier, this is supported (or at least it is supported
in my interpretation of things). It just isn't documented anywhere
outside the test itself.
Would certainly be good to document this but I tend to agree with the
comments that ideally-
a) it'd be nice for a relatively low-privileged user/process could run
the tests in an ongoing manner
b) we don't want to add more is-superuser checks
c) users shouldn't really be given the ability to see rows they're not
supposed to have access to
In other places in the code, when an error is generated and the user
doesn't have access to the underlying table or doesn't have BYPASSRLS,
we don't include the details or the actual data in the error. Perhaps
that approach would make sense here (or perhaps not, but it doesn't seem
entirely crazy to me, anyway). In other words:
a) keep the ability for someone who has EXECUTE on the function to be
able to run the function against any relation
b) when we detect an issue, perform a permissions check to see if the
user calling the function has rights to read the rows of the table
and, if RLS is enabled on the table, if they have BYPASSRLS
c) if the user has appropriate privileges, log the detailed error, if
not, return a generic error with a HINT that details weren't
available due to lack of privileges on the relation
I can appreciate the concerns regarding dead rows ending up being
visible to someone who wouldn't normally be able to see them but I'd
argue we could simply document that fact rather than try to build
something to address it, for this particular case. If there's push back
on that then I'd suggest we have a "can read dead rows" or some such
capability that can be GRANT'd (in the form of a default role, I would
think) which a user would also have to have in order to get detailed
error reports from this function.
Thanks,
Stephen
On Tue, Aug 25, 2020 at 07:36:53AM -0700, Mark Dilger wrote:
Removed.
This patch is failing to compile on Windows:
C:\projects\postgresql\src\include\fe_utils/print.h(18): fatal error
C1083: Cannot open include file: 'libpq-fe.h': No such file or
directory [C:\projects\postgresql\pg_amcheck.vcxproj]
It looks like you forgot to tweak the scripts in src/tools/msvc/.
--
Michael
Robert, Peter, Andrey, Stephen, and Michael,
Attached is a new version based in part on your review comments, quoted and responded to below as necessary.
There remain a few open issues and/or things I did not implement:
- This version follows Robert's suggestion of using pg_class_aclcheck() to check that the caller has permission to select from the table being checked. This is inconsistent with the btree checking logic, which does no such check. These two approaches should be reconciled, but there was apparently no agreement on this issue.
- The public facing documentation, currently live at https://www.postgresql.org/docs/13/amcheck.html, claims "amcheck functions may only be used by superusers." The docs on master still say the same. This patch replaces that language with alternate language explaining that execute permissions may be granted to non-superusers, along with a warning about the risk of data leakage. Perhaps some portion of that language in this patch should be back-patched?
- Stephen's comments about restricting how much information goes into the returned corruption report depending on the permissions of the caller has not been implemented. I may implement some of this if doing so is consistent with whatever we decide to do for the aclcheck issue, above, though probably not. It seems overly complicated.
- This version does not change clog handling, which leaves Andrey's concern unaddressed. Peter also showed some support for (or perhaps just a lack of opposition to) doing more of what Andrey suggests. I may come back to this issue, depending on time available and further feedback.
Moving on to Michael's review....
On Sep 28, 2020, at 10:56 PM, Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Aug 25, 2020 at 07:36:53AM -0700, Mark Dilger wrote:
Removed.
This patch is failing to compile on Windows:
C:\projects\postgresql\src\include\fe_utils/print.h(18): fatal error
C1083: Cannot open include file: 'libpq-fe.h': No such file or
directory [C:\projects\postgresql\pg_amcheck.vcxproj]It looks like you forgot to tweak the scripts in src/tools/msvc/.
Fixed, I think. I have not tested on windows.
Moving on to Stephen's review....
On Sep 23, 2020, at 6:46 AM, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Peter Geoghegan (pg@bowt.ie) wrote:
On Tue, Sep 22, 2020 at 12:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
But now I see that there's no secondary permission check in the
verify_nbtree.c code. Is that intentional? Peter, what's the
justification for that?As noted by comments in contrib/amcheck/sql/check_btree.sql (the
verify_nbtree.c tests), this is intentional. Note that we explicitly
test that a non-superuser role can perform verification following
GRANT EXECUTE ON FUNCTION ... .As I mentioned earlier, this is supported (or at least it is supported
in my interpretation of things). It just isn't documented anywhere
outside the test itself.Would certainly be good to document this but I tend to agree with the
comments that ideally-a) it'd be nice for a relatively low-privileged user/process could run
the tests in an ongoing manner
b) we don't want to add more is-superuser checks
c) users shouldn't really be given the ability to see rows they're not
supposed to have access toIn other places in the code, when an error is generated and the user
doesn't have access to the underlying table or doesn't have BYPASSRLS,
we don't include the details or the actual data in the error. Perhaps
that approach would make sense here (or perhaps not, but it doesn't seem
entirely crazy to me, anyway). In other words:a) keep the ability for someone who has EXECUTE on the function to be
able to run the function against any relation
b) when we detect an issue, perform a permissions check to see if the
user calling the function has rights to read the rows of the table
and, if RLS is enabled on the table, if they have BYPASSRLS
c) if the user has appropriate privileges, log the detailed error, if
not, return a generic error with a HINT that details weren't
available due to lack of privileges on the relationI can appreciate the concerns regarding dead rows ending up being
visible to someone who wouldn't normally be able to see them but I'd
argue we could simply document that fact rather than try to build
something to address it, for this particular case. If there's push back
on that then I'd suggest we have a "can read dead rows" or some such
capability that can be GRANT'd (in the form of a default role, I would
think) which a user would also have to have in order to get detailed
error reports from this function.
There wasn't enough agreement on the thread about how this should work, so I left this idea unimplemented.
I'm a bit concerned that restricting the results for non-superusers would create a perverse incentive to use a superuser role to connect and check tables. On the other hand, there would not be any difference in the output in the common case that no corruption exists, so maybe the perverse incentive would not be too significant.
Implementing the idea you outline would complicate the patch a fair amount, as we'd need to tailor all the reports in this way, and extend the tests to verify we're not leaking any information to non-superusers. I would prefer to find a simpler solution.
Moving on to Robert's review....
On Sep 21, 2020, at 2:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 25, 2020 at 10:36 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:Thanks for the review!
+ msg OUT text + )Looks like atypical formatting.
+REVOKE ALL ON FUNCTION +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint) +FROM PUBLIC;This too.
Changed in this next version.
+-- Don't want this to be available to public
Add "by default, but superusers can grant access" or so?
Hmm. I borrowed the verbiage from elsewhere.
contrib/pg_buffercache/pg_buffercache--1.2.sql:-- Don't want these to be available to public.
contrib/pg_freespacemap/pg_freespacemap--1.1.sql:-- Don't want these to be available to public.
contrib/pg_visibility/pg_visibility--1.1.sql:-- Don't want these to be available to public.
I think there should be a call to pg_class_aclcheck() here, just like
the one in pg_prewarm, so that if the superuser does choose to grant
access, users given access can check tables they anyway have
permission to access, but not others. Maybe put that in
check_relation_relkind_and_relam() and rename it. Might want to look
at the pg_surgery precedent, too.
I don't think there are any great options here, but for this next version I've done it with pg_class_aclcheck().
Oh, and that functions header
comment is also wrong.
Changed in this next version.
I think that the way the checks on the block range are performed could
be improved. Generally, we want to avoid reporting the same problem
with a variety of different message strings, because it adds burden
for translators and is potentially confusing for users. You've got two
message strings that are only going to be used for empty relations and
a third message string that is only going to be used for non-empty
relations. What stops you from just ripping off the way that this is
done in pg_prewarm, which requires only 2 messages? Then you'd be
adding a net total of 0 new messages instead of 3, and in my view they
would be clearer than your third message, "block range is out of
bounds for relation with block count %u: " INT64_FORMAT " .. "
INT64_FORMAT, which doesn't say very precisely what the problem is,
and also falls afoul of our usual practice of avoiding the use of
INT64_FORMAT in error messages that are subject to translation. I
notice that pg_prewarm just silently does nothing if the start and end
blocks are swapped, rather than generating an error. We could choose
to do differently here, but I'm not sure why we should bother.
This next version borrows pg_prewarm's messages as you suggest, except that pg_prewarm embeds INT64_FORMAT in the message strings, which are replaced with %u in this next patch. Also, there is no good way to report an invalid block range for empty tables using these messages, so the patch now just exists early in such a case for invalid ranges without throwing an error. This is a little bit non-orthogonal with how invalid block ranges are handled on non-empty tables, but perhaps that's ok.
+ all_frozen = mapbits & VISIBILITYMAP_ALL_VISIBLE; + all_visible = mapbits & VISIBILITYMAP_ALL_FROZEN; + + if ((all_frozen && skip_option == SKIP_PAGES_ALL_FROZEN) || + (all_visible && skip_option == SKIP_PAGES_ALL_VISIBLE)) + { + continue; + }This isn't horrible style, but why not just get rid of the local
variables? e.g. if (skip_option == SKIP_PAGES_ALL_FROZEN) { if
((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0) continue; } else { ... }Typically no braces around a block containing only one line.
Changed in this next version.
+ * table contains corrupt all frozen bits, a concurrent vacuum might skip the
all-frozen?
Changed in this next version.
+ * relfrozenxid beyond xid.) Reporting the xid as valid under such conditions + * seems acceptable, since if we had checked it earlier in our scan it would + * have truly been valid at that time, and we break no MVCC guarantees by + * failing to notice the concurrent change in its status.I agree with the first half of this sentence, but I don't know what
MVCC guarantees have to do with anything. I'd just delete the second
part, or make it a lot clearer.
Changed in this next version to simply omit the MVCC related language.
+ * Some kinds of tuple header corruption make it unsafe to check the tuple + * attributes, for example when the tuple is foreshortened and such checks + * would read beyond the end of the line pointer (and perhaps the page). InI think of foreshortening mostly as an art term, though I guess it has
other meanings. Maybe it would be clearer to say something like "Some
kinds of corruption make it unsafe to check the tuple attributes, for
example when the line pointer refers to a range of bytes outside the
page"?+ * Other kinds of tuple header corruption do not bare on the question of
bear
Changed.
+ pstrdup(_("updating transaction ID marked incompatibly as keys updated and locked only"))); + pstrdup(_("updating transaction ID marked incompatibly as committed and as a multitransaction ID")));"updating transaction ID" might scare somebody who thinks that you are
telling them that you changed something. That's not what it means, but
it might not be totally clear. Maybe:tuple is marked as only locked, but also claims key columns were updated
multixact should not be marked committed
Changed to use your verbiage.
+
psprintf(_("data offset differs from expected: %u vs. %u (1 attribute,
has nulls)"),For these, how about:
tuple data should begin at byte %u, but actually begins at byte %u (1
attribute, has nulls)
etc.
Is it ok to embed interpolated values into the message string like that? I thought that made it harder for translators. I agree that your language is easier to understand, and have used it in this next version of the patch. Many of your comments that follow raise the same issue, but I'm using your verbiage anyway.
+
psprintf(_("old-style VACUUM FULL transaction ID is in the future:
%u"),
+
psprintf(_("old-style VACUUM FULL transaction ID precedes freeze
threshold: %u"),
+
psprintf(_("old-style VACUUM FULL transaction ID is invalid in this
relation: %u"),old-style VACUUM FULL transaction ID %u is in the future
old-style VACUUM FULL transaction ID %u precedes freeze threshold %u
old-style VACUUM FULL transaction ID %u out of range %u..%uDoesn't the second of these overlap with the third?
Good point. If the second one reports, so will the third. I've changed it to use if/else if logic to avoid that, and to use your suggested verbiage.
Similarly in other places, e.g.
+
psprintf(_("inserting transaction ID is in the future: %u"),I think this should change to: inserting transaction ID %u is in the future
Changed, along with similarly formatted messages.
+ else if (VARATT_IS_SHORT(chunk)) + /* + * could happen due to heap_form_tuple doing its thing + */ + chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;Add braces here, since there are multiple lines.
Changed.
+ psprintf(_("toast
chunk sequence number not the expected sequence number: %u vs. %u"),toast chunk sequence number %u does not match expected sequence number %u
There are more instances of this kind of thing.
Changed.
+
psprintf(_("toasted attribute has unexpected TOAST tag: %u"),Remove colon.
Changed.
+
psprintf(_("attribute ends at offset beyond total tuple length: %u vs.
%u (attribute length %u)"),Let's try to specify the attribute number in the attribute messages
where we can, e.g.+
psprintf(_("attribute ends at offset beyond total tuple length: %u vs.
%u (attribute length %u)"),How about: attribute %u with length %u should end at offset %u, but
the tuple length is only %u
I had omitted the attribute numbers from the attribute corruption messages because attnum is one of the OUT parameters from verify_heapam. I'm including attnum in the message text for this next version, as you request.
+ if (TransactionIdIsNormal(ctx->relfrozenxid) && + TransactionIdPrecedes(xmin, ctx->relfrozenxid)) + { + report_corruption(ctx, + /* translator: Both %u are transaction IDs. */ + psprintf(_("inserting transaction ID is from before freeze cutoff: %u vs. %u"), + xmin, ctx->relfrozenxid)); + fatal = true; + } + else if (!xid_valid_in_rel(xmin, ctx)) + { + report_corruption(ctx, + /* translator: %u is a transaction ID. */ + psprintf(_("inserting transaction ID is in the future: %u"), + xmin)); + fatal = true; + }This seems like good evidence that xid_valid_in_rel needs some
rethinking. As far as I can see, every place where you call
xid_valid_in_rel, you have checks beforehand that duplicate some of
what it does, so that you can give a more accurate error message.
That's not good. Either the message should be adjusted so that it
covers all the cases "e.g. tuple xmin %u is outside acceptable range
%u..%u" or we should just get rid of xid_valid_in_rel() and have
separate error messages for each case, e.g. tuple xmin %u precedes
relfrozenxid %u".
This next version is refactored, removing the function xid_valid_in_rel entirely, and structuring get_xid_status differently.
I think it's OK to use terms like xmin and xmax in
these messages, rather than inserting transaction ID etc. We have
existing instances of that, and while someone might judge it
user-unfriendly, I disagree. A person who is qualified to interpret
this output must know what 'tuplex min' means immediately, but whether
they can understand that 'inserting transaction ID' means the same
thing is questionable, I think.
Done.
This is not a full review, but in general I think that this is getting
pretty close to being committable. The error messages seem to still
need some polishing and I wouldn't be surprised if there are a few
more bugs lurking yet, but I think it's come a long way.
This next version has some other message rewording. While testing, I found it odd to report an xid as out of bounds (in the future, or before the freeze threshold, etc.), without mentioning the xid value against which it is being compared unfavorably. We don't normally need to think about the epoch when comparing two xids against each other, as they must both make sense relative to the current epoch; but for corruption, you can't assume the corrupt xid was written relative to any particular epoch, and only the 32-bit xid value can be printed since the epoch is unknown. The other xid value (freeze threshold, etc) can be printed with the epoch information, but printing the epoch+xid merely as xid8out does (in other words, as a UINT64) makes the messages thoroughly confusing. I went with the equivalent of sprintf("%u:%u", epoch, xid), which follows the precedent from pg_controldata.c, gistdesc.c, and elsewhere.
Moving on to Peter's reviews....
On Sep 22, 2020, at 4:18 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Sep 21, 2020 at 2:09 PM Robert Haas <robertmhaas@gmail.com> wrote:
+REVOKE ALL ON FUNCTION +verify_heapam(regclass, boolean, boolean, cstring, bigint, bigint) +FROM PUBLIC;This too.
Do we really want to use a cstring as an enum-like argument?
Perhaps not. This next version has that as text.
I think that I see a bug at this point in check_tuple() (in
v15-0001-Adding-function-verify_heapam-to-amcheck-module.patch):+ /* If xmax is a multixact, it should be within valid range */ + xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr); + if ((infomask & HEAP_XMAX_IS_MULTI) && !mxid_valid_in_rel(xmax, ctx)) + {*** SNIP ***
+ } + + /* If xmax is normal, it should be within valid range */ + if (TransactionIdIsNormal(xmax)) + {Why should it be okay to call TransactionIdIsNormal(xmax) at this
point? It isn't certain that xmax is an XID at all (could be a
MultiXactId, since you called HeapTupleHeaderGetRawXmax() to get the
value in the first place). Don't you need to check "(infomask &
HEAP_XMAX_IS_MULTI) == 0" here?
I think you are right. This check you suggest is used in this next version.
On Sep 22, 2020, at 5:16 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Sat, Aug 29, 2020 at 10:48 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:I had an earlier version of the verify_heapam patch that included a non-throwing interface to clog. Ultimately, I ripped that out. My reasoning was that a simpler patch submission was more likely to be acceptable to the community.
Isn't some kind of pragmatic compromise possible?
But I don't want to make this patch dependent on that hypothetical patch getting written and accepted.
Fair enough, but if you're alluding to what I said then about
check_tuphdr_xids()/clog checking a while back then FWIW I didn't
intend to block progress on clog/xact status verification at all.
I don't recall your comments factoring into my thinking on this specific issue, but rather a conversation I had off-list with Robert. The clog interface may be a hot enough code path that adding a flag for non-throwing behavior merely to support a contrib module might be resisted. If folks generally like such a change to the clog interface, I could consider adding that as a third patch in this set.
I
just don't think that it is sensible to impose an iron clad guarantee
about having no assertion failures with corrupt clog data -- that
leads to far too much code duplication. But why should you need to
provide an absolute guarantee of that?I for one would be fine with making the clog checks an optional extra,
that rescinds the no crash guarantee that you're keen on -- just like
with the TOAST checks that you have already in v15. It might make
sense to review how often crashes occur with simulated corruption, and
then to minimize the number of occurrences in the real world. Maybe we
could tolerate a usually-no-crash interface to clog -- if it could
still have assertion failures. Making a strong guarantee about
assertions seems unnecessary.I don't see how verify_heapam will avoid raising an error during basic
validation from PageIsVerified(), which will violate the guarantee
about not throwing errors. I don't see that as a problem myself, but
presumably you will.
My concern is not so much that verify_heapam will stop with an error, but rather that it might trigger a panic that stops all backends. Stopping with an error merely because it hits corruption is not ideal, as I would rather it completed the scan and reported all corruptions found, but that's minor compared to the damage done if verify_heapam creates downtime in a production environment offering high availability guarantees. That statement might seem nuts, given that the corrupt table itself would be causing downtime, but that analysis depends on assumptions about table access patterns, and there is no a priori reason to think that corrupt pages are necessarily ever being accessed, or accessed in a way that causes crashes (rather than merely wrong results) outside verify_heapam scanning the whole table.
Attachments:
v16-0001-Adding-function-verify_heapam-to-amcheck-module.patchapplication/octet-stream; name=v16-0001-Adding-function-verify_heapam-to-amcheck-module.patch; x-unix-mode=0644Download
From 683e5b9da3552020cfc06099abb5aaa968516ee2 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 5 Oct 2020 15:42:18 -0700
Subject: [PATCH v16 1/2] Adding function verify_heapam to amcheck module
Adding new function verify_heapam for checking a heap relation and
optionally its associated toast relation, if any.
---
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 30 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/expected/check_heap.out | 200 +++
contrib/amcheck/sql/check_heap.sql | 123 ++
contrib/amcheck/t/001_verify_heapam.pl | 242 ++++
contrib/amcheck/verify_heapam.c | 1537 +++++++++++++++++++++++
doc/src/sgml/amcheck.sgml | 237 +++-
src/backend/access/heap/hio.c | 11 +
src/backend/access/transam/multixact.c | 19 +
src/include/access/multixact.h | 1 +
src/tools/pgindent/typedefs.list | 4 +
12 files changed, 2404 insertions(+), 9 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/t/001_verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..b82f221e50 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..7237ab738c
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,30 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip text default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass,
+ boolean,
+ boolean,
+ text,
+ bigint,
+ bigint)
+FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..41cdc6435c
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,200 @@
+CREATE TABLE heaptest (a integer, b text);
+REVOKE ALL ON heaptest FROM PUBLIC;
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+ERROR: invalid skip option
+HINT: Valid skip options are "all-visible", "all-frozen", and "none".
+-- Check specifying invalid block ranges when verifying an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 5, endblock := 8);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table, and that skip enum-like parameter is case-insensitive
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'None');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+CREATE ROLE regress_heaptest_role;
+-- verify permissions are checked (error due to function not callable)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ERROR: permission denied for function verify_heapam
+RESET ROLE;
+GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are checked (error due to no select privileges on relation)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ERROR: permission denied for table heaptest
+RESET ROLE;
+GRANT SELECT ON heaptest TO regress_heaptest_role;
+-- verify permissions are now sufficient
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+RESET ROLE;
+-- Check specifying invalid block ranges when verifying a non-empty table.
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 10000);
+ERROR: ending block number must be between 0 and 0
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
+ERROR: starting block number must be between 0 and 0
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+CREATE TABLE test_partitioned (a int, b text default repeat('x', 5000))
+ PARTITION BY list (a);
+SELECT * FROM verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+CREATE TABLE test_partition partition OF test_partitioned FOR VALUES IN (1);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+INSERT INTO test_partitioned (a) (SELECT 1 FROM generate_series(1,1000) gs);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that indexes are rejected
+CREATE INDEX test_index ON test_partition (a);
+SELECT * FROM verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+-- Check that views are rejected
+CREATE VIEW test_view AS SELECT 1;
+SELECT * FROM verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+-- Check that sequences are rejected
+CREATE SEQUENCE test_sequence;
+SELECT * FROM verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+-- Check that foreign tables are rejected
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER dummy_server FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE test_foreign_table () SERVER dummy_server;
+SELECT * FROM verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
+-- cleanup
+DROP TABLE heaptest;
+DROP TABLE test_partition;
+DROP TABLE test_partitioned;
+DROP OWNED BY regress_heaptest_role; -- permissions
+DROP ROLE regress_heaptest_role;
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..c8397a46f0
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,123 @@
+CREATE TABLE heaptest (a integer, b text);
+REVOKE ALL ON heaptest FROM PUBLIC;
+
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+
+-- Check specifying invalid block ranges when verifying an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 5, endblock := 8);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table, and that skip enum-like parameter is case-insensitive
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'None');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+CREATE ROLE regress_heaptest_role;
+
+-- verify permissions are checked (error due to function not callable)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+
+-- verify permissions are checked (error due to no select privileges on relation)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+GRANT SELECT ON heaptest TO regress_heaptest_role;
+
+-- verify permissions are now sufficient
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+-- Check specifying invalid block ranges when verifying a non-empty table.
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 10000);
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
+
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+CREATE TABLE test_partitioned (a int, b text default repeat('x', 5000))
+ PARTITION BY list (a);
+SELECT * FROM verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+CREATE TABLE test_partition partition OF test_partitioned FOR VALUES IN (1);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+INSERT INTO test_partitioned (a) (SELECT 1 FROM generate_series(1,1000) gs);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that indexes are rejected
+CREATE INDEX test_index ON test_partition (a);
+SELECT * FROM verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that views are rejected
+CREATE VIEW test_view AS SELECT 1;
+SELECT * FROM verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that sequences are rejected
+CREATE SEQUENCE test_sequence;
+SELECT * FROM verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that foreign tables are rejected
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER dummy_server FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE test_foreign_table () SERVER dummy_server;
+SELECT * FROM verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+
+-- cleanup
+DROP TABLE heaptest;
+DROP TABLE test_partition;
+DROP TABLE test_partitioned;
+DROP OWNED BY regress_heaptest_role; -- permissions
+DROP ROLE regress_heaptest_role;
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
new file mode 100644
index 0000000000..e7526c17b8
--- /dev/null
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -0,0 +1,242 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 65;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#
+# Check a table with data loaded but no corruption, freezing, etc.
+#
+fresh_test_table('test');
+check_all_options_uncorrupted('test', 'plain');
+
+#
+# Check a corrupt table
+#
+fresh_test_table('test');
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "plain corrupted table");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-visible')",
+ "plain corrupted table skipping all-visible");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "plain corrupted table skipping all-frozen");
+detects_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "plain corrupted table skipping toast");
+detects_corruption(
+ "verify_heapam('test', startblock := 0, endblock := 0)",
+ "plain corrupted table checking only block zero");
+
+#
+# Check a corrupt table with all-frozen data
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "all-frozen corrupted table");
+detects_no_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "all-frozen corrupted table skipping all-frozen");
+
+#
+# Check a corrupt table with corrupt page header
+#
+fresh_test_table('test');
+corrupt_first_page_and_header('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "corrupted test table with bad page header");
+
+#
+# Check an uncorrupted table with corrupt toast page header
+#
+fresh_test_table('test');
+my $toast = get_toast_for('test');
+corrupt_first_page_and_header($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast page header checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast page header skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast page header");
+
+#
+# Check an uncorrupted table with corrupt toast
+#
+fresh_test_table('test');
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table");
+
+#
+# Check an uncorrupted all-frozen table with corrupt toast
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "all-frozen table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "all-frozen table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table of all-frozen table");
+
+# Returns the filesystem path for the named relation.
+sub relation_filepath
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the fully qualified name of the toast table for the named relation
+sub get_toast_for
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ SELECT 'pg_toast.' || t.relname
+ FROM pg_catalog.pg_class c, pg_catalog.pg_class t
+ WHERE c.relname = '$relname'
+ AND c.reltoastrelid = t.oid));
+}
+
+# (Re)create and populate a test table of the given name.
+sub fresh_test_table
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ DROP TABLE IF EXISTS $relname CASCADE;
+ CREATE TABLE $relname (a integer, b text);
+ ALTER TABLE $relname SET (autovacuum_enabled=false);
+ ALTER TABLE $relname ALTER b SET STORAGE external;
+ INSERT INTO $relname (a, b)
+ (SELECT gs, repeat('b',gs*10) FROM generate_series(1,1000) gs);
+ ));
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+sub corrupt_first_page_internal
+{
+ my ($relname, $corrupt_header) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+
+ # If we corrupt the header, postgres won't allow the page into the buffer.
+ syswrite($fh, '\xFF\xFF\xFF\xFF', 8) if ($corrupt_header);
+
+ # Corrupt at least the line pointers. Exactly what this corrupts will
+ # depend on the page, as it may run past the line pointers into the user
+ # data. We stop short of writing 2048 bytes (2k), the smallest supported
+ # page size, as we don't want to corrupt the next page.
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+sub corrupt_first_page
+{
+ corrupt_first_page_internal($_[0], undef);
+}
+
+sub corrupt_first_page_and_header
+{
+ corrupt_first_page_internal($_[0], 1);
+}
+
+sub detects_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) > 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+sub detects_no_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) = 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+# Check various options are stable (don't abort) and do not report corruption
+# when running verify_heapam on an uncorrupted test table.
+#
+# The relname *must* be an uncorrupted table, or this will fail.
+#
+# The prefix is used to identify the test, along with the options,
+# and should be unique.
+sub check_all_options_uncorrupted
+{
+ my ($relname, $prefix) = @_;
+ for my $stop (qw(true false))
+ {
+ for my $check_toast (qw(true false))
+ {
+ for my $skip ("'none'", "'all-frozen'", "'all-visible'")
+ {
+ for my $startblock (qw(NULL 0))
+ {
+ for my $endblock (qw(NULL 0))
+ {
+ my $opts = "on_error_stop := $stop, " .
+ "check_toast := $check_toast, " .
+ "skip := $skip, " .
+ "startblock := $startblock, " .
+ "endblock := $endblock";
+
+ detects_no_corruption(
+ "verify_heapam('$relname', $opts)",
+ "$prefix: $opts");
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..b46cce798e
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1537 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/* The number of columns in tuples returned by verify_heapam */
+#define HEAPCHECK_RELATION_COLS 4
+
+typedef enum XidBoundsViolation
+{
+ XID_INVALID,
+ XID_IN_FUTURE,
+ XID_PRECEDES_DATMIN,
+ XID_PRECEDES_RELMIN,
+ XID_BOUNDS_OK
+} XidBoundsViolation;
+
+typedef enum XidCommitStatus
+{
+ XID_COMMITTED,
+ XID_IN_PROGRESS,
+ XID_ABORTED
+} XidCommitStatus;
+
+typedef enum SkipPages
+{
+ SKIP_PAGES_ALL_FROZEN,
+ SKIP_PAGES_ALL_VISIBLE,
+ SKIP_PAGES_NONE
+} SkipPages;
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam execution.
+ */
+typedef struct HeapCheckContext
+{
+ /*
+ * Cached copies of values from ShmemVariableCache and computed values
+ * from them.
+ */
+ FullTransactionId next_fxid; /* ShmemVariableCache->nextXid */
+ TransactionId next_xid; /* 32-bit version of next_fxid */
+ TransactionId oldest_xid; /* ShmemVariableCache->oldestXid */
+ FullTransactionId oldest_fxid; /* 64-bit version of oldest_xid, computed
+ * relative to next_fxid */
+
+ /*
+ * Cached copy of value from MultiXactState
+ */
+ MultiXactId next_mxact; /* MultiXactState->nextMXact */
+ MultiXactId oldest_mxact; /* MultiXactState->oldestMultiXactId */
+
+ /*
+ * Cached copies of the most recently checked xid and its status.
+ */
+ TransactionId cached_xid;
+ XidCommitStatus cached_status;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ FullTransactionId relfrozenfxid;
+ TransactionId relminmxid;
+ Relation toast_rel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Whether verify_heapam has yet encountered any corrupt tuples */
+ bool is_corrupt;
+
+ /* The descriptor and tuplestore for verify_heapam's result tuples */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void sanity_check_relation(Relation rel);
+static void check_tuple(HeapCheckContext *ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx);
+
+static bool check_tuple_attribute(HeapCheckContext *ctx);
+static bool check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx);
+
+static void report_corruption(HeapCheckContext *ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+static FullTransactionId FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx);
+static void update_cached_xid_range(HeapCheckContext *ctx);
+static void update_cached_mxid_range(HeapCheckContext *ctx);
+static XidBoundsViolation check_mxid_in_range(MultiXactId mxid, HeapCheckContext *ctx);
+static XidBoundsViolation check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx);
+static XidBoundsViolation get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status);
+
+/*
+ * Scan and report corruption in heap pages, optionally reconciling toasted
+ * attributes with entries in the associated toast table. Intended to be
+ * called from SQL with the following parameters:
+ *
+ * relation
+ * The Oid of the heap relation to be checked.
+ *
+ * on_error_stop:
+ * Whether to stop at the end of the first page for which errors are
+ * detected. Note that multiple rows may be returned.
+ *
+ * check_toast:
+ * Whether to check each toasted attribute against the toast table to
+ * verify that it can be found there.
+ *
+ * skip:
+ * What kinds of pages in the heap relation should be skipped. Valid
+ * options are "all-visible", "all-frozen", and "none".
+ *
+ * Returns to the SQL caller a set of tuples, each containing the location
+ * and a description of a corruption found in the heap.
+ *
+ * Note that if check_toast is true, it is the caller's responsibility to
+ * provide that the toast table and index are not corrupt, and that they
+ * do not become corrupt while this function is running.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext old_context;
+ bool random_access;
+ HeapCheckContext ctx;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool check_toast;
+ SkipPages skip_option = SKIP_PAGES_NONE;
+ BlockNumber first_block;
+ BlockNumber last_block;
+ BlockNumber nblocks;
+ const char *skip;
+
+ /* Check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("relation cannot be null")));
+ relid = PG_GETARG_OID(0);
+
+ if (PG_ARGISNULL(1))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("on_error_stop cannot be null")));
+ on_error_stop = PG_GETARG_BOOL(1);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check_toast cannot be null")));
+ check_toast = PG_GETARG_BOOL(2);
+
+ if (PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("skip cannot be null")));
+ skip = text_to_cstring(PG_GETARG_TEXT_PP(3));
+ if (pg_strcasecmp(skip, "all-visible") == 0)
+ skip_option = SKIP_PAGES_ALL_VISIBLE;
+ else if (pg_strcasecmp(skip, "all-frozen") == 0)
+ skip_option = SKIP_PAGES_ALL_FROZEN;
+ else if (pg_strcasecmp(skip, "none") == 0)
+ skip_option = SKIP_PAGES_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid skip option"),
+ errhint("Valid skip options are \"all-visible\", \"all-frozen\", and \"none\".")));
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+ ctx.cached_xid = InvalidTransactionId;
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ old_context = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ random_access = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(random_access, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+ MemoryContextSwitchTo(old_context);
+
+ /* Open relation, check relkind and access method, and check privileges */
+ ctx.rel = relation_open(relid, AccessShareLock);
+ sanity_check_relation(ctx.rel);
+
+ /* Early exit if the relation is empty */
+ nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ if (!nblocks)
+ {
+ relation_close(ctx.rel, AccessShareLock);
+ PG_RETURN_NULL();
+ }
+
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ /* Validate block numbers, or handle nulls. */
+ if (PG_ARGISNULL(4))
+ first_block = 0;
+ else
+ {
+ int64 fb = PG_GETARG_INT64(4);
+
+ if (fb < 0 || fb >= nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("starting block number must be between 0 and %u",
+ nblocks - 1)));
+ first_block = (BlockNumber) fb;
+ }
+ if (PG_ARGISNULL(5))
+ last_block = nblocks - 1;
+ else
+ {
+ int64 lb = PG_GETARG_INT64(5);
+
+ if (lb < 0 || lb >= nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block number must be between 0 and %u",
+ nblocks - 1)));
+ last_block = (BlockNumber) lb;
+ }
+
+ /* Optionally open the toast relation, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid && check_toast)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toast_rel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ AccessShareLock);
+ offset = toast_open_indexes(ctx.toast_rel,
+ AccessShareLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /*
+ * Main relation has no associated toast relation, or we're
+ * intentionally skipping it.
+ */
+ ctx.toast_rel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ update_cached_xid_range(&ctx);
+ update_cached_mxid_range(&ctx);
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relfrozenfxid = FullTransactionIdFromXidAndCtx(ctx.relfrozenxid, &ctx);
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldest_xid = ctx.relfrozenxid;
+
+ for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++)
+ {
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_option != SKIP_PAGES_NONE)
+ {
+ int32 mapbits;
+
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_option == SKIP_PAGES_ALL_FROZEN)
+ {
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ if (skip_option == SKIP_PAGES_ALL_VISIBLE)
+ {
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ }
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it
+ * redirects to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
+ ItemId rditem;
+
+ if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ {
+ report_corruption(&ctx,
+ /*------
+ translator: the %u is an offset */
+ psprintf(_("line pointer redirection to item at offset %u exceeds maximum offset %u"),
+ (unsigned) rdoffnum,
+ (unsigned) maxoff));
+ continue;
+ }
+ rditem = PageGetItemId(ctx.page, rdoffnum);
+ if (!ItemIdIsUsed(rditem))
+ report_corruption(&ctx,
+ /*------
+ translator: the %u is an offset */
+ psprintf(_("line pointer redirection to unused item at offset %u"),
+ (unsigned) rdoffnum));
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ AccessShareLock);
+ if (ctx.toast_rel)
+ table_close(ctx.toast_rel, AccessShareLock);
+
+ /* Close the main relation */
+ relation_close(ctx.rel, AccessShareLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Check that a relation's relkind and access method are both supported,
+ * and that the caller has select privilege on the relation.
+ */
+static void
+sanity_check_relation(Relation rel)
+{
+ AclResult aclresult;
+
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ /*------
+ translator: %s is a user supplied object name */
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only heap AM is supported")));
+ aclresult = pg_class_aclcheck(rel->rd_id, GetUserId(), ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult,
+ get_relkind_objtype(rel->rd_rel->relkind),
+ RelationGetRelationName(rel));
+}
+
+/*
+ * Record a single corruption found in the table. The values in ctx should
+ * reflect the location of the corruption, and the msg argument should contain
+ * a human readable description of the corruption.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+report_corruption(HeapCheckContext *ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ values[2] = Int32GetDatum(ctx->attnum);
+ nulls[2] = (ctx->attnum < 0);
+ values[3] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using work_mem worth of memory building up the
+ * tuplestore. That's ok, but if we also leak the msg argument memory
+ * until the end of the query, we could exceed work_mem by more than a
+ * trivial amount. Therefore, free the msg argument each time we are
+ * called rather than waiting for our current memory context to be freed.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Construct the TupleDesc used to report messages about corruptions found
+ * while scanning the heap.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Check for tuple header corruption and tuple visibility.
+ *
+ * Since we do not hold a snapshot, tuple visibility is not a question of
+ * whether we should be able to see the tuple relative to any particular
+ * snapshot, but rather a question of whether it is safe and reasonable to
+ * to check the tuple attributes.
+ *
+ * Some kinds of corruption make it unsafe to check the tuple attributes, for
+ * example when the line pointer refers to a range of bytes outside the page.
+ * In such cases, we return false (not visible) after recording appropriate
+ * corruption messages.
+ *
+ * Some other kinds of tuple header corruption confuse the question of where
+ * the tuple attributes begin, or how long the nulls bitmap is, etc., making it
+ * unreasonable to attempt to check attributes, even if all candidate answers
+ * to those questions would not result in reading past the end of the line
+ * pointer or page. In such cases, like above, we record corruption messages
+ * about the header and then return false.
+ *
+ * Other kinds of tuple header corruption do not bear on the question of
+ * whether the tuple attributes can be checked, so we record corruption
+ * messages for them but do not base our visibility determination on them. (In
+ * other words, we do not return false merely because we detected them.)
+ *
+ * For visibility determination not specifically related to corruption, what we
+ * want to know is if a tuple is potentially visible to any running
+ * transaction. If you are tempted to replace this function's visibility logic
+ * with a call to another visibility checking function, keep in mind that this
+ * function does not update hint bits, as it seems imprudent to write hint bits
+ * (or anything at all) to a table during a corruption check. Nor does this
+ * function bother classifying tuple visibility beyond a boolean visible vs.
+ * not visible.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ *
+ * Returns whether the tuple is both visible and sufficiently sensible to
+ * undergo attribute checks.
+ */
+static bool
+check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+ bool header_garbled = false;
+ unsigned expected_hoff;;
+
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u is an offset, second %u is a length */
+ psprintf(_("data begins at offset %u beyond the tuple length %u"),
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ header_garbled = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ report_corruption(ctx,
+ pstrdup(_("tuple is marked as only locked, but also claims key columns were updated")));
+ header_garbled = true;
+ }
+
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ report_corruption(ctx,
+ pstrdup(_("multixact should not be marked committed")));
+
+ /*
+ * This condition is clearly wrong, but we do not consider the header
+ * garbled, because we don't rely on this property for determining if
+ * the tuple is visible or for interpreting other relevant header
+ * fields.
+ */
+ }
+
+ if (infomask & HEAP_HASNULL)
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts));
+ else
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader);
+ if (ctx->tuphdr->t_hoff != expected_hoff)
+ {
+ if ((infomask & HEAP_HASNULL) && ctx->natts == 1)
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent an offset */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, has nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff));
+ else if ((infomask & HEAP_HASNULL))
+ report_corruption(ctx,
+ /*------
+ translator: first and second %u represent an offset, third %u
+ represents the number of attributes */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, has nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
+ else if (ctx->natts == 1)
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent an offset */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, no nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff));
+ else
+ report_corruption(ctx,
+ /*------
+ translator: first and second %u represent an offset, third %u
+ represents the number of attributes */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, no nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
+ header_garbled = true;
+ }
+
+ if (header_garbled)
+ return false; /* checking of this tuple should not continue */
+
+ /*
+ * Ok, we can examine the header for tuple visibility purposes, though we
+ * still need to be careful about a few remaining types of header
+ * corruption. This logic roughly follows that of
+ * HeapTupleSatisfiesVacuum. Where possible the comments indicate which
+ * HTSV_Result we think that function might return for this tuple.
+ */
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ return false; /* HEAPTUPLE_DEAD */
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ XidCommitStatus status;
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ switch (get_xid_status(xvac, ctx, &status))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("old-style VACUUM FULL transaction ID is invalid")));
+ return false; /* corrupt */
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u equals or exceeds next valid transaction ID %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u precedes relation freeze threshold %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u precedes oldest valid transaction ID %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ break;
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_IN_PROGRESS:
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ case XID_COMMITTED:
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+ }
+ else
+ {
+ XidCommitStatus status;
+
+ switch (get_xid_status(raw_xmin, ctx, &status))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("raw xmin is invalid")));
+ return false;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u equals or exceeds next valid transaction ID %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u precedes relation freeze threshold %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u precedes oldest valid transaction ID %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_COMMITTED:
+ break;
+ case XID_IN_PROGRESS:
+ return true; /* insert or delete in progress */
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ XidCommitStatus status;
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ switch (get_xid_status(xmax, ctx, &status))
+ {
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("xmax is invalid")));
+ return false; /* corrupt */
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u equals or exceeds next valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u precedes relation freeze threshold %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u precedes oldest valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_IN_PROGRESS:
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ case XID_COMMITTED:
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or
+ * HEAPTUPLE_DEAD */
+ }
+ }
+
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS or
+ * HEAPTUPLE_LIVE */
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true; /* not dead */
+}
+
+/*
+ * Check the current toast tuple against the state tracked in ctx, recording
+ * any corruption found in ctx->tupstore.
+ *
+ * This is not equivalent to running verify_heapam on the toast table itself,
+ * and is not hardened against corruption of the toast table. Rather, when
+ * validating a toasted attribute in the main table, the sequence of toast
+ * tuples that store the toasted value are retrieved and checked in order, with
+ * each toast tuple being checked against where we are in the sequence, as well
+ * as each toast tuple having its varlena structure sanity checked.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk sequence number is null")));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk data is null")));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ }
+ else
+ {
+ /* should never happen */
+ uint32 header = ((varattrib_4b *) chunk)->va_4byte.va_header;
+
+ report_corruption(ctx,
+ /*------
+ translator: %0x represents a bit pattern in hexadecimal, %d represents
+ the sequence number */
+ psprintf(_("corrupt extended toast chunk has invalid varlena header: %0x (sequence number %d)"),
+ header, curchunk));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a sequence number */
+ psprintf(_("toast chunk sequence number %u does not match the expected sequence number %u"),
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a sequence number */
+ psprintf(_("toast chunk sequence number %u exceeds the end chunk sequence number %u"),
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a chunk size */
+ psprintf(_("toast chunk size %u differs from the expected size %u"),
+ chunksize, expected_size));
+ return;
+ }
+}
+
+/*
+ * Check the current attribute as tracked in ctx, recording any corruption
+ * found in ctx->tupstore.
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in the
+ * case of a toasted value, optionally continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The checks
+ * performed by Asserts present in those two functions are also performed here.
+ * In cases where those two functions are a bit cavalier in their assumptions
+ * about data being correct, we perform additional checks not present in either
+ * of those two functions. Where some condition is checked in both of those
+ * functions, we perform it here twice, as we parallel the logical flow of
+ * those two functions. The presence of duplicate checks seems a reasonable
+ * price to pay for keeping this code tightly coupled with the code it
+ * protects.
+ *
+ * Returns true if the tuple attribute is sane enough for processing to
+ * continue on to the next attribute, false otherwise.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext *ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask;
+ Form_pg_attribute thisatt;
+
+ infomask = ctx->tuphdr->t_infomask;
+ thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and fourth
+ %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u starts at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and
+ fourth %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u ends at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number, second %u
+ represents an enumeration value */
+ psprintf(_("toasted attribute %u has unexpected TOAST tag %u"),
+ ctx->attnum,
+ va_tag));
+ /* We can't know where the next attribute begins */
+ return false;
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and fourth
+ %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u ends at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+
+ return false;
+ }
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("attribute %u is external but tuple header flag HEAP_HASEXTERNAL not set"),
+ ctx->attnum));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("attribute %u is external but relation has no toast relation"),
+ ctx->attnum));
+ return true;
+ }
+
+ /* If we were told to skip toast checking, then we're done. */
+ if (ctx->toast_rel == NULL)
+ return true;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toast_rel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ ctx->chunkno++;
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a chunk number */
+ psprintf(_("final toast chunk number %u differs from expected value %u"),
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("toasted value for attribute %u missing from toast table"),
+ ctx->attnum));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * Check the current tuple as tracked in ctx, recording any corruption found in
+ * ctx->tupstore.
+ */
+static void
+check_tuple(HeapCheckContext *ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /*
+ * If we report corruption before iterating over individual attributes, we
+ * need attnum to be reported as NULL. Set that up before any corruption
+ * reporting might happen.
+ */
+ ctx->attnum = -1;
+
+ /*
+ * If the line pointer for this tuple does not reserve enough space for a
+ * complete tuple header, we dare not read the tuple header.
+ */
+ if (ctx->lp_len < MAXALIGN(SizeofHeapTupleHeader))
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents a length, second %u represents a size
+ */
+ psprintf(_("line pointer length %u is less than the minimum tuple header size %u"),
+ ctx->lp_len, (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ return;
+ }
+
+ /* If xmin is normal, it should be within valid range */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ switch (get_xid_status(xmin, ctx, NULL))
+ {
+ case XID_INVALID:
+ case XID_BOUNDS_OK:
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u equals or exceeds next valid transaction ID %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u precedes oldest valid transaction ID %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u precedes relation freeze threshold %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ fatal = true;
+ break;
+ }
+
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+
+ /* If xmax is a multixact, it should be within valid range */
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ switch (check_mxid_valid_in_rel(xmax, ctx))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("multitransaction ID is invalid")));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: both %u are multitransaction IDs */
+ psprintf(_("multitransaction ID %u precedes relation minimum multitransaction ID threshold %u"),
+ xmax, ctx->relminmxid));
+ fatal = true;
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: Both %u are multitransaction IDs */
+ psprintf(_("multitransaction ID %u precedes oldest valid multitransaction ID threshold %u"),
+ xmax, ctx->oldest_mxact));
+ fatal = true;
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: %u is a multitransaction ID */
+ psprintf(_("multitransaction ID %u equals or exceeds next valid multitransaction ID %u"),
+ xmax,
+ ctx->next_mxact));
+ fatal = true;
+ break;
+ case XID_BOUNDS_OK:
+ break;
+ }
+ }
+ /* If xmax is not a multixact and is normal, it should be within valid range */
+ else
+ {
+ switch (get_xid_status(xmax, ctx, NULL))
+ {
+ case XID_INVALID:
+ case XID_BOUNDS_OK:
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u equals or exceeds next valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u precedes oldest valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u precedes relation freeze threshold %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ fatal = true;
+ }
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Check various forms of tuple header corruption. If the header is too
+ * corrupt to continue checking, or if the tuple is not visible to anyone,
+ * we cannot continue with other checks.
+ */
+ if (!check_tuple_header_and_visibilty(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * The tuple is visible, so it must be compatible with the current version
+ * of the relation descriptor. It might have fewer columns than are
+ * present in the relation descriptor, but it cannot have more.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u are a number */
+ psprintf(_("number of attributes %u exceeds maximum expected for table %u"),
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
+ return;
+ }
+
+ /*
+ * Check each attribute unless we hit corruption that confuses what to do
+ * next, at which point we abort further attribute checks for this tuple.
+ * Note that we don't abort for all types of corruption, only for those
+ * types where we don't know how to continue.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ if (!check_tuple_attribute(ctx))
+ break; /* cannot continue */
+}
+
+/*
+ * Convert a TransactionId into a FullTransactionId using our cached values of
+ * the valid transaction ID range. It is the caller's responsibility to have
+ * already updated the cached values, if necessary.
+ */
+static FullTransactionId
+FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
+{
+ uint32 epoch;
+
+ if (!TransactionIdIsNormal(xid))
+ return FullTransactionIdFromEpochAndXid(0, xid);
+ epoch = EpochFromFullTransactionId(ctx->next_fxid);
+ if (xid > ctx->next_xid)
+ epoch--;
+ return FullTransactionIdFromEpochAndXid(epoch, xid);
+}
+
+/*
+ * Update our cached range of valid transaction IDs.
+ */
+static void
+update_cached_xid_range(HeapCheckContext *ctx)
+{
+ /* Make cached copies */
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ ctx->next_fxid = ShmemVariableCache->nextXid;
+ ctx->oldest_xid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+
+ /* And compute alternate versions of the same */
+ ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
+ ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+}
+
+/*
+ * Update our cached range of valid multitransaction IDs.
+ */
+static void
+update_cached_mxid_range(HeapCheckContext *ctx)
+{
+ ReadMultiXactIdRange(&ctx->oldest_mxact, &ctx->next_mxact);
+}
+
+/*
+ * Return whether the given FullTransactionId is within our cached valid
+ * transaction ID range.
+ */
+static inline bool
+fxid_in_cached_range(FullTransactionId fxid, const HeapCheckContext *ctx)
+{
+ return (FullTransactionIdPrecedesOrEquals(ctx->oldest_fxid, fxid) &&
+ FullTransactionIdPrecedes(fxid, ctx->next_fxid));
+}
+
+/*
+ * Checks wheter a multitransaction ID is in the cached valid range, returning
+ * the nature of the range violation, if any.
+ */
+static XidBoundsViolation
+check_mxid_in_range(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ if (!TransactionIdIsValid(mxid))
+ return XID_INVALID;
+ if (MultiXactIdPrecedes(mxid, ctx->relminmxid))
+ return XID_PRECEDES_RELMIN;
+ if (MultiXactIdPrecedes(mxid, ctx->oldest_mxact))
+ return XID_PRECEDES_DATMIN;
+ if (MultiXactIdPrecedesOrEquals(ctx->next_mxact, mxid))
+ return XID_IN_FUTURE;
+ return XID_BOUNDS_OK;
+}
+
+/*
+ * Checks whether the given mxid is valid to appear in the heap being checked,
+ * returning the nature of the range violation, if any.
+ *
+ * This function attempts to return quickly by caching the known valid mxid
+ * range in ctx. Callers should already have performed the initial setup of
+ * the cache prior to the first call to this function.
+ */
+static XidBoundsViolation
+check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ XidBoundsViolation result;
+
+ result = check_mxid_in_range(mxid, ctx);
+ if (result == XID_BOUNDS_OK)
+ return XID_BOUNDS_OK;
+
+ /* The range may have advanced. Recheck. */
+ update_cached_mxid_range(ctx);
+ return check_mxid_in_range(mxid, ctx);
+}
+
+/*
+ * Checks whether the given transaction ID is (or was recently) valid to appear
+ * in the heap being checked, or whether it is too old or too new to appear in
+ * the relation, returning information about the nature of the bounds violation.
+ *
+ * We cache the range of valid transaction IDs. If xid is in that range, we
+ * conclude that it is valid, even though concurrent changes to the table might
+ * invalidate it under certain corrupt conditions. (For example, if the table
+ * contains corrupt all-frozen bits, a concurrent vacuum might skip the page(s)
+ * containing the xid and then truncate clog and advance the relfrozenxid
+ * beyond xid.) Reporting the xid as valid under such conditions seems
+ * acceptable, since if we had checked it earlier in our scan it would have
+ * truly been valid at that time.
+ *
+ * If the status argument is not NULL, and if and only if the transaction ID
+ * appears to be valid in this relation, clog will be consulted and the commit
+ * status argument will be set with the status of the transaction ID.
+ */
+static XidBoundsViolation
+get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status)
+{
+ XidBoundsViolation result;
+ FullTransactionId fxid;
+ FullTransactionId clog_horizon;
+
+ /* Quick check for special xids */
+ if (!TransactionIdIsValid(xid))
+ result = XID_INVALID;
+ else if (xid == BootstrapTransactionId || xid == FrozenTransactionId)
+ result = XID_BOUNDS_OK;
+ else
+ {
+ /* Check if the xid is within bounds */
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ if (!fxid_in_cached_range(fxid, ctx))
+ {
+ /*
+ * We may have been checking against stale values. Update the
+ * cached range to be sure, and since we relied on the cached
+ * range when we performed the full xid conversion, reconvert.
+ */
+ update_cached_xid_range(ctx);
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ }
+
+ if (FullTransactionIdPrecedesOrEquals(ctx->next_fxid, fxid))
+ result = XID_IN_FUTURE;
+ else if (FullTransactionIdPrecedes(fxid, ctx->oldest_fxid))
+ result = XID_PRECEDES_DATMIN;
+ else if (FullTransactionIdPrecedes(fxid, ctx->relfrozenfxid))
+ result = XID_PRECEDES_RELMIN;
+ else
+ result = XID_BOUNDS_OK;
+ }
+
+ /*
+ * Early return if the caller does not request clog checking, or if the
+ * xid is already known to be out of bounds. We dare not check clog for
+ * out of bounds transaction IDs.
+ */
+ if (status == NULL || result != XID_BOUNDS_OK)
+ return result;
+
+ /* Early return if we just checked this xid in a prior call */
+ if (xid == ctx->cached_xid)
+ {
+ *status = ctx->cached_status;
+ return result;
+ }
+
+ *status = XID_COMMITTED;
+ LWLockAcquire(XactTruncationLock, LW_SHARED);
+ clog_horizon = FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid, ctx);
+ if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
+ {
+ if (TransactionIdIsCurrentTransactionId(xid))
+ *status = XID_IN_PROGRESS;
+ else if (TransactionIdDidCommit(xid))
+ *status = XID_COMMITTED;
+ else if (TransactionIdDidAbort(xid))
+ *status = XID_ABORTED;
+ else
+ *status = XID_IN_PROGRESS;
+ }
+ LWLockRelease(XactTruncationLock);
+ ctx->cached_xid = xid;
+ ctx->cached_status = *status;
+ return result;
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a9df2c1a9d..a57781992a 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -9,12 +9,11 @@
<para>
The <filename>amcheck</filename> module provides functions that allow you to
- verify the logical consistency of the structure of relations. If the
- structure appears to be valid, no error is raised.
+ verify the logical consistency of the structure of relations.
</para>
<para>
- The functions verify various <emphasis>invariants</emphasis> in the
+ The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the
structure of the representation of particular relations. The
correctness of the access method functions behind index scans and
other important operations relies on these invariants always
@@ -24,7 +23,7 @@
collated lexical order). If that particular invariant somehow fails
to hold, we can expect binary searches on the affected page to
incorrectly guide index scans, resulting in wrong answers to SQL
- queries.
+ queries. If the structure appears to be valid, no error is raised.
</para>
<para>
Verification is performed using the same procedures as those used by
@@ -35,7 +34,22 @@
functions.
</para>
<para>
- <filename>amcheck</filename> functions may only be used by superusers.
+ Unlike the B-Tree checking functions which report corruption by raising
+ errors, the heap checking function <function>verify_heapam</function> checks
+ a table and attempts to return a set of rows, one row per corruption
+ detected. Despite this, if facilities that
+ <function>verify_heapam</function> relies upon are themselves corrupted, the
+ function may be unable to continue and may instead raise an error.
+ </para>
+ <para>
+ Permission to execute <filename>amcheck</filename> functions may be granted
+ to non-superusers, but before granting such permissions careful consideration
+ should be given to data security and privacy concerns. Although the
+ corruption reports generated by these functions do not focus on the contents
+ of the corrupted data so much as on the structure of that data and the nature
+ of the corruptions found, an attacker who gains permission to execute these
+ functions, particularly if the attacker can also induce corruption, might be
+ able to infer something of the data itself from such messages.
</para>
<sect2>
@@ -187,12 +201,223 @@ SET client_min_messages = DEBUG1;
</para>
</tip>
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ check_toast boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks a table for structural corruption, where pages in the relation
+ contain data that is invalidly formatted, and for logical corruption,
+ where pages are structurally valid but inconsistent with the rest of the
+ database cluster. Example usage:
+<screen>
+test=# select * from verify_heapam('mytable', check_toast := true);
+ blkno | offnum | attnum | msg
+-------+--------+--------+--------------------------------------------------------------------------------------------------
+ 17 | 12 | | xmin 4294967295 precedes relation freeze threshold 17:1134217582
+ 960 | 4 | | data begins at offset 152 beyond the tuple length 58
+ 960 | 4 | | tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+ 960 | 5 | | tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+ 960 | 6 | | tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+ 960 | 7 | | tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+ 1147 | 2 | | number of attributes 2047 exceeds maximum expected for table 3
+ 1147 | 10 | | tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+ 1147 | 15 | | number of attributes 67 exceeds maximum expected for table 3
+ 1147 | 16 | 1 | attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+ 1147 | 18 | 2 | final toast chunk number 0 differs from expected value 6
+ 1147 | 19 | 2 | toasted value for attribute 2 missing from toast table
+ 1147 | 21 | | tuple is marked as only locked, but also claims key columns were updated
+ 1147 | 22 | | multitransaction ID 1775655 is from before relation cutoff 2355572
+(14 rows)
+</screen>
+ As this example shows, the Tuple ID (TID) of the corrupt tuple is given
+ in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
+ for corruptions specific to a particular attribute in the tuple, the
+ <literal>attnum</literal> field shows which one.
+ </para>
+ <para>
+ Structural corruption can happen due to faulty storage hardware, or
+ relation files being overwritten or modified by unrelated software.
+ This kind of corruption can also be detected with
+ <link linkend="app-initdb-data-checksums"><application>data page
+ checksums</application></link>.
+ </para>
+ <para>
+ Relation pages which are correctly formatted, internally consistent, and
+ correct relative to their own internal checksums may still contain
+ logical corruption. As such, this kind of corruption cannot be detected
+ with <application>checksums</application>. Examples include toasted
+ values in the main table which lack a corresponding entry in the toast
+ table, and tuples in the main table with a Transaction ID that is older
+ than the oldest valid Transaction ID in the database or cluster.
+ </para>
+ <para>
+ Multiple causes of logical corruption have been observed in production
+ systems, including bugs in the <productname>PostgreSQL</productname>
+ server software, faulty and ill-conceived backup and restore tools, and
+ user error.
+ </para>
+ <para>
+ Corrupt relations are most concerning in live production environments,
+ precisely the same environments where high risk activities are least
+ welcome. For this reason, <function>verify_heapam</function> has been
+ designed to diagnose corruption without undue risk. It cannot guard
+ against all causes of backend crashes, as even executing the calling
+ query could be unsafe on a badly corrupted system. Access to <link
+ linkend="catalogs-overview">catalog tables</link> are performed and could
+ be problematic if the catalogs themselves are corrupted.
+ </para>
+ <para>
+ The design principle adhered to in <function>verify_heapam</function> is
+ that, if the rest of the system and server hardware are correct, under
+ default options, <function>verify_heapam</function> will not crash the
+ server due merely to structural or logical corruption in the target
+ table.
+ </para>
+ <para>
+ An experimental option, <literal>check_toast</literal>, exists to
+ reconcile the target table against entries in its corresponding toast
+ table. This option may change in future, is disabled by default, and is
+ known to be slow. It is also unsafe under some conditions. If the
+ target relation's corresponding toast table or toast index are corrupt,
+ reconciling the target table against toast values may be unsafe. If the
+ catalogs, toast table and toast index are uncorrupted, and remain so
+ during the check of the target table, reconciling the target table
+ against its toast table should be safe.
+ </para>
+ <para>
+ The following optional arguments are recognized:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>on_error_stop</term>
+ <listitem>
+ <para>
+ If true, corruption checking stops at the end of the first block on
+ which any corruptions are found.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>check_toast</term>
+ <listitem>
+ <para>
+ If this experimental option is true, toasted values are checked gainst
+ the corresponding TOAST table.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>skip</term>
+ <listitem>
+ <para>
+ If not <literal>none</literal>, corruption checking skips blocks that
+ are marked as all-visible or all-frozen, as given.
+ Valid options are <literal>all-visible</literal>,
+ <literal>all-frozen</literal> and <literal>none</literal>.
+ </para>
+ <para>
+ Defaults to <literal>none</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>startblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking begins at the specified block,
+ skipping all previous blocks. It is an error to specify a
+ <literal>startblock</literal> outside the range of blocks in the
+ target table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>endblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking ends at the specified block,
+ skipping all remaining blocks. It is an error to specify an
+ <literal>endblock</literal> outside the range of blocks in the target
+ table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ For each corruption detected, <function>verify_heapam</function> returns
+ a row with the following columns:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</sect2>
<sect2>
<title>Optional <parameter>heapallindexed</parameter> Verification</title>
<para>
- When the <parameter>heapallindexed</parameter> argument to
+ When the <parameter>heapallindexed</parameter> argument to B-Tree
verification functions is <literal>true</literal>, an additional
phase of verification is performed against the table associated with
the target index relation. This consists of a <quote>dummy</quote>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ca357410a2 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you disable one or both of these
+ * assertions, make corresponding changes there.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a2ce617c8c..81752b68eb 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -735,6 +735,25 @@ ReadNextMultiXactId(void)
return mxid;
}
+/*
+ * ReadMultiXactIdRange
+ * Get the range of IDs that may still be referenced by a relation.
+ */
+void
+ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next)
+{
+ LWLockAcquire(MultiXactGenLock, LW_SHARED);
+ *oldest = MultiXactState->oldestMultiXactId;
+ *next = MultiXactState->nextMXact;
+ LWLockRelease(MultiXactGenLock);
+
+ if (*oldest < FirstMultiXactId)
+ *oldest = FirstMultiXactId;
+ if (*next < FirstMultiXactId)
+ *next = FirstMultiXactId;
+}
+
+
/*
* MultiXactIdCreateFromMembers
* Make a new MultiXactId from the specified set of members
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 58c42ffe1f..9a30380901 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -109,6 +109,7 @@ extern MultiXactId MultiXactIdCreateFromMembers(int nmembers,
MultiXactMember *members);
extern MultiXactId ReadNextMultiXactId(void);
+extern void ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next);
extern bool MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly);
extern void MultiXactIdSetOldestMember(void);
extern int GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **xids,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4191f94869..bca30f3dde 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1020,6 +1020,7 @@ HbaToken
HeadlineJsonState
HeadlineParsedText
HeadlineWordEntry
+HeapCheckContext
HeapScanDesc
HeapTuple
HeapTupleData
@@ -2287,6 +2288,7 @@ SimpleStringList
SimpleStringListCell
SingleBoundSortItem
Size
+SkipPages
SlabBlock
SlabChunk
SlabContext
@@ -2788,6 +2790,8 @@ XactCallback
XactCallbackItem
XactEvent
XactLockTableWaitInfo
+XidBoundsViolation
+XidCommitStatus
XidHorizonPrefetchState
XidStatus
XmlExpr
--
2.21.1 (Apple Git-122.3)
v16-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v16-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 4dd7e0f44d9d29dcf7b7531d3000d4f82c92fa59 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 5 Oct 2020 15:43:00 -0700
Subject: [PATCH v16 2/2] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 2 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 1281 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.control | 5 +
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 60 +
contrib/pg_amcheck/t/003_check.pl | 231 ++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 489 ++++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 228 ++++
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 2 +
15 files changed, 2393 insertions(+), 3 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.control
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 7a4866e338..0fd4125902 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..f8eecf70bf
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,2 @@
+/pg_amcheck
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..324cf1cfc8
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1281 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "common/connect.h"
+#include "common/string.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --strict-names require include patterns to match at least one entity each",
+ " -o, --on-error-stop stop checking at end of first corrupt page",
+ "",
+ "Schema checking options:",
+ " -n, --schema=PATTERN check relations in the specified schema(s) only",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)",
+ "",
+ "Table checking options:",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -b, --startblock begin checking table(s) at the given starting block number",
+ " -e, --endblock check table(s) only up to the given ending block number",
+ " -f, --skip-all-frozen do NOT check blocks marked as all frozen",
+ " -v, --skip-all-visible do NOT check blocks marked as all visible",
+ "",
+ "TOAST table checking options:",
+ " -z, --check-toast check associated toast tables and toast indexes",
+ " -Z, --skip-toast do NOT check associated toast tables and toast indexes",
+ " -B, --toast-startblock begin checking toast table(s) at the given starting block",
+ " -E, --toast-endblock check toast table(s) only up to the given ending block",
+ "",
+ "Index checking options:",
+ " -x, --check-indexes check btree indexes associated with tables being checked",
+ " -X, --skip-indexes do NOT check any btree indexes",
+ " -i, --index=PATTERN check the specified index(es) only",
+ " -I, --exclude-index=PATTERN do NOT check the specified index(es)",
+ " -c, --check-corrupt check indexes even if their associated table is corrupt",
+ " -C, --skip-corrupt do NOT check indexes if their associated table is corrupt",
+ " -a, --heapallindexed check index tuples against the table tuples",
+ " -A, --no-heapallindexed do NOT check index tuples against the table tuples",
+ " -r, --rootdescend search from the root page for each index tuple",
+ " -R, --no-rootdescend do NOT search from the root page for each index tuple",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+AmCheckSettings
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes */
+ bool check_toast; /* Check associated toast tables and indexes */
+ bool check_corrupt; /* Check indexes even if table is corrupt */
+ bool heapallindexed; /* Perform index to table reconciling checks */
+ bool rootdescend; /* Perform index rootdescend checks */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends, inclusive */
+ char *toaststart; /* Block number where toast checking begins */
+ char *toastend; /* Block number where toast checking ends,
+ * inclusive */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of Oids of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+static SimpleStringList index_include_patterns = {NULL, NULL};
+static SimpleOidList index_include_oids = {NULL, NULL};
+static SimpleStringList index_exclude_patterns = {NULL, NULL};
+static SimpleOidList index_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+/*
+ * Strings to be constructed once upon first use. These could be made
+ * string constants instead, but that would require embedding knowledge
+ * of the single character values for each relkind, such as 'm' for
+ * materialized views, which we'd rather not embed here.
+ */
+static char *table_relkind_quals = NULL;
+static char *index_relkind_quals = NULL;
+
+/*
+ * Functions to get pointers to the two strings, above, after initializing
+ * them upon the first call to the function.
+ */
+static const char *get_table_relkind_quals(void);
+static const char *get_index_relkind_quals(void);
+
+/*
+ * Functions for running the various corruption checks.
+ */
+static void check_tables(SimpleOidList *checklist);
+static uint64 check_toast(Oid tbloid);
+static uint64 check_table(Oid tbloid, const char *startblock,
+ const char *endblock, bool on_error_stop,
+ bool check_toast);
+static uint64 check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids);
+static uint64 check_index(const char *idxoid, const char *idxname,
+ const char *tblname);
+
+/*
+ * Functions implementing standard command line behaviors.
+ */
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+static void NoticeProcessor(void *arg, const char *message);
+
+/*
+ * Functions for converting command line options that include or exclude
+ * schemas, tables, and indexes by pattern into internally useful lists of
+ * Oids for objects that match those patterns.
+ */
+static void expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals);
+static void expand_table_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_index_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(const SimpleOidList *include_nsp,
+ const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl,
+ const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+static PGresult *ExecuteSqlQuery(const char *query, char **error);
+static PGresult *ExecuteSqlQueryOrDie(const char *query);
+
+static void append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids);
+static void apply_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids, bool include);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char *password = NULL;
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ /*
+ * Default behaviors for user settable options. Note that these default
+ * to doing all the safe checks and none of the unsafe ones, on the theory
+ * that if a user says "pg_amcheck mydb" without specifying any additional
+ * options, we should check everything we know how to check without
+ * risking any backend aborts.
+ */
+
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+
+ /* Index checking options */
+ settings.check_indexes = false;
+ settings.check_corrupt = false;
+ settings.heapallindexed = false;
+ settings.rootdescend = false;
+
+ /*
+ * Reconciling toasted attributes from the main table against the toast
+ * table can crash the backend if the toast table or index are corrupt.
+ * We can optionally check the toast table and then the toast index prior
+ * to checking the main table, but if the toast table or index are
+ * concurrently corrupted after we conclude they are valid, the check of
+ * the main table can crash the backend. The oneous is on any caller who
+ * enables this option to make certain the environment is sufficiently
+ * stable that concurrent corruptions of the toast is not possible.
+ */
+ settings.check_toast = false;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ password = simple_prompt("Password: ", false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ password = simple_prompt(password_prompt, false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /*
+ * Expand schema, table, and index exclusion patterns, if any. Note that
+ * non-matching exclusion patterns are not an error, even when
+ * --strict-names was specified.
+ */
+ expand_schema_name_patterns(&schema_exclude_patterns, NULL,
+ &schema_exclude_oids, false);
+ expand_table_name_patterns(&table_exclude_patterns, NULL, NULL,
+ &table_exclude_oids, false);
+ expand_index_name_patterns(&index_exclude_patterns, NULL, NULL,
+ &index_exclude_oids, false);
+
+ /* Expand schema selection patterns into Oid lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_exclude_oids,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+
+ /* Expand table selection patterns into Oid lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &schema_exclude_oids,
+ &table_exclude_oids,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+
+ /* Expand index selection patterns into Oid lists */
+ if (index_include_patterns.head != NULL)
+ {
+ expand_index_name_patterns(&index_include_patterns,
+ &schema_exclude_oids,
+ &index_exclude_oids,
+ &index_include_oids,
+ settings.strict_names);
+ if (index_include_oids.head == NULL)
+ fatal("no matching indexes were found");
+ }
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ /*
+ * All information about corrupt indexes are returned via ereport, not as
+ * tuples. We want all the details to report if corruption exists.
+ */
+ PQsetErrorVerbosity(settings.db, PQERRORS_VERBOSE);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+/*
+ * Conditionally add a restriction to a query such that lval must be an Oid in
+ * the given list of Oids, except that for a null or empty oids list argument,
+ * no filtering is done and we return without having modified the query buffer.
+ *
+ * The query argument must already have begun the WHERE clause and must be in a
+ * state where we can append an AND clause. No checking of this requirement is
+ * done here.
+ *
+ * On return, the query buffer will be extended with an AND clause that filters
+ * only those rows where the lval is an Oid present in the given list of oids.
+ */
+static inline void
+include_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, true);
+}
+
+/*
+ * Same as include_filter, above, except that for a non-null, non-empty oids
+ * list, the lval is restricted to not be any of the values in the list.
+ */
+static inline void
+exclude_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, false);
+}
+
+/*
+ * Check each table from the given checklist per the user specified options.
+ */
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ uint64 corruptions = 0;
+ bool reconcile_toast;
+
+ /*
+ * If we skip checking the toast table, or if during the check we
+ * detect any toast table corruption, the main table checks below must
+ * not reconcile toasted attributes against the toast table, as such
+ * accesses to the toast table might crash the backend. Instead, skip
+ * such reconciliations for this table.
+ *
+ * This protection contains a race condition; the toast table or index
+ * could become corrupted concurrently with our checks, but prevention
+ * of such concurrent corruption is documented as the caller's
+ * reponsibility, so we don't worry about it here.
+ */
+ reconcile_toast = false;
+ if (settings.check_toast)
+ {
+ if (check_toast(cell->val) == 0)
+ reconcile_toast = true;
+ }
+
+ corruptions = check_table(cell->val,
+ settings.startblock,
+ settings.endblock,
+ settings.on_error_stop,
+ reconcile_toast);
+
+ if (settings.check_indexes)
+ {
+ bool old_heapallindexed;
+
+ /* Optionally skip the index checks for a corrupt table. */
+ if (corruptions && !settings.check_corrupt)
+ continue;
+
+ /*
+ * The btree checking logic which optionally checks the contents
+ * of an index against the corresponding table has not yet been
+ * sufficiently hardened against corrupt tables. In particular,
+ * when called with heapallindexed true, it segfaults if the file
+ * backing the table relation has been erroneously unlinked. In
+ * any event, it seems unwise to reconcile an index against its
+ * table when we already know the table is corrupt.
+ */
+ old_heapallindexed = settings.heapallindexed;
+ if (corruptions)
+ settings.heapallindexed = false;
+
+ corruptions += check_indexes(cell->val,
+ &index_include_oids,
+ &index_exclude_oids);
+
+ settings.heapallindexed = old_heapallindexed;
+ }
+ }
+}
+
+/*
+ * For a given main table relation, returns the associated toast table,
+ * or InvalidOid if none exists.
+ */
+static Oid
+get_toast_oid(Oid tbloid)
+{
+ PQExpBuffer querybuf = createPQExpBuffer();
+ PGresult *res;
+ char *error = NULL;
+ Oid result = InvalidOid;
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.reltoastrelid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid);
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ result = atooid(PQgetvalue(res, 0, 0));
+ else if (error)
+ die_on_query_failure(querybuf->data);
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return result;
+}
+
+/*
+ * For the given main table relation, checks the associated toast table and
+ * index, in any. This should be performed *before* checking the main table
+ * relation, as the checks inside verify_heapam assume both the toast table and
+ * toast index are usable.
+ *
+ * Returns the number of corruptions detected.
+ */
+static uint64
+check_toast(Oid tbloid)
+{
+ Oid toastoid;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_toast");
+
+ toastoid = get_toast_oid(tbloid);
+ if (OidIsValid(toastoid))
+ {
+ corruption_cnt = check_table(toastoid, settings.toaststart,
+ settings.toastend, settings.on_error_stop,
+ false);
+
+ /*
+ * If the toast table is corrupt, checking the index is not safe.
+ * There is a race condition here, as the toast table could be
+ * concurrently corrupted, but preventing concurrent corruption is the
+ * caller's responsibility, not ours.
+ */
+ if (corruption_cnt == 0)
+ corruption_cnt += check_indexes(toastoid, NULL, NULL);
+ }
+
+ return corruption_cnt;
+}
+
+/*
+ * Checks the given table for corruption, returning the number of corruptions
+ * detected and printed to the user.
+ */
+static uint64
+check_table(Oid tbloid, const char *startblock, const char *endblock,
+ bool on_error_stop, bool check_toast)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *skip;
+ char *toast;
+ const char *stop;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_table");
+
+ if (startblock == NULL)
+ startblock = "NULL";
+ if (endblock == NULL)
+ endblock = "NULL";
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("'none'");
+ stop = (on_error_stop) ? "true" : "false";
+ toast = (check_toast) ? "true" : "false";
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg "
+ "FROM verify_heapam("
+ "relation := %u, "
+ "on_error_stop := %s, "
+ "skip := %s, "
+ "check_toast := %s, "
+ "startblock := %s, "
+ "endblock := %s) v, "
+ "pg_catalog.pg_class c "
+ "WHERE c.oid = %u",
+ tbloid, stop, skip, toast, startblock, endblock, tbloid);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ {
+ corruption_cnt += PQntuples(res);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ printf("(relname=%s,blkno=%s,offnum=%s,attnum=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ }
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+ return corruption_cnt;
+}
+
+static uint64
+check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_indexes");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT i.indexrelid, ci.relname, ct.relname"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class ci, "
+ "pg_catalog.pg_class ct"
+ "\nWHERE i.indexrelid = ci.oid"
+ "\n AND i.indrelid = ct.oid"
+ "\n AND ci.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+ include_filter(querybuf, "i.indexrelid", include_oids);
+ exclude_filter(querybuf, "i.indexrelid", exclude_oids);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ for (i = 0; i < PQntuples(res); i++)
+ corruption_cnt += check_index(PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2));
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static uint64
+check_index(const char *idxoid, const char *idxname, const char *tblname)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_index");
+ if (idxname == NULL)
+ fatal("no index name on entry to check_index");
+ if (tblname == NULL)
+ fatal("no table name on entry to check_index");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
+ res = PQexec(settings.db, querybuf->data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("index check failed for index %s of table %s:\n",
+ idxname, tblname);
+ printf("%s", PQerrorMessage(settings.db));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"check-corrupt", no_argument, NULL, 'c'},
+ {"check-indexes", no_argument, NULL, 'x'},
+ {"check-toast", no_argument, NULL, 'z'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"heapallindexed", no_argument, NULL, 'a'},
+ {"help", optional_argument, NULL, '?'},
+ {"host", required_argument, NULL, 'h'},
+ {"index", required_argument, NULL, 'i'},
+ {"no-heapallindexed", no_argument, NULL, 'A'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"no-rootdescend", no_argument, NULL, 'R'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"password", no_argument, NULL, 'W'},
+ {"port", required_argument, NULL, 'p'},
+ {"rootdescend", no_argument, NULL, 'r'},
+ {"schema", required_argument, NULL, 'n'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-corrupt", no_argument, NULL, 'C'},
+ {"skip-indexes", no_argument, NULL, 'X'},
+ {"skip-toast", no_argument, NULL, 'Z'},
+ {"startblock", required_argument, NULL, 'b'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"toast-endblock", required_argument, NULL, 'E'},
+ {"toast-startblock", required_argument, NULL, 'B'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "aAb:B:cCd:e:E:fh:i:I:n:N:op:rRst:T:U:vVwWxXzZ?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'a':
+ settings.heapallindexed = true;
+ break;
+ case 'A':
+ settings.heapallindexed = false;
+ break;
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'B':
+ settings.toaststart = pg_strdup(optarg);
+ break;
+ case 'c':
+ settings.check_corrupt = true;
+ break;
+ case 'C':
+ settings.check_corrupt = false;
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'E':
+ settings.toastend = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ simple_string_list_append(&index_include_patterns, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&index_exclude_patterns, optarg);
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 'r':
+ settings.rootdescend = true;
+ break;
+ case 'R':
+ settings.rootdescend = false;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case 'x':
+ settings.check_indexes = true;
+ break;
+ case 'X':
+ settings.check_indexes = false;
+ break;
+ case 'z':
+ settings.check_toast = true;
+ break;
+ case 'Z':
+ settings.check_toast = false;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ int lineno;
+
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ printf("%s\n", usage_text[lineno]);
+ printf("Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Helper function for apply_filter, below.
+ */
+static void
+append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(querybuf, "%s%u", comma, cell->val);
+}
+
+/*
+ * Internal implementation of include_filter and exclude_filter
+ */
+static void
+apply_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids,
+ bool include)
+{
+ if (!oids || !oids->head)
+ return;
+ if (include)
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.=) ANY(array[", lval);
+ else
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.!=) ALL(array[", lval);
+ append_csv_oids(querybuf, oids);
+ appendPQExpBuffer(querybuf, "]::OID[])");
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all schemas matching the
+ * given list of patterns but not included in the given list of excluded Oids.
+ */
+static void
+expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the Oid list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(querybuf,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, querybuf, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all relations matching the
+ * given list of patterns but not included in the given list of excluded Oids
+ * nor in one of the given excluded namespaces. The relations are filtered by
+ * the given schema_quals. They are further filtered by the given
+ * relkind_quals, allowing the caller to restrict the relations to just indexes
+ * or tables. The missing_errtext should be a message for use in error
+ * messages if no matching relations are found and strict_names was specified.
+ */
+static void
+expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_relkind_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the Oid list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) %s\n",
+ relkind_quals);
+ exclude_filter(querybuf, "c.oid", exclude_oids);
+ exclude_filter(querybuf, "n.oid", exclude_nsp_oids);
+ processSQLNamePattern(settings.db, querybuf, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("%s \"%s\"", missing_errtext, cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find the Oids of all tables matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_table_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching tables were found for pattern",
+ get_table_relkind_quals());
+}
+
+/*
+ * Find the Oids of all indexes matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_index_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching indexes were found for pattern",
+ get_index_relkind_quals());
+}
+
+static void
+get_table_check_list(const SimpleOidList *include_nsp, const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl, const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to get_table_check_list");
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c, pg_catalog.pg_namespace n"
+ "\nWHERE n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\n AND c.relkind OPERATOR(pg_catalog.=) %s\n",
+ get_table_relkind_quals());
+ include_filter(querybuf, "n.oid", include_nsp);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+ include_filter(querybuf, "c.oid", include_tbl);
+ exclude_filter(querybuf, "c.oid", exclude_tbl);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+}
+
+static PGresult *
+ExecuteSqlQueryOrDie(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute the given SQL query. This function should only be used for queries
+ * which are not expected to fail under normal circumstances, as failures will
+ * result in the printing of error messages, which will look a bit messy when
+ * interleaved with corruption reports.
+ *
+ * On error, use the supplied error_context string and the error string
+ * returned from the database connection to print an error message for the
+ * user.
+ *
+ * The error_context argument is pfree'd by us at the end of the call.
+ */
+static PGresult *
+ExecuteSqlQuery(const char *query, char **error)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ *error = pstrdup(PQerrorMessage(settings.db));
+ return res;
+}
+
+/*
+ * Return the cached relkind quals string for tables, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_table_relkind_quals(void)
+{
+ if (!table_relkind_quals)
+ table_relkind_quals = psprintf("ANY(array['%c', '%c', '%c'])",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ return table_relkind_quals;
+}
+
+/*
+ * Return the cached relkind quals string for indexes, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_index_relkind_quals(void)
+{
+ if (!index_relkind_quals)
+ index_relkind_quals = psprintf("'%c'", RELKIND_INDEX);
+ return index_relkind_quals;
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.control b/contrib/pg_amcheck/pg_amcheck.control
new file mode 100644
index 0000000000..395f368101
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.control
@@ -0,0 +1,5 @@
+# pg_amcheck extension
+comment = 'command-line tool for verifying relation integrity'
+default_version = '1.3'
+module_pathname = '$libdir/pg_amcheck'
+relocatable = true
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..68be9c6585
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,60 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-i', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching indexes were found for pattern\E/,
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..4d8e61d871
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,231 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 39;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($)
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the node is running.
+sub corrupt_first_page($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+-- We'll corrupt all indexes in s1
+CREATE SCHEMA s1;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s1.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables in s2
+CREATE SCHEMA s2;
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+INSERT INTO s2.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s2.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables and indexes in s3
+CREATE SCHEMA s3;
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+INSERT INTO s3.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s3.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll leave everything in s4 uncorrupted
+CREATE SCHEMA s4;
+CREATE TABLE s4.t1 (a TEXT);
+CREATE TABLE s4.t2 (a TEXT);
+CREATE INDEX i1 ON s4.t1(a);
+CREATE INDEX i2 ON s4.t2(a);
+INSERT INTO s4.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s4.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('s1.i1');
+corrupt_first_page('s1.i2');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('s2.t1');
+corrupt_first_page('s2.t2');
+
+# Corrupt tables and indexes in schema "s3"
+remove_relation_file('s3.i1');
+corrupt_first_page('s3.i2');
+remove_relation_file('s3.t1');
+corrupt_first_page('s3.t2');
+
+# Leave schema "s4" alone
+
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's*', '-t', 't1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-N', 's1', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s1.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+
+# In schema s3, the tables and indexes are both corrupt. Ordinarily, checking
+# of indexes will not be performed for corrupt tables, but the --check-corrupt
+# option (-c) forces the indexes to also be checked.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s3.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s3.s2 reports index corruption');
+
+
+# Check that '-x' and '-X' work as expected. Since only index corruption
+# (and not table corruption) exists in s1, '-X' should give no errors, and
+# '-x' should give errors about index corruption.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck over tables and indexes in schema s1 reports corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over only tables in schema s1 reports no corruption');
+
+
+# Check that table corruption is reported as expected, with or without
+# index checking
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables in schema s2 reports table corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables and indexes in schema s2 reports table corruption');
+
+# Check that no corruption is reported in schema s4
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s4 reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt schemas
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-N', 's2', '-N', 's3' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt schemas reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt tables
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt tables reports no corruption');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..1cc36b25b7
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,489 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper functions
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ qr/\(relname=test,blkno=$blkno,offnum=$offnum,attnum=$attnum\)\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, '');
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..fdbb1ea402
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 4e833d79ef..1efca8adc4 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -119,6 +119,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&oldsnapshot;
&pageinspect;
&passwordcheck;
+ &pgamcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 47271addc1..30b37b07d8 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -132,6 +132,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..fc36447dda
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,228 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<sect1 id="pgamcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> and <function>verify_heapam</function>
+ functions, and on having privileges to access the relations being checked.
+ </para>
+
+<synopsis>
+pg_amcheck [OPTION]... [DBNAME [USERNAME]]
+ General options:
+ -V, --version output version information, then exit
+ -?, --help show this help, then exit
+ -s, --strict-names require include patterns to match at least one entity each
+ -o, --on-error-stop stop checking at end of first corrupt page
+
+ Schema checking options:
+ -n, --schema=PATTERN check relations in the specified schema(s) only
+ -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)
+
+ Table checking options:
+ -t, --table=PATTERN check the specified table(s) only
+ -T, --exclude-table=PATTERN do NOT check the specified table(s)
+ -b, --startblock begin checking table(s) at the given starting block number
+ -e, --endblock check table(s) only up to the given ending block number
+ -f, --skip-all-frozen do NOT check blocks marked as all-frozen
+ -v, --skip-all-visible do NOT check blocks marked as all-visible
+
+ TOAST table checking options:
+ -z, --check-toast check associated toast tables and toast indexes
+ -Z, --skip-toast do NOT check associated toast tables and toast indexes
+ -B, --toast-startblock begin checking toast table(s) at the given starting block
+ -E, --toast-endblock check toast table(s) only up to the given ending block
+
+ Index checking options:
+ -x, --check-indexes check btree indexes associated with tables being checked
+ -X, --skip-indexes do NOT check any btree indexes
+ -i, --index=PATTERN check the specified index(es) only
+ -I, --exclude-index=PATTERN do NOT check the specified index(es)
+ -c, --check-corrupt check indexes even if their associated table is corrupt
+ -C, --skip-corrupt do NOT check indexes if their associated table is corrupt
+ -a, --heapallindexed check index tuples against the table tuples
+ -A, --no-heapallindexed do NOT check index tuples against the table tuples
+ -r, --rootdescend search from the root page for each index tuple
+ -R, --no-rootdescend do NOT search from the root page for each index tuple
+
+ Connection options:
+ -d, --dbname=DBNAME database name to connect to
+ -h, --host=HOSTNAME database server host or socket directory
+ -p, --port=PORT database server port
+ -U, --username=USERNAME database user name
+ -w, --no-password never prompt for password
+ -W, --password force password prompt (should happen automatically)
+</synopsis>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked in the visibility map as all-frozen or all-visible,
+ respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ For table corruption, each detected corruption is reported on two lines, the
+ first line shows the location and the second line shows a message describing
+ the problem.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "mytable",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --skip-indexes mydb
+(relname=mytable,blkno=17,offnum=12,attnum=)
+xmin 4294967295 precedes relation freeze threshold 17:1134217582
+(relname=mytable,blkno=960,offnum=4,attnum=)
+data begins at offset 152 beyond the tuple length 58
+(relname=mytable,blkno=960,offnum=4,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=5,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=6,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=7,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+(relname=mytable,blkno=1147,offnum=2,attnum=)
+number of attributes 2047 exceeds maximum expected for table 3
+(relname=mytable,blkno=1147,offnum=10,attnum=)
+tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+(relname=mytable,blkno=1147,offnum=15,attnum=)
+number of attributes 67 exceeds maximum expected for table 3
+(relname=mytable,blkno=1147,offnum=16,attnum=1)
+attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+(relname=mytable,blkno=1147,offnum=18,attnum=2)
+final toast chunk number 0 differs from expected value 6
+(relname=mytable,blkno=1147,offnum=19,attnum=2)
+toasted value for attribute 2 missing from toast table
+(relname=mytable,blkno=1147,offnum=21,attnum=)
+tuple is marked as only locked, but also claims key columns were updated
+(relname=mytable,blkno=1147,offnum=22,attnum=)
+multitransaction ID 1775655 is from before relation cutoff 2355572
+</screen>
+
+ <para>
+ For index corruption, the output is more free-form, and may span differing
+ numbers of lines per corruption detected.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt index,
+ "corrupt_index", with corruption in the page header, along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index "corrupt_index" is not a btree
+LOCATION: _bt_getmeta, nbtpage.c:152
+</screen>
+
+ <para>
+ Checking again after rebuilding the index but corrupting the contents,
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index tuple size does not equal lp_len in index "corrupt_index"
+DETAIL: Index tid=(39,49) tuple size=3373 lp_len=24 page lsn=0/2B548C0.
+HINT: This could be a torn page problem.
+LOCATION: bt_target_page_check, verify_nbtree.c:1125
+</screen>
+
+ </sect2>
+</sect1>
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 89e1b39036..8cf0554823 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'pg_standby', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bca30f3dde..369b8e7c6f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -102,6 +102,7 @@ AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AlternativeSubPlanState
+AmCheckSettings
AnalyzeAttrComputeStatsFunc
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
@@ -403,6 +404,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnectOptions
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
On Oct 5, 2020, at 5:24 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
- This version does not change clog handling, which leaves Andrey's concern unaddressed. Peter also showed some support for (or perhaps just a lack of opposition to) doing more of what Andrey suggests. I may come back to this issue, depending on time available and further feedback.
Attached is a patch set that includes the clog handling as discussed. The 0001 and 0002 are effectively unchanged since version 16 posted yesterday, but this now includes 0003 which creates a non-throwing interface to clog, and 0004 which uses the non-throwing interface from within amcheck's heap checking functions.
I think this is a pretty good sketch for discussion, though I am unsatisfied with the lack of regression test coverage of verify_heapam in the presence of clog truncation. I was hoping to have that as part of v17, but since it is taking a bit longer than I anticipated, I'll have to come back with that in a later patch.
Attachments:
v17-0001-Adding-function-verify_heapam-to-amcheck-module.patchapplication/octet-stream; name=v17-0001-Adding-function-verify_heapam-to-amcheck-module.patch; x-unix-mode=0644Download
From 19f858fb20f29cd716a0ce55f65193a0bde7c0c8 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 5 Oct 2020 15:42:18 -0700
Subject: [PATCH v17 1/4] Adding function verify_heapam to amcheck module
Adding new function verify_heapam for checking a heap relation and
optionally its associated toast relation, if any.
---
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 30 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/expected/check_heap.out | 200 +++
contrib/amcheck/sql/check_heap.sql | 123 ++
contrib/amcheck/t/001_verify_heapam.pl | 242 ++++
contrib/amcheck/verify_heapam.c | 1537 +++++++++++++++++++++++
doc/src/sgml/amcheck.sgml | 237 +++-
src/backend/access/heap/hio.c | 11 +
src/backend/access/transam/multixact.c | 19 +
src/include/access/multixact.h | 1 +
src/tools/pgindent/typedefs.list | 4 +
12 files changed, 2404 insertions(+), 9 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/t/001_verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..b82f221e50 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..7237ab738c
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,30 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip text default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass,
+ boolean,
+ boolean,
+ text,
+ bigint,
+ bigint)
+FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..41cdc6435c
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,200 @@
+CREATE TABLE heaptest (a integer, b text);
+REVOKE ALL ON heaptest FROM PUBLIC;
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+ERROR: invalid skip option
+HINT: Valid skip options are "all-visible", "all-frozen", and "none".
+-- Check specifying invalid block ranges when verifying an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 5, endblock := 8);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table, and that skip enum-like parameter is case-insensitive
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'None');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+CREATE ROLE regress_heaptest_role;
+-- verify permissions are checked (error due to function not callable)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ERROR: permission denied for function verify_heapam
+RESET ROLE;
+GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are checked (error due to no select privileges on relation)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ERROR: permission denied for table heaptest
+RESET ROLE;
+GRANT SELECT ON heaptest TO regress_heaptest_role;
+-- verify permissions are now sufficient
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+RESET ROLE;
+-- Check specifying invalid block ranges when verifying a non-empty table.
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 10000);
+ERROR: ending block number must be between 0 and 0
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
+ERROR: starting block number must be between 0 and 0
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+CREATE TABLE test_partitioned (a int, b text default repeat('x', 5000))
+ PARTITION BY list (a);
+SELECT * FROM verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+CREATE TABLE test_partition partition OF test_partitioned FOR VALUES IN (1);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+INSERT INTO test_partitioned (a) (SELECT 1 FROM generate_series(1,1000) gs);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that indexes are rejected
+CREATE INDEX test_index ON test_partition (a);
+SELECT * FROM verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+-- Check that views are rejected
+CREATE VIEW test_view AS SELECT 1;
+SELECT * FROM verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+-- Check that sequences are rejected
+CREATE SEQUENCE test_sequence;
+SELECT * FROM verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+-- Check that foreign tables are rejected
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER dummy_server FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE test_foreign_table () SERVER dummy_server;
+SELECT * FROM verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
+-- cleanup
+DROP TABLE heaptest;
+DROP TABLE test_partition;
+DROP TABLE test_partitioned;
+DROP OWNED BY regress_heaptest_role; -- permissions
+DROP ROLE regress_heaptest_role;
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..c8397a46f0
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,123 @@
+CREATE TABLE heaptest (a integer, b text);
+REVOKE ALL ON heaptest FROM PUBLIC;
+
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+
+-- Check specifying invalid block ranges when verifying an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 5, endblock := 8);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table, and that skip enum-like parameter is case-insensitive
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'None');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+CREATE ROLE regress_heaptest_role;
+
+-- verify permissions are checked (error due to function not callable)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+
+-- verify permissions are checked (error due to no select privileges on relation)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+GRANT SELECT ON heaptest TO regress_heaptest_role;
+
+-- verify permissions are now sufficient
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+-- Check specifying invalid block ranges when verifying a non-empty table.
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 10000);
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
+
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+CREATE TABLE test_partitioned (a int, b text default repeat('x', 5000))
+ PARTITION BY list (a);
+SELECT * FROM verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+CREATE TABLE test_partition partition OF test_partitioned FOR VALUES IN (1);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+INSERT INTO test_partitioned (a) (SELECT 1 FROM generate_series(1,1000) gs);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that indexes are rejected
+CREATE INDEX test_index ON test_partition (a);
+SELECT * FROM verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that views are rejected
+CREATE VIEW test_view AS SELECT 1;
+SELECT * FROM verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that sequences are rejected
+CREATE SEQUENCE test_sequence;
+SELECT * FROM verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that foreign tables are rejected
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER dummy_server FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE test_foreign_table () SERVER dummy_server;
+SELECT * FROM verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+
+-- cleanup
+DROP TABLE heaptest;
+DROP TABLE test_partition;
+DROP TABLE test_partitioned;
+DROP OWNED BY regress_heaptest_role; -- permissions
+DROP ROLE regress_heaptest_role;
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
new file mode 100644
index 0000000000..e7526c17b8
--- /dev/null
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -0,0 +1,242 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 65;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#
+# Check a table with data loaded but no corruption, freezing, etc.
+#
+fresh_test_table('test');
+check_all_options_uncorrupted('test', 'plain');
+
+#
+# Check a corrupt table
+#
+fresh_test_table('test');
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "plain corrupted table");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-visible')",
+ "plain corrupted table skipping all-visible");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "plain corrupted table skipping all-frozen");
+detects_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "plain corrupted table skipping toast");
+detects_corruption(
+ "verify_heapam('test', startblock := 0, endblock := 0)",
+ "plain corrupted table checking only block zero");
+
+#
+# Check a corrupt table with all-frozen data
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "all-frozen corrupted table");
+detects_no_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "all-frozen corrupted table skipping all-frozen");
+
+#
+# Check a corrupt table with corrupt page header
+#
+fresh_test_table('test');
+corrupt_first_page_and_header('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "corrupted test table with bad page header");
+
+#
+# Check an uncorrupted table with corrupt toast page header
+#
+fresh_test_table('test');
+my $toast = get_toast_for('test');
+corrupt_first_page_and_header($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast page header checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast page header skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast page header");
+
+#
+# Check an uncorrupted table with corrupt toast
+#
+fresh_test_table('test');
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table");
+
+#
+# Check an uncorrupted all-frozen table with corrupt toast
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "all-frozen table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "all-frozen table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table of all-frozen table");
+
+# Returns the filesystem path for the named relation.
+sub relation_filepath
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the fully qualified name of the toast table for the named relation
+sub get_toast_for
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ SELECT 'pg_toast.' || t.relname
+ FROM pg_catalog.pg_class c, pg_catalog.pg_class t
+ WHERE c.relname = '$relname'
+ AND c.reltoastrelid = t.oid));
+}
+
+# (Re)create and populate a test table of the given name.
+sub fresh_test_table
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ DROP TABLE IF EXISTS $relname CASCADE;
+ CREATE TABLE $relname (a integer, b text);
+ ALTER TABLE $relname SET (autovacuum_enabled=false);
+ ALTER TABLE $relname ALTER b SET STORAGE external;
+ INSERT INTO $relname (a, b)
+ (SELECT gs, repeat('b',gs*10) FROM generate_series(1,1000) gs);
+ ));
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+sub corrupt_first_page_internal
+{
+ my ($relname, $corrupt_header) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+
+ # If we corrupt the header, postgres won't allow the page into the buffer.
+ syswrite($fh, '\xFF\xFF\xFF\xFF', 8) if ($corrupt_header);
+
+ # Corrupt at least the line pointers. Exactly what this corrupts will
+ # depend on the page, as it may run past the line pointers into the user
+ # data. We stop short of writing 2048 bytes (2k), the smallest supported
+ # page size, as we don't want to corrupt the next page.
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+sub corrupt_first_page
+{
+ corrupt_first_page_internal($_[0], undef);
+}
+
+sub corrupt_first_page_and_header
+{
+ corrupt_first_page_internal($_[0], 1);
+}
+
+sub detects_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) > 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+sub detects_no_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) = 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+# Check various options are stable (don't abort) and do not report corruption
+# when running verify_heapam on an uncorrupted test table.
+#
+# The relname *must* be an uncorrupted table, or this will fail.
+#
+# The prefix is used to identify the test, along with the options,
+# and should be unique.
+sub check_all_options_uncorrupted
+{
+ my ($relname, $prefix) = @_;
+ for my $stop (qw(true false))
+ {
+ for my $check_toast (qw(true false))
+ {
+ for my $skip ("'none'", "'all-frozen'", "'all-visible'")
+ {
+ for my $startblock (qw(NULL 0))
+ {
+ for my $endblock (qw(NULL 0))
+ {
+ my $opts = "on_error_stop := $stop, " .
+ "check_toast := $check_toast, " .
+ "skip := $skip, " .
+ "startblock := $startblock, " .
+ "endblock := $endblock";
+
+ detects_no_corruption(
+ "verify_heapam('$relname', $opts)",
+ "$prefix: $opts");
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..b46cce798e
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1537 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "utils/acl.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/* The number of columns in tuples returned by verify_heapam */
+#define HEAPCHECK_RELATION_COLS 4
+
+typedef enum XidBoundsViolation
+{
+ XID_INVALID,
+ XID_IN_FUTURE,
+ XID_PRECEDES_DATMIN,
+ XID_PRECEDES_RELMIN,
+ XID_BOUNDS_OK
+} XidBoundsViolation;
+
+typedef enum XidCommitStatus
+{
+ XID_COMMITTED,
+ XID_IN_PROGRESS,
+ XID_ABORTED
+} XidCommitStatus;
+
+typedef enum SkipPages
+{
+ SKIP_PAGES_ALL_FROZEN,
+ SKIP_PAGES_ALL_VISIBLE,
+ SKIP_PAGES_NONE
+} SkipPages;
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam execution.
+ */
+typedef struct HeapCheckContext
+{
+ /*
+ * Cached copies of values from ShmemVariableCache and computed values
+ * from them.
+ */
+ FullTransactionId next_fxid; /* ShmemVariableCache->nextXid */
+ TransactionId next_xid; /* 32-bit version of next_fxid */
+ TransactionId oldest_xid; /* ShmemVariableCache->oldestXid */
+ FullTransactionId oldest_fxid; /* 64-bit version of oldest_xid, computed
+ * relative to next_fxid */
+
+ /*
+ * Cached copy of value from MultiXactState
+ */
+ MultiXactId next_mxact; /* MultiXactState->nextMXact */
+ MultiXactId oldest_mxact; /* MultiXactState->oldestMultiXactId */
+
+ /*
+ * Cached copies of the most recently checked xid and its status.
+ */
+ TransactionId cached_xid;
+ XidCommitStatus cached_status;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ FullTransactionId relfrozenfxid;
+ TransactionId relminmxid;
+ Relation toast_rel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Whether verify_heapam has yet encountered any corrupt tuples */
+ bool is_corrupt;
+
+ /* The descriptor and tuplestore for verify_heapam's result tuples */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void sanity_check_relation(Relation rel);
+static void check_tuple(HeapCheckContext *ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx);
+
+static bool check_tuple_attribute(HeapCheckContext *ctx);
+static bool check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx);
+
+static void report_corruption(HeapCheckContext *ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+static FullTransactionId FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx);
+static void update_cached_xid_range(HeapCheckContext *ctx);
+static void update_cached_mxid_range(HeapCheckContext *ctx);
+static XidBoundsViolation check_mxid_in_range(MultiXactId mxid, HeapCheckContext *ctx);
+static XidBoundsViolation check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx);
+static XidBoundsViolation get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status);
+
+/*
+ * Scan and report corruption in heap pages, optionally reconciling toasted
+ * attributes with entries in the associated toast table. Intended to be
+ * called from SQL with the following parameters:
+ *
+ * relation
+ * The Oid of the heap relation to be checked.
+ *
+ * on_error_stop:
+ * Whether to stop at the end of the first page for which errors are
+ * detected. Note that multiple rows may be returned.
+ *
+ * check_toast:
+ * Whether to check each toasted attribute against the toast table to
+ * verify that it can be found there.
+ *
+ * skip:
+ * What kinds of pages in the heap relation should be skipped. Valid
+ * options are "all-visible", "all-frozen", and "none".
+ *
+ * Returns to the SQL caller a set of tuples, each containing the location
+ * and a description of a corruption found in the heap.
+ *
+ * Note that if check_toast is true, it is the caller's responsibility to
+ * provide that the toast table and index are not corrupt, and that they
+ * do not become corrupt while this function is running.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext old_context;
+ bool random_access;
+ HeapCheckContext ctx;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool check_toast;
+ SkipPages skip_option = SKIP_PAGES_NONE;
+ BlockNumber first_block;
+ BlockNumber last_block;
+ BlockNumber nblocks;
+ const char *skip;
+
+ /* Check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("relation cannot be null")));
+ relid = PG_GETARG_OID(0);
+
+ if (PG_ARGISNULL(1))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("on_error_stop cannot be null")));
+ on_error_stop = PG_GETARG_BOOL(1);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check_toast cannot be null")));
+ check_toast = PG_GETARG_BOOL(2);
+
+ if (PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("skip cannot be null")));
+ skip = text_to_cstring(PG_GETARG_TEXT_PP(3));
+ if (pg_strcasecmp(skip, "all-visible") == 0)
+ skip_option = SKIP_PAGES_ALL_VISIBLE;
+ else if (pg_strcasecmp(skip, "all-frozen") == 0)
+ skip_option = SKIP_PAGES_ALL_FROZEN;
+ else if (pg_strcasecmp(skip, "none") == 0)
+ skip_option = SKIP_PAGES_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid skip option"),
+ errhint("Valid skip options are \"all-visible\", \"all-frozen\", and \"none\".")));
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+ ctx.cached_xid = InvalidTransactionId;
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ old_context = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ random_access = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(random_access, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+ MemoryContextSwitchTo(old_context);
+
+ /* Open relation, check relkind and access method, and check privileges */
+ ctx.rel = relation_open(relid, AccessShareLock);
+ sanity_check_relation(ctx.rel);
+
+ /* Early exit if the relation is empty */
+ nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ if (!nblocks)
+ {
+ relation_close(ctx.rel, AccessShareLock);
+ PG_RETURN_NULL();
+ }
+
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ /* Validate block numbers, or handle nulls. */
+ if (PG_ARGISNULL(4))
+ first_block = 0;
+ else
+ {
+ int64 fb = PG_GETARG_INT64(4);
+
+ if (fb < 0 || fb >= nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("starting block number must be between 0 and %u",
+ nblocks - 1)));
+ first_block = (BlockNumber) fb;
+ }
+ if (PG_ARGISNULL(5))
+ last_block = nblocks - 1;
+ else
+ {
+ int64 lb = PG_GETARG_INT64(5);
+
+ if (lb < 0 || lb >= nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block number must be between 0 and %u",
+ nblocks - 1)));
+ last_block = (BlockNumber) lb;
+ }
+
+ /* Optionally open the toast relation, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid && check_toast)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toast_rel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ AccessShareLock);
+ offset = toast_open_indexes(ctx.toast_rel,
+ AccessShareLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /*
+ * Main relation has no associated toast relation, or we're
+ * intentionally skipping it.
+ */
+ ctx.toast_rel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ update_cached_xid_range(&ctx);
+ update_cached_mxid_range(&ctx);
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relfrozenfxid = FullTransactionIdFromXidAndCtx(ctx.relfrozenxid, &ctx);
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldest_xid = ctx.relfrozenxid;
+
+ for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++)
+ {
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_option != SKIP_PAGES_NONE)
+ {
+ int32 mapbits;
+
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_option == SKIP_PAGES_ALL_FROZEN)
+ {
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ if (skip_option == SKIP_PAGES_ALL_VISIBLE)
+ {
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ }
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it
+ * redirects to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
+ ItemId rditem;
+
+ if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ {
+ report_corruption(&ctx,
+ /*------
+ translator: the %u is an offset */
+ psprintf(_("line pointer redirection to item at offset %u exceeds maximum offset %u"),
+ (unsigned) rdoffnum,
+ (unsigned) maxoff));
+ continue;
+ }
+ rditem = PageGetItemId(ctx.page, rdoffnum);
+ if (!ItemIdIsUsed(rditem))
+ report_corruption(&ctx,
+ /*------
+ translator: the %u is an offset */
+ psprintf(_("line pointer redirection to unused item at offset %u"),
+ (unsigned) rdoffnum));
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ AccessShareLock);
+ if (ctx.toast_rel)
+ table_close(ctx.toast_rel, AccessShareLock);
+
+ /* Close the main relation */
+ relation_close(ctx.rel, AccessShareLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Check that a relation's relkind and access method are both supported,
+ * and that the caller has select privilege on the relation.
+ */
+static void
+sanity_check_relation(Relation rel)
+{
+ AclResult aclresult;
+
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ /*------
+ translator: %s is a user supplied object name */
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only heap AM is supported")));
+ aclresult = pg_class_aclcheck(rel->rd_id, GetUserId(), ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult,
+ get_relkind_objtype(rel->rd_rel->relkind),
+ RelationGetRelationName(rel));
+}
+
+/*
+ * Record a single corruption found in the table. The values in ctx should
+ * reflect the location of the corruption, and the msg argument should contain
+ * a human readable description of the corruption.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+report_corruption(HeapCheckContext *ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ values[2] = Int32GetDatum(ctx->attnum);
+ nulls[2] = (ctx->attnum < 0);
+ values[3] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using work_mem worth of memory building up the
+ * tuplestore. That's ok, but if we also leak the msg argument memory
+ * until the end of the query, we could exceed work_mem by more than a
+ * trivial amount. Therefore, free the msg argument each time we are
+ * called rather than waiting for our current memory context to be freed.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Construct the TupleDesc used to report messages about corruptions found
+ * while scanning the heap.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Check for tuple header corruption and tuple visibility.
+ *
+ * Since we do not hold a snapshot, tuple visibility is not a question of
+ * whether we should be able to see the tuple relative to any particular
+ * snapshot, but rather a question of whether it is safe and reasonable to
+ * to check the tuple attributes.
+ *
+ * Some kinds of corruption make it unsafe to check the tuple attributes, for
+ * example when the line pointer refers to a range of bytes outside the page.
+ * In such cases, we return false (not visible) after recording appropriate
+ * corruption messages.
+ *
+ * Some other kinds of tuple header corruption confuse the question of where
+ * the tuple attributes begin, or how long the nulls bitmap is, etc., making it
+ * unreasonable to attempt to check attributes, even if all candidate answers
+ * to those questions would not result in reading past the end of the line
+ * pointer or page. In such cases, like above, we record corruption messages
+ * about the header and then return false.
+ *
+ * Other kinds of tuple header corruption do not bear on the question of
+ * whether the tuple attributes can be checked, so we record corruption
+ * messages for them but do not base our visibility determination on them. (In
+ * other words, we do not return false merely because we detected them.)
+ *
+ * For visibility determination not specifically related to corruption, what we
+ * want to know is if a tuple is potentially visible to any running
+ * transaction. If you are tempted to replace this function's visibility logic
+ * with a call to another visibility checking function, keep in mind that this
+ * function does not update hint bits, as it seems imprudent to write hint bits
+ * (or anything at all) to a table during a corruption check. Nor does this
+ * function bother classifying tuple visibility beyond a boolean visible vs.
+ * not visible.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ *
+ * Returns whether the tuple is both visible and sufficiently sensible to
+ * undergo attribute checks.
+ */
+static bool
+check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+ bool header_garbled = false;
+ unsigned expected_hoff;;
+
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u is an offset, second %u is a length */
+ psprintf(_("data begins at offset %u beyond the tuple length %u"),
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ header_garbled = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ report_corruption(ctx,
+ pstrdup(_("tuple is marked as only locked, but also claims key columns were updated")));
+ header_garbled = true;
+ }
+
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ report_corruption(ctx,
+ pstrdup(_("multixact should not be marked committed")));
+
+ /*
+ * This condition is clearly wrong, but we do not consider the header
+ * garbled, because we don't rely on this property for determining if
+ * the tuple is visible or for interpreting other relevant header
+ * fields.
+ */
+ }
+
+ if (infomask & HEAP_HASNULL)
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts));
+ else
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader);
+ if (ctx->tuphdr->t_hoff != expected_hoff)
+ {
+ if ((infomask & HEAP_HASNULL) && ctx->natts == 1)
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent an offset */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, has nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff));
+ else if ((infomask & HEAP_HASNULL))
+ report_corruption(ctx,
+ /*------
+ translator: first and second %u represent an offset, third %u
+ represents the number of attributes */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, has nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
+ else if (ctx->natts == 1)
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent an offset */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, no nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff));
+ else
+ report_corruption(ctx,
+ /*------
+ translator: first and second %u represent an offset, third %u
+ represents the number of attributes */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, no nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
+ header_garbled = true;
+ }
+
+ if (header_garbled)
+ return false; /* checking of this tuple should not continue */
+
+ /*
+ * Ok, we can examine the header for tuple visibility purposes, though we
+ * still need to be careful about a few remaining types of header
+ * corruption. This logic roughly follows that of
+ * HeapTupleSatisfiesVacuum. Where possible the comments indicate which
+ * HTSV_Result we think that function might return for this tuple.
+ */
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ return false; /* HEAPTUPLE_DEAD */
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ XidCommitStatus status;
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ switch (get_xid_status(xvac, ctx, &status))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("old-style VACUUM FULL transaction ID is invalid")));
+ return false; /* corrupt */
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u equals or exceeds next valid transaction ID %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u precedes relation freeze threshold %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u precedes oldest valid transaction ID %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ break;
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_IN_PROGRESS:
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ case XID_COMMITTED:
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+ }
+ else
+ {
+ XidCommitStatus status;
+
+ switch (get_xid_status(raw_xmin, ctx, &status))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("raw xmin is invalid")));
+ return false;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u equals or exceeds next valid transaction ID %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u precedes relation freeze threshold %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u precedes oldest valid transaction ID %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_COMMITTED:
+ break;
+ case XID_IN_PROGRESS:
+ return true; /* insert or delete in progress */
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ XidCommitStatus status;
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ switch (get_xid_status(xmax, ctx, &status))
+ {
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("xmax is invalid")));
+ return false; /* corrupt */
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u equals or exceeds next valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u precedes relation freeze threshold %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u precedes oldest valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_IN_PROGRESS:
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ case XID_COMMITTED:
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or
+ * HEAPTUPLE_DEAD */
+ }
+ }
+
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS or
+ * HEAPTUPLE_LIVE */
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true; /* not dead */
+}
+
+/*
+ * Check the current toast tuple against the state tracked in ctx, recording
+ * any corruption found in ctx->tupstore.
+ *
+ * This is not equivalent to running verify_heapam on the toast table itself,
+ * and is not hardened against corruption of the toast table. Rather, when
+ * validating a toasted attribute in the main table, the sequence of toast
+ * tuples that store the toasted value are retrieved and checked in order, with
+ * each toast tuple being checked against where we are in the sequence, as well
+ * as each toast tuple having its varlena structure sanity checked.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk sequence number is null")));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk data is null")));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ }
+ else
+ {
+ /* should never happen */
+ uint32 header = ((varattrib_4b *) chunk)->va_4byte.va_header;
+
+ report_corruption(ctx,
+ /*------
+ translator: %0x represents a bit pattern in hexadecimal, %d represents
+ the sequence number */
+ psprintf(_("corrupt extended toast chunk has invalid varlena header: %0x (sequence number %d)"),
+ header, curchunk));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a sequence number */
+ psprintf(_("toast chunk sequence number %u does not match the expected sequence number %u"),
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a sequence number */
+ psprintf(_("toast chunk sequence number %u exceeds the end chunk sequence number %u"),
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a chunk size */
+ psprintf(_("toast chunk size %u differs from the expected size %u"),
+ chunksize, expected_size));
+ return;
+ }
+}
+
+/*
+ * Check the current attribute as tracked in ctx, recording any corruption
+ * found in ctx->tupstore.
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in the
+ * case of a toasted value, optionally continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The checks
+ * performed by Asserts present in those two functions are also performed here.
+ * In cases where those two functions are a bit cavalier in their assumptions
+ * about data being correct, we perform additional checks not present in either
+ * of those two functions. Where some condition is checked in both of those
+ * functions, we perform it here twice, as we parallel the logical flow of
+ * those two functions. The presence of duplicate checks seems a reasonable
+ * price to pay for keeping this code tightly coupled with the code it
+ * protects.
+ *
+ * Returns true if the tuple attribute is sane enough for processing to
+ * continue on to the next attribute, false otherwise.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext *ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask;
+ Form_pg_attribute thisatt;
+
+ infomask = ctx->tuphdr->t_infomask;
+ thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and fourth
+ %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u starts at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and
+ fourth %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u ends at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number, second %u
+ represents an enumeration value */
+ psprintf(_("toasted attribute %u has unexpected TOAST tag %u"),
+ ctx->attnum,
+ va_tag));
+ /* We can't know where the next attribute begins */
+ return false;
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and fourth
+ %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u ends at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+
+ return false;
+ }
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("attribute %u is external but tuple header flag HEAP_HASEXTERNAL not set"),
+ ctx->attnum));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("attribute %u is external but relation has no toast relation"),
+ ctx->attnum));
+ return true;
+ }
+
+ /* If we were told to skip toast checking, then we're done. */
+ if (ctx->toast_rel == NULL)
+ return true;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toast_rel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ ctx->chunkno++;
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a chunk number */
+ psprintf(_("final toast chunk number %u differs from expected value %u"),
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("toasted value for attribute %u missing from toast table"),
+ ctx->attnum));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * Check the current tuple as tracked in ctx, recording any corruption found in
+ * ctx->tupstore.
+ */
+static void
+check_tuple(HeapCheckContext *ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /*
+ * If we report corruption before iterating over individual attributes, we
+ * need attnum to be reported as NULL. Set that up before any corruption
+ * reporting might happen.
+ */
+ ctx->attnum = -1;
+
+ /*
+ * If the line pointer for this tuple does not reserve enough space for a
+ * complete tuple header, we dare not read the tuple header.
+ */
+ if (ctx->lp_len < MAXALIGN(SizeofHeapTupleHeader))
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents a length, second %u represents a size
+ */
+ psprintf(_("line pointer length %u is less than the minimum tuple header size %u"),
+ ctx->lp_len, (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ return;
+ }
+
+ /* If xmin is normal, it should be within valid range */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ switch (get_xid_status(xmin, ctx, NULL))
+ {
+ case XID_INVALID:
+ case XID_BOUNDS_OK:
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u equals or exceeds next valid transaction ID %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u precedes oldest valid transaction ID %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u precedes relation freeze threshold %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ fatal = true;
+ break;
+ }
+
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+
+ /* If xmax is a multixact, it should be within valid range */
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ switch (check_mxid_valid_in_rel(xmax, ctx))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("multitransaction ID is invalid")));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: both %u are multitransaction IDs */
+ psprintf(_("multitransaction ID %u precedes relation minimum multitransaction ID threshold %u"),
+ xmax, ctx->relminmxid));
+ fatal = true;
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: Both %u are multitransaction IDs */
+ psprintf(_("multitransaction ID %u precedes oldest valid multitransaction ID threshold %u"),
+ xmax, ctx->oldest_mxact));
+ fatal = true;
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: %u is a multitransaction ID */
+ psprintf(_("multitransaction ID %u equals or exceeds next valid multitransaction ID %u"),
+ xmax,
+ ctx->next_mxact));
+ fatal = true;
+ break;
+ case XID_BOUNDS_OK:
+ break;
+ }
+ }
+ /* If xmax is not a multixact and is normal, it should be within valid range */
+ else
+ {
+ switch (get_xid_status(xmax, ctx, NULL))
+ {
+ case XID_INVALID:
+ case XID_BOUNDS_OK:
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u equals or exceeds next valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u precedes oldest valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u precedes relation freeze threshold %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ fatal = true;
+ }
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Check various forms of tuple header corruption. If the header is too
+ * corrupt to continue checking, or if the tuple is not visible to anyone,
+ * we cannot continue with other checks.
+ */
+ if (!check_tuple_header_and_visibilty(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * The tuple is visible, so it must be compatible with the current version
+ * of the relation descriptor. It might have fewer columns than are
+ * present in the relation descriptor, but it cannot have more.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u are a number */
+ psprintf(_("number of attributes %u exceeds maximum expected for table %u"),
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
+ return;
+ }
+
+ /*
+ * Check each attribute unless we hit corruption that confuses what to do
+ * next, at which point we abort further attribute checks for this tuple.
+ * Note that we don't abort for all types of corruption, only for those
+ * types where we don't know how to continue.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ if (!check_tuple_attribute(ctx))
+ break; /* cannot continue */
+}
+
+/*
+ * Convert a TransactionId into a FullTransactionId using our cached values of
+ * the valid transaction ID range. It is the caller's responsibility to have
+ * already updated the cached values, if necessary.
+ */
+static FullTransactionId
+FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
+{
+ uint32 epoch;
+
+ if (!TransactionIdIsNormal(xid))
+ return FullTransactionIdFromEpochAndXid(0, xid);
+ epoch = EpochFromFullTransactionId(ctx->next_fxid);
+ if (xid > ctx->next_xid)
+ epoch--;
+ return FullTransactionIdFromEpochAndXid(epoch, xid);
+}
+
+/*
+ * Update our cached range of valid transaction IDs.
+ */
+static void
+update_cached_xid_range(HeapCheckContext *ctx)
+{
+ /* Make cached copies */
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ ctx->next_fxid = ShmemVariableCache->nextXid;
+ ctx->oldest_xid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+
+ /* And compute alternate versions of the same */
+ ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
+ ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+}
+
+/*
+ * Update our cached range of valid multitransaction IDs.
+ */
+static void
+update_cached_mxid_range(HeapCheckContext *ctx)
+{
+ ReadMultiXactIdRange(&ctx->oldest_mxact, &ctx->next_mxact);
+}
+
+/*
+ * Return whether the given FullTransactionId is within our cached valid
+ * transaction ID range.
+ */
+static inline bool
+fxid_in_cached_range(FullTransactionId fxid, const HeapCheckContext *ctx)
+{
+ return (FullTransactionIdPrecedesOrEquals(ctx->oldest_fxid, fxid) &&
+ FullTransactionIdPrecedes(fxid, ctx->next_fxid));
+}
+
+/*
+ * Checks wheter a multitransaction ID is in the cached valid range, returning
+ * the nature of the range violation, if any.
+ */
+static XidBoundsViolation
+check_mxid_in_range(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ if (!TransactionIdIsValid(mxid))
+ return XID_INVALID;
+ if (MultiXactIdPrecedes(mxid, ctx->relminmxid))
+ return XID_PRECEDES_RELMIN;
+ if (MultiXactIdPrecedes(mxid, ctx->oldest_mxact))
+ return XID_PRECEDES_DATMIN;
+ if (MultiXactIdPrecedesOrEquals(ctx->next_mxact, mxid))
+ return XID_IN_FUTURE;
+ return XID_BOUNDS_OK;
+}
+
+/*
+ * Checks whether the given mxid is valid to appear in the heap being checked,
+ * returning the nature of the range violation, if any.
+ *
+ * This function attempts to return quickly by caching the known valid mxid
+ * range in ctx. Callers should already have performed the initial setup of
+ * the cache prior to the first call to this function.
+ */
+static XidBoundsViolation
+check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ XidBoundsViolation result;
+
+ result = check_mxid_in_range(mxid, ctx);
+ if (result == XID_BOUNDS_OK)
+ return XID_BOUNDS_OK;
+
+ /* The range may have advanced. Recheck. */
+ update_cached_mxid_range(ctx);
+ return check_mxid_in_range(mxid, ctx);
+}
+
+/*
+ * Checks whether the given transaction ID is (or was recently) valid to appear
+ * in the heap being checked, or whether it is too old or too new to appear in
+ * the relation, returning information about the nature of the bounds violation.
+ *
+ * We cache the range of valid transaction IDs. If xid is in that range, we
+ * conclude that it is valid, even though concurrent changes to the table might
+ * invalidate it under certain corrupt conditions. (For example, if the table
+ * contains corrupt all-frozen bits, a concurrent vacuum might skip the page(s)
+ * containing the xid and then truncate clog and advance the relfrozenxid
+ * beyond xid.) Reporting the xid as valid under such conditions seems
+ * acceptable, since if we had checked it earlier in our scan it would have
+ * truly been valid at that time.
+ *
+ * If the status argument is not NULL, and if and only if the transaction ID
+ * appears to be valid in this relation, clog will be consulted and the commit
+ * status argument will be set with the status of the transaction ID.
+ */
+static XidBoundsViolation
+get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status)
+{
+ XidBoundsViolation result;
+ FullTransactionId fxid;
+ FullTransactionId clog_horizon;
+
+ /* Quick check for special xids */
+ if (!TransactionIdIsValid(xid))
+ result = XID_INVALID;
+ else if (xid == BootstrapTransactionId || xid == FrozenTransactionId)
+ result = XID_BOUNDS_OK;
+ else
+ {
+ /* Check if the xid is within bounds */
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ if (!fxid_in_cached_range(fxid, ctx))
+ {
+ /*
+ * We may have been checking against stale values. Update the
+ * cached range to be sure, and since we relied on the cached
+ * range when we performed the full xid conversion, reconvert.
+ */
+ update_cached_xid_range(ctx);
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ }
+
+ if (FullTransactionIdPrecedesOrEquals(ctx->next_fxid, fxid))
+ result = XID_IN_FUTURE;
+ else if (FullTransactionIdPrecedes(fxid, ctx->oldest_fxid))
+ result = XID_PRECEDES_DATMIN;
+ else if (FullTransactionIdPrecedes(fxid, ctx->relfrozenfxid))
+ result = XID_PRECEDES_RELMIN;
+ else
+ result = XID_BOUNDS_OK;
+ }
+
+ /*
+ * Early return if the caller does not request clog checking, or if the
+ * xid is already known to be out of bounds. We dare not check clog for
+ * out of bounds transaction IDs.
+ */
+ if (status == NULL || result != XID_BOUNDS_OK)
+ return result;
+
+ /* Early return if we just checked this xid in a prior call */
+ if (xid == ctx->cached_xid)
+ {
+ *status = ctx->cached_status;
+ return result;
+ }
+
+ *status = XID_COMMITTED;
+ LWLockAcquire(XactTruncationLock, LW_SHARED);
+ clog_horizon = FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid, ctx);
+ if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
+ {
+ if (TransactionIdIsCurrentTransactionId(xid))
+ *status = XID_IN_PROGRESS;
+ else if (TransactionIdDidCommit(xid))
+ *status = XID_COMMITTED;
+ else if (TransactionIdDidAbort(xid))
+ *status = XID_ABORTED;
+ else
+ *status = XID_IN_PROGRESS;
+ }
+ LWLockRelease(XactTruncationLock);
+ ctx->cached_xid = xid;
+ ctx->cached_status = *status;
+ return result;
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a9df2c1a9d..a57781992a 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -9,12 +9,11 @@
<para>
The <filename>amcheck</filename> module provides functions that allow you to
- verify the logical consistency of the structure of relations. If the
- structure appears to be valid, no error is raised.
+ verify the logical consistency of the structure of relations.
</para>
<para>
- The functions verify various <emphasis>invariants</emphasis> in the
+ The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the
structure of the representation of particular relations. The
correctness of the access method functions behind index scans and
other important operations relies on these invariants always
@@ -24,7 +23,7 @@
collated lexical order). If that particular invariant somehow fails
to hold, we can expect binary searches on the affected page to
incorrectly guide index scans, resulting in wrong answers to SQL
- queries.
+ queries. If the structure appears to be valid, no error is raised.
</para>
<para>
Verification is performed using the same procedures as those used by
@@ -35,7 +34,22 @@
functions.
</para>
<para>
- <filename>amcheck</filename> functions may only be used by superusers.
+ Unlike the B-Tree checking functions which report corruption by raising
+ errors, the heap checking function <function>verify_heapam</function> checks
+ a table and attempts to return a set of rows, one row per corruption
+ detected. Despite this, if facilities that
+ <function>verify_heapam</function> relies upon are themselves corrupted, the
+ function may be unable to continue and may instead raise an error.
+ </para>
+ <para>
+ Permission to execute <filename>amcheck</filename> functions may be granted
+ to non-superusers, but before granting such permissions careful consideration
+ should be given to data security and privacy concerns. Although the
+ corruption reports generated by these functions do not focus on the contents
+ of the corrupted data so much as on the structure of that data and the nature
+ of the corruptions found, an attacker who gains permission to execute these
+ functions, particularly if the attacker can also induce corruption, might be
+ able to infer something of the data itself from such messages.
</para>
<sect2>
@@ -187,12 +201,223 @@ SET client_min_messages = DEBUG1;
</para>
</tip>
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ check_toast boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks a table for structural corruption, where pages in the relation
+ contain data that is invalidly formatted, and for logical corruption,
+ where pages are structurally valid but inconsistent with the rest of the
+ database cluster. Example usage:
+<screen>
+test=# select * from verify_heapam('mytable', check_toast := true);
+ blkno | offnum | attnum | msg
+-------+--------+--------+--------------------------------------------------------------------------------------------------
+ 17 | 12 | | xmin 4294967295 precedes relation freeze threshold 17:1134217582
+ 960 | 4 | | data begins at offset 152 beyond the tuple length 58
+ 960 | 4 | | tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+ 960 | 5 | | tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+ 960 | 6 | | tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+ 960 | 7 | | tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+ 1147 | 2 | | number of attributes 2047 exceeds maximum expected for table 3
+ 1147 | 10 | | tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+ 1147 | 15 | | number of attributes 67 exceeds maximum expected for table 3
+ 1147 | 16 | 1 | attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+ 1147 | 18 | 2 | final toast chunk number 0 differs from expected value 6
+ 1147 | 19 | 2 | toasted value for attribute 2 missing from toast table
+ 1147 | 21 | | tuple is marked as only locked, but also claims key columns were updated
+ 1147 | 22 | | multitransaction ID 1775655 is from before relation cutoff 2355572
+(14 rows)
+</screen>
+ As this example shows, the Tuple ID (TID) of the corrupt tuple is given
+ in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
+ for corruptions specific to a particular attribute in the tuple, the
+ <literal>attnum</literal> field shows which one.
+ </para>
+ <para>
+ Structural corruption can happen due to faulty storage hardware, or
+ relation files being overwritten or modified by unrelated software.
+ This kind of corruption can also be detected with
+ <link linkend="app-initdb-data-checksums"><application>data page
+ checksums</application></link>.
+ </para>
+ <para>
+ Relation pages which are correctly formatted, internally consistent, and
+ correct relative to their own internal checksums may still contain
+ logical corruption. As such, this kind of corruption cannot be detected
+ with <application>checksums</application>. Examples include toasted
+ values in the main table which lack a corresponding entry in the toast
+ table, and tuples in the main table with a Transaction ID that is older
+ than the oldest valid Transaction ID in the database or cluster.
+ </para>
+ <para>
+ Multiple causes of logical corruption have been observed in production
+ systems, including bugs in the <productname>PostgreSQL</productname>
+ server software, faulty and ill-conceived backup and restore tools, and
+ user error.
+ </para>
+ <para>
+ Corrupt relations are most concerning in live production environments,
+ precisely the same environments where high risk activities are least
+ welcome. For this reason, <function>verify_heapam</function> has been
+ designed to diagnose corruption without undue risk. It cannot guard
+ against all causes of backend crashes, as even executing the calling
+ query could be unsafe on a badly corrupted system. Access to <link
+ linkend="catalogs-overview">catalog tables</link> are performed and could
+ be problematic if the catalogs themselves are corrupted.
+ </para>
+ <para>
+ The design principle adhered to in <function>verify_heapam</function> is
+ that, if the rest of the system and server hardware are correct, under
+ default options, <function>verify_heapam</function> will not crash the
+ server due merely to structural or logical corruption in the target
+ table.
+ </para>
+ <para>
+ An experimental option, <literal>check_toast</literal>, exists to
+ reconcile the target table against entries in its corresponding toast
+ table. This option may change in future, is disabled by default, and is
+ known to be slow. It is also unsafe under some conditions. If the
+ target relation's corresponding toast table or toast index are corrupt,
+ reconciling the target table against toast values may be unsafe. If the
+ catalogs, toast table and toast index are uncorrupted, and remain so
+ during the check of the target table, reconciling the target table
+ against its toast table should be safe.
+ </para>
+ <para>
+ The following optional arguments are recognized:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>on_error_stop</term>
+ <listitem>
+ <para>
+ If true, corruption checking stops at the end of the first block on
+ which any corruptions are found.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>check_toast</term>
+ <listitem>
+ <para>
+ If this experimental option is true, toasted values are checked gainst
+ the corresponding TOAST table.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>skip</term>
+ <listitem>
+ <para>
+ If not <literal>none</literal>, corruption checking skips blocks that
+ are marked as all-visible or all-frozen, as given.
+ Valid options are <literal>all-visible</literal>,
+ <literal>all-frozen</literal> and <literal>none</literal>.
+ </para>
+ <para>
+ Defaults to <literal>none</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>startblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking begins at the specified block,
+ skipping all previous blocks. It is an error to specify a
+ <literal>startblock</literal> outside the range of blocks in the
+ target table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>endblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking ends at the specified block,
+ skipping all remaining blocks. It is an error to specify an
+ <literal>endblock</literal> outside the range of blocks in the target
+ table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ For each corruption detected, <function>verify_heapam</function> returns
+ a row with the following columns:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</sect2>
<sect2>
<title>Optional <parameter>heapallindexed</parameter> Verification</title>
<para>
- When the <parameter>heapallindexed</parameter> argument to
+ When the <parameter>heapallindexed</parameter> argument to B-Tree
verification functions is <literal>true</literal>, an additional
phase of verification is performed against the table associated with
the target index relation. This consists of a <quote>dummy</quote>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ca357410a2 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you disable one or both of these
+ * assertions, make corresponding changes there.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a2ce617c8c..81752b68eb 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -735,6 +735,25 @@ ReadNextMultiXactId(void)
return mxid;
}
+/*
+ * ReadMultiXactIdRange
+ * Get the range of IDs that may still be referenced by a relation.
+ */
+void
+ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next)
+{
+ LWLockAcquire(MultiXactGenLock, LW_SHARED);
+ *oldest = MultiXactState->oldestMultiXactId;
+ *next = MultiXactState->nextMXact;
+ LWLockRelease(MultiXactGenLock);
+
+ if (*oldest < FirstMultiXactId)
+ *oldest = FirstMultiXactId;
+ if (*next < FirstMultiXactId)
+ *next = FirstMultiXactId;
+}
+
+
/*
* MultiXactIdCreateFromMembers
* Make a new MultiXactId from the specified set of members
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 58c42ffe1f..9a30380901 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -109,6 +109,7 @@ extern MultiXactId MultiXactIdCreateFromMembers(int nmembers,
MultiXactMember *members);
extern MultiXactId ReadNextMultiXactId(void);
+extern void ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next);
extern bool MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly);
extern void MultiXactIdSetOldestMember(void);
extern int GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **xids,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4191f94869..bca30f3dde 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1020,6 +1020,7 @@ HbaToken
HeadlineJsonState
HeadlineParsedText
HeadlineWordEntry
+HeapCheckContext
HeapScanDesc
HeapTuple
HeapTupleData
@@ -2287,6 +2288,7 @@ SimpleStringList
SimpleStringListCell
SingleBoundSortItem
Size
+SkipPages
SlabBlock
SlabChunk
SlabContext
@@ -2788,6 +2790,8 @@ XactCallback
XactCallbackItem
XactEvent
XactLockTableWaitInfo
+XidBoundsViolation
+XidCommitStatus
XidHorizonPrefetchState
XidStatus
XmlExpr
--
2.21.1 (Apple Git-122.3)
v17-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v17-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 2f4b08a77308c003db5a49020649d6fc61c2352e Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 5 Oct 2020 15:43:00 -0700
Subject: [PATCH v17 2/4] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 2 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 1281 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.control | 5 +
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 60 +
contrib/pg_amcheck/t/003_check.pl | 231 ++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 489 ++++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 228 ++++
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 2 +
15 files changed, 2393 insertions(+), 3 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.control
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 7a4866e338..0fd4125902 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..f8eecf70bf
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,2 @@
+/pg_amcheck
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..324cf1cfc8
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1281 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "common/connect.h"
+#include "common/string.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --strict-names require include patterns to match at least one entity each",
+ " -o, --on-error-stop stop checking at end of first corrupt page",
+ "",
+ "Schema checking options:",
+ " -n, --schema=PATTERN check relations in the specified schema(s) only",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)",
+ "",
+ "Table checking options:",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -b, --startblock begin checking table(s) at the given starting block number",
+ " -e, --endblock check table(s) only up to the given ending block number",
+ " -f, --skip-all-frozen do NOT check blocks marked as all frozen",
+ " -v, --skip-all-visible do NOT check blocks marked as all visible",
+ "",
+ "TOAST table checking options:",
+ " -z, --check-toast check associated toast tables and toast indexes",
+ " -Z, --skip-toast do NOT check associated toast tables and toast indexes",
+ " -B, --toast-startblock begin checking toast table(s) at the given starting block",
+ " -E, --toast-endblock check toast table(s) only up to the given ending block",
+ "",
+ "Index checking options:",
+ " -x, --check-indexes check btree indexes associated with tables being checked",
+ " -X, --skip-indexes do NOT check any btree indexes",
+ " -i, --index=PATTERN check the specified index(es) only",
+ " -I, --exclude-index=PATTERN do NOT check the specified index(es)",
+ " -c, --check-corrupt check indexes even if their associated table is corrupt",
+ " -C, --skip-corrupt do NOT check indexes if their associated table is corrupt",
+ " -a, --heapallindexed check index tuples against the table tuples",
+ " -A, --no-heapallindexed do NOT check index tuples against the table tuples",
+ " -r, --rootdescend search from the root page for each index tuple",
+ " -R, --no-rootdescend do NOT search from the root page for each index tuple",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+AmCheckSettings
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes */
+ bool check_toast; /* Check associated toast tables and indexes */
+ bool check_corrupt; /* Check indexes even if table is corrupt */
+ bool heapallindexed; /* Perform index to table reconciling checks */
+ bool rootdescend; /* Perform index rootdescend checks */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends, inclusive */
+ char *toaststart; /* Block number where toast checking begins */
+ char *toastend; /* Block number where toast checking ends,
+ * inclusive */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of Oids of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+static SimpleStringList index_include_patterns = {NULL, NULL};
+static SimpleOidList index_include_oids = {NULL, NULL};
+static SimpleStringList index_exclude_patterns = {NULL, NULL};
+static SimpleOidList index_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+/*
+ * Strings to be constructed once upon first use. These could be made
+ * string constants instead, but that would require embedding knowledge
+ * of the single character values for each relkind, such as 'm' for
+ * materialized views, which we'd rather not embed here.
+ */
+static char *table_relkind_quals = NULL;
+static char *index_relkind_quals = NULL;
+
+/*
+ * Functions to get pointers to the two strings, above, after initializing
+ * them upon the first call to the function.
+ */
+static const char *get_table_relkind_quals(void);
+static const char *get_index_relkind_quals(void);
+
+/*
+ * Functions for running the various corruption checks.
+ */
+static void check_tables(SimpleOidList *checklist);
+static uint64 check_toast(Oid tbloid);
+static uint64 check_table(Oid tbloid, const char *startblock,
+ const char *endblock, bool on_error_stop,
+ bool check_toast);
+static uint64 check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids);
+static uint64 check_index(const char *idxoid, const char *idxname,
+ const char *tblname);
+
+/*
+ * Functions implementing standard command line behaviors.
+ */
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+static void NoticeProcessor(void *arg, const char *message);
+
+/*
+ * Functions for converting command line options that include or exclude
+ * schemas, tables, and indexes by pattern into internally useful lists of
+ * Oids for objects that match those patterns.
+ */
+static void expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals);
+static void expand_table_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_index_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(const SimpleOidList *include_nsp,
+ const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl,
+ const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+static PGresult *ExecuteSqlQuery(const char *query, char **error);
+static PGresult *ExecuteSqlQueryOrDie(const char *query);
+
+static void append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids);
+static void apply_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids, bool include);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char *password = NULL;
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ /*
+ * Default behaviors for user settable options. Note that these default
+ * to doing all the safe checks and none of the unsafe ones, on the theory
+ * that if a user says "pg_amcheck mydb" without specifying any additional
+ * options, we should check everything we know how to check without
+ * risking any backend aborts.
+ */
+
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+
+ /* Index checking options */
+ settings.check_indexes = false;
+ settings.check_corrupt = false;
+ settings.heapallindexed = false;
+ settings.rootdescend = false;
+
+ /*
+ * Reconciling toasted attributes from the main table against the toast
+ * table can crash the backend if the toast table or index are corrupt.
+ * We can optionally check the toast table and then the toast index prior
+ * to checking the main table, but if the toast table or index are
+ * concurrently corrupted after we conclude they are valid, the check of
+ * the main table can crash the backend. The oneous is on any caller who
+ * enables this option to make certain the environment is sufficiently
+ * stable that concurrent corruptions of the toast is not possible.
+ */
+ settings.check_toast = false;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ password = simple_prompt("Password: ", false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ password = simple_prompt(password_prompt, false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /*
+ * Expand schema, table, and index exclusion patterns, if any. Note that
+ * non-matching exclusion patterns are not an error, even when
+ * --strict-names was specified.
+ */
+ expand_schema_name_patterns(&schema_exclude_patterns, NULL,
+ &schema_exclude_oids, false);
+ expand_table_name_patterns(&table_exclude_patterns, NULL, NULL,
+ &table_exclude_oids, false);
+ expand_index_name_patterns(&index_exclude_patterns, NULL, NULL,
+ &index_exclude_oids, false);
+
+ /* Expand schema selection patterns into Oid lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_exclude_oids,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+
+ /* Expand table selection patterns into Oid lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &schema_exclude_oids,
+ &table_exclude_oids,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+
+ /* Expand index selection patterns into Oid lists */
+ if (index_include_patterns.head != NULL)
+ {
+ expand_index_name_patterns(&index_include_patterns,
+ &schema_exclude_oids,
+ &index_exclude_oids,
+ &index_include_oids,
+ settings.strict_names);
+ if (index_include_oids.head == NULL)
+ fatal("no matching indexes were found");
+ }
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ /*
+ * All information about corrupt indexes are returned via ereport, not as
+ * tuples. We want all the details to report if corruption exists.
+ */
+ PQsetErrorVerbosity(settings.db, PQERRORS_VERBOSE);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+/*
+ * Conditionally add a restriction to a query such that lval must be an Oid in
+ * the given list of Oids, except that for a null or empty oids list argument,
+ * no filtering is done and we return without having modified the query buffer.
+ *
+ * The query argument must already have begun the WHERE clause and must be in a
+ * state where we can append an AND clause. No checking of this requirement is
+ * done here.
+ *
+ * On return, the query buffer will be extended with an AND clause that filters
+ * only those rows where the lval is an Oid present in the given list of oids.
+ */
+static inline void
+include_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, true);
+}
+
+/*
+ * Same as include_filter, above, except that for a non-null, non-empty oids
+ * list, the lval is restricted to not be any of the values in the list.
+ */
+static inline void
+exclude_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, false);
+}
+
+/*
+ * Check each table from the given checklist per the user specified options.
+ */
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ uint64 corruptions = 0;
+ bool reconcile_toast;
+
+ /*
+ * If we skip checking the toast table, or if during the check we
+ * detect any toast table corruption, the main table checks below must
+ * not reconcile toasted attributes against the toast table, as such
+ * accesses to the toast table might crash the backend. Instead, skip
+ * such reconciliations for this table.
+ *
+ * This protection contains a race condition; the toast table or index
+ * could become corrupted concurrently with our checks, but prevention
+ * of such concurrent corruption is documented as the caller's
+ * reponsibility, so we don't worry about it here.
+ */
+ reconcile_toast = false;
+ if (settings.check_toast)
+ {
+ if (check_toast(cell->val) == 0)
+ reconcile_toast = true;
+ }
+
+ corruptions = check_table(cell->val,
+ settings.startblock,
+ settings.endblock,
+ settings.on_error_stop,
+ reconcile_toast);
+
+ if (settings.check_indexes)
+ {
+ bool old_heapallindexed;
+
+ /* Optionally skip the index checks for a corrupt table. */
+ if (corruptions && !settings.check_corrupt)
+ continue;
+
+ /*
+ * The btree checking logic which optionally checks the contents
+ * of an index against the corresponding table has not yet been
+ * sufficiently hardened against corrupt tables. In particular,
+ * when called with heapallindexed true, it segfaults if the file
+ * backing the table relation has been erroneously unlinked. In
+ * any event, it seems unwise to reconcile an index against its
+ * table when we already know the table is corrupt.
+ */
+ old_heapallindexed = settings.heapallindexed;
+ if (corruptions)
+ settings.heapallindexed = false;
+
+ corruptions += check_indexes(cell->val,
+ &index_include_oids,
+ &index_exclude_oids);
+
+ settings.heapallindexed = old_heapallindexed;
+ }
+ }
+}
+
+/*
+ * For a given main table relation, returns the associated toast table,
+ * or InvalidOid if none exists.
+ */
+static Oid
+get_toast_oid(Oid tbloid)
+{
+ PQExpBuffer querybuf = createPQExpBuffer();
+ PGresult *res;
+ char *error = NULL;
+ Oid result = InvalidOid;
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.reltoastrelid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid);
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ result = atooid(PQgetvalue(res, 0, 0));
+ else if (error)
+ die_on_query_failure(querybuf->data);
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return result;
+}
+
+/*
+ * For the given main table relation, checks the associated toast table and
+ * index, in any. This should be performed *before* checking the main table
+ * relation, as the checks inside verify_heapam assume both the toast table and
+ * toast index are usable.
+ *
+ * Returns the number of corruptions detected.
+ */
+static uint64
+check_toast(Oid tbloid)
+{
+ Oid toastoid;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_toast");
+
+ toastoid = get_toast_oid(tbloid);
+ if (OidIsValid(toastoid))
+ {
+ corruption_cnt = check_table(toastoid, settings.toaststart,
+ settings.toastend, settings.on_error_stop,
+ false);
+
+ /*
+ * If the toast table is corrupt, checking the index is not safe.
+ * There is a race condition here, as the toast table could be
+ * concurrently corrupted, but preventing concurrent corruption is the
+ * caller's responsibility, not ours.
+ */
+ if (corruption_cnt == 0)
+ corruption_cnt += check_indexes(toastoid, NULL, NULL);
+ }
+
+ return corruption_cnt;
+}
+
+/*
+ * Checks the given table for corruption, returning the number of corruptions
+ * detected and printed to the user.
+ */
+static uint64
+check_table(Oid tbloid, const char *startblock, const char *endblock,
+ bool on_error_stop, bool check_toast)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *skip;
+ char *toast;
+ const char *stop;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_table");
+
+ if (startblock == NULL)
+ startblock = "NULL";
+ if (endblock == NULL)
+ endblock = "NULL";
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("'none'");
+ stop = (on_error_stop) ? "true" : "false";
+ toast = (check_toast) ? "true" : "false";
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg "
+ "FROM verify_heapam("
+ "relation := %u, "
+ "on_error_stop := %s, "
+ "skip := %s, "
+ "check_toast := %s, "
+ "startblock := %s, "
+ "endblock := %s) v, "
+ "pg_catalog.pg_class c "
+ "WHERE c.oid = %u",
+ tbloid, stop, skip, toast, startblock, endblock, tbloid);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ {
+ corruption_cnt += PQntuples(res);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ printf("(relname=%s,blkno=%s,offnum=%s,attnum=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ }
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+ return corruption_cnt;
+}
+
+static uint64
+check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_indexes");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT i.indexrelid, ci.relname, ct.relname"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class ci, "
+ "pg_catalog.pg_class ct"
+ "\nWHERE i.indexrelid = ci.oid"
+ "\n AND i.indrelid = ct.oid"
+ "\n AND ci.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+ include_filter(querybuf, "i.indexrelid", include_oids);
+ exclude_filter(querybuf, "i.indexrelid", exclude_oids);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ for (i = 0; i < PQntuples(res); i++)
+ corruption_cnt += check_index(PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2));
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static uint64
+check_index(const char *idxoid, const char *idxname, const char *tblname)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_index");
+ if (idxname == NULL)
+ fatal("no index name on entry to check_index");
+ if (tblname == NULL)
+ fatal("no table name on entry to check_index");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
+ res = PQexec(settings.db, querybuf->data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("index check failed for index %s of table %s:\n",
+ idxname, tblname);
+ printf("%s", PQerrorMessage(settings.db));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"check-corrupt", no_argument, NULL, 'c'},
+ {"check-indexes", no_argument, NULL, 'x'},
+ {"check-toast", no_argument, NULL, 'z'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"heapallindexed", no_argument, NULL, 'a'},
+ {"help", optional_argument, NULL, '?'},
+ {"host", required_argument, NULL, 'h'},
+ {"index", required_argument, NULL, 'i'},
+ {"no-heapallindexed", no_argument, NULL, 'A'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"no-rootdescend", no_argument, NULL, 'R'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"password", no_argument, NULL, 'W'},
+ {"port", required_argument, NULL, 'p'},
+ {"rootdescend", no_argument, NULL, 'r'},
+ {"schema", required_argument, NULL, 'n'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-corrupt", no_argument, NULL, 'C'},
+ {"skip-indexes", no_argument, NULL, 'X'},
+ {"skip-toast", no_argument, NULL, 'Z'},
+ {"startblock", required_argument, NULL, 'b'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"toast-endblock", required_argument, NULL, 'E'},
+ {"toast-startblock", required_argument, NULL, 'B'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "aAb:B:cCd:e:E:fh:i:I:n:N:op:rRst:T:U:vVwWxXzZ?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'a':
+ settings.heapallindexed = true;
+ break;
+ case 'A':
+ settings.heapallindexed = false;
+ break;
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'B':
+ settings.toaststart = pg_strdup(optarg);
+ break;
+ case 'c':
+ settings.check_corrupt = true;
+ break;
+ case 'C':
+ settings.check_corrupt = false;
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'E':
+ settings.toastend = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ simple_string_list_append(&index_include_patterns, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&index_exclude_patterns, optarg);
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 'r':
+ settings.rootdescend = true;
+ break;
+ case 'R':
+ settings.rootdescend = false;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case 'x':
+ settings.check_indexes = true;
+ break;
+ case 'X':
+ settings.check_indexes = false;
+ break;
+ case 'z':
+ settings.check_toast = true;
+ break;
+ case 'Z':
+ settings.check_toast = false;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ int lineno;
+
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ printf("%s\n", usage_text[lineno]);
+ printf("Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Helper function for apply_filter, below.
+ */
+static void
+append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(querybuf, "%s%u", comma, cell->val);
+}
+
+/*
+ * Internal implementation of include_filter and exclude_filter
+ */
+static void
+apply_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids,
+ bool include)
+{
+ if (!oids || !oids->head)
+ return;
+ if (include)
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.=) ANY(array[", lval);
+ else
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.!=) ALL(array[", lval);
+ append_csv_oids(querybuf, oids);
+ appendPQExpBuffer(querybuf, "]::OID[])");
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all schemas matching the
+ * given list of patterns but not included in the given list of excluded Oids.
+ */
+static void
+expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the Oid list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(querybuf,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, querybuf, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all relations matching the
+ * given list of patterns but not included in the given list of excluded Oids
+ * nor in one of the given excluded namespaces. The relations are filtered by
+ * the given schema_quals. They are further filtered by the given
+ * relkind_quals, allowing the caller to restrict the relations to just indexes
+ * or tables. The missing_errtext should be a message for use in error
+ * messages if no matching relations are found and strict_names was specified.
+ */
+static void
+expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_relkind_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the Oid list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) %s\n",
+ relkind_quals);
+ exclude_filter(querybuf, "c.oid", exclude_oids);
+ exclude_filter(querybuf, "n.oid", exclude_nsp_oids);
+ processSQLNamePattern(settings.db, querybuf, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("%s \"%s\"", missing_errtext, cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find the Oids of all tables matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_table_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching tables were found for pattern",
+ get_table_relkind_quals());
+}
+
+/*
+ * Find the Oids of all indexes matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_index_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching indexes were found for pattern",
+ get_index_relkind_quals());
+}
+
+static void
+get_table_check_list(const SimpleOidList *include_nsp, const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl, const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to get_table_check_list");
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c, pg_catalog.pg_namespace n"
+ "\nWHERE n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\n AND c.relkind OPERATOR(pg_catalog.=) %s\n",
+ get_table_relkind_quals());
+ include_filter(querybuf, "n.oid", include_nsp);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+ include_filter(querybuf, "c.oid", include_tbl);
+ exclude_filter(querybuf, "c.oid", exclude_tbl);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+}
+
+static PGresult *
+ExecuteSqlQueryOrDie(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute the given SQL query. This function should only be used for queries
+ * which are not expected to fail under normal circumstances, as failures will
+ * result in the printing of error messages, which will look a bit messy when
+ * interleaved with corruption reports.
+ *
+ * On error, use the supplied error_context string and the error string
+ * returned from the database connection to print an error message for the
+ * user.
+ *
+ * The error_context argument is pfree'd by us at the end of the call.
+ */
+static PGresult *
+ExecuteSqlQuery(const char *query, char **error)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ *error = pstrdup(PQerrorMessage(settings.db));
+ return res;
+}
+
+/*
+ * Return the cached relkind quals string for tables, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_table_relkind_quals(void)
+{
+ if (!table_relkind_quals)
+ table_relkind_quals = psprintf("ANY(array['%c', '%c', '%c'])",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ return table_relkind_quals;
+}
+
+/*
+ * Return the cached relkind quals string for indexes, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_index_relkind_quals(void)
+{
+ if (!index_relkind_quals)
+ index_relkind_quals = psprintf("'%c'", RELKIND_INDEX);
+ return index_relkind_quals;
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.control b/contrib/pg_amcheck/pg_amcheck.control
new file mode 100644
index 0000000000..395f368101
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.control
@@ -0,0 +1,5 @@
+# pg_amcheck extension
+comment = 'command-line tool for verifying relation integrity'
+default_version = '1.3'
+module_pathname = '$libdir/pg_amcheck'
+relocatable = true
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..68be9c6585
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,60 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-i', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching indexes were found for pattern\E/,
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..4d8e61d871
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,231 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 39;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($)
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the node is running.
+sub corrupt_first_page($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+-- We'll corrupt all indexes in s1
+CREATE SCHEMA s1;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s1.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables in s2
+CREATE SCHEMA s2;
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+INSERT INTO s2.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s2.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables and indexes in s3
+CREATE SCHEMA s3;
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+INSERT INTO s3.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s3.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll leave everything in s4 uncorrupted
+CREATE SCHEMA s4;
+CREATE TABLE s4.t1 (a TEXT);
+CREATE TABLE s4.t2 (a TEXT);
+CREATE INDEX i1 ON s4.t1(a);
+CREATE INDEX i2 ON s4.t2(a);
+INSERT INTO s4.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s4.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('s1.i1');
+corrupt_first_page('s1.i2');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('s2.t1');
+corrupt_first_page('s2.t2');
+
+# Corrupt tables and indexes in schema "s3"
+remove_relation_file('s3.i1');
+corrupt_first_page('s3.i2');
+remove_relation_file('s3.t1');
+corrupt_first_page('s3.t2');
+
+# Leave schema "s4" alone
+
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's*', '-t', 't1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-N', 's1', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s1.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+
+# In schema s3, the tables and indexes are both corrupt. Ordinarily, checking
+# of indexes will not be performed for corrupt tables, but the --check-corrupt
+# option (-c) forces the indexes to also be checked.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s3.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s3.s2 reports index corruption');
+
+
+# Check that '-x' and '-X' work as expected. Since only index corruption
+# (and not table corruption) exists in s1, '-X' should give no errors, and
+# '-x' should give errors about index corruption.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck over tables and indexes in schema s1 reports corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over only tables in schema s1 reports no corruption');
+
+
+# Check that table corruption is reported as expected, with or without
+# index checking
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables in schema s2 reports table corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables and indexes in schema s2 reports table corruption');
+
+# Check that no corruption is reported in schema s4
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s4 reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt schemas
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-N', 's2', '-N', 's3' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt schemas reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt tables
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt tables reports no corruption');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..1cc36b25b7
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,489 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper functions
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ qr/\(relname=test,blkno=$blkno,offnum=$offnum,attnum=$attnum\)\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, '');
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..fdbb1ea402
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 4e833d79ef..1efca8adc4 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -119,6 +119,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&oldsnapshot;
&pageinspect;
&passwordcheck;
+ &pgamcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a4e1b28b38 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..fc36447dda
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,228 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<sect1 id="pgamcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> and <function>verify_heapam</function>
+ functions, and on having privileges to access the relations being checked.
+ </para>
+
+<synopsis>
+pg_amcheck [OPTION]... [DBNAME [USERNAME]]
+ General options:
+ -V, --version output version information, then exit
+ -?, --help show this help, then exit
+ -s, --strict-names require include patterns to match at least one entity each
+ -o, --on-error-stop stop checking at end of first corrupt page
+
+ Schema checking options:
+ -n, --schema=PATTERN check relations in the specified schema(s) only
+ -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)
+
+ Table checking options:
+ -t, --table=PATTERN check the specified table(s) only
+ -T, --exclude-table=PATTERN do NOT check the specified table(s)
+ -b, --startblock begin checking table(s) at the given starting block number
+ -e, --endblock check table(s) only up to the given ending block number
+ -f, --skip-all-frozen do NOT check blocks marked as all-frozen
+ -v, --skip-all-visible do NOT check blocks marked as all-visible
+
+ TOAST table checking options:
+ -z, --check-toast check associated toast tables and toast indexes
+ -Z, --skip-toast do NOT check associated toast tables and toast indexes
+ -B, --toast-startblock begin checking toast table(s) at the given starting block
+ -E, --toast-endblock check toast table(s) only up to the given ending block
+
+ Index checking options:
+ -x, --check-indexes check btree indexes associated with tables being checked
+ -X, --skip-indexes do NOT check any btree indexes
+ -i, --index=PATTERN check the specified index(es) only
+ -I, --exclude-index=PATTERN do NOT check the specified index(es)
+ -c, --check-corrupt check indexes even if their associated table is corrupt
+ -C, --skip-corrupt do NOT check indexes if their associated table is corrupt
+ -a, --heapallindexed check index tuples against the table tuples
+ -A, --no-heapallindexed do NOT check index tuples against the table tuples
+ -r, --rootdescend search from the root page for each index tuple
+ -R, --no-rootdescend do NOT search from the root page for each index tuple
+
+ Connection options:
+ -d, --dbname=DBNAME database name to connect to
+ -h, --host=HOSTNAME database server host or socket directory
+ -p, --port=PORT database server port
+ -U, --username=USERNAME database user name
+ -w, --no-password never prompt for password
+ -W, --password force password prompt (should happen automatically)
+</synopsis>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked in the visibility map as all-frozen or all-visible,
+ respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ For table corruption, each detected corruption is reported on two lines, the
+ first line shows the location and the second line shows a message describing
+ the problem.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "mytable",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --skip-indexes mydb
+(relname=mytable,blkno=17,offnum=12,attnum=)
+xmin 4294967295 precedes relation freeze threshold 17:1134217582
+(relname=mytable,blkno=960,offnum=4,attnum=)
+data begins at offset 152 beyond the tuple length 58
+(relname=mytable,blkno=960,offnum=4,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=5,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=6,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=7,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+(relname=mytable,blkno=1147,offnum=2,attnum=)
+number of attributes 2047 exceeds maximum expected for table 3
+(relname=mytable,blkno=1147,offnum=10,attnum=)
+tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+(relname=mytable,blkno=1147,offnum=15,attnum=)
+number of attributes 67 exceeds maximum expected for table 3
+(relname=mytable,blkno=1147,offnum=16,attnum=1)
+attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+(relname=mytable,blkno=1147,offnum=18,attnum=2)
+final toast chunk number 0 differs from expected value 6
+(relname=mytable,blkno=1147,offnum=19,attnum=2)
+toasted value for attribute 2 missing from toast table
+(relname=mytable,blkno=1147,offnum=21,attnum=)
+tuple is marked as only locked, but also claims key columns were updated
+(relname=mytable,blkno=1147,offnum=22,attnum=)
+multitransaction ID 1775655 is from before relation cutoff 2355572
+</screen>
+
+ <para>
+ For index corruption, the output is more free-form, and may span differing
+ numbers of lines per corruption detected.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt index,
+ "corrupt_index", with corruption in the page header, along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index "corrupt_index" is not a btree
+LOCATION: _bt_getmeta, nbtpage.c:152
+</screen>
+
+ <para>
+ Checking again after rebuilding the index but corrupting the contents,
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index tuple size does not equal lp_len in index "corrupt_index"
+DETAIL: Index tid=(39,49) tuple size=3373 lp_len=24 page lsn=0/2B548C0.
+HINT: This could be a torn page problem.
+LOCATION: bt_target_page_check, verify_nbtree.c:1125
+</screen>
+
+ </sect2>
+</sect1>
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 89e1b39036..8cf0554823 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'pg_standby', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bca30f3dde..369b8e7c6f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -102,6 +102,7 @@ AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AlternativeSubPlanState
+AmCheckSettings
AnalyzeAttrComputeStatsFunc
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
@@ -403,6 +404,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnectOptions
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
v17-0003-Creating-non-throwing-interface-to-clog-and-slru.patchapplication/octet-stream; name=v17-0003-Creating-non-throwing-interface-to-clog-and-slru.patch; x-unix-mode=0644Download
From f34c48af2746b752d11c3d55ce7604785dc44068 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 6 Oct 2020 11:28:18 -0700
Subject: [PATCH v17 3/4] Creating non-throwing interface to clog and slru.
---
src/backend/access/transam/clog.c | 21 +++---
src/backend/access/transam/commit_ts.c | 4 +-
src/backend/access/transam/multixact.c | 16 ++---
src/backend/access/transam/slru.c | 23 +++---
src/backend/access/transam/subtrans.c | 4 +-
src/backend/access/transam/transam.c | 98 +++++++++-----------------
src/backend/commands/async.c | 4 +-
src/backend/storage/lmgr/predicate.c | 4 +-
src/include/access/clog.h | 18 +----
src/include/access/clogdefs.h | 33 +++++++++
src/include/access/slru.h | 6 +-
src/include/access/transam.h | 3 +
12 files changed, 122 insertions(+), 112 deletions(-)
create mode 100644 src/include/access/clogdefs.h
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 034349aa7b..a2eb3e2983 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -357,7 +357,7 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
* write-busy, since we don't care if the update reaches disk sooner than
* we think.
*/
- slotno = SimpleLruReadPage(XactCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
+ slotno = SimpleLruReadPage(XactCtl, pageno, XLogRecPtrIsInvalid(lsn), xid, true);
/*
* Set the main transaction id, if any.
@@ -631,7 +631,7 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
* for most uses; TransactionLogFetch() in transam.c is the intended caller.
*/
XidStatus
-TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
+TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn, bool throwError)
{
int pageno = TransactionIdToPage(xid);
int byteno = TransactionIdToByte(xid);
@@ -643,13 +643,18 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(XactCtl, pageno, xid);
- byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
+ slotno = SimpleLruReadPage_ReadOnly(XactCtl, pageno, xid, throwError);
+ if (slotno == InvalidSlotNo)
+ status = TRANSACTION_STATUS_UNKNOWN;
+ else
+ {
+ byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
- status = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
+ status = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
- lsnindex = GetLSNIndex(slotno, xid);
- *lsn = XactCtl->shared->group_lsn[lsnindex];
+ lsnindex = GetLSNIndex(slotno, xid);
+ *lsn = XactCtl->shared->group_lsn[lsnindex];
+ }
LWLockRelease(XactSLRULock);
@@ -796,7 +801,7 @@ TrimCLOG(void)
int slotno;
char *byteptr;
- slotno = SimpleLruReadPage(XactCtl, pageno, false, xid);
+ slotno = SimpleLruReadPage(XactCtl, pageno, false, xid, true);
byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
/* Zero so-far-unused positions in the current byte */
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index cb8a968801..98c685405c 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -237,7 +237,7 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
- slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
+ slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid, true);
TransactionIdSetCommitTs(xid, ts, nodeid, slotno);
for (i = 0; i < nsubxids; i++)
@@ -342,7 +342,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
}
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(CommitTsCtl, pageno, xid);
+ slotno = SimpleLruReadPage_ReadOnly(CommitTsCtl, pageno, xid, true);
memcpy(&entry,
CommitTsCtl->shared->page_buffer[slotno] +
SizeOfCommitTimestampEntry * entryno,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 81752b68eb..5c0213b06d 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -881,7 +881,7 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
* enough that a MultiXactId is really involved. Perhaps someday we'll
* take the trouble to generalize the slru.c error reporting code.
*/
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -914,7 +914,7 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
if (pageno != prev_pageno)
{
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi, true);
prev_pageno = pageno;
}
@@ -1345,7 +1345,7 @@ retry:
pageno = MultiXactIdToOffsetPage(multi);
entryno = MultiXactIdToOffsetEntry(multi);
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
offset = *offptr;
@@ -1377,7 +1377,7 @@ retry:
entryno = MultiXactIdToOffsetEntry(tmpMXact);
if (pageno != prev_pageno)
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -1418,7 +1418,7 @@ retry:
if (pageno != prev_pageno)
{
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi, true);
prev_pageno = pageno;
}
@@ -2063,7 +2063,7 @@ TrimMultiXact(void)
int slotno;
MultiXactOffset *offptr;
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -2095,7 +2095,7 @@ TrimMultiXact(void)
int memberoff;
memberoff = MXOffsetToMemberOffset(offset);
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset, true);
xidptr = (TransactionId *)
(MultiXactMemberCtl->shared->page_buffer[slotno] + memberoff);
@@ -2749,7 +2749,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return false;
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi);
+ slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
offset = *offptr;
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 16a7898697..daa145eeff 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -385,14 +385,15 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
* The passed-in xid is used only for error reporting, and may be
* InvalidTransactionId if no specific xid is associated with the action.
*
- * Return value is the shared-buffer slot number now holding the page.
- * The buffer's LRU access info is updated.
+ * On error, when throwError is false, the return value is negative.
+ * Otherwise, return value is the shared-buffer slot number now holding the
+ * page, and the buffer's LRU access info is updated.
*
* Control lock must be held at entry, and will be held at exit.
*/
int
SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
- TransactionId xid)
+ TransactionId xid, bool throwError)
{
SlruShared shared = ctl->shared;
@@ -465,7 +466,11 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
/* Now it's okay to ereport if we failed */
if (!ok)
- SlruReportIOError(ctl, pageno, xid);
+ {
+ if (throwError)
+ SlruReportIOError(ctl, pageno, xid);
+ return InvalidSlotNo;
+ }
SlruRecentlyUsed(shared, slotno);
@@ -484,14 +489,16 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
* The passed-in xid is used only for error reporting, and may be
* InvalidTransactionId if no specific xid is associated with the action.
*
- * Return value is the shared-buffer slot number now holding the page.
- * The buffer's LRU access info is updated.
+ * On error, when throwError is false, the return value is negative.
+ * Otherwise, return value is the shared-buffer slot number now holding the
+ * page, and the buffer's LRU access info is updated.
*
* Control lock must NOT be held at entry, but will be held at exit.
* It is unspecified whether the lock will be shared or exclusive.
*/
int
-SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
+SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid,
+ bool throwError)
{
SlruShared shared = ctl->shared;
int slotno;
@@ -520,7 +527,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
LWLockRelease(shared->ControlLock);
LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
- return SimpleLruReadPage(ctl, pageno, true, xid);
+ return SimpleLruReadPage(ctl, pageno, true, xid, throwError);
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0111e867c7..353b946731 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -83,7 +83,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
- slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
+ slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid, true);
ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
ptr += entryno;
@@ -123,7 +123,7 @@ SubTransGetParent(TransactionId xid)
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid);
+ slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid, true);
ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
ptr += entryno;
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index a28918657c..88f867e5ef 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -35,7 +35,8 @@ static XidStatus cachedFetchXidStatus;
static XLogRecPtr cachedCommitLSN;
/* Local functions */
-static XidStatus TransactionLogFetch(TransactionId transactionId);
+static XidStatus TransactionLogFetch(TransactionId transactionId,
+ bool throwError);
/* ----------------------------------------------------------------
@@ -49,7 +50,7 @@ static XidStatus TransactionLogFetch(TransactionId transactionId);
* TransactionLogFetch --- fetch commit status of specified transaction id
*/
static XidStatus
-TransactionLogFetch(TransactionId transactionId)
+TransactionLogFetch(TransactionId transactionId, bool throwError)
{
XidStatus xidstatus;
XLogRecPtr xidlsn;
@@ -76,14 +77,16 @@ TransactionLogFetch(TransactionId transactionId)
/*
* Get the transaction status.
*/
- xidstatus = TransactionIdGetStatus(transactionId, &xidlsn);
+ xidstatus = TransactionIdGetStatus(transactionId, &xidlsn, throwError);
/*
* Cache it, but DO NOT cache status for unfinished or sub-committed
* transactions! We only cache status that is guaranteed not to change.
+ * Likewise, DO NOT cache when the status is unknown.
*/
if (xidstatus != TRANSACTION_STATUS_IN_PROGRESS &&
- xidstatus != TRANSACTION_STATUS_SUB_COMMITTED)
+ xidstatus != TRANSACTION_STATUS_SUB_COMMITTED &&
+ xidstatus != TRANSACTION_STATUS_UNKNOWN)
{
cachedFetchXid = transactionId;
cachedFetchXidStatus = xidstatus;
@@ -96,6 +99,7 @@ TransactionLogFetch(TransactionId transactionId)
/* ----------------------------------------------------------------
* Interface functions
*
+ * TransactionIdResolveStatus
* TransactionIdDidCommit
* TransactionIdDidAbort
* ========
@@ -115,24 +119,17 @@ TransactionLogFetch(TransactionId transactionId)
*/
/*
- * TransactionIdDidCommit
- * True iff transaction associated with the identifier did commit.
- *
- * Note:
- * Assumes transaction identifier is valid and exists in clog.
+ * TransactionIdResolveStatus
+ * Returns the status of the transaction associated with the identifer,
+ * recursively resolving sub-committed transaction status by checking
+ * the parent transaction.
*/
-bool /* true if given transaction committed */
-TransactionIdDidCommit(TransactionId transactionId)
+XidStatus
+TransactionIdResolveStatus(TransactionId transactionId, bool throwError)
{
XidStatus xidstatus;
- xidstatus = TransactionLogFetch(transactionId);
-
- /*
- * If it's marked committed, it's committed.
- */
- if (xidstatus == TRANSACTION_STATUS_COMMITTED)
- return true;
+ xidstatus = TransactionLogFetch(transactionId, throwError);
/*
* If it's marked subcommitted, we have to check the parent recursively.
@@ -153,21 +150,31 @@ TransactionIdDidCommit(TransactionId transactionId)
TransactionId parentXid;
if (TransactionIdPrecedes(transactionId, TransactionXmin))
- return false;
+ return TRANSACTION_STATUS_ABORTED;
parentXid = SubTransGetParent(transactionId);
if (!TransactionIdIsValid(parentXid))
{
elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
transactionId);
- return false;
+ return TRANSACTION_STATUS_ABORTED;
}
- return TransactionIdDidCommit(parentXid);
+ return TransactionIdResolveStatus(parentXid, throwError);
}
+ return xidstatus;
+}
- /*
- * It's not committed.
- */
- return false;
+/*
+ * TransactionIdDidCommit
+ * True iff transaction associated with the identifier did commit.
+ *
+ * Note:
+ * Assumes transaction identifier is valid and exists in clog.
+ */
+bool /* true if given transaction committed */
+TransactionIdDidCommit(TransactionId transactionId)
+{
+ return (TransactionIdResolveStatus(transactionId, true) ==
+ TRANSACTION_STATUS_COMMITTED);
}
/*
@@ -180,43 +187,8 @@ TransactionIdDidCommit(TransactionId transactionId)
bool /* true if given transaction aborted */
TransactionIdDidAbort(TransactionId transactionId)
{
- XidStatus xidstatus;
-
- xidstatus = TransactionLogFetch(transactionId);
-
- /*
- * If it's marked aborted, it's aborted.
- */
- if (xidstatus == TRANSACTION_STATUS_ABORTED)
- return true;
-
- /*
- * If it's marked subcommitted, we have to check the parent recursively.
- * However, if it's older than TransactionXmin, we can't look at
- * pg_subtrans; instead assume that the parent crashed without cleaning up
- * its children.
- */
- if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
- {
- TransactionId parentXid;
-
- if (TransactionIdPrecedes(transactionId, TransactionXmin))
- return true;
- parentXid = SubTransGetParent(transactionId);
- if (!TransactionIdIsValid(parentXid))
- {
- /* see notes in TransactionIdDidCommit */
- elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
- transactionId);
- return true;
- }
- return TransactionIdDidAbort(parentXid);
- }
-
- /*
- * It's not aborted.
- */
- return false;
+ return (TransactionIdResolveStatus(transactionId, true) ==
+ TRANSACTION_STATUS_ABORTED);
}
/*
@@ -419,7 +391,7 @@ TransactionIdGetCommitLSN(TransactionId xid)
/*
* Get the transaction status.
*/
- (void) TransactionIdGetStatus(xid, &result);
+ (void) TransactionIdGetStatus(xid, &result, true);
return result;
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8dbcace3f9..a49126dba0 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1477,7 +1477,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
slotno = SimpleLruZeroPage(NotifyCtl, pageno);
else
slotno = SimpleLruReadPage(NotifyCtl, pageno, true,
- InvalidTransactionId);
+ InvalidTransactionId, true);
/* Note we mark the page dirty before writing in it */
NotifyCtl->shared->page_dirty[slotno] = true;
@@ -2010,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
* part of the page we will actually inspect.
*/
slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
- InvalidTransactionId);
+ InvalidTransactionId, true);
if (curpage == QUEUE_POS_PAGE(head))
{
/* we only want to read as far as head */
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 8a365b400c..6cf12e46f6 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -904,7 +904,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
slotno = SimpleLruZeroPage(SerialSlruCtl, targetPage);
}
else
- slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
+ slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid, true);
SerialValue(slotno, xid) = minConflictCommitSeqNo;
SerialSlruCtl->shared->page_dirty[slotno] = true;
@@ -946,7 +946,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
* but will return with that lock held, which must then be released.
*/
slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
- SerialPage(xid), xid);
+ SerialPage(xid), xid, true);
val = SerialValue(slotno, xid);
LWLockRelease(SerialSLRULock);
return val;
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index 6c840cbf29..cf299cd8f6 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -11,24 +11,11 @@
#ifndef CLOG_H
#define CLOG_H
+#include "access/clogdefs.h"
#include "access/xlogreader.h"
#include "storage/sync.h"
#include "lib/stringinfo.h"
-/*
- * Possible transaction statuses --- note that all-zeroes is the initial
- * state.
- *
- * A "subcommitted" transaction is a committed subtransaction whose parent
- * hasn't committed or aborted yet.
- */
-typedef int XidStatus;
-
-#define TRANSACTION_STATUS_IN_PROGRESS 0x00
-#define TRANSACTION_STATUS_COMMITTED 0x01
-#define TRANSACTION_STATUS_ABORTED 0x02
-#define TRANSACTION_STATUS_SUB_COMMITTED 0x03
-
typedef struct xl_clog_truncate
{
int pageno;
@@ -38,7 +25,8 @@ typedef struct xl_clog_truncate
extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
-extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
+extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn,
+ bool throwError);
extern Size CLOGShmemBuffers(void);
extern Size CLOGShmemSize(void);
diff --git a/src/include/access/clogdefs.h b/src/include/access/clogdefs.h
new file mode 100644
index 0000000000..0f9996bb08
--- /dev/null
+++ b/src/include/access/clogdefs.h
@@ -0,0 +1,33 @@
+/*
+ * clogdefs.h
+ *
+ * PostgreSQL transaction-commit-log manager
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/clogdefs.h
+ */
+#ifndef CLOGDEFS_H
+#define CLOGDEFS_H
+
+/*
+ * Possible transaction statuses --- note that all-zeroes is the initial
+ * state.
+ *
+ * A "subcommitted" transaction is a committed subtransaction whose parent
+ * hasn't committed or aborted yet.
+ *
+ * An "unknown" status indicates an error condition, such as when the clog has
+ * been erroneously truncated and the commit status of a transaction cannot be
+ * determined.
+ */
+typedef enum XidStatus {
+ TRANSACTION_STATUS_IN_PROGRESS = 0x00,
+ TRANSACTION_STATUS_COMMITTED = 0x01,
+ TRANSACTION_STATUS_ABORTED = 0x02,
+ TRANSACTION_STATUS_SUB_COMMITTED = 0x03,
+ TRANSACTION_STATUS_UNKNOWN = 0x04 /* error condition */
+} XidStatus;
+
+#endif /* CLOG_H */
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b39b43504d..0b6a5669d8 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -133,6 +133,8 @@ typedef struct SlruCtlData
typedef SlruCtlData *SlruCtl;
+#define InvalidSlotNo ((int) -1)
+
extern Size SimpleLruShmemSize(int nslots, int nlsns);
extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
@@ -140,9 +142,9 @@ extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
SyncRequestHandler sync_handler);
extern int SimpleLruZeroPage(SlruCtl ctl, int pageno);
extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
- TransactionId xid);
+ TransactionId xid, bool throwError);
extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
- TransactionId xid);
+ TransactionId xid, bool throwError);
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 2f1f144db4..7d5e2f614d 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -14,6 +14,7 @@
#ifndef TRANSAM_H
#define TRANSAM_H
+#include "access/clogdefs.h"
#include "access/xlogdefs.h"
@@ -264,6 +265,8 @@ extern PGDLLIMPORT VariableCache ShmemVariableCache;
/*
* prototypes for functions in transam/transam.c
*/
+extern XidStatus TransactionIdResolveStatus(TransactionId transactionId,
+ bool throwError);
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
--
2.21.1 (Apple Git-122.3)
v17-0004-Using-non-throwing-clog-interface-from-amcheck.patchapplication/octet-stream; name=v17-0004-Using-non-throwing-clog-interface-from-amcheck.patch; x-unix-mode=0644Download
From 574d7cb2d8a169fcd50653ec36a7ad919540d808 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 6 Oct 2020 13:33:23 -0700
Subject: [PATCH v17 4/4] Using non-throwing clog interface from amcheck
Converting the heap checking functions to use the recently introduced
non-throwing interface to clog when checking transaction commit status, and
adding corruption reports about missing clog rather than aborting.
---
contrib/amcheck/verify_heapam.c | 84 ++++++++++++++++++++------------
src/tools/pgindent/typedefs.list | 1 -
2 files changed, 54 insertions(+), 31 deletions(-)
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index b46cce798e..9885a9d55c 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -10,6 +10,7 @@
*/
#include "postgres.h"
+#include "access/clogdefs.h"
#include "access/detoast.h"
#include "access/genam.h"
#include "access/heapam.h"
@@ -40,13 +41,6 @@ typedef enum XidBoundsViolation
XID_BOUNDS_OK
} XidBoundsViolation;
-typedef enum XidCommitStatus
-{
- XID_COMMITTED,
- XID_IN_PROGRESS,
- XID_ABORTED
-} XidCommitStatus;
-
typedef enum SkipPages
{
SKIP_PAGES_ALL_FROZEN,
@@ -80,7 +74,7 @@ typedef struct HeapCheckContext
* Cached copies of the most recently checked xid and its status.
*/
TransactionId cached_xid;
- XidCommitStatus cached_status;
+ XidStatus cached_status;
/* Values concerning the heap relation being checked */
Relation rel;
@@ -138,7 +132,7 @@ static void update_cached_xid_range(HeapCheckContext *ctx);
static void update_cached_mxid_range(HeapCheckContext *ctx);
static XidBoundsViolation check_mxid_in_range(MultiXactId mxid, HeapCheckContext *ctx);
static XidBoundsViolation check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx);
-static XidBoundsViolation get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status);
+static XidBoundsViolation get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidStatus *status);
/*
* Scan and report corruption in heap pages, optionally reconciling toasted
@@ -642,7 +636,7 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
else if (infomask & HEAP_MOVED_OFF ||
infomask & HEAP_MOVED_IN)
{
- XidCommitStatus status;
+ XidStatus status;
TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
switch (get_xid_status(xvac, ctx, &status))
@@ -686,17 +680,27 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
- case XID_COMMITTED:
- case XID_ABORTED:
+ case TRANSACTION_STATUS_COMMITTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier */
+ psprintf(_("old-style VACUUM FULL transaction ID %u commit status is lost"),
+ xvac));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
}
else
{
- XidCommitStatus status;
+ XidStatus status;
switch (get_xid_status(raw_xmin, ctx, &status))
{
@@ -737,12 +741,22 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_COMMITTED:
+ case TRANSACTION_STATUS_COMMITTED:
break;
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* insert or delete in progress */
- case XID_ABORTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier */
+ psprintf(_("raw xmin %u commit status is lost"),
+ raw_xmin));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
}
@@ -752,7 +766,7 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
{
if (infomask & HEAP_XMAX_IS_MULTI)
{
- XidCommitStatus status;
+ XidStatus status;
TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
switch (get_xid_status(xmax, ctx, &status))
@@ -795,12 +809,22 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
- case XID_COMMITTED:
- case XID_ABORTED:
+ case TRANSACTION_STATUS_COMMITTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_RECENTLY_DEAD or
* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier */
+ psprintf(_("xmax %u commit status is lost"),
+ xmax));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
@@ -1260,7 +1284,11 @@ check_tuple(HeapCheckContext *ctx)
break;
}
}
- /* If xmax is not a multixact and is normal, it should be within valid range */
+
+ /*
+ * If xmax is not a multixact and is normal, it should be within valid
+ * range
+ */
else
{
switch (get_xid_status(xmax, ctx, NULL))
@@ -1465,7 +1493,7 @@ check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
* status argument will be set with the status of the transaction ID.
*/
static XidBoundsViolation
-get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status)
+get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidStatus *status)
{
XidBoundsViolation result;
FullTransactionId fxid;
@@ -1516,19 +1544,15 @@ get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status
return result;
}
- *status = XID_COMMITTED;
+ *status = TRANSACTION_STATUS_COMMITTED;
LWLockAcquire(XactTruncationLock, LW_SHARED);
clog_horizon = FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid, ctx);
if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
{
if (TransactionIdIsCurrentTransactionId(xid))
- *status = XID_IN_PROGRESS;
- else if (TransactionIdDidCommit(xid))
- *status = XID_COMMITTED;
- else if (TransactionIdDidAbort(xid))
- *status = XID_ABORTED;
+ *status = TRANSACTION_STATUS_IN_PROGRESS;
else
- *status = XID_IN_PROGRESS;
+ *status = TransactionIdResolveStatus(xid, false);
}
LWLockRelease(XactTruncationLock);
ctx->cached_xid = xid;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 369b8e7c6f..6ca1bac21a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2793,7 +2793,6 @@ XactCallbackItem
XactEvent
XactLockTableWaitInfo
XidBoundsViolation
-XidCommitStatus
XidHorizonPrefetchState
XidStatus
XmlExpr
--
2.21.1 (Apple Git-122.3)
7 окт. 2020 г., в 04:20, Mark Dilger <mark.dilger@enterprisedb.com> написал(а):
On Oct 5, 2020, at 5:24 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
- This version does not change clog handling, which leaves Andrey's concern unaddressed. Peter also showed some support for (or perhaps just a lack of opposition to) doing more of what Andrey suggests. I may come back to this issue, depending on time available and further feedback.
Attached is a patch set that includes the clog handling as discussed. The 0001 and 0002 are effectively unchanged since version 16 posted yesterday, but this now includes 0003 which creates a non-throwing interface to clog, and 0004 which uses the non-throwing interface from within amcheck's heap checking functions.
I think this is a pretty good sketch for discussion, though I am unsatisfied with the lack of regression test coverage of verify_heapam in the presence of clog truncation. I was hoping to have that as part of v17, but since it is taking a bit longer than I anticipated, I'll have to come back with that in a later patch.
Many thanks, Mark! I really appreciate this functionality. It could save me many hours of recreating clogs.
I'm not entire sure this message is correct: psprintf(_("xmax %u commit status is lost")
It seems to me to be not commit status, but rather transaction status.
Thanks!
Best regards, Andrey Borodin.
On Oct 6, 2020, at 11:27 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
7 окт. 2020 г., в 04:20, Mark Dilger <mark.dilger@enterprisedb.com> написал(а):
On Oct 5, 2020, at 5:24 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
- This version does not change clog handling, which leaves Andrey's concern unaddressed. Peter also showed some support for (or perhaps just a lack of opposition to) doing more of what Andrey suggests. I may come back to this issue, depending on time available and further feedback.
Attached is a patch set that includes the clog handling as discussed. The 0001 and 0002 are effectively unchanged since version 16 posted yesterday, but this now includes 0003 which creates a non-throwing interface to clog, and 0004 which uses the non-throwing interface from within amcheck's heap checking functions.
I think this is a pretty good sketch for discussion, though I am unsatisfied with the lack of regression test coverage of verify_heapam in the presence of clog truncation. I was hoping to have that as part of v17, but since it is taking a bit longer than I anticipated, I'll have to come back with that in a later patch.
Many thanks, Mark! I really appreciate this functionality. It could save me many hours of recreating clogs.
You are quite welcome, though the thanks may be premature. I posted 0003 and 0004 patches mostly as concrete implementation examples that can be criticized.
I'm not entire sure this message is correct: psprintf(_("xmax %u commit status is lost")
It seems to me to be not commit status, but rather transaction status.
I have changed several such messages to say "transaction status" rather than "commit status". I'll be posting it in a separate email, shortly.
Thanks for reviewing!
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Oct 5, 2020, at 5:24 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
There remain a few open issues and/or things I did not implement:
- This version follows Robert's suggestion of using pg_class_aclcheck() to check that the caller has permission to select from the table being checked. This is inconsistent with the btree checking logic, which does no such check. These two approaches should be reconciled, but there was apparently no agreement on this issue.
This next version, attached, has the acl checking and associated documentation changes split out into patch 0005, making it easier to review in isolation from the rest of the patch series.
Independently of acl considerations, this version also has some verbiage changes in 0004, in response to Andrey's review upthread.
Attachments:
v18-0001-Adding-function-verify_heapam-to-amcheck-module.patchapplication/octet-stream; name=v18-0001-Adding-function-verify_heapam-to-amcheck-module.patch; x-unix-mode=0644Download
From 547c160ed0cd1a61bd0686ce62c349962fef9309 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 5 Oct 2020 15:42:18 -0700
Subject: [PATCH v18 1/5] Adding function verify_heapam to amcheck module
Adding new function verify_heapam for checking a heap relation and
optionally its associated toast relation, if any.
---
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 30 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/expected/check_heap.out | 194 +++
contrib/amcheck/sql/check_heap.sql | 116 ++
contrib/amcheck/t/001_verify_heapam.pl | 242 ++++
contrib/amcheck/verify_heapam.c | 1529 +++++++++++++++++++++++
doc/src/sgml/amcheck.sgml | 237 +++-
src/backend/access/heap/hio.c | 11 +
src/backend/access/transam/multixact.c | 19 +
src/include/access/multixact.h | 1 +
src/tools/pgindent/typedefs.list | 4 +
12 files changed, 2383 insertions(+), 9 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/t/001_verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..b82f221e50 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..7237ab738c
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,30 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip text default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass,
+ boolean,
+ boolean,
+ text,
+ bigint,
+ bigint)
+FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..882f853d56
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,194 @@
+CREATE TABLE heaptest (a integer, b text);
+REVOKE ALL ON heaptest FROM PUBLIC;
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+ERROR: invalid skip option
+HINT: Valid skip options are "all-visible", "all-frozen", and "none".
+-- Check specifying invalid block ranges when verifying an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 5, endblock := 8);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table, and that skip enum-like parameter is case-insensitive
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'None');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+CREATE ROLE regress_heaptest_role;
+-- verify permissions are checked (error due to function not callable)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ERROR: permission denied for function verify_heapam
+RESET ROLE;
+GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are now sufficient
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+RESET ROLE;
+-- Check specifying invalid block ranges when verifying a non-empty table.
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 10000);
+ERROR: ending block number must be between 0 and 0
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
+ERROR: starting block number must be between 0 and 0
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+CREATE TABLE test_partitioned (a int, b text default repeat('x', 5000))
+ PARTITION BY list (a);
+SELECT * FROM verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+CREATE TABLE test_partition partition OF test_partitioned FOR VALUES IN (1);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+INSERT INTO test_partitioned (a) (SELECT 1 FROM generate_series(1,1000) gs);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that indexes are rejected
+CREATE INDEX test_index ON test_partition (a);
+SELECT * FROM verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+-- Check that views are rejected
+CREATE VIEW test_view AS SELECT 1;
+SELECT * FROM verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+-- Check that sequences are rejected
+CREATE SEQUENCE test_sequence;
+SELECT * FROM verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+-- Check that foreign tables are rejected
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER dummy_server FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE test_foreign_table () SERVER dummy_server;
+SELECT * FROM verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
+-- cleanup
+DROP TABLE heaptest;
+DROP TABLE test_partition;
+DROP TABLE test_partitioned;
+DROP OWNED BY regress_heaptest_role; -- permissions
+DROP ROLE regress_heaptest_role;
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..c10a25f21c
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,116 @@
+CREATE TABLE heaptest (a integer, b text);
+REVOKE ALL ON heaptest FROM PUBLIC;
+
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+
+-- Check specifying invalid block ranges when verifying an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 5, endblock := 8);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table, and that skip enum-like parameter is case-insensitive
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'None');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+CREATE ROLE regress_heaptest_role;
+
+-- verify permissions are checked (error due to function not callable)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+
+-- verify permissions are now sufficient
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+-- Check specifying invalid block ranges when verifying a non-empty table.
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 10000);
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
+
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+CREATE TABLE test_partitioned (a int, b text default repeat('x', 5000))
+ PARTITION BY list (a);
+SELECT * FROM verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+CREATE TABLE test_partition partition OF test_partitioned FOR VALUES IN (1);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+INSERT INTO test_partitioned (a) (SELECT 1 FROM generate_series(1,1000) gs);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that indexes are rejected
+CREATE INDEX test_index ON test_partition (a);
+SELECT * FROM verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that views are rejected
+CREATE VIEW test_view AS SELECT 1;
+SELECT * FROM verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that sequences are rejected
+CREATE SEQUENCE test_sequence;
+SELECT * FROM verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that foreign tables are rejected
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER dummy_server FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE test_foreign_table () SERVER dummy_server;
+SELECT * FROM verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+
+-- cleanup
+DROP TABLE heaptest;
+DROP TABLE test_partition;
+DROP TABLE test_partitioned;
+DROP OWNED BY regress_heaptest_role; -- permissions
+DROP ROLE regress_heaptest_role;
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
new file mode 100644
index 0000000000..e7526c17b8
--- /dev/null
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -0,0 +1,242 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 65;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#
+# Check a table with data loaded but no corruption, freezing, etc.
+#
+fresh_test_table('test');
+check_all_options_uncorrupted('test', 'plain');
+
+#
+# Check a corrupt table
+#
+fresh_test_table('test');
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "plain corrupted table");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-visible')",
+ "plain corrupted table skipping all-visible");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "plain corrupted table skipping all-frozen");
+detects_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "plain corrupted table skipping toast");
+detects_corruption(
+ "verify_heapam('test', startblock := 0, endblock := 0)",
+ "plain corrupted table checking only block zero");
+
+#
+# Check a corrupt table with all-frozen data
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "all-frozen corrupted table");
+detects_no_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "all-frozen corrupted table skipping all-frozen");
+
+#
+# Check a corrupt table with corrupt page header
+#
+fresh_test_table('test');
+corrupt_first_page_and_header('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "corrupted test table with bad page header");
+
+#
+# Check an uncorrupted table with corrupt toast page header
+#
+fresh_test_table('test');
+my $toast = get_toast_for('test');
+corrupt_first_page_and_header($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast page header checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast page header skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast page header");
+
+#
+# Check an uncorrupted table with corrupt toast
+#
+fresh_test_table('test');
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table");
+
+#
+# Check an uncorrupted all-frozen table with corrupt toast
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "all-frozen table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "all-frozen table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table of all-frozen table");
+
+# Returns the filesystem path for the named relation.
+sub relation_filepath
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the fully qualified name of the toast table for the named relation
+sub get_toast_for
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ SELECT 'pg_toast.' || t.relname
+ FROM pg_catalog.pg_class c, pg_catalog.pg_class t
+ WHERE c.relname = '$relname'
+ AND c.reltoastrelid = t.oid));
+}
+
+# (Re)create and populate a test table of the given name.
+sub fresh_test_table
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ DROP TABLE IF EXISTS $relname CASCADE;
+ CREATE TABLE $relname (a integer, b text);
+ ALTER TABLE $relname SET (autovacuum_enabled=false);
+ ALTER TABLE $relname ALTER b SET STORAGE external;
+ INSERT INTO $relname (a, b)
+ (SELECT gs, repeat('b',gs*10) FROM generate_series(1,1000) gs);
+ ));
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+sub corrupt_first_page_internal
+{
+ my ($relname, $corrupt_header) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+
+ # If we corrupt the header, postgres won't allow the page into the buffer.
+ syswrite($fh, '\xFF\xFF\xFF\xFF', 8) if ($corrupt_header);
+
+ # Corrupt at least the line pointers. Exactly what this corrupts will
+ # depend on the page, as it may run past the line pointers into the user
+ # data. We stop short of writing 2048 bytes (2k), the smallest supported
+ # page size, as we don't want to corrupt the next page.
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+sub corrupt_first_page
+{
+ corrupt_first_page_internal($_[0], undef);
+}
+
+sub corrupt_first_page_and_header
+{
+ corrupt_first_page_internal($_[0], 1);
+}
+
+sub detects_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) > 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+sub detects_no_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) = 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+# Check various options are stable (don't abort) and do not report corruption
+# when running verify_heapam on an uncorrupted test table.
+#
+# The relname *must* be an uncorrupted table, or this will fail.
+#
+# The prefix is used to identify the test, along with the options,
+# and should be unique.
+sub check_all_options_uncorrupted
+{
+ my ($relname, $prefix) = @_;
+ for my $stop (qw(true false))
+ {
+ for my $check_toast (qw(true false))
+ {
+ for my $skip ("'none'", "'all-frozen'", "'all-visible'")
+ {
+ for my $startblock (qw(NULL 0))
+ {
+ for my $endblock (qw(NULL 0))
+ {
+ my $opts = "on_error_stop := $stop, " .
+ "check_toast := $check_toast, " .
+ "skip := $skip, " .
+ "startblock := $startblock, " .
+ "endblock := $endblock";
+
+ detects_no_corruption(
+ "verify_heapam('$relname', $opts)",
+ "$prefix: $opts");
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..7d7230a9c9
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1529 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/* The number of columns in tuples returned by verify_heapam */
+#define HEAPCHECK_RELATION_COLS 4
+
+typedef enum XidBoundsViolation
+{
+ XID_INVALID,
+ XID_IN_FUTURE,
+ XID_PRECEDES_DATMIN,
+ XID_PRECEDES_RELMIN,
+ XID_BOUNDS_OK
+} XidBoundsViolation;
+
+typedef enum XidCommitStatus
+{
+ XID_COMMITTED,
+ XID_IN_PROGRESS,
+ XID_ABORTED
+} XidCommitStatus;
+
+typedef enum SkipPages
+{
+ SKIP_PAGES_ALL_FROZEN,
+ SKIP_PAGES_ALL_VISIBLE,
+ SKIP_PAGES_NONE
+} SkipPages;
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam execution.
+ */
+typedef struct HeapCheckContext
+{
+ /*
+ * Cached copies of values from ShmemVariableCache and computed values
+ * from them.
+ */
+ FullTransactionId next_fxid; /* ShmemVariableCache->nextXid */
+ TransactionId next_xid; /* 32-bit version of next_fxid */
+ TransactionId oldest_xid; /* ShmemVariableCache->oldestXid */
+ FullTransactionId oldest_fxid; /* 64-bit version of oldest_xid, computed
+ * relative to next_fxid */
+
+ /*
+ * Cached copy of value from MultiXactState
+ */
+ MultiXactId next_mxact; /* MultiXactState->nextMXact */
+ MultiXactId oldest_mxact; /* MultiXactState->oldestMultiXactId */
+
+ /*
+ * Cached copies of the most recently checked xid and its status.
+ */
+ TransactionId cached_xid;
+ XidCommitStatus cached_status;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ FullTransactionId relfrozenfxid;
+ TransactionId relminmxid;
+ Relation toast_rel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Whether verify_heapam has yet encountered any corrupt tuples */
+ bool is_corrupt;
+
+ /* The descriptor and tuplestore for verify_heapam's result tuples */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void sanity_check_relation(Relation rel);
+static void check_tuple(HeapCheckContext *ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx);
+
+static bool check_tuple_attribute(HeapCheckContext *ctx);
+static bool check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx);
+
+static void report_corruption(HeapCheckContext *ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+static FullTransactionId FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx);
+static void update_cached_xid_range(HeapCheckContext *ctx);
+static void update_cached_mxid_range(HeapCheckContext *ctx);
+static XidBoundsViolation check_mxid_in_range(MultiXactId mxid, HeapCheckContext *ctx);
+static XidBoundsViolation check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx);
+static XidBoundsViolation get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status);
+
+/*
+ * Scan and report corruption in heap pages, optionally reconciling toasted
+ * attributes with entries in the associated toast table. Intended to be
+ * called from SQL with the following parameters:
+ *
+ * relation
+ * The Oid of the heap relation to be checked.
+ *
+ * on_error_stop:
+ * Whether to stop at the end of the first page for which errors are
+ * detected. Note that multiple rows may be returned.
+ *
+ * check_toast:
+ * Whether to check each toasted attribute against the toast table to
+ * verify that it can be found there.
+ *
+ * skip:
+ * What kinds of pages in the heap relation should be skipped. Valid
+ * options are "all-visible", "all-frozen", and "none".
+ *
+ * Returns to the SQL caller a set of tuples, each containing the location
+ * and a description of a corruption found in the heap.
+ *
+ * Note that if check_toast is true, it is the caller's responsibility to
+ * provide that the toast table and index are not corrupt, and that they
+ * do not become corrupt while this function is running.
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext old_context;
+ bool random_access;
+ HeapCheckContext ctx;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool check_toast;
+ SkipPages skip_option = SKIP_PAGES_NONE;
+ BlockNumber first_block;
+ BlockNumber last_block;
+ BlockNumber nblocks;
+ const char *skip;
+
+ /* Check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("relation cannot be null")));
+ relid = PG_GETARG_OID(0);
+
+ if (PG_ARGISNULL(1))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("on_error_stop cannot be null")));
+ on_error_stop = PG_GETARG_BOOL(1);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check_toast cannot be null")));
+ check_toast = PG_GETARG_BOOL(2);
+
+ if (PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("skip cannot be null")));
+ skip = text_to_cstring(PG_GETARG_TEXT_PP(3));
+ if (pg_strcasecmp(skip, "all-visible") == 0)
+ skip_option = SKIP_PAGES_ALL_VISIBLE;
+ else if (pg_strcasecmp(skip, "all-frozen") == 0)
+ skip_option = SKIP_PAGES_ALL_FROZEN;
+ else if (pg_strcasecmp(skip, "none") == 0)
+ skip_option = SKIP_PAGES_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid skip option"),
+ errhint("Valid skip options are \"all-visible\", \"all-frozen\", and \"none\".")));
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+ ctx.cached_xid = InvalidTransactionId;
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ old_context = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ random_access = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(random_access, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+ MemoryContextSwitchTo(old_context);
+
+ /* Open relation, check relkind and access method, and check privileges */
+ ctx.rel = relation_open(relid, AccessShareLock);
+ sanity_check_relation(ctx.rel);
+
+ /* Early exit if the relation is empty */
+ nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ if (!nblocks)
+ {
+ relation_close(ctx.rel, AccessShareLock);
+ PG_RETURN_NULL();
+ }
+
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ /* Validate block numbers, or handle nulls. */
+ if (PG_ARGISNULL(4))
+ first_block = 0;
+ else
+ {
+ int64 fb = PG_GETARG_INT64(4);
+
+ if (fb < 0 || fb >= nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("starting block number must be between 0 and %u",
+ nblocks - 1)));
+ first_block = (BlockNumber) fb;
+ }
+ if (PG_ARGISNULL(5))
+ last_block = nblocks - 1;
+ else
+ {
+ int64 lb = PG_GETARG_INT64(5);
+
+ if (lb < 0 || lb >= nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block number must be between 0 and %u",
+ nblocks - 1)));
+ last_block = (BlockNumber) lb;
+ }
+
+ /* Optionally open the toast relation, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid && check_toast)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toast_rel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ AccessShareLock);
+ offset = toast_open_indexes(ctx.toast_rel,
+ AccessShareLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /*
+ * Main relation has no associated toast relation, or we're
+ * intentionally skipping it.
+ */
+ ctx.toast_rel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ update_cached_xid_range(&ctx);
+ update_cached_mxid_range(&ctx);
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relfrozenfxid = FullTransactionIdFromXidAndCtx(ctx.relfrozenxid, &ctx);
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldest_xid = ctx.relfrozenxid;
+
+ for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++)
+ {
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_option != SKIP_PAGES_NONE)
+ {
+ int32 mapbits;
+
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_option == SKIP_PAGES_ALL_FROZEN)
+ {
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ if (skip_option == SKIP_PAGES_ALL_VISIBLE)
+ {
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ }
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it
+ * redirects to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
+ ItemId rditem;
+
+ if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ {
+ report_corruption(&ctx,
+ /*------
+ translator: the %u is an offset */
+ psprintf(_("line pointer redirection to item at offset %u exceeds maximum offset %u"),
+ (unsigned) rdoffnum,
+ (unsigned) maxoff));
+ continue;
+ }
+ rditem = PageGetItemId(ctx.page, rdoffnum);
+ if (!ItemIdIsUsed(rditem))
+ report_corruption(&ctx,
+ /*------
+ translator: the %u is an offset */
+ psprintf(_("line pointer redirection to unused item at offset %u"),
+ (unsigned) rdoffnum));
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ AccessShareLock);
+ if (ctx.toast_rel)
+ table_close(ctx.toast_rel, AccessShareLock);
+
+ /* Close the main relation */
+ relation_close(ctx.rel, AccessShareLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Check that a relation's relkind and access method are both supported,
+ * and that the caller has select privilege on the relation.
+ */
+static void
+sanity_check_relation(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ /*------
+ translator: %s is a user supplied object name */
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only heap AM is supported")));
+}
+
+/*
+ * Record a single corruption found in the table. The values in ctx should
+ * reflect the location of the corruption, and the msg argument should contain
+ * a human readable description of the corruption.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+report_corruption(HeapCheckContext *ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ values[2] = Int32GetDatum(ctx->attnum);
+ nulls[2] = (ctx->attnum < 0);
+ values[3] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using work_mem worth of memory building up the
+ * tuplestore. That's ok, but if we also leak the msg argument memory
+ * until the end of the query, we could exceed work_mem by more than a
+ * trivial amount. Therefore, free the msg argument each time we are
+ * called rather than waiting for our current memory context to be freed.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Construct the TupleDesc used to report messages about corruptions found
+ * while scanning the heap.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Check for tuple header corruption and tuple visibility.
+ *
+ * Since we do not hold a snapshot, tuple visibility is not a question of
+ * whether we should be able to see the tuple relative to any particular
+ * snapshot, but rather a question of whether it is safe and reasonable to
+ * to check the tuple attributes.
+ *
+ * Some kinds of corruption make it unsafe to check the tuple attributes, for
+ * example when the line pointer refers to a range of bytes outside the page.
+ * In such cases, we return false (not visible) after recording appropriate
+ * corruption messages.
+ *
+ * Some other kinds of tuple header corruption confuse the question of where
+ * the tuple attributes begin, or how long the nulls bitmap is, etc., making it
+ * unreasonable to attempt to check attributes, even if all candidate answers
+ * to those questions would not result in reading past the end of the line
+ * pointer or page. In such cases, like above, we record corruption messages
+ * about the header and then return false.
+ *
+ * Other kinds of tuple header corruption do not bear on the question of
+ * whether the tuple attributes can be checked, so we record corruption
+ * messages for them but do not base our visibility determination on them. (In
+ * other words, we do not return false merely because we detected them.)
+ *
+ * For visibility determination not specifically related to corruption, what we
+ * want to know is if a tuple is potentially visible to any running
+ * transaction. If you are tempted to replace this function's visibility logic
+ * with a call to another visibility checking function, keep in mind that this
+ * function does not update hint bits, as it seems imprudent to write hint bits
+ * (or anything at all) to a table during a corruption check. Nor does this
+ * function bother classifying tuple visibility beyond a boolean visible vs.
+ * not visible.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ *
+ * Returns whether the tuple is both visible and sufficiently sensible to
+ * undergo attribute checks.
+ */
+static bool
+check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+ bool header_garbled = false;
+ unsigned expected_hoff;;
+
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u is an offset, second %u is a length */
+ psprintf(_("data begins at offset %u beyond the tuple length %u"),
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ header_garbled = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ report_corruption(ctx,
+ pstrdup(_("tuple is marked as only locked, but also claims key columns were updated")));
+ header_garbled = true;
+ }
+
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ report_corruption(ctx,
+ pstrdup(_("multixact should not be marked committed")));
+
+ /*
+ * This condition is clearly wrong, but we do not consider the header
+ * garbled, because we don't rely on this property for determining if
+ * the tuple is visible or for interpreting other relevant header
+ * fields.
+ */
+ }
+
+ if (infomask & HEAP_HASNULL)
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts));
+ else
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader);
+ if (ctx->tuphdr->t_hoff != expected_hoff)
+ {
+ if ((infomask & HEAP_HASNULL) && ctx->natts == 1)
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent an offset */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, has nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff));
+ else if ((infomask & HEAP_HASNULL))
+ report_corruption(ctx,
+ /*------
+ translator: first and second %u represent an offset, third %u
+ represents the number of attributes */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, has nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
+ else if (ctx->natts == 1)
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent an offset */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, no nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff));
+ else
+ report_corruption(ctx,
+ /*------
+ translator: first and second %u represent an offset, third %u
+ represents the number of attributes */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, no nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
+ header_garbled = true;
+ }
+
+ if (header_garbled)
+ return false; /* checking of this tuple should not continue */
+
+ /*
+ * Ok, we can examine the header for tuple visibility purposes, though we
+ * still need to be careful about a few remaining types of header
+ * corruption. This logic roughly follows that of
+ * HeapTupleSatisfiesVacuum. Where possible the comments indicate which
+ * HTSV_Result we think that function might return for this tuple.
+ */
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ return false; /* HEAPTUPLE_DEAD */
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ XidCommitStatus status;
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ switch (get_xid_status(xvac, ctx, &status))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("old-style VACUUM FULL transaction ID is invalid")));
+ return false; /* corrupt */
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u equals or exceeds next valid transaction ID %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u precedes relation freeze threshold %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u precedes oldest valid transaction ID %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ break;
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_IN_PROGRESS:
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ case XID_COMMITTED:
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+ }
+ else
+ {
+ XidCommitStatus status;
+
+ switch (get_xid_status(raw_xmin, ctx, &status))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("raw xmin is invalid")));
+ return false;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u equals or exceeds next valid transaction ID %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u precedes relation freeze threshold %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u precedes oldest valid transaction ID %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_COMMITTED:
+ break;
+ case XID_IN_PROGRESS:
+ return true; /* insert or delete in progress */
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ XidCommitStatus status;
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ switch (get_xid_status(xmax, ctx, &status))
+ {
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("xmax is invalid")));
+ return false; /* corrupt */
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u equals or exceeds next valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u precedes relation freeze threshold %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u precedes oldest valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_IN_PROGRESS:
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ case XID_COMMITTED:
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or
+ * HEAPTUPLE_DEAD */
+ }
+ }
+
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS or
+ * HEAPTUPLE_LIVE */
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true; /* not dead */
+}
+
+/*
+ * Check the current toast tuple against the state tracked in ctx, recording
+ * any corruption found in ctx->tupstore.
+ *
+ * This is not equivalent to running verify_heapam on the toast table itself,
+ * and is not hardened against corruption of the toast table. Rather, when
+ * validating a toasted attribute in the main table, the sequence of toast
+ * tuples that store the toasted value are retrieved and checked in order, with
+ * each toast tuple being checked against where we are in the sequence, as well
+ * as each toast tuple having its varlena structure sanity checked.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk sequence number is null")));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk data is null")));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ }
+ else
+ {
+ /* should never happen */
+ uint32 header = ((varattrib_4b *) chunk)->va_4byte.va_header;
+
+ report_corruption(ctx,
+ /*------
+ translator: %0x represents a bit pattern in hexadecimal, %d represents
+ the sequence number */
+ psprintf(_("corrupt extended toast chunk has invalid varlena header: %0x (sequence number %d)"),
+ header, curchunk));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a sequence number */
+ psprintf(_("toast chunk sequence number %u does not match the expected sequence number %u"),
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a sequence number */
+ psprintf(_("toast chunk sequence number %u exceeds the end chunk sequence number %u"),
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a chunk size */
+ psprintf(_("toast chunk size %u differs from the expected size %u"),
+ chunksize, expected_size));
+ return;
+ }
+}
+
+/*
+ * Check the current attribute as tracked in ctx, recording any corruption
+ * found in ctx->tupstore.
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in the
+ * case of a toasted value, optionally continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The checks
+ * performed by Asserts present in those two functions are also performed here.
+ * In cases where those two functions are a bit cavalier in their assumptions
+ * about data being correct, we perform additional checks not present in either
+ * of those two functions. Where some condition is checked in both of those
+ * functions, we perform it here twice, as we parallel the logical flow of
+ * those two functions. The presence of duplicate checks seems a reasonable
+ * price to pay for keeping this code tightly coupled with the code it
+ * protects.
+ *
+ * Returns true if the tuple attribute is sane enough for processing to
+ * continue on to the next attribute, false otherwise.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext *ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask;
+ Form_pg_attribute thisatt;
+
+ infomask = ctx->tuphdr->t_infomask;
+ thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and fourth
+ %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u starts at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and
+ fourth %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u ends at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number, second %u
+ represents an enumeration value */
+ psprintf(_("toasted attribute %u has unexpected TOAST tag %u"),
+ ctx->attnum,
+ va_tag));
+ /* We can't know where the next attribute begins */
+ return false;
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and fourth
+ %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u ends at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+
+ return false;
+ }
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("attribute %u is external but tuple header flag HEAP_HASEXTERNAL not set"),
+ ctx->attnum));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("attribute %u is external but relation has no toast relation"),
+ ctx->attnum));
+ return true;
+ }
+
+ /* If we were told to skip toast checking, then we're done. */
+ if (ctx->toast_rel == NULL)
+ return true;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toast_rel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ ctx->chunkno++;
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a chunk number */
+ psprintf(_("final toast chunk number %u differs from expected value %u"),
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("toasted value for attribute %u missing from toast table"),
+ ctx->attnum));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * Check the current tuple as tracked in ctx, recording any corruption found in
+ * ctx->tupstore.
+ */
+static void
+check_tuple(HeapCheckContext *ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /*
+ * If we report corruption before iterating over individual attributes, we
+ * need attnum to be reported as NULL. Set that up before any corruption
+ * reporting might happen.
+ */
+ ctx->attnum = -1;
+
+ /*
+ * If the line pointer for this tuple does not reserve enough space for a
+ * complete tuple header, we dare not read the tuple header.
+ */
+ if (ctx->lp_len < MAXALIGN(SizeofHeapTupleHeader))
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents a length, second %u represents a size
+ */
+ psprintf(_("line pointer length %u is less than the minimum tuple header size %u"),
+ ctx->lp_len, (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ return;
+ }
+
+ /* If xmin is normal, it should be within valid range */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ switch (get_xid_status(xmin, ctx, NULL))
+ {
+ case XID_INVALID:
+ case XID_BOUNDS_OK:
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u equals or exceeds next valid transaction ID %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u precedes oldest valid transaction ID %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u precedes relation freeze threshold %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ fatal = true;
+ break;
+ }
+
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+
+ /* If xmax is a multixact, it should be within valid range */
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ switch (check_mxid_valid_in_rel(xmax, ctx))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("multitransaction ID is invalid")));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: both %u are multitransaction IDs */
+ psprintf(_("multitransaction ID %u precedes relation minimum multitransaction ID threshold %u"),
+ xmax, ctx->relminmxid));
+ fatal = true;
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: Both %u are multitransaction IDs */
+ psprintf(_("multitransaction ID %u precedes oldest valid multitransaction ID threshold %u"),
+ xmax, ctx->oldest_mxact));
+ fatal = true;
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: %u is a multitransaction ID */
+ psprintf(_("multitransaction ID %u equals or exceeds next valid multitransaction ID %u"),
+ xmax,
+ ctx->next_mxact));
+ fatal = true;
+ break;
+ case XID_BOUNDS_OK:
+ break;
+ }
+ }
+ /* If xmax is not a multixact and is normal, it should be within valid range */
+ else
+ {
+ switch (get_xid_status(xmax, ctx, NULL))
+ {
+ case XID_INVALID:
+ case XID_BOUNDS_OK:
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u equals or exceeds next valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_DATMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u precedes oldest valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u precedes relation freeze threshold %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ fatal = true;
+ }
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Check various forms of tuple header corruption. If the header is too
+ * corrupt to continue checking, or if the tuple is not visible to anyone,
+ * we cannot continue with other checks.
+ */
+ if (!check_tuple_header_and_visibilty(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * The tuple is visible, so it must be compatible with the current version
+ * of the relation descriptor. It might have fewer columns than are
+ * present in the relation descriptor, but it cannot have more.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u are a number */
+ psprintf(_("number of attributes %u exceeds maximum expected for table %u"),
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
+ return;
+ }
+
+ /*
+ * Check each attribute unless we hit corruption that confuses what to do
+ * next, at which point we abort further attribute checks for this tuple.
+ * Note that we don't abort for all types of corruption, only for those
+ * types where we don't know how to continue.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ if (!check_tuple_attribute(ctx))
+ break; /* cannot continue */
+}
+
+/*
+ * Convert a TransactionId into a FullTransactionId using our cached values of
+ * the valid transaction ID range. It is the caller's responsibility to have
+ * already updated the cached values, if necessary.
+ */
+static FullTransactionId
+FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
+{
+ uint32 epoch;
+
+ if (!TransactionIdIsNormal(xid))
+ return FullTransactionIdFromEpochAndXid(0, xid);
+ epoch = EpochFromFullTransactionId(ctx->next_fxid);
+ if (xid > ctx->next_xid)
+ epoch--;
+ return FullTransactionIdFromEpochAndXid(epoch, xid);
+}
+
+/*
+ * Update our cached range of valid transaction IDs.
+ */
+static void
+update_cached_xid_range(HeapCheckContext *ctx)
+{
+ /* Make cached copies */
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ ctx->next_fxid = ShmemVariableCache->nextXid;
+ ctx->oldest_xid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+
+ /* And compute alternate versions of the same */
+ ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
+ ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+}
+
+/*
+ * Update our cached range of valid multitransaction IDs.
+ */
+static void
+update_cached_mxid_range(HeapCheckContext *ctx)
+{
+ ReadMultiXactIdRange(&ctx->oldest_mxact, &ctx->next_mxact);
+}
+
+/*
+ * Return whether the given FullTransactionId is within our cached valid
+ * transaction ID range.
+ */
+static inline bool
+fxid_in_cached_range(FullTransactionId fxid, const HeapCheckContext *ctx)
+{
+ return (FullTransactionIdPrecedesOrEquals(ctx->oldest_fxid, fxid) &&
+ FullTransactionIdPrecedes(fxid, ctx->next_fxid));
+}
+
+/*
+ * Checks wheter a multitransaction ID is in the cached valid range, returning
+ * the nature of the range violation, if any.
+ */
+static XidBoundsViolation
+check_mxid_in_range(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ if (!TransactionIdIsValid(mxid))
+ return XID_INVALID;
+ if (MultiXactIdPrecedes(mxid, ctx->relminmxid))
+ return XID_PRECEDES_RELMIN;
+ if (MultiXactIdPrecedes(mxid, ctx->oldest_mxact))
+ return XID_PRECEDES_DATMIN;
+ if (MultiXactIdPrecedesOrEquals(ctx->next_mxact, mxid))
+ return XID_IN_FUTURE;
+ return XID_BOUNDS_OK;
+}
+
+/*
+ * Checks whether the given mxid is valid to appear in the heap being checked,
+ * returning the nature of the range violation, if any.
+ *
+ * This function attempts to return quickly by caching the known valid mxid
+ * range in ctx. Callers should already have performed the initial setup of
+ * the cache prior to the first call to this function.
+ */
+static XidBoundsViolation
+check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ XidBoundsViolation result;
+
+ result = check_mxid_in_range(mxid, ctx);
+ if (result == XID_BOUNDS_OK)
+ return XID_BOUNDS_OK;
+
+ /* The range may have advanced. Recheck. */
+ update_cached_mxid_range(ctx);
+ return check_mxid_in_range(mxid, ctx);
+}
+
+/*
+ * Checks whether the given transaction ID is (or was recently) valid to appear
+ * in the heap being checked, or whether it is too old or too new to appear in
+ * the relation, returning information about the nature of the bounds violation.
+ *
+ * We cache the range of valid transaction IDs. If xid is in that range, we
+ * conclude that it is valid, even though concurrent changes to the table might
+ * invalidate it under certain corrupt conditions. (For example, if the table
+ * contains corrupt all-frozen bits, a concurrent vacuum might skip the page(s)
+ * containing the xid and then truncate clog and advance the relfrozenxid
+ * beyond xid.) Reporting the xid as valid under such conditions seems
+ * acceptable, since if we had checked it earlier in our scan it would have
+ * truly been valid at that time.
+ *
+ * If the status argument is not NULL, and if and only if the transaction ID
+ * appears to be valid in this relation, clog will be consulted and the commit
+ * status argument will be set with the status of the transaction ID.
+ */
+static XidBoundsViolation
+get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status)
+{
+ XidBoundsViolation result;
+ FullTransactionId fxid;
+ FullTransactionId clog_horizon;
+
+ /* Quick check for special xids */
+ if (!TransactionIdIsValid(xid))
+ result = XID_INVALID;
+ else if (xid == BootstrapTransactionId || xid == FrozenTransactionId)
+ result = XID_BOUNDS_OK;
+ else
+ {
+ /* Check if the xid is within bounds */
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ if (!fxid_in_cached_range(fxid, ctx))
+ {
+ /*
+ * We may have been checking against stale values. Update the
+ * cached range to be sure, and since we relied on the cached
+ * range when we performed the full xid conversion, reconvert.
+ */
+ update_cached_xid_range(ctx);
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ }
+
+ if (FullTransactionIdPrecedesOrEquals(ctx->next_fxid, fxid))
+ result = XID_IN_FUTURE;
+ else if (FullTransactionIdPrecedes(fxid, ctx->oldest_fxid))
+ result = XID_PRECEDES_DATMIN;
+ else if (FullTransactionIdPrecedes(fxid, ctx->relfrozenfxid))
+ result = XID_PRECEDES_RELMIN;
+ else
+ result = XID_BOUNDS_OK;
+ }
+
+ /*
+ * Early return if the caller does not request clog checking, or if the
+ * xid is already known to be out of bounds. We dare not check clog for
+ * out of bounds transaction IDs.
+ */
+ if (status == NULL || result != XID_BOUNDS_OK)
+ return result;
+
+ /* Early return if we just checked this xid in a prior call */
+ if (xid == ctx->cached_xid)
+ {
+ *status = ctx->cached_status;
+ return result;
+ }
+
+ *status = XID_COMMITTED;
+ LWLockAcquire(XactTruncationLock, LW_SHARED);
+ clog_horizon = FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid, ctx);
+ if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
+ {
+ if (TransactionIdIsCurrentTransactionId(xid))
+ *status = XID_IN_PROGRESS;
+ else if (TransactionIdDidCommit(xid))
+ *status = XID_COMMITTED;
+ else if (TransactionIdDidAbort(xid))
+ *status = XID_ABORTED;
+ else
+ *status = XID_IN_PROGRESS;
+ }
+ LWLockRelease(XactTruncationLock);
+ ctx->cached_xid = xid;
+ ctx->cached_status = *status;
+ return result;
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a9df2c1a9d..a57781992a 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -9,12 +9,11 @@
<para>
The <filename>amcheck</filename> module provides functions that allow you to
- verify the logical consistency of the structure of relations. If the
- structure appears to be valid, no error is raised.
+ verify the logical consistency of the structure of relations.
</para>
<para>
- The functions verify various <emphasis>invariants</emphasis> in the
+ The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the
structure of the representation of particular relations. The
correctness of the access method functions behind index scans and
other important operations relies on these invariants always
@@ -24,7 +23,7 @@
collated lexical order). If that particular invariant somehow fails
to hold, we can expect binary searches on the affected page to
incorrectly guide index scans, resulting in wrong answers to SQL
- queries.
+ queries. If the structure appears to be valid, no error is raised.
</para>
<para>
Verification is performed using the same procedures as those used by
@@ -35,7 +34,22 @@
functions.
</para>
<para>
- <filename>amcheck</filename> functions may only be used by superusers.
+ Unlike the B-Tree checking functions which report corruption by raising
+ errors, the heap checking function <function>verify_heapam</function> checks
+ a table and attempts to return a set of rows, one row per corruption
+ detected. Despite this, if facilities that
+ <function>verify_heapam</function> relies upon are themselves corrupted, the
+ function may be unable to continue and may instead raise an error.
+ </para>
+ <para>
+ Permission to execute <filename>amcheck</filename> functions may be granted
+ to non-superusers, but before granting such permissions careful consideration
+ should be given to data security and privacy concerns. Although the
+ corruption reports generated by these functions do not focus on the contents
+ of the corrupted data so much as on the structure of that data and the nature
+ of the corruptions found, an attacker who gains permission to execute these
+ functions, particularly if the attacker can also induce corruption, might be
+ able to infer something of the data itself from such messages.
</para>
<sect2>
@@ -187,12 +201,223 @@ SET client_min_messages = DEBUG1;
</para>
</tip>
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ check_toast boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks a table for structural corruption, where pages in the relation
+ contain data that is invalidly formatted, and for logical corruption,
+ where pages are structurally valid but inconsistent with the rest of the
+ database cluster. Example usage:
+<screen>
+test=# select * from verify_heapam('mytable', check_toast := true);
+ blkno | offnum | attnum | msg
+-------+--------+--------+--------------------------------------------------------------------------------------------------
+ 17 | 12 | | xmin 4294967295 precedes relation freeze threshold 17:1134217582
+ 960 | 4 | | data begins at offset 152 beyond the tuple length 58
+ 960 | 4 | | tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+ 960 | 5 | | tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+ 960 | 6 | | tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+ 960 | 7 | | tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+ 1147 | 2 | | number of attributes 2047 exceeds maximum expected for table 3
+ 1147 | 10 | | tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+ 1147 | 15 | | number of attributes 67 exceeds maximum expected for table 3
+ 1147 | 16 | 1 | attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+ 1147 | 18 | 2 | final toast chunk number 0 differs from expected value 6
+ 1147 | 19 | 2 | toasted value for attribute 2 missing from toast table
+ 1147 | 21 | | tuple is marked as only locked, but also claims key columns were updated
+ 1147 | 22 | | multitransaction ID 1775655 is from before relation cutoff 2355572
+(14 rows)
+</screen>
+ As this example shows, the Tuple ID (TID) of the corrupt tuple is given
+ in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
+ for corruptions specific to a particular attribute in the tuple, the
+ <literal>attnum</literal> field shows which one.
+ </para>
+ <para>
+ Structural corruption can happen due to faulty storage hardware, or
+ relation files being overwritten or modified by unrelated software.
+ This kind of corruption can also be detected with
+ <link linkend="app-initdb-data-checksums"><application>data page
+ checksums</application></link>.
+ </para>
+ <para>
+ Relation pages which are correctly formatted, internally consistent, and
+ correct relative to their own internal checksums may still contain
+ logical corruption. As such, this kind of corruption cannot be detected
+ with <application>checksums</application>. Examples include toasted
+ values in the main table which lack a corresponding entry in the toast
+ table, and tuples in the main table with a Transaction ID that is older
+ than the oldest valid Transaction ID in the database or cluster.
+ </para>
+ <para>
+ Multiple causes of logical corruption have been observed in production
+ systems, including bugs in the <productname>PostgreSQL</productname>
+ server software, faulty and ill-conceived backup and restore tools, and
+ user error.
+ </para>
+ <para>
+ Corrupt relations are most concerning in live production environments,
+ precisely the same environments where high risk activities are least
+ welcome. For this reason, <function>verify_heapam</function> has been
+ designed to diagnose corruption without undue risk. It cannot guard
+ against all causes of backend crashes, as even executing the calling
+ query could be unsafe on a badly corrupted system. Access to <link
+ linkend="catalogs-overview">catalog tables</link> are performed and could
+ be problematic if the catalogs themselves are corrupted.
+ </para>
+ <para>
+ The design principle adhered to in <function>verify_heapam</function> is
+ that, if the rest of the system and server hardware are correct, under
+ default options, <function>verify_heapam</function> will not crash the
+ server due merely to structural or logical corruption in the target
+ table.
+ </para>
+ <para>
+ An experimental option, <literal>check_toast</literal>, exists to
+ reconcile the target table against entries in its corresponding toast
+ table. This option may change in future, is disabled by default, and is
+ known to be slow. It is also unsafe under some conditions. If the
+ target relation's corresponding toast table or toast index are corrupt,
+ reconciling the target table against toast values may be unsafe. If the
+ catalogs, toast table and toast index are uncorrupted, and remain so
+ during the check of the target table, reconciling the target table
+ against its toast table should be safe.
+ </para>
+ <para>
+ The following optional arguments are recognized:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>on_error_stop</term>
+ <listitem>
+ <para>
+ If true, corruption checking stops at the end of the first block on
+ which any corruptions are found.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>check_toast</term>
+ <listitem>
+ <para>
+ If this experimental option is true, toasted values are checked gainst
+ the corresponding TOAST table.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>skip</term>
+ <listitem>
+ <para>
+ If not <literal>none</literal>, corruption checking skips blocks that
+ are marked as all-visible or all-frozen, as given.
+ Valid options are <literal>all-visible</literal>,
+ <literal>all-frozen</literal> and <literal>none</literal>.
+ </para>
+ <para>
+ Defaults to <literal>none</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>startblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking begins at the specified block,
+ skipping all previous blocks. It is an error to specify a
+ <literal>startblock</literal> outside the range of blocks in the
+ target table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>endblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking ends at the specified block,
+ skipping all remaining blocks. It is an error to specify an
+ <literal>endblock</literal> outside the range of blocks in the target
+ table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ For each corruption detected, <function>verify_heapam</function> returns
+ a row with the following columns:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</sect2>
<sect2>
<title>Optional <parameter>heapallindexed</parameter> Verification</title>
<para>
- When the <parameter>heapallindexed</parameter> argument to
+ When the <parameter>heapallindexed</parameter> argument to B-Tree
verification functions is <literal>true</literal>, an additional
phase of verification is performed against the table associated with
the target index relation. This consists of a <quote>dummy</quote>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ca357410a2 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you disable one or both of these
+ * assertions, make corresponding changes there.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index a2ce617c8c..81752b68eb 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -735,6 +735,25 @@ ReadNextMultiXactId(void)
return mxid;
}
+/*
+ * ReadMultiXactIdRange
+ * Get the range of IDs that may still be referenced by a relation.
+ */
+void
+ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next)
+{
+ LWLockAcquire(MultiXactGenLock, LW_SHARED);
+ *oldest = MultiXactState->oldestMultiXactId;
+ *next = MultiXactState->nextMXact;
+ LWLockRelease(MultiXactGenLock);
+
+ if (*oldest < FirstMultiXactId)
+ *oldest = FirstMultiXactId;
+ if (*next < FirstMultiXactId)
+ *next = FirstMultiXactId;
+}
+
+
/*
* MultiXactIdCreateFromMembers
* Make a new MultiXactId from the specified set of members
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 58c42ffe1f..9a30380901 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -109,6 +109,7 @@ extern MultiXactId MultiXactIdCreateFromMembers(int nmembers,
MultiXactMember *members);
extern MultiXactId ReadNextMultiXactId(void);
+extern void ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next);
extern bool MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly);
extern void MultiXactIdSetOldestMember(void);
extern int GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **xids,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4191f94869..bca30f3dde 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1020,6 +1020,7 @@ HbaToken
HeadlineJsonState
HeadlineParsedText
HeadlineWordEntry
+HeapCheckContext
HeapScanDesc
HeapTuple
HeapTupleData
@@ -2287,6 +2288,7 @@ SimpleStringList
SimpleStringListCell
SingleBoundSortItem
Size
+SkipPages
SlabBlock
SlabChunk
SlabContext
@@ -2788,6 +2790,8 @@ XactCallback
XactCallbackItem
XactEvent
XactLockTableWaitInfo
+XidBoundsViolation
+XidCommitStatus
XidHorizonPrefetchState
XidStatus
XmlExpr
--
2.21.1 (Apple Git-122.3)
v18-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v18-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 09c7f6392075d40b7f2cc90a2dca78002cf2d336 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 5 Oct 2020 15:43:00 -0700
Subject: [PATCH v18 2/5] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 2 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 1281 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.control | 5 +
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 60 +
contrib/pg_amcheck/t/003_check.pl | 231 ++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 489 ++++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 228 ++++
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 2 +
15 files changed, 2393 insertions(+), 3 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.control
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 7a4866e338..0fd4125902 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..f8eecf70bf
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,2 @@
+/pg_amcheck
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..324cf1cfc8
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1281 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "common/connect.h"
+#include "common/string.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --strict-names require include patterns to match at least one entity each",
+ " -o, --on-error-stop stop checking at end of first corrupt page",
+ "",
+ "Schema checking options:",
+ " -n, --schema=PATTERN check relations in the specified schema(s) only",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)",
+ "",
+ "Table checking options:",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -b, --startblock begin checking table(s) at the given starting block number",
+ " -e, --endblock check table(s) only up to the given ending block number",
+ " -f, --skip-all-frozen do NOT check blocks marked as all frozen",
+ " -v, --skip-all-visible do NOT check blocks marked as all visible",
+ "",
+ "TOAST table checking options:",
+ " -z, --check-toast check associated toast tables and toast indexes",
+ " -Z, --skip-toast do NOT check associated toast tables and toast indexes",
+ " -B, --toast-startblock begin checking toast table(s) at the given starting block",
+ " -E, --toast-endblock check toast table(s) only up to the given ending block",
+ "",
+ "Index checking options:",
+ " -x, --check-indexes check btree indexes associated with tables being checked",
+ " -X, --skip-indexes do NOT check any btree indexes",
+ " -i, --index=PATTERN check the specified index(es) only",
+ " -I, --exclude-index=PATTERN do NOT check the specified index(es)",
+ " -c, --check-corrupt check indexes even if their associated table is corrupt",
+ " -C, --skip-corrupt do NOT check indexes if their associated table is corrupt",
+ " -a, --heapallindexed check index tuples against the table tuples",
+ " -A, --no-heapallindexed do NOT check index tuples against the table tuples",
+ " -r, --rootdescend search from the root page for each index tuple",
+ " -R, --no-rootdescend do NOT search from the root page for each index tuple",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+AmCheckSettings
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes */
+ bool check_toast; /* Check associated toast tables and indexes */
+ bool check_corrupt; /* Check indexes even if table is corrupt */
+ bool heapallindexed; /* Perform index to table reconciling checks */
+ bool rootdescend; /* Perform index rootdescend checks */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends, inclusive */
+ char *toaststart; /* Block number where toast checking begins */
+ char *toastend; /* Block number where toast checking ends,
+ * inclusive */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of Oids of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+static SimpleStringList index_include_patterns = {NULL, NULL};
+static SimpleOidList index_include_oids = {NULL, NULL};
+static SimpleStringList index_exclude_patterns = {NULL, NULL};
+static SimpleOidList index_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+/*
+ * Strings to be constructed once upon first use. These could be made
+ * string constants instead, but that would require embedding knowledge
+ * of the single character values for each relkind, such as 'm' for
+ * materialized views, which we'd rather not embed here.
+ */
+static char *table_relkind_quals = NULL;
+static char *index_relkind_quals = NULL;
+
+/*
+ * Functions to get pointers to the two strings, above, after initializing
+ * them upon the first call to the function.
+ */
+static const char *get_table_relkind_quals(void);
+static const char *get_index_relkind_quals(void);
+
+/*
+ * Functions for running the various corruption checks.
+ */
+static void check_tables(SimpleOidList *checklist);
+static uint64 check_toast(Oid tbloid);
+static uint64 check_table(Oid tbloid, const char *startblock,
+ const char *endblock, bool on_error_stop,
+ bool check_toast);
+static uint64 check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids);
+static uint64 check_index(const char *idxoid, const char *idxname,
+ const char *tblname);
+
+/*
+ * Functions implementing standard command line behaviors.
+ */
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+static void NoticeProcessor(void *arg, const char *message);
+
+/*
+ * Functions for converting command line options that include or exclude
+ * schemas, tables, and indexes by pattern into internally useful lists of
+ * Oids for objects that match those patterns.
+ */
+static void expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals);
+static void expand_table_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_index_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(const SimpleOidList *include_nsp,
+ const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl,
+ const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+static PGresult *ExecuteSqlQuery(const char *query, char **error);
+static PGresult *ExecuteSqlQueryOrDie(const char *query);
+
+static void append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids);
+static void apply_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids, bool include);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char *password = NULL;
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ /*
+ * Default behaviors for user settable options. Note that these default
+ * to doing all the safe checks and none of the unsafe ones, on the theory
+ * that if a user says "pg_amcheck mydb" without specifying any additional
+ * options, we should check everything we know how to check without
+ * risking any backend aborts.
+ */
+
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+
+ /* Index checking options */
+ settings.check_indexes = false;
+ settings.check_corrupt = false;
+ settings.heapallindexed = false;
+ settings.rootdescend = false;
+
+ /*
+ * Reconciling toasted attributes from the main table against the toast
+ * table can crash the backend if the toast table or index are corrupt.
+ * We can optionally check the toast table and then the toast index prior
+ * to checking the main table, but if the toast table or index are
+ * concurrently corrupted after we conclude they are valid, the check of
+ * the main table can crash the backend. The oneous is on any caller who
+ * enables this option to make certain the environment is sufficiently
+ * stable that concurrent corruptions of the toast is not possible.
+ */
+ settings.check_toast = false;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ password = simple_prompt("Password: ", false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf(_("Password for user %s: "),
+ realusername);
+ else
+ password_prompt = pg_strdup(_("Password: "));
+ PQfinish(settings.db);
+
+ password = simple_prompt(password_prompt, false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /*
+ * Expand schema, table, and index exclusion patterns, if any. Note that
+ * non-matching exclusion patterns are not an error, even when
+ * --strict-names was specified.
+ */
+ expand_schema_name_patterns(&schema_exclude_patterns, NULL,
+ &schema_exclude_oids, false);
+ expand_table_name_patterns(&table_exclude_patterns, NULL, NULL,
+ &table_exclude_oids, false);
+ expand_index_name_patterns(&index_exclude_patterns, NULL, NULL,
+ &index_exclude_oids, false);
+
+ /* Expand schema selection patterns into Oid lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_exclude_oids,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+
+ /* Expand table selection patterns into Oid lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &schema_exclude_oids,
+ &table_exclude_oids,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+
+ /* Expand index selection patterns into Oid lists */
+ if (index_include_patterns.head != NULL)
+ {
+ expand_index_name_patterns(&index_include_patterns,
+ &schema_exclude_oids,
+ &index_exclude_oids,
+ &index_include_oids,
+ settings.strict_names);
+ if (index_include_oids.head == NULL)
+ fatal("no matching indexes were found");
+ }
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ /*
+ * All information about corrupt indexes are returned via ereport, not as
+ * tuples. We want all the details to report if corruption exists.
+ */
+ PQsetErrorVerbosity(settings.db, PQERRORS_VERBOSE);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+/*
+ * Conditionally add a restriction to a query such that lval must be an Oid in
+ * the given list of Oids, except that for a null or empty oids list argument,
+ * no filtering is done and we return without having modified the query buffer.
+ *
+ * The query argument must already have begun the WHERE clause and must be in a
+ * state where we can append an AND clause. No checking of this requirement is
+ * done here.
+ *
+ * On return, the query buffer will be extended with an AND clause that filters
+ * only those rows where the lval is an Oid present in the given list of oids.
+ */
+static inline void
+include_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, true);
+}
+
+/*
+ * Same as include_filter, above, except that for a non-null, non-empty oids
+ * list, the lval is restricted to not be any of the values in the list.
+ */
+static inline void
+exclude_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, false);
+}
+
+/*
+ * Check each table from the given checklist per the user specified options.
+ */
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ uint64 corruptions = 0;
+ bool reconcile_toast;
+
+ /*
+ * If we skip checking the toast table, or if during the check we
+ * detect any toast table corruption, the main table checks below must
+ * not reconcile toasted attributes against the toast table, as such
+ * accesses to the toast table might crash the backend. Instead, skip
+ * such reconciliations for this table.
+ *
+ * This protection contains a race condition; the toast table or index
+ * could become corrupted concurrently with our checks, but prevention
+ * of such concurrent corruption is documented as the caller's
+ * reponsibility, so we don't worry about it here.
+ */
+ reconcile_toast = false;
+ if (settings.check_toast)
+ {
+ if (check_toast(cell->val) == 0)
+ reconcile_toast = true;
+ }
+
+ corruptions = check_table(cell->val,
+ settings.startblock,
+ settings.endblock,
+ settings.on_error_stop,
+ reconcile_toast);
+
+ if (settings.check_indexes)
+ {
+ bool old_heapallindexed;
+
+ /* Optionally skip the index checks for a corrupt table. */
+ if (corruptions && !settings.check_corrupt)
+ continue;
+
+ /*
+ * The btree checking logic which optionally checks the contents
+ * of an index against the corresponding table has not yet been
+ * sufficiently hardened against corrupt tables. In particular,
+ * when called with heapallindexed true, it segfaults if the file
+ * backing the table relation has been erroneously unlinked. In
+ * any event, it seems unwise to reconcile an index against its
+ * table when we already know the table is corrupt.
+ */
+ old_heapallindexed = settings.heapallindexed;
+ if (corruptions)
+ settings.heapallindexed = false;
+
+ corruptions += check_indexes(cell->val,
+ &index_include_oids,
+ &index_exclude_oids);
+
+ settings.heapallindexed = old_heapallindexed;
+ }
+ }
+}
+
+/*
+ * For a given main table relation, returns the associated toast table,
+ * or InvalidOid if none exists.
+ */
+static Oid
+get_toast_oid(Oid tbloid)
+{
+ PQExpBuffer querybuf = createPQExpBuffer();
+ PGresult *res;
+ char *error = NULL;
+ Oid result = InvalidOid;
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.reltoastrelid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid);
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ result = atooid(PQgetvalue(res, 0, 0));
+ else if (error)
+ die_on_query_failure(querybuf->data);
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return result;
+}
+
+/*
+ * For the given main table relation, checks the associated toast table and
+ * index, in any. This should be performed *before* checking the main table
+ * relation, as the checks inside verify_heapam assume both the toast table and
+ * toast index are usable.
+ *
+ * Returns the number of corruptions detected.
+ */
+static uint64
+check_toast(Oid tbloid)
+{
+ Oid toastoid;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_toast");
+
+ toastoid = get_toast_oid(tbloid);
+ if (OidIsValid(toastoid))
+ {
+ corruption_cnt = check_table(toastoid, settings.toaststart,
+ settings.toastend, settings.on_error_stop,
+ false);
+
+ /*
+ * If the toast table is corrupt, checking the index is not safe.
+ * There is a race condition here, as the toast table could be
+ * concurrently corrupted, but preventing concurrent corruption is the
+ * caller's responsibility, not ours.
+ */
+ if (corruption_cnt == 0)
+ corruption_cnt += check_indexes(toastoid, NULL, NULL);
+ }
+
+ return corruption_cnt;
+}
+
+/*
+ * Checks the given table for corruption, returning the number of corruptions
+ * detected and printed to the user.
+ */
+static uint64
+check_table(Oid tbloid, const char *startblock, const char *endblock,
+ bool on_error_stop, bool check_toast)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *skip;
+ char *toast;
+ const char *stop;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_table");
+
+ if (startblock == NULL)
+ startblock = "NULL";
+ if (endblock == NULL)
+ endblock = "NULL";
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("'none'");
+ stop = (on_error_stop) ? "true" : "false";
+ toast = (check_toast) ? "true" : "false";
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg "
+ "FROM verify_heapam("
+ "relation := %u, "
+ "on_error_stop := %s, "
+ "skip := %s, "
+ "check_toast := %s, "
+ "startblock := %s, "
+ "endblock := %s) v, "
+ "pg_catalog.pg_class c "
+ "WHERE c.oid = %u",
+ tbloid, stop, skip, toast, startblock, endblock, tbloid);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ {
+ corruption_cnt += PQntuples(res);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ printf("(relname=%s,blkno=%s,offnum=%s,attnum=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ }
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+ return corruption_cnt;
+}
+
+static uint64
+check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_indexes");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT i.indexrelid, ci.relname, ct.relname"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class ci, "
+ "pg_catalog.pg_class ct"
+ "\nWHERE i.indexrelid = ci.oid"
+ "\n AND i.indrelid = ct.oid"
+ "\n AND ci.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+ include_filter(querybuf, "i.indexrelid", include_oids);
+ exclude_filter(querybuf, "i.indexrelid", exclude_oids);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ for (i = 0; i < PQntuples(res); i++)
+ corruption_cnt += check_index(PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2));
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static uint64
+check_index(const char *idxoid, const char *idxname, const char *tblname)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_index");
+ if (idxname == NULL)
+ fatal("no index name on entry to check_index");
+ if (tblname == NULL)
+ fatal("no table name on entry to check_index");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
+ res = PQexec(settings.db, querybuf->data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("index check failed for index %s of table %s:\n",
+ idxname, tblname);
+ printf("%s", PQerrorMessage(settings.db));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"check-corrupt", no_argument, NULL, 'c'},
+ {"check-indexes", no_argument, NULL, 'x'},
+ {"check-toast", no_argument, NULL, 'z'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"heapallindexed", no_argument, NULL, 'a'},
+ {"help", optional_argument, NULL, '?'},
+ {"host", required_argument, NULL, 'h'},
+ {"index", required_argument, NULL, 'i'},
+ {"no-heapallindexed", no_argument, NULL, 'A'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"no-rootdescend", no_argument, NULL, 'R'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"password", no_argument, NULL, 'W'},
+ {"port", required_argument, NULL, 'p'},
+ {"rootdescend", no_argument, NULL, 'r'},
+ {"schema", required_argument, NULL, 'n'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-corrupt", no_argument, NULL, 'C'},
+ {"skip-indexes", no_argument, NULL, 'X'},
+ {"skip-toast", no_argument, NULL, 'Z'},
+ {"startblock", required_argument, NULL, 'b'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"toast-endblock", required_argument, NULL, 'E'},
+ {"toast-startblock", required_argument, NULL, 'B'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "aAb:B:cCd:e:E:fh:i:I:n:N:op:rRst:T:U:vVwWxXzZ?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'a':
+ settings.heapallindexed = true;
+ break;
+ case 'A':
+ settings.heapallindexed = false;
+ break;
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'B':
+ settings.toaststart = pg_strdup(optarg);
+ break;
+ case 'c':
+ settings.check_corrupt = true;
+ break;
+ case 'C':
+ settings.check_corrupt = false;
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'E':
+ settings.toastend = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ simple_string_list_append(&index_include_patterns, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&index_exclude_patterns, optarg);
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 'r':
+ settings.rootdescend = true;
+ break;
+ case 'R':
+ settings.rootdescend = false;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case 'x':
+ settings.check_indexes = true;
+ break;
+ case 'X':
+ settings.check_indexes = false;
+ break;
+ case 'z':
+ settings.check_toast = true;
+ break;
+ case 'Z':
+ settings.check_toast = false;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ int lineno;
+
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ printf("%s\n", usage_text[lineno]);
+ printf("Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Helper function for apply_filter, below.
+ */
+static void
+append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(querybuf, "%s%u", comma, cell->val);
+}
+
+/*
+ * Internal implementation of include_filter and exclude_filter
+ */
+static void
+apply_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids,
+ bool include)
+{
+ if (!oids || !oids->head)
+ return;
+ if (include)
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.=) ANY(array[", lval);
+ else
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.!=) ALL(array[", lval);
+ append_csv_oids(querybuf, oids);
+ appendPQExpBuffer(querybuf, "]::OID[])");
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all schemas matching the
+ * given list of patterns but not included in the given list of excluded Oids.
+ */
+static void
+expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the Oid list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(querybuf,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, querybuf, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all relations matching the
+ * given list of patterns but not included in the given list of excluded Oids
+ * nor in one of the given excluded namespaces. The relations are filtered by
+ * the given schema_quals. They are further filtered by the given
+ * relkind_quals, allowing the caller to restrict the relations to just indexes
+ * or tables. The missing_errtext should be a message for use in error
+ * messages if no matching relations are found and strict_names was specified.
+ */
+static void
+expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_relkind_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the Oid list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) %s\n",
+ relkind_quals);
+ exclude_filter(querybuf, "c.oid", exclude_oids);
+ exclude_filter(querybuf, "n.oid", exclude_nsp_oids);
+ processSQLNamePattern(settings.db, querybuf, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("%s \"%s\"", missing_errtext, cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find the Oids of all tables matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_table_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching tables were found for pattern",
+ get_table_relkind_quals());
+}
+
+/*
+ * Find the Oids of all indexes matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_index_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching indexes were found for pattern",
+ get_index_relkind_quals());
+}
+
+static void
+get_table_check_list(const SimpleOidList *include_nsp, const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl, const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to get_table_check_list");
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c, pg_catalog.pg_namespace n"
+ "\nWHERE n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\n AND c.relkind OPERATOR(pg_catalog.=) %s\n",
+ get_table_relkind_quals());
+ include_filter(querybuf, "n.oid", include_nsp);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+ include_filter(querybuf, "c.oid", include_tbl);
+ exclude_filter(querybuf, "c.oid", exclude_tbl);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+}
+
+static PGresult *
+ExecuteSqlQueryOrDie(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute the given SQL query. This function should only be used for queries
+ * which are not expected to fail under normal circumstances, as failures will
+ * result in the printing of error messages, which will look a bit messy when
+ * interleaved with corruption reports.
+ *
+ * On error, use the supplied error_context string and the error string
+ * returned from the database connection to print an error message for the
+ * user.
+ *
+ * The error_context argument is pfree'd by us at the end of the call.
+ */
+static PGresult *
+ExecuteSqlQuery(const char *query, char **error)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ *error = pstrdup(PQerrorMessage(settings.db));
+ return res;
+}
+
+/*
+ * Return the cached relkind quals string for tables, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_table_relkind_quals(void)
+{
+ if (!table_relkind_quals)
+ table_relkind_quals = psprintf("ANY(array['%c', '%c', '%c'])",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ return table_relkind_quals;
+}
+
+/*
+ * Return the cached relkind quals string for indexes, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_index_relkind_quals(void)
+{
+ if (!index_relkind_quals)
+ index_relkind_quals = psprintf("'%c'", RELKIND_INDEX);
+ return index_relkind_quals;
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.control b/contrib/pg_amcheck/pg_amcheck.control
new file mode 100644
index 0000000000..395f368101
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.control
@@ -0,0 +1,5 @@
+# pg_amcheck extension
+comment = 'command-line tool for verifying relation integrity'
+default_version = '1.3'
+module_pathname = '$libdir/pg_amcheck'
+relocatable = true
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..68be9c6585
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,60 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-i', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching indexes were found for pattern\E/,
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..4d8e61d871
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,231 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 39;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($)
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the node is running.
+sub corrupt_first_page($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+-- We'll corrupt all indexes in s1
+CREATE SCHEMA s1;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s1.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables in s2
+CREATE SCHEMA s2;
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+INSERT INTO s2.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s2.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables and indexes in s3
+CREATE SCHEMA s3;
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+INSERT INTO s3.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s3.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll leave everything in s4 uncorrupted
+CREATE SCHEMA s4;
+CREATE TABLE s4.t1 (a TEXT);
+CREATE TABLE s4.t2 (a TEXT);
+CREATE INDEX i1 ON s4.t1(a);
+CREATE INDEX i2 ON s4.t2(a);
+INSERT INTO s4.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s4.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('s1.i1');
+corrupt_first_page('s1.i2');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('s2.t1');
+corrupt_first_page('s2.t2');
+
+# Corrupt tables and indexes in schema "s3"
+remove_relation_file('s3.i1');
+corrupt_first_page('s3.i2');
+remove_relation_file('s3.t1');
+corrupt_first_page('s3.t2');
+
+# Leave schema "s4" alone
+
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's*', '-t', 't1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-N', 's1', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s1.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+
+# In schema s3, the tables and indexes are both corrupt. Ordinarily, checking
+# of indexes will not be performed for corrupt tables, but the --check-corrupt
+# option (-c) forces the indexes to also be checked.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s3.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s3.s2 reports index corruption');
+
+
+# Check that '-x' and '-X' work as expected. Since only index corruption
+# (and not table corruption) exists in s1, '-X' should give no errors, and
+# '-x' should give errors about index corruption.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck over tables and indexes in schema s1 reports corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over only tables in schema s1 reports no corruption');
+
+
+# Check that table corruption is reported as expected, with or without
+# index checking
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables in schema s2 reports table corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables and indexes in schema s2 reports table corruption');
+
+# Check that no corruption is reported in schema s4
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s4 reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt schemas
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-N', 's2', '-N', 's3' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt schemas reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt tables
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt tables reports no corruption');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..1cc36b25b7
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,489 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper functions
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ qr/\(relname=test,blkno=$blkno,offnum=$offnum,attnum=$attnum\)\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, '');
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..fdbb1ea402
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 4e833d79ef..1efca8adc4 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -119,6 +119,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&oldsnapshot;
&pageinspect;
&passwordcheck;
+ &pgamcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a4e1b28b38 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..3e059e7753
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,228 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<sect1 id="pgamcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> and <function>verify_heapam</function>
+ functions.
+ </para>
+
+<synopsis>
+pg_amcheck [OPTION]... [DBNAME [USERNAME]]
+ General options:
+ -V, --version output version information, then exit
+ -?, --help show this help, then exit
+ -s, --strict-names require include patterns to match at least one entity each
+ -o, --on-error-stop stop checking at end of first corrupt page
+
+ Schema checking options:
+ -n, --schema=PATTERN check relations in the specified schema(s) only
+ -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)
+
+ Table checking options:
+ -t, --table=PATTERN check the specified table(s) only
+ -T, --exclude-table=PATTERN do NOT check the specified table(s)
+ -b, --startblock begin checking table(s) at the given starting block number
+ -e, --endblock check table(s) only up to the given ending block number
+ -f, --skip-all-frozen do NOT check blocks marked as all-frozen
+ -v, --skip-all-visible do NOT check blocks marked as all-visible
+
+ TOAST table checking options:
+ -z, --check-toast check associated toast tables and toast indexes
+ -Z, --skip-toast do NOT check associated toast tables and toast indexes
+ -B, --toast-startblock begin checking toast table(s) at the given starting block
+ -E, --toast-endblock check toast table(s) only up to the given ending block
+
+ Index checking options:
+ -x, --check-indexes check btree indexes associated with tables being checked
+ -X, --skip-indexes do NOT check any btree indexes
+ -i, --index=PATTERN check the specified index(es) only
+ -I, --exclude-index=PATTERN do NOT check the specified index(es)
+ -c, --check-corrupt check indexes even if their associated table is corrupt
+ -C, --skip-corrupt do NOT check indexes if their associated table is corrupt
+ -a, --heapallindexed check index tuples against the table tuples
+ -A, --no-heapallindexed do NOT check index tuples against the table tuples
+ -r, --rootdescend search from the root page for each index tuple
+ -R, --no-rootdescend do NOT search from the root page for each index tuple
+
+ Connection options:
+ -d, --dbname=DBNAME database name to connect to
+ -h, --host=HOSTNAME database server host or socket directory
+ -p, --port=PORT database server port
+ -U, --username=USERNAME database user name
+ -w, --no-password never prompt for password
+ -W, --password force password prompt (should happen automatically)
+</synopsis>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked in the visibility map as all-frozen or all-visible,
+ respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ For table corruption, each detected corruption is reported on two lines, the
+ first line shows the location and the second line shows a message describing
+ the problem.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "mytable",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --skip-indexes mydb
+(relname=mytable,blkno=17,offnum=12,attnum=)
+xmin 4294967295 precedes relation freeze threshold 17:1134217582
+(relname=mytable,blkno=960,offnum=4,attnum=)
+data begins at offset 152 beyond the tuple length 58
+(relname=mytable,blkno=960,offnum=4,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=5,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=6,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=7,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+(relname=mytable,blkno=1147,offnum=2,attnum=)
+number of attributes 2047 exceeds maximum expected for table 3
+(relname=mytable,blkno=1147,offnum=10,attnum=)
+tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+(relname=mytable,blkno=1147,offnum=15,attnum=)
+number of attributes 67 exceeds maximum expected for table 3
+(relname=mytable,blkno=1147,offnum=16,attnum=1)
+attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+(relname=mytable,blkno=1147,offnum=18,attnum=2)
+final toast chunk number 0 differs from expected value 6
+(relname=mytable,blkno=1147,offnum=19,attnum=2)
+toasted value for attribute 2 missing from toast table
+(relname=mytable,blkno=1147,offnum=21,attnum=)
+tuple is marked as only locked, but also claims key columns were updated
+(relname=mytable,blkno=1147,offnum=22,attnum=)
+multitransaction ID 1775655 is from before relation cutoff 2355572
+</screen>
+
+ <para>
+ For index corruption, the output is more free-form, and may span differing
+ numbers of lines per corruption detected.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt index,
+ "corrupt_index", with corruption in the page header, along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index "corrupt_index" is not a btree
+LOCATION: _bt_getmeta, nbtpage.c:152
+</screen>
+
+ <para>
+ Checking again after rebuilding the index but corrupting the contents,
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index tuple size does not equal lp_len in index "corrupt_index"
+DETAIL: Index tid=(39,49) tuple size=3373 lp_len=24 page lsn=0/2B548C0.
+HINT: This could be a torn page problem.
+LOCATION: bt_target_page_check, verify_nbtree.c:1125
+</screen>
+
+ </sect2>
+</sect1>
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 89e1b39036..8cf0554823 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'pg_standby', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bca30f3dde..369b8e7c6f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -102,6 +102,7 @@ AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AlternativeSubPlanState
+AmCheckSettings
AnalyzeAttrComputeStatsFunc
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
@@ -403,6 +404,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnectOptions
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
v18-0003-Creating-non-throwing-interface-to-clog-and-slru.patchapplication/octet-stream; name=v18-0003-Creating-non-throwing-interface-to-clog-and-slru.patch; x-unix-mode=0644Download
From a163a8f4b82395d829a1840a766a4aa8c2fb9b72 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 6 Oct 2020 11:28:18 -0700
Subject: [PATCH v18 3/5] Creating non-throwing interface to clog and slru.
---
src/backend/access/transam/clog.c | 21 +++---
src/backend/access/transam/commit_ts.c | 4 +-
src/backend/access/transam/multixact.c | 16 ++---
src/backend/access/transam/slru.c | 23 +++---
src/backend/access/transam/subtrans.c | 4 +-
src/backend/access/transam/transam.c | 98 +++++++++-----------------
src/backend/commands/async.c | 4 +-
src/backend/storage/lmgr/predicate.c | 4 +-
src/include/access/clog.h | 18 +----
src/include/access/clogdefs.h | 33 +++++++++
src/include/access/slru.h | 6 +-
src/include/access/transam.h | 3 +
12 files changed, 122 insertions(+), 112 deletions(-)
create mode 100644 src/include/access/clogdefs.h
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 034349aa7b..a2eb3e2983 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -357,7 +357,7 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
* write-busy, since we don't care if the update reaches disk sooner than
* we think.
*/
- slotno = SimpleLruReadPage(XactCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
+ slotno = SimpleLruReadPage(XactCtl, pageno, XLogRecPtrIsInvalid(lsn), xid, true);
/*
* Set the main transaction id, if any.
@@ -631,7 +631,7 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
* for most uses; TransactionLogFetch() in transam.c is the intended caller.
*/
XidStatus
-TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
+TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn, bool throwError)
{
int pageno = TransactionIdToPage(xid);
int byteno = TransactionIdToByte(xid);
@@ -643,13 +643,18 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(XactCtl, pageno, xid);
- byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
+ slotno = SimpleLruReadPage_ReadOnly(XactCtl, pageno, xid, throwError);
+ if (slotno == InvalidSlotNo)
+ status = TRANSACTION_STATUS_UNKNOWN;
+ else
+ {
+ byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
- status = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
+ status = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
- lsnindex = GetLSNIndex(slotno, xid);
- *lsn = XactCtl->shared->group_lsn[lsnindex];
+ lsnindex = GetLSNIndex(slotno, xid);
+ *lsn = XactCtl->shared->group_lsn[lsnindex];
+ }
LWLockRelease(XactSLRULock);
@@ -796,7 +801,7 @@ TrimCLOG(void)
int slotno;
char *byteptr;
- slotno = SimpleLruReadPage(XactCtl, pageno, false, xid);
+ slotno = SimpleLruReadPage(XactCtl, pageno, false, xid, true);
byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
/* Zero so-far-unused positions in the current byte */
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index cb8a968801..98c685405c 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -237,7 +237,7 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
- slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
+ slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid, true);
TransactionIdSetCommitTs(xid, ts, nodeid, slotno);
for (i = 0; i < nsubxids; i++)
@@ -342,7 +342,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
}
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(CommitTsCtl, pageno, xid);
+ slotno = SimpleLruReadPage_ReadOnly(CommitTsCtl, pageno, xid, true);
memcpy(&entry,
CommitTsCtl->shared->page_buffer[slotno] +
SizeOfCommitTimestampEntry * entryno,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 81752b68eb..5c0213b06d 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -881,7 +881,7 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
* enough that a MultiXactId is really involved. Perhaps someday we'll
* take the trouble to generalize the slru.c error reporting code.
*/
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -914,7 +914,7 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
if (pageno != prev_pageno)
{
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi, true);
prev_pageno = pageno;
}
@@ -1345,7 +1345,7 @@ retry:
pageno = MultiXactIdToOffsetPage(multi);
entryno = MultiXactIdToOffsetEntry(multi);
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
offset = *offptr;
@@ -1377,7 +1377,7 @@ retry:
entryno = MultiXactIdToOffsetEntry(tmpMXact);
if (pageno != prev_pageno)
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -1418,7 +1418,7 @@ retry:
if (pageno != prev_pageno)
{
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi, true);
prev_pageno = pageno;
}
@@ -2063,7 +2063,7 @@ TrimMultiXact(void)
int slotno;
MultiXactOffset *offptr;
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -2095,7 +2095,7 @@ TrimMultiXact(void)
int memberoff;
memberoff = MXOffsetToMemberOffset(offset);
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset, true);
xidptr = (TransactionId *)
(MultiXactMemberCtl->shared->page_buffer[slotno] + memberoff);
@@ -2749,7 +2749,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return false;
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi);
+ slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
offset = *offptr;
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 16a7898697..daa145eeff 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -385,14 +385,15 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
* The passed-in xid is used only for error reporting, and may be
* InvalidTransactionId if no specific xid is associated with the action.
*
- * Return value is the shared-buffer slot number now holding the page.
- * The buffer's LRU access info is updated.
+ * On error, when throwError is false, the return value is negative.
+ * Otherwise, return value is the shared-buffer slot number now holding the
+ * page, and the buffer's LRU access info is updated.
*
* Control lock must be held at entry, and will be held at exit.
*/
int
SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
- TransactionId xid)
+ TransactionId xid, bool throwError)
{
SlruShared shared = ctl->shared;
@@ -465,7 +466,11 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
/* Now it's okay to ereport if we failed */
if (!ok)
- SlruReportIOError(ctl, pageno, xid);
+ {
+ if (throwError)
+ SlruReportIOError(ctl, pageno, xid);
+ return InvalidSlotNo;
+ }
SlruRecentlyUsed(shared, slotno);
@@ -484,14 +489,16 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
* The passed-in xid is used only for error reporting, and may be
* InvalidTransactionId if no specific xid is associated with the action.
*
- * Return value is the shared-buffer slot number now holding the page.
- * The buffer's LRU access info is updated.
+ * On error, when throwError is false, the return value is negative.
+ * Otherwise, return value is the shared-buffer slot number now holding the
+ * page, and the buffer's LRU access info is updated.
*
* Control lock must NOT be held at entry, but will be held at exit.
* It is unspecified whether the lock will be shared or exclusive.
*/
int
-SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
+SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid,
+ bool throwError)
{
SlruShared shared = ctl->shared;
int slotno;
@@ -520,7 +527,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
LWLockRelease(shared->ControlLock);
LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
- return SimpleLruReadPage(ctl, pageno, true, xid);
+ return SimpleLruReadPage(ctl, pageno, true, xid, throwError);
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0111e867c7..353b946731 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -83,7 +83,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
- slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
+ slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid, true);
ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
ptr += entryno;
@@ -123,7 +123,7 @@ SubTransGetParent(TransactionId xid)
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid);
+ slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid, true);
ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
ptr += entryno;
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index a28918657c..88f867e5ef 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -35,7 +35,8 @@ static XidStatus cachedFetchXidStatus;
static XLogRecPtr cachedCommitLSN;
/* Local functions */
-static XidStatus TransactionLogFetch(TransactionId transactionId);
+static XidStatus TransactionLogFetch(TransactionId transactionId,
+ bool throwError);
/* ----------------------------------------------------------------
@@ -49,7 +50,7 @@ static XidStatus TransactionLogFetch(TransactionId transactionId);
* TransactionLogFetch --- fetch commit status of specified transaction id
*/
static XidStatus
-TransactionLogFetch(TransactionId transactionId)
+TransactionLogFetch(TransactionId transactionId, bool throwError)
{
XidStatus xidstatus;
XLogRecPtr xidlsn;
@@ -76,14 +77,16 @@ TransactionLogFetch(TransactionId transactionId)
/*
* Get the transaction status.
*/
- xidstatus = TransactionIdGetStatus(transactionId, &xidlsn);
+ xidstatus = TransactionIdGetStatus(transactionId, &xidlsn, throwError);
/*
* Cache it, but DO NOT cache status for unfinished or sub-committed
* transactions! We only cache status that is guaranteed not to change.
+ * Likewise, DO NOT cache when the status is unknown.
*/
if (xidstatus != TRANSACTION_STATUS_IN_PROGRESS &&
- xidstatus != TRANSACTION_STATUS_SUB_COMMITTED)
+ xidstatus != TRANSACTION_STATUS_SUB_COMMITTED &&
+ xidstatus != TRANSACTION_STATUS_UNKNOWN)
{
cachedFetchXid = transactionId;
cachedFetchXidStatus = xidstatus;
@@ -96,6 +99,7 @@ TransactionLogFetch(TransactionId transactionId)
/* ----------------------------------------------------------------
* Interface functions
*
+ * TransactionIdResolveStatus
* TransactionIdDidCommit
* TransactionIdDidAbort
* ========
@@ -115,24 +119,17 @@ TransactionLogFetch(TransactionId transactionId)
*/
/*
- * TransactionIdDidCommit
- * True iff transaction associated with the identifier did commit.
- *
- * Note:
- * Assumes transaction identifier is valid and exists in clog.
+ * TransactionIdResolveStatus
+ * Returns the status of the transaction associated with the identifer,
+ * recursively resolving sub-committed transaction status by checking
+ * the parent transaction.
*/
-bool /* true if given transaction committed */
-TransactionIdDidCommit(TransactionId transactionId)
+XidStatus
+TransactionIdResolveStatus(TransactionId transactionId, bool throwError)
{
XidStatus xidstatus;
- xidstatus = TransactionLogFetch(transactionId);
-
- /*
- * If it's marked committed, it's committed.
- */
- if (xidstatus == TRANSACTION_STATUS_COMMITTED)
- return true;
+ xidstatus = TransactionLogFetch(transactionId, throwError);
/*
* If it's marked subcommitted, we have to check the parent recursively.
@@ -153,21 +150,31 @@ TransactionIdDidCommit(TransactionId transactionId)
TransactionId parentXid;
if (TransactionIdPrecedes(transactionId, TransactionXmin))
- return false;
+ return TRANSACTION_STATUS_ABORTED;
parentXid = SubTransGetParent(transactionId);
if (!TransactionIdIsValid(parentXid))
{
elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
transactionId);
- return false;
+ return TRANSACTION_STATUS_ABORTED;
}
- return TransactionIdDidCommit(parentXid);
+ return TransactionIdResolveStatus(parentXid, throwError);
}
+ return xidstatus;
+}
- /*
- * It's not committed.
- */
- return false;
+/*
+ * TransactionIdDidCommit
+ * True iff transaction associated with the identifier did commit.
+ *
+ * Note:
+ * Assumes transaction identifier is valid and exists in clog.
+ */
+bool /* true if given transaction committed */
+TransactionIdDidCommit(TransactionId transactionId)
+{
+ return (TransactionIdResolveStatus(transactionId, true) ==
+ TRANSACTION_STATUS_COMMITTED);
}
/*
@@ -180,43 +187,8 @@ TransactionIdDidCommit(TransactionId transactionId)
bool /* true if given transaction aborted */
TransactionIdDidAbort(TransactionId transactionId)
{
- XidStatus xidstatus;
-
- xidstatus = TransactionLogFetch(transactionId);
-
- /*
- * If it's marked aborted, it's aborted.
- */
- if (xidstatus == TRANSACTION_STATUS_ABORTED)
- return true;
-
- /*
- * If it's marked subcommitted, we have to check the parent recursively.
- * However, if it's older than TransactionXmin, we can't look at
- * pg_subtrans; instead assume that the parent crashed without cleaning up
- * its children.
- */
- if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
- {
- TransactionId parentXid;
-
- if (TransactionIdPrecedes(transactionId, TransactionXmin))
- return true;
- parentXid = SubTransGetParent(transactionId);
- if (!TransactionIdIsValid(parentXid))
- {
- /* see notes in TransactionIdDidCommit */
- elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
- transactionId);
- return true;
- }
- return TransactionIdDidAbort(parentXid);
- }
-
- /*
- * It's not aborted.
- */
- return false;
+ return (TransactionIdResolveStatus(transactionId, true) ==
+ TRANSACTION_STATUS_ABORTED);
}
/*
@@ -419,7 +391,7 @@ TransactionIdGetCommitLSN(TransactionId xid)
/*
* Get the transaction status.
*/
- (void) TransactionIdGetStatus(xid, &result);
+ (void) TransactionIdGetStatus(xid, &result, true);
return result;
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8dbcace3f9..a49126dba0 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1477,7 +1477,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
slotno = SimpleLruZeroPage(NotifyCtl, pageno);
else
slotno = SimpleLruReadPage(NotifyCtl, pageno, true,
- InvalidTransactionId);
+ InvalidTransactionId, true);
/* Note we mark the page dirty before writing in it */
NotifyCtl->shared->page_dirty[slotno] = true;
@@ -2010,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
* part of the page we will actually inspect.
*/
slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
- InvalidTransactionId);
+ InvalidTransactionId, true);
if (curpage == QUEUE_POS_PAGE(head))
{
/* we only want to read as far as head */
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 8a365b400c..6cf12e46f6 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -904,7 +904,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
slotno = SimpleLruZeroPage(SerialSlruCtl, targetPage);
}
else
- slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
+ slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid, true);
SerialValue(slotno, xid) = minConflictCommitSeqNo;
SerialSlruCtl->shared->page_dirty[slotno] = true;
@@ -946,7 +946,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
* but will return with that lock held, which must then be released.
*/
slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
- SerialPage(xid), xid);
+ SerialPage(xid), xid, true);
val = SerialValue(slotno, xid);
LWLockRelease(SerialSLRULock);
return val;
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index 6c840cbf29..cf299cd8f6 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -11,24 +11,11 @@
#ifndef CLOG_H
#define CLOG_H
+#include "access/clogdefs.h"
#include "access/xlogreader.h"
#include "storage/sync.h"
#include "lib/stringinfo.h"
-/*
- * Possible transaction statuses --- note that all-zeroes is the initial
- * state.
- *
- * A "subcommitted" transaction is a committed subtransaction whose parent
- * hasn't committed or aborted yet.
- */
-typedef int XidStatus;
-
-#define TRANSACTION_STATUS_IN_PROGRESS 0x00
-#define TRANSACTION_STATUS_COMMITTED 0x01
-#define TRANSACTION_STATUS_ABORTED 0x02
-#define TRANSACTION_STATUS_SUB_COMMITTED 0x03
-
typedef struct xl_clog_truncate
{
int pageno;
@@ -38,7 +25,8 @@ typedef struct xl_clog_truncate
extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
-extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
+extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn,
+ bool throwError);
extern Size CLOGShmemBuffers(void);
extern Size CLOGShmemSize(void);
diff --git a/src/include/access/clogdefs.h b/src/include/access/clogdefs.h
new file mode 100644
index 0000000000..0f9996bb08
--- /dev/null
+++ b/src/include/access/clogdefs.h
@@ -0,0 +1,33 @@
+/*
+ * clogdefs.h
+ *
+ * PostgreSQL transaction-commit-log manager
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/clogdefs.h
+ */
+#ifndef CLOGDEFS_H
+#define CLOGDEFS_H
+
+/*
+ * Possible transaction statuses --- note that all-zeroes is the initial
+ * state.
+ *
+ * A "subcommitted" transaction is a committed subtransaction whose parent
+ * hasn't committed or aborted yet.
+ *
+ * An "unknown" status indicates an error condition, such as when the clog has
+ * been erroneously truncated and the commit status of a transaction cannot be
+ * determined.
+ */
+typedef enum XidStatus {
+ TRANSACTION_STATUS_IN_PROGRESS = 0x00,
+ TRANSACTION_STATUS_COMMITTED = 0x01,
+ TRANSACTION_STATUS_ABORTED = 0x02,
+ TRANSACTION_STATUS_SUB_COMMITTED = 0x03,
+ TRANSACTION_STATUS_UNKNOWN = 0x04 /* error condition */
+} XidStatus;
+
+#endif /* CLOG_H */
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b39b43504d..0b6a5669d8 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -133,6 +133,8 @@ typedef struct SlruCtlData
typedef SlruCtlData *SlruCtl;
+#define InvalidSlotNo ((int) -1)
+
extern Size SimpleLruShmemSize(int nslots, int nlsns);
extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
@@ -140,9 +142,9 @@ extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
SyncRequestHandler sync_handler);
extern int SimpleLruZeroPage(SlruCtl ctl, int pageno);
extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
- TransactionId xid);
+ TransactionId xid, bool throwError);
extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
- TransactionId xid);
+ TransactionId xid, bool throwError);
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 2f1f144db4..7d5e2f614d 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -14,6 +14,7 @@
#ifndef TRANSAM_H
#define TRANSAM_H
+#include "access/clogdefs.h"
#include "access/xlogdefs.h"
@@ -264,6 +265,8 @@ extern PGDLLIMPORT VariableCache ShmemVariableCache;
/*
* prototypes for functions in transam/transam.c
*/
+extern XidStatus TransactionIdResolveStatus(TransactionId transactionId,
+ bool throwError);
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
--
2.21.1 (Apple Git-122.3)
v18-0004-Using-non-throwing-clog-interface-from-amcheck.patchapplication/octet-stream; name=v18-0004-Using-non-throwing-clog-interface-from-amcheck.patch; x-unix-mode=0644Download
From d46e827af91a461447974e3387c4b9fa276b7437 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 6 Oct 2020 13:33:23 -0700
Subject: [PATCH v18 4/5] Using non-throwing clog interface from amcheck
Converting the heap checking functions to use the recently introduced
non-throwing interface to clog when checking transaction commit status, and
adding corruption reports about missing clog rather than aborting.
---
contrib/amcheck/verify_heapam.c | 84 +++++++++------
contrib/pg_amcheck/t/006_clog_truncation.pl | 111 ++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 -
3 files changed, 165 insertions(+), 31 deletions(-)
create mode 100644 contrib/pg_amcheck/t/006_clog_truncation.pl
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 7d7230a9c9..db5eec2504 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -10,6 +10,7 @@
*/
#include "postgres.h"
+#include "access/clogdefs.h"
#include "access/detoast.h"
#include "access/genam.h"
#include "access/heapam.h"
@@ -39,13 +40,6 @@ typedef enum XidBoundsViolation
XID_BOUNDS_OK
} XidBoundsViolation;
-typedef enum XidCommitStatus
-{
- XID_COMMITTED,
- XID_IN_PROGRESS,
- XID_ABORTED
-} XidCommitStatus;
-
typedef enum SkipPages
{
SKIP_PAGES_ALL_FROZEN,
@@ -79,7 +73,7 @@ typedef struct HeapCheckContext
* Cached copies of the most recently checked xid and its status.
*/
TransactionId cached_xid;
- XidCommitStatus cached_status;
+ XidStatus cached_status;
/* Values concerning the heap relation being checked */
Relation rel;
@@ -137,7 +131,7 @@ static void update_cached_xid_range(HeapCheckContext *ctx);
static void update_cached_mxid_range(HeapCheckContext *ctx);
static XidBoundsViolation check_mxid_in_range(MultiXactId mxid, HeapCheckContext *ctx);
static XidBoundsViolation check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx);
-static XidBoundsViolation get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status);
+static XidBoundsViolation get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidStatus *status);
/*
* Scan and report corruption in heap pages, optionally reconciling toasted
@@ -634,7 +628,7 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
else if (infomask & HEAP_MOVED_OFF ||
infomask & HEAP_MOVED_IN)
{
- XidCommitStatus status;
+ XidStatus status;
TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
switch (get_xid_status(xvac, ctx, &status))
@@ -678,17 +672,27 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
- case XID_COMMITTED:
- case XID_ABORTED:
+ case TRANSACTION_STATUS_COMMITTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier */
+ psprintf(_("old-style VACUUM FULL transaction ID %u transaction status is lost"),
+ xvac));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
}
else
{
- XidCommitStatus status;
+ XidStatus status;
switch (get_xid_status(raw_xmin, ctx, &status))
{
@@ -729,12 +733,22 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_COMMITTED:
+ case TRANSACTION_STATUS_COMMITTED:
break;
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* insert or delete in progress */
- case XID_ABORTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier */
+ psprintf(_("raw xmin %u transaction status is lost"),
+ raw_xmin));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
}
@@ -744,7 +758,7 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
{
if (infomask & HEAP_XMAX_IS_MULTI)
{
- XidCommitStatus status;
+ XidStatus status;
TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
switch (get_xid_status(xmax, ctx, &status))
@@ -787,12 +801,22 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
- case XID_COMMITTED:
- case XID_ABORTED:
+ case TRANSACTION_STATUS_COMMITTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_RECENTLY_DEAD or
* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ /*------
+ translator: %u is a transaction identifier */
+ psprintf(_("xmax %u transaction status is lost"),
+ xmax));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
@@ -1252,7 +1276,11 @@ check_tuple(HeapCheckContext *ctx)
break;
}
}
- /* If xmax is not a multixact and is normal, it should be within valid range */
+
+ /*
+ * If xmax is not a multixact and is normal, it should be within valid
+ * range
+ */
else
{
switch (get_xid_status(xmax, ctx, NULL))
@@ -1457,7 +1485,7 @@ check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
* status argument will be set with the status of the transaction ID.
*/
static XidBoundsViolation
-get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status)
+get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidStatus *status)
{
XidBoundsViolation result;
FullTransactionId fxid;
@@ -1508,19 +1536,15 @@ get_xid_status(TransactionId xid, HeapCheckContext *ctx, XidCommitStatus *status
return result;
}
- *status = XID_COMMITTED;
+ *status = TRANSACTION_STATUS_COMMITTED;
LWLockAcquire(XactTruncationLock, LW_SHARED);
clog_horizon = FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid, ctx);
if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
{
if (TransactionIdIsCurrentTransactionId(xid))
- *status = XID_IN_PROGRESS;
- else if (TransactionIdDidCommit(xid))
- *status = XID_COMMITTED;
- else if (TransactionIdDidAbort(xid))
- *status = XID_ABORTED;
+ *status = TRANSACTION_STATUS_IN_PROGRESS;
else
- *status = XID_IN_PROGRESS;
+ *status = TransactionIdResolveStatus(xid, false);
}
LWLockRelease(XactTruncationLock);
ctx->cached_xid = xid;
diff --git a/contrib/pg_amcheck/t/006_clog_truncation.pl b/contrib/pg_amcheck/t/006_clog_truncation.pl
new file mode 100644
index 0000000000..f205ae7ede
--- /dev/null
+++ b/contrib/pg_amcheck/t/006_clog_truncation.pl
@@ -0,0 +1,111 @@
+# This regression test checks the behavior of the heap validation in the
+# presence of clog corruption.
+
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 3;
+
+my ($node, $pgdata, $clogdir);
+
+sub count_clog_files
+{
+ my $result = 0;
+ opendir(DIR, $clogdir) or die "Cannot opendir $clogdir: $!";
+ while (my $fname = readdir(DIR))
+ {
+ $result++ if (-f "$clogdir/$fname");
+ }
+ closedir(DIR);
+ return $result;
+}
+
+# Burn through enough xids that at least three clog files exists in pg_xact/
+sub create_three_clog_files
+{
+ print STDERR "Generating clog entries....\n";
+
+ $node->safe_psql('postgres', q(
+ CREATE PROCEDURE burn_xids ()
+ LANGUAGE plpgsql
+ AS $$
+ DECLARE
+ loopcnt BIGINT;
+ BEGIN
+ FOR loopcnt IN 1..32768
+ LOOP
+ PERFORM txid_current();
+ COMMIT;
+ END LOOP;
+ END;
+ $$;
+ ));
+
+ do {
+ $node->safe_psql('postgres', 'INSERT INTO test_0 (i) VALUES (0)');
+ $node->safe_psql('postgres', 'CALL burn_xids()');
+ print STDERR "Burned transaction ids...\n";
+ $node->safe_psql('postgres', 'INSERT INTO test_1 (i) VALUES (1)');
+ } while (count_clog_files() < 3);
+}
+
+# Of the clog files in pg_xact, remove the second one, sorted by name order.
+# This function, used along with create_three_clog_files(), is intended to
+# remove neither the newest nor the oldest clog file. Experimentation shows
+# that removing the newest clog file works ok, but for future-proofing, remove
+# one less likely to be checked at server startup.
+sub unlink_second_clog_file
+{
+ my @paths;
+ opendir(DIR, $clogdir) or die "Cannot opendir $clogdir: $!";
+ while (my $fname = readdir(DIR))
+ {
+ my $path = "$clogdir/$fname";
+ next unless -f $path;
+ push @paths, $path;
+ }
+ closedir(DIR);
+
+ my @ordered = sort { $a cmp $b } @paths;
+ unlink $ordered[1];
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we corrupt clog, autovacuum workers visiting tables
+# could crash the backend. Disable autovacuum so that won't happen.
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$pgdata = $node->data_dir;
+$clogdir = join('/', $pgdata, 'pg_xact');
+$node->start;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE TABLE test_0 (i INTEGER)");
+$node->safe_psql('postgres', "CREATE TABLE test_1 (i INTEGER)");
+$node->safe_psql('postgres', "VACUUM FREEZE");
+
+create_three_clog_files();
+
+# Corruptly delete a clog file
+$node->stop;
+unlink_second_clog_file();
+$node->start;
+
+my $port = $node->port;
+
+# Run pg_amcheck against the corrupt database, looking for clog related
+# corruption messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ qr/transaction status is lost/ ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 369b8e7c6f..6ca1bac21a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2793,7 +2793,6 @@ XactCallbackItem
XactEvent
XactLockTableWaitInfo
XidBoundsViolation
-XidCommitStatus
XidHorizonPrefetchState
XidStatus
XmlExpr
--
2.21.1 (Apple Git-122.3)
v18-0005-Adding-ACL-checks-for-verify_heapam.patchapplication/octet-stream; name=v18-0005-Adding-ACL-checks-for-verify_heapam.patch; x-unix-mode=0644Download
From 59d20cc6af34150d0feb6ff2762d4473645054d3 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 7 Oct 2020 17:05:17 -0700
Subject: [PATCH v18 5/5] Adding ACL checks for verify_heapam
Requiring select privileges tables scanned by verify_heapam, in
addition to the already required execute privileges on the function.
---
contrib/amcheck/expected/check_heap.out | 6 ++++++
contrib/amcheck/sql/check_heap.sql | 7 +++++++
contrib/amcheck/verify_heapam.c | 8 ++++++++
doc/src/sgml/pgamcheck.sgml | 2 +-
4 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
index 882f853d56..41cdc6435c 100644
--- a/contrib/amcheck/expected/check_heap.out
+++ b/contrib/amcheck/expected/check_heap.out
@@ -95,6 +95,12 @@ SELECT * FROM verify_heapam(relation := 'heaptest');
ERROR: permission denied for function verify_heapam
RESET ROLE;
GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are checked (error due to no select privileges on relation)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ERROR: permission denied for table heaptest
+RESET ROLE;
+GRANT SELECT ON heaptest TO regress_heaptest_role;
-- verify permissions are now sufficient
SET ROLE regress_heaptest_role;
SELECT * FROM verify_heapam(relation := 'heaptest');
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
index c10a25f21c..c8397a46f0 100644
--- a/contrib/amcheck/sql/check_heap.sql
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -41,6 +41,13 @@ RESET ROLE;
GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are checked (error due to no select privileges on relation)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+GRANT SELECT ON heaptest TO regress_heaptest_role;
+
-- verify permissions are now sufficient
SET ROLE regress_heaptest_role;
SELECT * FROM verify_heapam(relation := 'heaptest');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index db5eec2504..1178c0cefe 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -23,6 +23,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
@@ -423,6 +424,8 @@ verify_heapam(PG_FUNCTION_ARGS)
static void
sanity_check_relation(Relation rel)
{
+ AclResult aclresult;
+
if (rel->rd_rel->relkind != RELKIND_RELATION &&
rel->rd_rel->relkind != RELKIND_MATVIEW &&
rel->rd_rel->relkind != RELKIND_TOASTVALUE)
@@ -436,6 +439,11 @@ sanity_check_relation(Relation rel)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only heap AM is supported")));
+ aclresult = pg_class_aclcheck(rel->rd_id, GetUserId(), ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult,
+ get_relkind_objtype(rel->rd_rel->relkind),
+ RelationGetRelationName(rel));
}
/*
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
index 3e059e7753..fc36447dda 100644
--- a/doc/src/sgml/pgamcheck.sgml
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -19,7 +19,7 @@
connecting as a user with sufficient privileges to check tables and indexes.
Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
<function>bt_index_parent_check</function> and <function>verify_heapam</function>
- functions.
+ functions, and on having privileges to access the relations being checked.
</para>
<synopsis>
--
2.21.1 (Apple Git-122.3)
On Mon, Oct 5, 2020 at 5:24 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
I don't see how verify_heapam will avoid raising an error during basic
validation from PageIsVerified(), which will violate the guarantee
about not throwing errors. I don't see that as a problem myself, but
presumably you will.My concern is not so much that verify_heapam will stop with an error, but rather that it might trigger a panic that stops all backends. Stopping with an error merely because it hits corruption is not ideal, as I would rather it completed the scan and reported all corruptions found, but that's minor compared to the damage done if verify_heapam creates downtime in a production environment offering high availability guarantees. That statement might seem nuts, given that the corrupt table itself would be causing downtime, but that analysis depends on assumptions about table access patterns, and there is no a priori reason to think that corrupt pages are necessarily ever being accessed, or accessed in a way that causes crashes (rather than merely wrong results) outside verify_heapam scanning the whole table.
That seems reasonable to me. I think that it makes sense to never take
down the server in a non-debug build with verify_heapam. That's not
what I took away from your previous remarks on the issue, but perhaps
it doesn't matter now.
--
Peter Geoghegan
On Wed, Oct 7, 2020 at 9:01 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
This next version, attached, has the acl checking and associated documentation changes split out into patch 0005, making it easier to review in isolation from the rest of the patch series.
Independently of acl considerations, this version also has some verbiage changes in 0004, in response to Andrey's review upthread.
I was about to commit 0001, after making some cosmetic changes, when I
discovered that it won't link for me. I think there must be something
wrong with the NLS stuff. My version of 0001 is attached. The error I
got is:
ccache clang -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -Wno-unused-command-line-argument -g -O2 -Wall -Werror
-fno-omit-frame-pointer -bundle -multiply_defined suppress -o
amcheck.so verify_heapam.o verify_nbtree.o -L../../src/port
-L../../src/common -L/opt/local/lib -L/opt/local/lib
-L/opt/local/lib -L/opt/local/lib -L/opt/local/lib
-Wl,-dead_strip_dylibs -Wall -Werror -fno-omit-frame-pointer
-bundle_loader ../../src/backend/postgres
Undefined symbols for architecture x86_64:
"_libintl_gettext", referenced from:
_verify_heapam in verify_heapam.o
_check_tuple in verify_heapam.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [amcheck.so] Error 1
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
v19-0001-Extend-amcheck-to-check-heap-pages.patchapplication/octet-stream; name=v19-0001-Extend-amcheck-to-check-heap-pages.patchDownload
From 09bfac919ac5cc4efe8329f544399abed89854f6 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 21 Oct 2020 15:28:44 -0400
Subject: [PATCH v19] Extend amcheck to check heap pages.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Mark Dilger, reviewed by Peter Geoghegan, Andres Freund, Álvaro Herrera,
Michael Paquier, Amul Sul, and by me. Some last-minute cosmetic
revisions by me.
Discussion: http://postgr.es/m/12ED3DA8-25F0-4B68-937D-D907CFBF08E7@enterprisedb.com
---
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 30 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/expected/check_heap.out | 194 +++
contrib/amcheck/sql/check_heap.sql | 116 ++
contrib/amcheck/t/001_verify_heapam.pl | 242 ++++
contrib/amcheck/verify_heapam.c | 1550 +++++++++++++++++++++++
doc/src/sgml/amcheck.sgml | 235 +++-
src/backend/access/heap/hio.c | 11 +
src/backend/access/transam/multixact.c | 19 +
src/include/access/multixact.h | 1 +
src/tools/pgindent/typedefs.list | 4 +
12 files changed, 2402 insertions(+), 9 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/t/001_verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..b82f221e50 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..7237ab738c
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,30 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip text default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass,
+ boolean,
+ boolean,
+ text,
+ bigint,
+ bigint)
+FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..882f853d56
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,194 @@
+CREATE TABLE heaptest (a integer, b text);
+REVOKE ALL ON heaptest FROM PUBLIC;
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+ERROR: invalid skip option
+HINT: Valid skip options are "all-visible", "all-frozen", and "none".
+-- Check specifying invalid block ranges when verifying an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 5, endblock := 8);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table, and that skip enum-like parameter is case-insensitive
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'None');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+CREATE ROLE regress_heaptest_role;
+-- verify permissions are checked (error due to function not callable)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ERROR: permission denied for function verify_heapam
+RESET ROLE;
+GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are now sufficient
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+RESET ROLE;
+-- Check specifying invalid block ranges when verifying a non-empty table.
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 10000);
+ERROR: ending block number must be between 0 and 0
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
+ERROR: starting block number must be between 0 and 0
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+CREATE TABLE test_partitioned (a int, b text default repeat('x', 5000))
+ PARTITION BY list (a);
+SELECT * FROM verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+CREATE TABLE test_partition partition OF test_partitioned FOR VALUES IN (1);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+INSERT INTO test_partitioned (a) (SELECT 1 FROM generate_series(1,1000) gs);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that indexes are rejected
+CREATE INDEX test_index ON test_partition (a);
+SELECT * FROM verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+-- Check that views are rejected
+CREATE VIEW test_view AS SELECT 1;
+SELECT * FROM verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+-- Check that sequences are rejected
+CREATE SEQUENCE test_sequence;
+SELECT * FROM verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+-- Check that foreign tables are rejected
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER dummy_server FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE test_foreign_table () SERVER dummy_server;
+SELECT * FROM verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
+-- cleanup
+DROP TABLE heaptest;
+DROP TABLE test_partition;
+DROP TABLE test_partitioned;
+DROP OWNED BY regress_heaptest_role; -- permissions
+DROP ROLE regress_heaptest_role;
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..c10a25f21c
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,116 @@
+CREATE TABLE heaptest (a integer, b text);
+REVOKE ALL ON heaptest FROM PUBLIC;
+
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+
+-- Check specifying invalid block ranges when verifying an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 5, endblock := 8);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table, and that skip enum-like parameter is case-insensitive
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'None');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+CREATE ROLE regress_heaptest_role;
+
+-- verify permissions are checked (error due to function not callable)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+
+-- verify permissions are now sufficient
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+-- Check specifying invalid block ranges when verifying a non-empty table.
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 10000);
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
+
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+CREATE TABLE test_partitioned (a int, b text default repeat('x', 5000))
+ PARTITION BY list (a);
+SELECT * FROM verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+CREATE TABLE test_partition partition OF test_partitioned FOR VALUES IN (1);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+INSERT INTO test_partitioned (a) (SELECT 1 FROM generate_series(1,1000) gs);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that indexes are rejected
+CREATE INDEX test_index ON test_partition (a);
+SELECT * FROM verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that views are rejected
+CREATE VIEW test_view AS SELECT 1;
+SELECT * FROM verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that sequences are rejected
+CREATE SEQUENCE test_sequence;
+SELECT * FROM verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that foreign tables are rejected
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER dummy_server FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE test_foreign_table () SERVER dummy_server;
+SELECT * FROM verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+
+-- cleanup
+DROP TABLE heaptest;
+DROP TABLE test_partition;
+DROP TABLE test_partitioned;
+DROP OWNED BY regress_heaptest_role; -- permissions
+DROP ROLE regress_heaptest_role;
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
new file mode 100644
index 0000000000..e7526c17b8
--- /dev/null
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -0,0 +1,242 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 65;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#
+# Check a table with data loaded but no corruption, freezing, etc.
+#
+fresh_test_table('test');
+check_all_options_uncorrupted('test', 'plain');
+
+#
+# Check a corrupt table
+#
+fresh_test_table('test');
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "plain corrupted table");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-visible')",
+ "plain corrupted table skipping all-visible");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "plain corrupted table skipping all-frozen");
+detects_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "plain corrupted table skipping toast");
+detects_corruption(
+ "verify_heapam('test', startblock := 0, endblock := 0)",
+ "plain corrupted table checking only block zero");
+
+#
+# Check a corrupt table with all-frozen data
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "all-frozen corrupted table");
+detects_no_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "all-frozen corrupted table skipping all-frozen");
+
+#
+# Check a corrupt table with corrupt page header
+#
+fresh_test_table('test');
+corrupt_first_page_and_header('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "corrupted test table with bad page header");
+
+#
+# Check an uncorrupted table with corrupt toast page header
+#
+fresh_test_table('test');
+my $toast = get_toast_for('test');
+corrupt_first_page_and_header($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast page header checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast page header skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast page header");
+
+#
+# Check an uncorrupted table with corrupt toast
+#
+fresh_test_table('test');
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table");
+
+#
+# Check an uncorrupted all-frozen table with corrupt toast
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "all-frozen table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "all-frozen table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table of all-frozen table");
+
+# Returns the filesystem path for the named relation.
+sub relation_filepath
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the fully qualified name of the toast table for the named relation
+sub get_toast_for
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ SELECT 'pg_toast.' || t.relname
+ FROM pg_catalog.pg_class c, pg_catalog.pg_class t
+ WHERE c.relname = '$relname'
+ AND c.reltoastrelid = t.oid));
+}
+
+# (Re)create and populate a test table of the given name.
+sub fresh_test_table
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ DROP TABLE IF EXISTS $relname CASCADE;
+ CREATE TABLE $relname (a integer, b text);
+ ALTER TABLE $relname SET (autovacuum_enabled=false);
+ ALTER TABLE $relname ALTER b SET STORAGE external;
+ INSERT INTO $relname (a, b)
+ (SELECT gs, repeat('b',gs*10) FROM generate_series(1,1000) gs);
+ ));
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+sub corrupt_first_page_internal
+{
+ my ($relname, $corrupt_header) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+
+ # If we corrupt the header, postgres won't allow the page into the buffer.
+ syswrite($fh, '\xFF\xFF\xFF\xFF', 8) if ($corrupt_header);
+
+ # Corrupt at least the line pointers. Exactly what this corrupts will
+ # depend on the page, as it may run past the line pointers into the user
+ # data. We stop short of writing 2048 bytes (2k), the smallest supported
+ # page size, as we don't want to corrupt the next page.
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+sub corrupt_first_page
+{
+ corrupt_first_page_internal($_[0], undef);
+}
+
+sub corrupt_first_page_and_header
+{
+ corrupt_first_page_internal($_[0], 1);
+}
+
+sub detects_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) > 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+sub detects_no_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) = 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+# Check various options are stable (don't abort) and do not report corruption
+# when running verify_heapam on an uncorrupted test table.
+#
+# The relname *must* be an uncorrupted table, or this will fail.
+#
+# The prefix is used to identify the test, along with the options,
+# and should be unique.
+sub check_all_options_uncorrupted
+{
+ my ($relname, $prefix) = @_;
+ for my $stop (qw(true false))
+ {
+ for my $check_toast (qw(true false))
+ {
+ for my $skip ("'none'", "'all-frozen'", "'all-visible'")
+ {
+ for my $startblock (qw(NULL 0))
+ {
+ for my $endblock (qw(NULL 0))
+ {
+ my $opts = "on_error_stop := $stop, " .
+ "check_toast := $check_toast, " .
+ "skip := $skip, " .
+ "startblock := $startblock, " .
+ "endblock := $endblock";
+
+ detects_no_corruption(
+ "verify_heapam('$relname', $opts)",
+ "$prefix: $opts");
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..828ce3fd4c
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1550 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/* The number of columns in tuples returned by verify_heapam */
+#define HEAPCHECK_RELATION_COLS 4
+
+/*
+ * Despite the name, we use this for reporting problems with both XIDs and
+ * MXIDs.
+ */
+typedef enum XidBoundsViolation
+{
+ XID_INVALID,
+ XID_IN_FUTURE,
+ XID_PRECEDES_CLUSTERMIN,
+ XID_PRECEDES_RELMIN,
+ XID_BOUNDS_OK
+} XidBoundsViolation;
+
+typedef enum XidCommitStatus
+{
+ XID_COMMITTED,
+ XID_IN_PROGRESS,
+ XID_ABORTED
+} XidCommitStatus;
+
+typedef enum SkipPages
+{
+ SKIP_PAGES_ALL_FROZEN,
+ SKIP_PAGES_ALL_VISIBLE,
+ SKIP_PAGES_NONE
+} SkipPages;
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam execution.
+ */
+typedef struct HeapCheckContext
+{
+ /*
+ * Cached copies of values from ShmemVariableCache and computed values
+ * from them.
+ */
+ FullTransactionId next_fxid; /* ShmemVariableCache->nextXid */
+ TransactionId next_xid; /* 32-bit version of next_fxid */
+ TransactionId oldest_xid; /* ShmemVariableCache->oldestXid */
+ FullTransactionId oldest_fxid; /* 64-bit version of oldest_xid, computed
+ * relative to next_fxid */
+
+ /*
+ * Cached copy of value from MultiXactState
+ */
+ MultiXactId next_mxact; /* MultiXactState->nextMXact */
+ MultiXactId oldest_mxact; /* MultiXactState->oldestMultiXactId */
+
+ /*
+ * Cached copies of the most recently checked xid and its status.
+ */
+ TransactionId cached_xid;
+ XidCommitStatus cached_status;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ FullTransactionId relfrozenfxid;
+ TransactionId relminmxid;
+ Relation toast_rel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Whether verify_heapam has yet encountered any corrupt tuples */
+ bool is_corrupt;
+
+ /* The descriptor and tuplestore for verify_heapam's result tuples */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void sanity_check_relation(Relation rel);
+static void check_tuple(HeapCheckContext *ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx);
+
+static bool check_tuple_attribute(HeapCheckContext *ctx);
+static bool check_tuple_header_and_visibilty(HeapTupleHeader tuphdr,
+ HeapCheckContext *ctx);
+
+static void report_corruption(HeapCheckContext *ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+static FullTransactionId FullTransactionIdFromXidAndCtx(TransactionId xid,
+ const HeapCheckContext *ctx);
+static void update_cached_xid_range(HeapCheckContext *ctx);
+static void update_cached_mxid_range(HeapCheckContext *ctx);
+static XidBoundsViolation check_mxid_in_range(MultiXactId mxid,
+ HeapCheckContext *ctx);
+static XidBoundsViolation check_mxid_valid_in_rel(MultiXactId mxid,
+ HeapCheckContext *ctx);
+static XidBoundsViolation get_xid_status(TransactionId xid,
+ HeapCheckContext *ctx,
+ XidCommitStatus *status);
+
+/*
+ * Scan and report corruption in heap pages, optionally reconciling toasted
+ * attributes with entries in the associated toast table. Intended to be
+ * called from SQL with the following parameters:
+ *
+ * relation:
+ * The Oid of the heap relation to be checked.
+ *
+ * on_error_stop:
+ * Whether to stop at the end of the first page for which errors are
+ * detected. Note that multiple rows may be returned.
+ *
+ * check_toast:
+ * Whether to check each toasted attribute against the toast table to
+ * verify that it can be found there.
+ *
+ * skip:
+ * What kinds of pages in the heap relation should be skipped. Valid
+ * options are "all-visible", "all-frozen", and "none".
+ *
+ * Returns to the SQL caller a set of tuples, each containing the location
+ * and a description of a corruption found in the heap.
+ *
+ * This code goes to some trouble to avoid crashing the server even if the
+ * table pages are badly corrupted, but it's probably not perfect. If
+ * check_toast is true, we'll use regular index lookups to try to fetch TOAST
+ * tuples, which can certainly cause crashes if the right kind of corruption
+ * exists in the toast table or index. No matter what parameters you pass,
+ * we can't protect against crashes that might occur trying to look up the
+ * commit status of transaction IDs (though we avoid trying to do such lookups
+ * for transaction IDs that can't legally appear in the table).
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext old_context;
+ bool random_access;
+ HeapCheckContext ctx;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool check_toast;
+ SkipPages skip_option = SKIP_PAGES_NONE;
+ BlockNumber first_block;
+ BlockNumber last_block;
+ BlockNumber nblocks;
+ const char *skip;
+
+ /* Check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("relation cannot be null")));
+ relid = PG_GETARG_OID(0);
+
+ if (PG_ARGISNULL(1))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("on_error_stop cannot be null")));
+ on_error_stop = PG_GETARG_BOOL(1);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check_toast cannot be null")));
+ check_toast = PG_GETARG_BOOL(2);
+
+ if (PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("skip cannot be null")));
+ skip = text_to_cstring(PG_GETARG_TEXT_PP(3));
+ if (pg_strcasecmp(skip, "all-visible") == 0)
+ skip_option = SKIP_PAGES_ALL_VISIBLE;
+ else if (pg_strcasecmp(skip, "all-frozen") == 0)
+ skip_option = SKIP_PAGES_ALL_FROZEN;
+ else if (pg_strcasecmp(skip, "none") == 0)
+ skip_option = SKIP_PAGES_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid skip option"),
+ errhint("Valid skip options are \"all-visible\", \"all-frozen\", and \"none\".")));
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+ ctx.cached_xid = InvalidTransactionId;
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ old_context = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ random_access = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(random_access, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+ MemoryContextSwitchTo(old_context);
+
+ /* Open relation, check relkind and access method, and check privileges */
+ ctx.rel = relation_open(relid, AccessShareLock);
+ sanity_check_relation(ctx.rel);
+
+ /* Early exit if the relation is empty */
+ nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ if (!nblocks)
+ {
+ relation_close(ctx.rel, AccessShareLock);
+ PG_RETURN_NULL();
+ }
+
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ /* Validate block numbers, or handle nulls. */
+ if (PG_ARGISNULL(4))
+ first_block = 0;
+ else
+ {
+ int64 fb = PG_GETARG_INT64(4);
+
+ if (fb < 0 || fb >= nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("starting block number must be between 0 and %u",
+ nblocks - 1)));
+ first_block = (BlockNumber) fb;
+ }
+ if (PG_ARGISNULL(5))
+ last_block = nblocks - 1;
+ else
+ {
+ int64 lb = PG_GETARG_INT64(5);
+
+ if (lb < 0 || lb >= nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block number must be between 0 and %u",
+ nblocks - 1)));
+ last_block = (BlockNumber) lb;
+ }
+
+ /* Optionally open the toast relation, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid && check_toast)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toast_rel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ AccessShareLock);
+ offset = toast_open_indexes(ctx.toast_rel,
+ AccessShareLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /*
+ * Main relation has no associated toast relation, or we're
+ * intentionally skipping it.
+ */
+ ctx.toast_rel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ update_cached_xid_range(&ctx);
+ update_cached_mxid_range(&ctx);
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relfrozenfxid = FullTransactionIdFromXidAndCtx(ctx.relfrozenxid, &ctx);
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldest_xid = ctx.relfrozenxid;
+
+ for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++)
+ {
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_option != SKIP_PAGES_NONE)
+ {
+ int32 mapbits;
+
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_option == SKIP_PAGES_ALL_FROZEN)
+ {
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ if (skip_option == SKIP_PAGES_ALL_VISIBLE)
+ {
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ }
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it
+ * redirects to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
+ ItemId rditem;
+
+ if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ {
+ report_corruption(&ctx,
+ /*------
+ translator: the %u is an offset */
+ psprintf(_("line pointer redirection to item at offset %u exceeds maximum offset %u"),
+ (unsigned) rdoffnum,
+ (unsigned) maxoff));
+ continue;
+ }
+ rditem = PageGetItemId(ctx.page, rdoffnum);
+ if (!ItemIdIsUsed(rditem))
+ report_corruption(&ctx,
+ /*------
+ translator: the %u is an offset */
+ psprintf(_("line pointer redirection to unused item at offset %u"),
+ (unsigned) rdoffnum));
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ AccessShareLock);
+ if (ctx.toast_rel)
+ table_close(ctx.toast_rel, AccessShareLock);
+
+ /* Close the main relation */
+ relation_close(ctx.rel, AccessShareLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Check that a relation's relkind and access method are both supported,
+ * and that the caller has select privilege on the relation.
+ */
+static void
+sanity_check_relation(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ /*------
+ translator: %s is a user supplied object name */
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only heap AM is supported")));
+}
+
+/*
+ * Record a single corruption found in the table. The values in ctx should
+ * reflect the location of the corruption, and the msg argument should contain
+ * a human readable description of the corruption.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+report_corruption(HeapCheckContext *ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ values[2] = Int32GetDatum(ctx->attnum);
+ nulls[2] = (ctx->attnum < 0);
+ values[3] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using work_mem worth of memory building up the
+ * tuplestore. That's ok, but if we also leak the msg argument memory
+ * until the end of the query, we could exceed work_mem by more than a
+ * trivial amount. Therefore, free the msg argument each time we are
+ * called rather than waiting for our current memory context to be freed.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Construct the TupleDesc used to report messages about corruptions found
+ * while scanning the heap.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Check for tuple header corruption and tuple visibility.
+ *
+ * Since we do not hold a snapshot, tuple visibility is not a question of
+ * whether we should be able to see the tuple relative to any particular
+ * snapshot, but rather a question of whether it is safe and reasonable to
+ * to check the tuple attributes.
+ *
+ * Some kinds of corruption make it unsafe to check the tuple attributes, for
+ * example when the line pointer refers to a range of bytes outside the page.
+ * In such cases, we return false (not visible) after recording appropriate
+ * corruption messages.
+ *
+ * Some other kinds of tuple header corruption confuse the question of where
+ * the tuple attributes begin, or how long the nulls bitmap is, etc., making it
+ * unreasonable to attempt to check attributes, even if all candidate answers
+ * to those questions would not result in reading past the end of the line
+ * pointer or page. In such cases, like above, we record corruption messages
+ * about the header and then return false.
+ *
+ * Other kinds of tuple header corruption do not bear on the question of
+ * whether the tuple attributes can be checked, so we record corruption
+ * messages for them but do not base our visibility determination on them. (In
+ * other words, we do not return false merely because we detected them.)
+ *
+ * For visibility determination not specifically related to corruption, what we
+ * want to know is if a tuple is potentially visible to any running
+ * transaction. If you are tempted to replace this function's visibility logic
+ * with a call to another visibility checking function, keep in mind that this
+ * function does not update hint bits, as it seems imprudent to write hint bits
+ * (or anything at all) to a table during a corruption check. Nor does this
+ * function bother classifying tuple visibility beyond a boolean visible vs.
+ * not visible.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ *
+ * Returns whether the tuple is both visible and sufficiently sensible to
+ * undergo attribute checks.
+ */
+static bool
+check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+ bool header_garbled = false;
+ unsigned expected_hoff;;
+
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u is an offset, second %u is a length */
+ psprintf(_("data begins at offset %u beyond the tuple length %u"),
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ header_garbled = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ report_corruption(ctx,
+ pstrdup(_("tuple is marked as only locked, but also claims key columns were updated")));
+ header_garbled = true;
+ }
+
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ report_corruption(ctx,
+ pstrdup(_("multixact should not be marked committed")));
+
+ /*
+ * This condition is clearly wrong, but we do not consider the header
+ * garbled, because we don't rely on this property for determining if
+ * the tuple is visible or for interpreting other relevant header
+ * fields.
+ */
+ }
+
+ if (infomask & HEAP_HASNULL)
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts));
+ else
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader);
+ if (ctx->tuphdr->t_hoff != expected_hoff)
+ {
+ if ((infomask & HEAP_HASNULL) && ctx->natts == 1)
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent an offset */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, has nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff));
+ else if ((infomask & HEAP_HASNULL))
+ report_corruption(ctx,
+ /*------
+ translator: first and second %u represent an offset, third %u
+ represents the number of attributes */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, has nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
+ else if (ctx->natts == 1)
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent an offset */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, no nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff));
+ else
+ report_corruption(ctx,
+ /*------
+ translator: first and second %u represent an offset, third %u
+ represents the number of attributes */
+ psprintf(_("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, no nulls)"),
+ expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
+ header_garbled = true;
+ }
+
+ if (header_garbled)
+ return false; /* checking of this tuple should not continue */
+
+ /*
+ * Ok, we can examine the header for tuple visibility purposes, though we
+ * still need to be careful about a few remaining types of header
+ * corruption. This logic roughly follows that of
+ * HeapTupleSatisfiesVacuum. Where possible the comments indicate which
+ * HTSV_Result we think that function might return for this tuple.
+ */
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ return false; /* HEAPTUPLE_DEAD */
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ XidCommitStatus status;
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ switch (get_xid_status(xvac, ctx, &status))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("old-style VACUUM FULL transaction ID is invalid")));
+ return false; /* corrupt */
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u equals or exceeds next valid transaction ID %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u precedes relation freeze threshold %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ break;
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("old-style VACUUM FULL transaction ID %u precedes oldest valid transaction ID %u:%u"),
+ xvac,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ break;
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_IN_PROGRESS:
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ case XID_COMMITTED:
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+ }
+ else
+ {
+ XidCommitStatus status;
+
+ switch (get_xid_status(raw_xmin, ctx, &status))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("raw xmin is invalid")));
+ return false;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u equals or exceeds next valid transaction ID %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u precedes relation freeze threshold %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("raw xmin %u precedes oldest valid transaction ID %u:%u"),
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_COMMITTED:
+ break;
+ case XID_IN_PROGRESS:
+ return true; /* insert or delete in progress */
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ XidCommitStatus status;
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ switch (get_xid_status(xmax, ctx, &status))
+ {
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("xmax is invalid")));
+ return false; /* corrupt */
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u equals or exceeds next valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u precedes relation freeze threshold %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction
+ identifier, second %u is an epoch */
+ psprintf(_("xmax %u precedes oldest valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_IN_PROGRESS:
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ case XID_COMMITTED:
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or
+ * HEAPTUPLE_DEAD */
+ }
+ }
+
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS or
+ * HEAPTUPLE_LIVE */
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true; /* not dead */
+}
+
+/*
+ * Check the current toast tuple against the state tracked in ctx, recording
+ * any corruption found in ctx->tupstore.
+ *
+ * This is not equivalent to running verify_heapam on the toast table itself,
+ * and is not hardened against corruption of the toast table. Rather, when
+ * validating a toasted attribute in the main table, the sequence of toast
+ * tuples that store the toasted value are retrieved and checked in order, with
+ * each toast tuple being checked against where we are in the sequence, as well
+ * as each toast tuple having its varlena structure sanity checked.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk sequence number is null")));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup(_("toast chunk data is null")));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ }
+ else
+ {
+ /* should never happen */
+ uint32 header = ((varattrib_4b *) chunk)->va_4byte.va_header;
+
+ report_corruption(ctx,
+ /*------
+ translator: %0x represents a bit pattern in hexadecimal, %d represents
+ the sequence number */
+ psprintf(_("corrupt extended toast chunk has invalid varlena header: %0x (sequence number %d)"),
+ header, curchunk));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a sequence number */
+ psprintf(_("toast chunk sequence number %u does not match the expected sequence number %u"),
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a sequence number */
+ psprintf(_("toast chunk sequence number %u exceeds the end chunk sequence number %u"),
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a chunk size */
+ psprintf(_("toast chunk size %u differs from the expected size %u"),
+ chunksize, expected_size));
+ return;
+ }
+}
+
+/*
+ * Check the current attribute as tracked in ctx, recording any corruption
+ * found in ctx->tupstore.
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in the
+ * case of a toasted value, optionally continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The checks
+ * performed by Asserts present in those two functions are also performed here.
+ * In cases where those two functions are a bit cavalier in their assumptions
+ * about data being correct, we perform additional checks not present in either
+ * of those two functions. Where some condition is checked in both of those
+ * functions, we perform it here twice, as we parallel the logical flow of
+ * those two functions. The presence of duplicate checks seems a reasonable
+ * price to pay for keeping this code tightly coupled with the code it
+ * protects.
+ *
+ * Returns true if the tuple attribute is sane enough for processing to
+ * continue on to the next attribute, false otherwise.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext *ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask;
+ Form_pg_attribute thisatt;
+
+ infomask = ctx->tuphdr->t_infomask;
+ thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and fourth
+ %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u starts at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and
+ fourth %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u ends at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number, second %u
+ represents an enumeration value */
+ psprintf(_("toasted attribute %u has unexpected TOAST tag %u"),
+ ctx->attnum,
+ va_tag));
+ /* We can't know where the next attribute begins */
+ return false;
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents an attribute number; second and fourth
+ %u represent a length; third %u represents an offset */
+ psprintf(_("attribute %u with length %u ends at offset %u beyond total tuple length %u"),
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+
+ return false;
+ }
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("attribute %u is external but tuple header flag HEAP_HASEXTERNAL not set"),
+ ctx->attnum));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("attribute %u is external but relation has no toast relation"),
+ ctx->attnum));
+ return true;
+ }
+
+ /* If we were told to skip toast checking, then we're done. */
+ if (ctx->toast_rel == NULL)
+ return true;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toast_rel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ ctx->chunkno++;
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ report_corruption(ctx,
+ /*------
+ translator: both %u represent a chunk number */
+ psprintf(_("final toast chunk number %u differs from expected value %u"),
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ report_corruption(ctx,
+ /*------
+ translator: %u represents the attribute number */
+ psprintf(_("toasted value for attribute %u missing from toast table"),
+ ctx->attnum));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * Check the current tuple as tracked in ctx, recording any corruption found in
+ * ctx->tupstore.
+ */
+static void
+check_tuple(HeapCheckContext *ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /*
+ * If we report corruption before iterating over individual attributes, we
+ * need attnum to be reported as NULL. Set that up before any corruption
+ * reporting might happen.
+ */
+ ctx->attnum = -1;
+
+ /*
+ * If the line pointer for this tuple does not reserve enough space for a
+ * complete tuple header, we dare not read the tuple header.
+ */
+ if (ctx->lp_len < MAXALIGN(SizeofHeapTupleHeader))
+ {
+ report_corruption(ctx,
+ /*------
+ translator: first %u represents a length, second %u represents a size
+ */
+ psprintf(_("line pointer length %u is less than the minimum tuple header size %u"),
+ ctx->lp_len, (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ return;
+ }
+
+ /* If xmin is normal, it should be within valid range */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ switch (get_xid_status(xmin, ctx, NULL))
+ {
+ case XID_INVALID:
+ case XID_BOUNDS_OK:
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u equals or exceeds next valid transaction ID %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u precedes oldest valid transaction ID %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier, second
+ %u is an epoch */
+ psprintf(_("xmin %u precedes relation freeze threshold %u:%u"),
+ xmin,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ fatal = true;
+ break;
+ }
+
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ /* xmax is a multixact, so it should be within valid MXID range */
+ switch (check_mxid_valid_in_rel(xmax, ctx))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup(_("multitransaction ID is invalid")));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: both %u are multitransaction IDs */
+ psprintf(_("multitransaction ID %u precedes relation minimum multitransaction ID threshold %u"),
+ xmax, ctx->relminmxid));
+ fatal = true;
+ break;
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ /*------
+ translator: Both %u are multitransaction IDs */
+ psprintf(_("multitransaction ID %u precedes oldest valid multitransaction ID threshold %u"),
+ xmax, ctx->oldest_mxact));
+ fatal = true;
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: %u is a multitransaction ID */
+ psprintf(_("multitransaction ID %u equals or exceeds next valid multitransaction ID %u"),
+ xmax,
+ ctx->next_mxact));
+ fatal = true;
+ break;
+ case XID_BOUNDS_OK:
+ break;
+ }
+ }
+ else
+ {
+ /*
+ * xmax is not a multixact and is normal, so it should be within the
+ * valid XID range.
+ */
+ switch (get_xid_status(xmax, ctx, NULL))
+ {
+ case XID_INVALID:
+ case XID_BOUNDS_OK:
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u equals or exceeds next valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u precedes oldest valid transaction ID %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ /*------
+ translator: first and third %u are a transaction identifier,
+ second %u is an epoch */
+ psprintf(_("xmax %u precedes relation freeze threshold %u:%u"),
+ xmax,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ fatal = true;
+ }
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Check various forms of tuple header corruption. If the header is too
+ * corrupt to continue checking, or if the tuple is not visible to anyone,
+ * we cannot continue with other checks.
+ */
+ if (!check_tuple_header_and_visibilty(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * The tuple is visible, so it must be compatible with the current version
+ * of the relation descriptor. It might have fewer columns than are
+ * present in the relation descriptor, but it cannot have more.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ report_corruption(ctx,
+ /*------
+ translator: both %u are a number */
+ psprintf(_("number of attributes %u exceeds maximum expected for table %u"),
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
+ return;
+ }
+
+ /*
+ * Check each attribute unless we hit corruption that confuses what to do
+ * next, at which point we abort further attribute checks for this tuple.
+ * Note that we don't abort for all types of corruption, only for those
+ * types where we don't know how to continue.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ if (!check_tuple_attribute(ctx))
+ break; /* cannot continue */
+}
+
+/*
+ * Convert a TransactionId into a FullTransactionId using our cached values of
+ * the valid transaction ID range. It is the caller's responsibility to have
+ * already updated the cached values, if necessary.
+ */
+static FullTransactionId
+FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
+{
+ uint32 epoch;
+
+ if (!TransactionIdIsNormal(xid))
+ return FullTransactionIdFromEpochAndXid(0, xid);
+ epoch = EpochFromFullTransactionId(ctx->next_fxid);
+ if (xid > ctx->next_xid)
+ epoch--;
+ return FullTransactionIdFromEpochAndXid(epoch, xid);
+}
+
+/*
+ * Update our cached range of valid transaction IDs.
+ */
+static void
+update_cached_xid_range(HeapCheckContext *ctx)
+{
+ /* Make cached copies */
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ ctx->next_fxid = ShmemVariableCache->nextXid;
+ ctx->oldest_xid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+
+ /* And compute alternate versions of the same */
+ ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
+ ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+}
+
+/*
+ * Update our cached range of valid multitransaction IDs.
+ */
+static void
+update_cached_mxid_range(HeapCheckContext *ctx)
+{
+ ReadMultiXactIdRange(&ctx->oldest_mxact, &ctx->next_mxact);
+}
+
+/*
+ * Return whether the given FullTransactionId is within our cached valid
+ * transaction ID range.
+ */
+static inline bool
+fxid_in_cached_range(FullTransactionId fxid, const HeapCheckContext *ctx)
+{
+ return (FullTransactionIdPrecedesOrEquals(ctx->oldest_fxid, fxid) &&
+ FullTransactionIdPrecedes(fxid, ctx->next_fxid));
+}
+
+/*
+ * Checks wheter a multitransaction ID is in the cached valid range, returning
+ * the nature of the range violation, if any.
+ */
+static XidBoundsViolation
+check_mxid_in_range(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ if (!TransactionIdIsValid(mxid))
+ return XID_INVALID;
+ if (MultiXactIdPrecedes(mxid, ctx->relminmxid))
+ return XID_PRECEDES_RELMIN;
+ if (MultiXactIdPrecedes(mxid, ctx->oldest_mxact))
+ return XID_PRECEDES_CLUSTERMIN;
+ if (MultiXactIdPrecedesOrEquals(ctx->next_mxact, mxid))
+ return XID_IN_FUTURE;
+ return XID_BOUNDS_OK;
+}
+
+/*
+ * Checks whether the given mxid is valid to appear in the heap being checked,
+ * returning the nature of the range violation, if any.
+ *
+ * This function attempts to return quickly by caching the known valid mxid
+ * range in ctx. Callers should already have performed the initial setup of
+ * the cache prior to the first call to this function.
+ */
+static XidBoundsViolation
+check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ XidBoundsViolation result;
+
+ result = check_mxid_in_range(mxid, ctx);
+ if (result == XID_BOUNDS_OK)
+ return XID_BOUNDS_OK;
+
+ /* The range may have advanced. Recheck. */
+ update_cached_mxid_range(ctx);
+ return check_mxid_in_range(mxid, ctx);
+}
+
+/*
+ * Checks whether the given transaction ID is (or was recently) valid to appear
+ * in the heap being checked, or whether it is too old or too new to appear in
+ * the relation, returning information about the nature of the bounds violation.
+ *
+ * We cache the range of valid transaction IDs. If xid is in that range, we
+ * conclude that it is valid, even though concurrent changes to the table might
+ * invalidate it under certain corrupt conditions. (For example, if the table
+ * contains corrupt all-frozen bits, a concurrent vacuum might skip the page(s)
+ * containing the xid and then truncate clog and advance the relfrozenxid
+ * beyond xid.) Reporting the xid as valid under such conditions seems
+ * acceptable, since if we had checked it earlier in our scan it would have
+ * truly been valid at that time.
+ *
+ * If the status argument is not NULL, and if and only if the transaction ID
+ * appears to be valid in this relation, clog will be consulted and the commit
+ * status argument will be set with the status of the transaction ID.
+ */
+static XidBoundsViolation
+get_xid_status(TransactionId xid, HeapCheckContext *ctx,
+ XidCommitStatus *status)
+{
+ XidBoundsViolation result;
+ FullTransactionId fxid;
+ FullTransactionId clog_horizon;
+
+ /* Quick check for special xids */
+ if (!TransactionIdIsValid(xid))
+ result = XID_INVALID;
+ else if (xid == BootstrapTransactionId || xid == FrozenTransactionId)
+ result = XID_BOUNDS_OK;
+ else
+ {
+ /* Check if the xid is within bounds */
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ if (!fxid_in_cached_range(fxid, ctx))
+ {
+ /*
+ * We may have been checking against stale values. Update the
+ * cached range to be sure, and since we relied on the cached
+ * range when we performed the full xid conversion, reconvert.
+ */
+ update_cached_xid_range(ctx);
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ }
+
+ if (FullTransactionIdPrecedesOrEquals(ctx->next_fxid, fxid))
+ result = XID_IN_FUTURE;
+ else if (FullTransactionIdPrecedes(fxid, ctx->oldest_fxid))
+ result = XID_PRECEDES_CLUSTERMIN;
+ else if (FullTransactionIdPrecedes(fxid, ctx->relfrozenfxid))
+ result = XID_PRECEDES_RELMIN;
+ else
+ result = XID_BOUNDS_OK;
+ }
+
+ /*
+ * Early return if the caller does not request clog checking, or if the
+ * xid is already known to be out of bounds. We dare not check clog for
+ * out of bounds transaction IDs.
+ */
+ if (status == NULL || result != XID_BOUNDS_OK)
+ return result;
+
+ /* Early return if we just checked this xid in a prior call */
+ if (xid == ctx->cached_xid)
+ {
+ *status = ctx->cached_status;
+ return result;
+ }
+
+ *status = XID_COMMITTED;
+ LWLockAcquire(XactTruncationLock, LW_SHARED);
+ clog_horizon =
+ FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid,
+ ctx);
+ if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
+ {
+ if (TransactionIdIsCurrentTransactionId(xid))
+ *status = XID_IN_PROGRESS;
+ else if (TransactionIdDidCommit(xid))
+ *status = XID_COMMITTED;
+ else if (TransactionIdDidAbort(xid))
+ *status = XID_ABORTED;
+ else
+ *status = XID_IN_PROGRESS;
+ }
+ LWLockRelease(XactTruncationLock);
+ ctx->cached_xid = xid;
+ ctx->cached_status = *status;
+ return result;
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a9df2c1a9d..25e4bb2bfe 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -9,12 +9,11 @@
<para>
The <filename>amcheck</filename> module provides functions that allow you to
- verify the logical consistency of the structure of relations. If the
- structure appears to be valid, no error is raised.
+ verify the logical consistency of the structure of relations.
</para>
<para>
- The functions verify various <emphasis>invariants</emphasis> in the
+ The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the
structure of the representation of particular relations. The
correctness of the access method functions behind index scans and
other important operations relies on these invariants always
@@ -24,7 +23,7 @@
collated lexical order). If that particular invariant somehow fails
to hold, we can expect binary searches on the affected page to
incorrectly guide index scans, resulting in wrong answers to SQL
- queries.
+ queries. If the structure appears to be valid, no error is raised.
</para>
<para>
Verification is performed using the same procedures as those used by
@@ -35,7 +34,22 @@
functions.
</para>
<para>
- <filename>amcheck</filename> functions may only be used by superusers.
+ Unlike the B-Tree checking functions which report corruption by raising
+ errors, the heap checking function <function>verify_heapam</function> checks
+ a table and attempts to return a set of rows, one row per corruption
+ detected. Despite this, if facilities that
+ <function>verify_heapam</function> relies upon are themselves corrupted, the
+ function may be unable to continue and may instead raise an error.
+ </para>
+ <para>
+ Permission to execute <filename>amcheck</filename> functions may be granted
+ to non-superusers, but before granting such permissions careful consideration
+ should be given to data security and privacy concerns. Although the
+ corruption reports generated by these functions do not focus on the contents
+ of the corrupted data so much as on the structure of that data and the nature
+ of the corruptions found, an attacker who gains permission to execute these
+ functions, particularly if the attacker can also induce corruption, might be
+ able to infer something of the data itself from such messages.
</para>
<sect2>
@@ -187,12 +201,221 @@ SET client_min_messages = DEBUG1;
</para>
</tip>
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ check_toast boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks a table for structural corruption, where pages in the relation
+ contain data that is invalidly formatted, and for logical corruption,
+ where pages are structurally valid but inconsistent with the rest of the
+ database cluster. Example usage:
+<screen>
+test=# select * from verify_heapam('mytable', check_toast := true);
+ blkno | offnum | attnum | msg
+-------+--------+--------+--------------------------------------------------------------------------------------------------
+ 17 | 12 | | xmin 4294967295 precedes relation freeze threshold 17:1134217582
+ 960 | 4 | | data begins at offset 152 beyond the tuple length 58
+ 960 | 4 | | tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+ 960 | 5 | | tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+ 960 | 6 | | tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+ 960 | 7 | | tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+ 1147 | 2 | | number of attributes 2047 exceeds maximum expected for table 3
+ 1147 | 10 | | tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+ 1147 | 15 | | number of attributes 67 exceeds maximum expected for table 3
+ 1147 | 16 | 1 | attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+ 1147 | 18 | 2 | final toast chunk number 0 differs from expected value 6
+ 1147 | 19 | 2 | toasted value for attribute 2 missing from toast table
+ 1147 | 21 | | tuple is marked as only locked, but also claims key columns were updated
+ 1147 | 22 | | multitransaction ID 1775655 is from before relation cutoff 2355572
+(14 rows)
+</screen>
+ As this example shows, the Tuple ID (TID) of the corrupt tuple is given
+ in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
+ for corruptions specific to a particular attribute in the tuple, the
+ <literal>attnum</literal> field shows which one.
+ </para>
+ <para>
+ Structural corruption can happen due to faulty storage hardware, or
+ relation files being overwritten or modified by unrelated software.
+ This kind of corruption can also be detected with
+ <link linkend="app-initdb-data-checksums"><application>data page
+ checksums</application></link>.
+ </para>
+ <para>
+ Relation pages which are correctly formatted, internally consistent, and
+ correct relative to their own internal checksums may still contain
+ logical corruption. As such, this kind of corruption cannot be detected
+ with <application>checksums</application>. Examples include toasted
+ values in the main table which lack a corresponding entry in the toast
+ table, and tuples in the main table with a Transaction ID that is older
+ than the oldest valid Transaction ID in the database or cluster.
+ </para>
+ <para>
+ Multiple causes of logical corruption have been observed in production
+ systems, including bugs in the <productname>PostgreSQL</productname>
+ server software, faulty and ill-conceived backup and restore tools, and
+ user error.
+ </para>
+ <para>
+ Corrupt relations are most concerning in live production environments,
+ precisely the same environments where high risk activities are least
+ welcome. For this reason, <function>verify_heapam</function> has been
+ designed to diagnose corruption without undue risk. It cannot guard
+ against all causes of backend crashes, as even executing the calling
+ query could be unsafe on a badly corrupted system. Access to <link
+ linkend="catalogs-overview">catalog tables</link> are performed and could
+ be problematic if the catalogs themselves are corrupted.
+ </para>
+ <para>
+ The design principle adhered to in <function>verify_heapam</function> is
+ that, if the rest of the system and server hardware are correct, under
+ default options, <function>verify_heapam</function> will not crash the
+ server due merely to structural or logical corruption in the target
+ table.
+ </para>
+ <para>
+ The <literal>check_toast</literal> attempts to reconcile the target
+ table against entries in its corresponding toast table. This option is
+ disabled by default and is known to be slow.
+ If the target relation's corresponding toast table or toast index is
+ corrupt, reconciling the target table against toast values could
+ conceivably crash the server, although in many cases this would
+ just produce an error.
+ </para>
+ <para>
+ The following optional arguments are recognized:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>on_error_stop</term>
+ <listitem>
+ <para>
+ If true, corruption checking stops at the end of the first block on
+ which any corruptions are found.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>check_toast</term>
+ <listitem>
+ <para>
+ If true, toasted values are checked gainst the corresponding
+ TOAST table.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>skip</term>
+ <listitem>
+ <para>
+ If not <literal>none</literal>, corruption checking skips blocks that
+ are marked as all-visible or all-frozen, as given.
+ Valid options are <literal>all-visible</literal>,
+ <literal>all-frozen</literal> and <literal>none</literal>.
+ </para>
+ <para>
+ Defaults to <literal>none</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>startblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking begins at the specified block,
+ skipping all previous blocks. It is an error to specify a
+ <literal>startblock</literal> outside the range of blocks in the
+ target table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>endblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking ends at the specified block,
+ skipping all remaining blocks. It is an error to specify an
+ <literal>endblock</literal> outside the range of blocks in the target
+ table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ For each corruption detected, <function>verify_heapam</function> returns
+ a row with the following columns:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</sect2>
<sect2>
<title>Optional <parameter>heapallindexed</parameter> Verification</title>
<para>
- When the <parameter>heapallindexed</parameter> argument to
+ When the <parameter>heapallindexed</parameter> argument to B-Tree
verification functions is <literal>true</literal>, an additional
phase of verification is performed against the table associated with
the target index relation. This consists of a <quote>dummy</quote>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ca357410a2 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you disable one or both of these
+ * assertions, make corresponding changes there.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 6ccdc5b58c..43653fe572 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -735,6 +735,25 @@ ReadNextMultiXactId(void)
return mxid;
}
+/*
+ * ReadMultiXactIdRange
+ * Get the range of IDs that may still be referenced by a relation.
+ */
+void
+ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next)
+{
+ LWLockAcquire(MultiXactGenLock, LW_SHARED);
+ *oldest = MultiXactState->oldestMultiXactId;
+ *next = MultiXactState->nextMXact;
+ LWLockRelease(MultiXactGenLock);
+
+ if (*oldest < FirstMultiXactId)
+ *oldest = FirstMultiXactId;
+ if (*next < FirstMultiXactId)
+ *next = FirstMultiXactId;
+}
+
+
/*
* MultiXactIdCreateFromMembers
* Make a new MultiXactId from the specified set of members
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 58c42ffe1f..9a30380901 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -109,6 +109,7 @@ extern MultiXactId MultiXactIdCreateFromMembers(int nmembers,
MultiXactMember *members);
extern MultiXactId ReadNextMultiXactId(void);
+extern void ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next);
extern bool MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly);
extern void MultiXactIdSetOldestMember(void);
extern int GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **xids,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c52f20d4ba..ff853634bc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1020,6 +1020,7 @@ HbaToken
HeadlineJsonState
HeadlineParsedText
HeadlineWordEntry
+HeapCheckContext
HeapScanDesc
HeapTuple
HeapTupleData
@@ -2290,6 +2291,7 @@ SimpleStringList
SimpleStringListCell
SingleBoundSortItem
Size
+SkipPages
SlabBlock
SlabChunk
SlabContext
@@ -2791,6 +2793,8 @@ XactCallback
XactCallbackItem
XactEvent
XactLockTableWaitInfo
+XidBoundsViolation
+XidCommitStatus
XidHorizonPrefetchState
XidStatus
XmlExpr
--
2.24.3 (Apple Git-128)
On 2020-Oct-21, Robert Haas wrote:
On Wed, Oct 7, 2020 at 9:01 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
This next version, attached, has the acl checking and associated documentation changes split out into patch 0005, making it easier to review in isolation from the rest of the patch series.
Independently of acl considerations, this version also has some verbiage changes in 0004, in response to Andrey's review upthread.
I was about to commit 0001, after making some cosmetic changes, when I
discovered that it won't link for me. I think there must be something
wrong with the NLS stuff. My version of 0001 is attached. The error I
got is:
Hmm ... I don't think we have translation support in contrib, do we? I
think you could solve that by adding a "#undef _, #define _(...) (...)"
or similar at the top of the offending C files, assuming you don't want
to rip out all use of _() there.
TBH the usage of "translation:" comments in this patch seems
over-enthusiastic to me.
On Oct 21, 2020, at 1:13 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2020-Oct-21, Robert Haas wrote:
On Wed, Oct 7, 2020 at 9:01 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
This next version, attached, has the acl checking and associated documentation changes split out into patch 0005, making it easier to review in isolation from the rest of the patch series.
Independently of acl considerations, this version also has some verbiage changes in 0004, in response to Andrey's review upthread.
I was about to commit 0001, after making some cosmetic changes, when I
discovered that it won't link for me. I think there must be something
wrong with the NLS stuff. My version of 0001 is attached. The error I
got is:Hmm ... I don't think we have translation support in contrib, do we? I
think you could solve that by adding a "#undef _, #define _(...) (...)"
or similar at the top of the offending C files, assuming you don't want
to rip out all use of _() there.
There is still something screwy here, though, as this compiles, links and runs fine for me on mac and linux, but not for Robert.
On mac, I'm using the toolchain from XCode, whereas Robert is using MacPorts.
Mine reports:
Apple clang version 11.0.0 (clang-1100.0.33.17)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Robert's reports:
clang version 5.0.2 (tags/RELEASE_502/final)
Target: x86_64-apple-darwin19.4.0
Thread model: posix
InstalledDir: /opt/local/libexec/llvm-5.0/bin
On linux, I'm using gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
Searching around on the web, there are various reports of MacPort's clang not linking libintl correctly, though I don't know if that is a real problem with MacPorts or just a few cases of user error. Has anybody else following this thread had issues with MacPort's version of clang vis-a-vis linking libintl's gettext?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
I was about to commit 0001, after making some cosmetic changes, when I
discovered that it won't link for me. I think there must be something
wrong with the NLS stuff. My version of 0001 is attached. The error I
got is:
Well, the short answer would be "you need to add
SHLIB_LINK += $(filter -lintl, $(LIBS))
to the Makefile". However, I would vote against that, because in point
of fact amcheck has no translation support, just like all our other
contrib modules. What should likely happen instead is to rip out
whatever code is overoptimistically expecting it needs to support
translation.
regards, tom lane
Mark Dilger <mark.dilger@enterprisedb.com> writes:
There is still something screwy here, though, as this compiles, links and runs fine for me on mac and linux, but not for Robert.
Are you using --enable-nls at all on your Mac build? Because for sure it
should not work there, given the failure to include -lintl in amcheck's
link step. Some platforms are forgiving of that, but not Mac.
regards, tom lane
On Oct 21, 2020, at 1:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
There is still something screwy here, though, as this compiles, links and runs fine for me on mac and linux, but not for Robert.
Are you using --enable-nls at all on your Mac build? Because for sure it
should not work there, given the failure to include -lintl in amcheck's
link step. Some platforms are forgiving of that, but not Mac.
Thanks, Tom!
No, that's the answer. I had a typo/thinko in my configure options, --with-nls instead of --enable-nls, and the warning about it being an invalid flag went by so fast I didn't see it. I had it spelled correctly on linux, but I guess that's one of the platforms that is more forgiving.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Oct 21, 2020, at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
I was about to commit 0001, after making some cosmetic changes, when I
discovered that it won't link for me. I think there must be something
wrong with the NLS stuff. My version of 0001 is attached. The error I
got is:Well, the short answer would be "you need to add
SHLIB_LINK += $(filter -lintl, $(LIBS))
to the Makefile". However, I would vote against that, because in point
of fact amcheck has no translation support, just like all our other
contrib modules. What should likely happen instead is to rip out
whatever code is overoptimistically expecting it needs to support
translation.
Done that way in the attached, which also include Robert's changes from v19 he posted earlier today.
Attachments:
v20-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v20-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From eb4122ed62a7775ee47baa9f7cc8c5531bcf1aa7 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 21 Oct 2020 20:25:21 -0700
Subject: [PATCH v20 2/5] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 2 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 1281 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.control | 5 +
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 60 +
contrib/pg_amcheck/t/003_check.pl | 231 ++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 489 ++++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 228 ++++
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 2 +
15 files changed, 2393 insertions(+), 3 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.control
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 7a4866e338..0fd4125902 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..f8eecf70bf
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,2 @@
+/pg_amcheck
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..6d20ff3d78
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1281 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "common/connect.h"
+#include "common/string.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --strict-names require include patterns to match at least one entity each",
+ " -o, --on-error-stop stop checking at end of first corrupt page",
+ "",
+ "Schema checking options:",
+ " -n, --schema=PATTERN check relations in the specified schema(s) only",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)",
+ "",
+ "Table checking options:",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -b, --startblock begin checking table(s) at the given starting block number",
+ " -e, --endblock check table(s) only up to the given ending block number",
+ " -f, --skip-all-frozen do NOT check blocks marked as all frozen",
+ " -v, --skip-all-visible do NOT check blocks marked as all visible",
+ "",
+ "TOAST table checking options:",
+ " -z, --check-toast check associated toast tables and toast indexes",
+ " -Z, --skip-toast do NOT check associated toast tables and toast indexes",
+ " -B, --toast-startblock begin checking toast table(s) at the given starting block",
+ " -E, --toast-endblock check toast table(s) only up to the given ending block",
+ "",
+ "Index checking options:",
+ " -x, --check-indexes check btree indexes associated with tables being checked",
+ " -X, --skip-indexes do NOT check any btree indexes",
+ " -i, --index=PATTERN check the specified index(es) only",
+ " -I, --exclude-index=PATTERN do NOT check the specified index(es)",
+ " -c, --check-corrupt check indexes even if their associated table is corrupt",
+ " -C, --skip-corrupt do NOT check indexes if their associated table is corrupt",
+ " -a, --heapallindexed check index tuples against the table tuples",
+ " -A, --no-heapallindexed do NOT check index tuples against the table tuples",
+ " -r, --rootdescend search from the root page for each index tuple",
+ " -R, --no-rootdescend do NOT search from the root page for each index tuple",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+AmCheckSettings
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes */
+ bool check_toast; /* Check associated toast tables and indexes */
+ bool check_corrupt; /* Check indexes even if table is corrupt */
+ bool heapallindexed; /* Perform index to table reconciling checks */
+ bool rootdescend; /* Perform index rootdescend checks */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends, inclusive */
+ char *toaststart; /* Block number where toast checking begins */
+ char *toastend; /* Block number where toast checking ends,
+ * inclusive */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of Oids of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+static SimpleStringList index_include_patterns = {NULL, NULL};
+static SimpleOidList index_include_oids = {NULL, NULL};
+static SimpleStringList index_exclude_patterns = {NULL, NULL};
+static SimpleOidList index_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+/*
+ * Strings to be constructed once upon first use. These could be made
+ * string constants instead, but that would require embedding knowledge
+ * of the single character values for each relkind, such as 'm' for
+ * materialized views, which we'd rather not embed here.
+ */
+static char *table_relkind_quals = NULL;
+static char *index_relkind_quals = NULL;
+
+/*
+ * Functions to get pointers to the two strings, above, after initializing
+ * them upon the first call to the function.
+ */
+static const char *get_table_relkind_quals(void);
+static const char *get_index_relkind_quals(void);
+
+/*
+ * Functions for running the various corruption checks.
+ */
+static void check_tables(SimpleOidList *checklist);
+static uint64 check_toast(Oid tbloid);
+static uint64 check_table(Oid tbloid, const char *startblock,
+ const char *endblock, bool on_error_stop,
+ bool check_toast);
+static uint64 check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids);
+static uint64 check_index(const char *idxoid, const char *idxname,
+ const char *tblname);
+
+/*
+ * Functions implementing standard command line behaviors.
+ */
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+static void NoticeProcessor(void *arg, const char *message);
+
+/*
+ * Functions for converting command line options that include or exclude
+ * schemas, tables, and indexes by pattern into internally useful lists of
+ * Oids for objects that match those patterns.
+ */
+static void expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals);
+static void expand_table_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_index_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(const SimpleOidList *include_nsp,
+ const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl,
+ const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+static PGresult *ExecuteSqlQuery(const char *query, char **error);
+static PGresult *ExecuteSqlQueryOrDie(const char *query);
+
+static void append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids);
+static void apply_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids, bool include);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char *password = NULL;
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ /*
+ * Default behaviors for user settable options. Note that these default
+ * to doing all the safe checks and none of the unsafe ones, on the theory
+ * that if a user says "pg_amcheck mydb" without specifying any additional
+ * options, we should check everything we know how to check without
+ * risking any backend aborts.
+ */
+
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+
+ /* Index checking options */
+ settings.check_indexes = false;
+ settings.check_corrupt = false;
+ settings.heapallindexed = false;
+ settings.rootdescend = false;
+
+ /*
+ * Reconciling toasted attributes from the main table against the toast
+ * table can crash the backend if the toast table or index are corrupt.
+ * We can optionally check the toast table and then the toast index prior
+ * to checking the main table, but if the toast table or index are
+ * concurrently corrupted after we conclude they are valid, the check of
+ * the main table can crash the backend. The oneous is on any caller who
+ * enables this option to make certain the environment is sufficiently
+ * stable that concurrent corruptions of the toast is not possible.
+ */
+ settings.check_toast = false;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ password = simple_prompt("Password: ", false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf("Password for user %s: ",
+ realusername);
+ else
+ password_prompt = pg_strdup("Password: ");
+ PQfinish(settings.db);
+
+ password = simple_prompt(password_prompt, false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /*
+ * Expand schema, table, and index exclusion patterns, if any. Note that
+ * non-matching exclusion patterns are not an error, even when
+ * --strict-names was specified.
+ */
+ expand_schema_name_patterns(&schema_exclude_patterns, NULL,
+ &schema_exclude_oids, false);
+ expand_table_name_patterns(&table_exclude_patterns, NULL, NULL,
+ &table_exclude_oids, false);
+ expand_index_name_patterns(&index_exclude_patterns, NULL, NULL,
+ &index_exclude_oids, false);
+
+ /* Expand schema selection patterns into Oid lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_exclude_oids,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+
+ /* Expand table selection patterns into Oid lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &schema_exclude_oids,
+ &table_exclude_oids,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+
+ /* Expand index selection patterns into Oid lists */
+ if (index_include_patterns.head != NULL)
+ {
+ expand_index_name_patterns(&index_include_patterns,
+ &schema_exclude_oids,
+ &index_exclude_oids,
+ &index_include_oids,
+ settings.strict_names);
+ if (index_include_oids.head == NULL)
+ fatal("no matching indexes were found");
+ }
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ /*
+ * All information about corrupt indexes are returned via ereport, not as
+ * tuples. We want all the details to report if corruption exists.
+ */
+ PQsetErrorVerbosity(settings.db, PQERRORS_VERBOSE);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+/*
+ * Conditionally add a restriction to a query such that lval must be an Oid in
+ * the given list of Oids, except that for a null or empty oids list argument,
+ * no filtering is done and we return without having modified the query buffer.
+ *
+ * The query argument must already have begun the WHERE clause and must be in a
+ * state where we can append an AND clause. No checking of this requirement is
+ * done here.
+ *
+ * On return, the query buffer will be extended with an AND clause that filters
+ * only those rows where the lval is an Oid present in the given list of oids.
+ */
+static inline void
+include_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, true);
+}
+
+/*
+ * Same as include_filter, above, except that for a non-null, non-empty oids
+ * list, the lval is restricted to not be any of the values in the list.
+ */
+static inline void
+exclude_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, false);
+}
+
+/*
+ * Check each table from the given checklist per the user specified options.
+ */
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ uint64 corruptions = 0;
+ bool reconcile_toast;
+
+ /*
+ * If we skip checking the toast table, or if during the check we
+ * detect any toast table corruption, the main table checks below must
+ * not reconcile toasted attributes against the toast table, as such
+ * accesses to the toast table might crash the backend. Instead, skip
+ * such reconciliations for this table.
+ *
+ * This protection contains a race condition; the toast table or index
+ * could become corrupted concurrently with our checks, but prevention
+ * of such concurrent corruption is documented as the caller's
+ * reponsibility, so we don't worry about it here.
+ */
+ reconcile_toast = false;
+ if (settings.check_toast)
+ {
+ if (check_toast(cell->val) == 0)
+ reconcile_toast = true;
+ }
+
+ corruptions = check_table(cell->val,
+ settings.startblock,
+ settings.endblock,
+ settings.on_error_stop,
+ reconcile_toast);
+
+ if (settings.check_indexes)
+ {
+ bool old_heapallindexed;
+
+ /* Optionally skip the index checks for a corrupt table. */
+ if (corruptions && !settings.check_corrupt)
+ continue;
+
+ /*
+ * The btree checking logic which optionally checks the contents
+ * of an index against the corresponding table has not yet been
+ * sufficiently hardened against corrupt tables. In particular,
+ * when called with heapallindexed true, it segfaults if the file
+ * backing the table relation has been erroneously unlinked. In
+ * any event, it seems unwise to reconcile an index against its
+ * table when we already know the table is corrupt.
+ */
+ old_heapallindexed = settings.heapallindexed;
+ if (corruptions)
+ settings.heapallindexed = false;
+
+ corruptions += check_indexes(cell->val,
+ &index_include_oids,
+ &index_exclude_oids);
+
+ settings.heapallindexed = old_heapallindexed;
+ }
+ }
+}
+
+/*
+ * For a given main table relation, returns the associated toast table,
+ * or InvalidOid if none exists.
+ */
+static Oid
+get_toast_oid(Oid tbloid)
+{
+ PQExpBuffer querybuf = createPQExpBuffer();
+ PGresult *res;
+ char *error = NULL;
+ Oid result = InvalidOid;
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.reltoastrelid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid);
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ result = atooid(PQgetvalue(res, 0, 0));
+ else if (error)
+ die_on_query_failure(querybuf->data);
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return result;
+}
+
+/*
+ * For the given main table relation, checks the associated toast table and
+ * index, in any. This should be performed *before* checking the main table
+ * relation, as the checks inside verify_heapam assume both the toast table and
+ * toast index are usable.
+ *
+ * Returns the number of corruptions detected.
+ */
+static uint64
+check_toast(Oid tbloid)
+{
+ Oid toastoid;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_toast");
+
+ toastoid = get_toast_oid(tbloid);
+ if (OidIsValid(toastoid))
+ {
+ corruption_cnt = check_table(toastoid, settings.toaststart,
+ settings.toastend, settings.on_error_stop,
+ false);
+
+ /*
+ * If the toast table is corrupt, checking the index is not safe.
+ * There is a race condition here, as the toast table could be
+ * concurrently corrupted, but preventing concurrent corruption is the
+ * caller's responsibility, not ours.
+ */
+ if (corruption_cnt == 0)
+ corruption_cnt += check_indexes(toastoid, NULL, NULL);
+ }
+
+ return corruption_cnt;
+}
+
+/*
+ * Checks the given table for corruption, returning the number of corruptions
+ * detected and printed to the user.
+ */
+static uint64
+check_table(Oid tbloid, const char *startblock, const char *endblock,
+ bool on_error_stop, bool check_toast)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *skip;
+ char *toast;
+ const char *stop;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_table");
+
+ if (startblock == NULL)
+ startblock = "NULL";
+ if (endblock == NULL)
+ endblock = "NULL";
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("'none'");
+ stop = (on_error_stop) ? "true" : "false";
+ toast = (check_toast) ? "true" : "false";
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg "
+ "FROM verify_heapam("
+ "relation := %u, "
+ "on_error_stop := %s, "
+ "skip := %s, "
+ "check_toast := %s, "
+ "startblock := %s, "
+ "endblock := %s) v, "
+ "pg_catalog.pg_class c "
+ "WHERE c.oid = %u",
+ tbloid, stop, skip, toast, startblock, endblock, tbloid);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ {
+ corruption_cnt += PQntuples(res);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ printf("(relname=%s,blkno=%s,offnum=%s,attnum=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ }
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+ return corruption_cnt;
+}
+
+static uint64
+check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_indexes");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT i.indexrelid, ci.relname, ct.relname"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class ci, "
+ "pg_catalog.pg_class ct"
+ "\nWHERE i.indexrelid = ci.oid"
+ "\n AND i.indrelid = ct.oid"
+ "\n AND ci.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+ include_filter(querybuf, "i.indexrelid", include_oids);
+ exclude_filter(querybuf, "i.indexrelid", exclude_oids);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ for (i = 0; i < PQntuples(res); i++)
+ corruption_cnt += check_index(PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2));
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static uint64
+check_index(const char *idxoid, const char *idxname, const char *tblname)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_index");
+ if (idxname == NULL)
+ fatal("no index name on entry to check_index");
+ if (tblname == NULL)
+ fatal("no table name on entry to check_index");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
+ res = PQexec(settings.db, querybuf->data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("index check failed for index %s of table %s:\n",
+ idxname, tblname);
+ printf("%s", PQerrorMessage(settings.db));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"check-corrupt", no_argument, NULL, 'c'},
+ {"check-indexes", no_argument, NULL, 'x'},
+ {"check-toast", no_argument, NULL, 'z'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"heapallindexed", no_argument, NULL, 'a'},
+ {"help", optional_argument, NULL, '?'},
+ {"host", required_argument, NULL, 'h'},
+ {"index", required_argument, NULL, 'i'},
+ {"no-heapallindexed", no_argument, NULL, 'A'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"no-rootdescend", no_argument, NULL, 'R'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"password", no_argument, NULL, 'W'},
+ {"port", required_argument, NULL, 'p'},
+ {"rootdescend", no_argument, NULL, 'r'},
+ {"schema", required_argument, NULL, 'n'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-corrupt", no_argument, NULL, 'C'},
+ {"skip-indexes", no_argument, NULL, 'X'},
+ {"skip-toast", no_argument, NULL, 'Z'},
+ {"startblock", required_argument, NULL, 'b'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"toast-endblock", required_argument, NULL, 'E'},
+ {"toast-startblock", required_argument, NULL, 'B'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "aAb:B:cCd:e:E:fh:i:I:n:N:op:rRst:T:U:vVwWxXzZ?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'a':
+ settings.heapallindexed = true;
+ break;
+ case 'A':
+ settings.heapallindexed = false;
+ break;
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'B':
+ settings.toaststart = pg_strdup(optarg);
+ break;
+ case 'c':
+ settings.check_corrupt = true;
+ break;
+ case 'C':
+ settings.check_corrupt = false;
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'E':
+ settings.toastend = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ simple_string_list_append(&index_include_patterns, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&index_exclude_patterns, optarg);
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 'r':
+ settings.rootdescend = true;
+ break;
+ case 'R':
+ settings.rootdescend = false;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case 'x':
+ settings.check_indexes = true;
+ break;
+ case 'X':
+ settings.check_indexes = false;
+ break;
+ case 'z':
+ settings.check_toast = true;
+ break;
+ case 'Z':
+ settings.check_toast = false;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, "Try \"%s --help\" for more information.\n",
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ int lineno;
+
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ printf("%s\n", usage_text[lineno]);
+ printf("Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Helper function for apply_filter, below.
+ */
+static void
+append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(querybuf, "%s%u", comma, cell->val);
+}
+
+/*
+ * Internal implementation of include_filter and exclude_filter
+ */
+static void
+apply_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids,
+ bool include)
+{
+ if (!oids || !oids->head)
+ return;
+ if (include)
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.=) ANY(array[", lval);
+ else
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.!=) ALL(array[", lval);
+ append_csv_oids(querybuf, oids);
+ appendPQExpBuffer(querybuf, "]::OID[])");
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all schemas matching the
+ * given list of patterns but not included in the given list of excluded Oids.
+ */
+static void
+expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the Oid list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(querybuf,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, querybuf, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all relations matching the
+ * given list of patterns but not included in the given list of excluded Oids
+ * nor in one of the given excluded namespaces. The relations are filtered by
+ * the given schema_quals. They are further filtered by the given
+ * relkind_quals, allowing the caller to restrict the relations to just indexes
+ * or tables. The missing_errtext should be a message for use in error
+ * messages if no matching relations are found and strict_names was specified.
+ */
+static void
+expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_relkind_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the Oid list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) %s\n",
+ relkind_quals);
+ exclude_filter(querybuf, "c.oid", exclude_oids);
+ exclude_filter(querybuf, "n.oid", exclude_nsp_oids);
+ processSQLNamePattern(settings.db, querybuf, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("%s \"%s\"", missing_errtext, cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find the Oids of all tables matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_table_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching tables were found for pattern",
+ get_table_relkind_quals());
+}
+
+/*
+ * Find the Oids of all indexes matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_index_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching indexes were found for pattern",
+ get_index_relkind_quals());
+}
+
+static void
+get_table_check_list(const SimpleOidList *include_nsp, const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl, const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to get_table_check_list");
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c, pg_catalog.pg_namespace n"
+ "\nWHERE n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\n AND c.relkind OPERATOR(pg_catalog.=) %s\n",
+ get_table_relkind_quals());
+ include_filter(querybuf, "n.oid", include_nsp);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+ include_filter(querybuf, "c.oid", include_tbl);
+ exclude_filter(querybuf, "c.oid", exclude_tbl);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+}
+
+static PGresult *
+ExecuteSqlQueryOrDie(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute the given SQL query. This function should only be used for queries
+ * which are not expected to fail under normal circumstances, as failures will
+ * result in the printing of error messages, which will look a bit messy when
+ * interleaved with corruption reports.
+ *
+ * On error, use the supplied error_context string and the error string
+ * returned from the database connection to print an error message for the
+ * user.
+ *
+ * The error_context argument is pfree'd by us at the end of the call.
+ */
+static PGresult *
+ExecuteSqlQuery(const char *query, char **error)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ *error = pstrdup(PQerrorMessage(settings.db));
+ return res;
+}
+
+/*
+ * Return the cached relkind quals string for tables, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_table_relkind_quals(void)
+{
+ if (!table_relkind_quals)
+ table_relkind_quals = psprintf("ANY(array['%c', '%c', '%c'])",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ return table_relkind_quals;
+}
+
+/*
+ * Return the cached relkind quals string for indexes, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_index_relkind_quals(void)
+{
+ if (!index_relkind_quals)
+ index_relkind_quals = psprintf("'%c'", RELKIND_INDEX);
+ return index_relkind_quals;
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.control b/contrib/pg_amcheck/pg_amcheck.control
new file mode 100644
index 0000000000..395f368101
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.control
@@ -0,0 +1,5 @@
+# pg_amcheck extension
+comment = 'command-line tool for verifying relation integrity'
+default_version = '1.3'
+module_pathname = '$libdir/pg_amcheck'
+relocatable = true
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..68be9c6585
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,60 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-i', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching indexes were found for pattern\E/,
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..4d8e61d871
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,231 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 39;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($)
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the node is running.
+sub corrupt_first_page($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+-- We'll corrupt all indexes in s1
+CREATE SCHEMA s1;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s1.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables in s2
+CREATE SCHEMA s2;
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+INSERT INTO s2.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s2.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables and indexes in s3
+CREATE SCHEMA s3;
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+INSERT INTO s3.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s3.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll leave everything in s4 uncorrupted
+CREATE SCHEMA s4;
+CREATE TABLE s4.t1 (a TEXT);
+CREATE TABLE s4.t2 (a TEXT);
+CREATE INDEX i1 ON s4.t1(a);
+CREATE INDEX i2 ON s4.t2(a);
+INSERT INTO s4.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s4.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('s1.i1');
+corrupt_first_page('s1.i2');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('s2.t1');
+corrupt_first_page('s2.t2');
+
+# Corrupt tables and indexes in schema "s3"
+remove_relation_file('s3.i1');
+corrupt_first_page('s3.i2');
+remove_relation_file('s3.t1');
+corrupt_first_page('s3.t2');
+
+# Leave schema "s4" alone
+
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's*', '-t', 't1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-N', 's1', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s1.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+
+# In schema s3, the tables and indexes are both corrupt. Ordinarily, checking
+# of indexes will not be performed for corrupt tables, but the --check-corrupt
+# option (-c) forces the indexes to also be checked.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s3.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s3.s2 reports index corruption');
+
+
+# Check that '-x' and '-X' work as expected. Since only index corruption
+# (and not table corruption) exists in s1, '-X' should give no errors, and
+# '-x' should give errors about index corruption.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck over tables and indexes in schema s1 reports corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over only tables in schema s1 reports no corruption');
+
+
+# Check that table corruption is reported as expected, with or without
+# index checking
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables in schema s2 reports table corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables and indexes in schema s2 reports table corruption');
+
+# Check that no corruption is reported in schema s4
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s4 reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt schemas
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-N', 's2', '-N', 's3' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt schemas reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt tables
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt tables reports no corruption');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..1cc36b25b7
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,489 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper functions
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ qr/\(relname=test,blkno=$blkno,offnum=$offnum,attnum=$attnum\)\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, '');
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..fdbb1ea402
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 4e833d79ef..1efca8adc4 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -119,6 +119,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&oldsnapshot;
&pageinspect;
&passwordcheck;
+ &pgamcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a4e1b28b38 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..3e059e7753
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,228 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<sect1 id="pgamcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> and <function>verify_heapam</function>
+ functions.
+ </para>
+
+<synopsis>
+pg_amcheck [OPTION]... [DBNAME [USERNAME]]
+ General options:
+ -V, --version output version information, then exit
+ -?, --help show this help, then exit
+ -s, --strict-names require include patterns to match at least one entity each
+ -o, --on-error-stop stop checking at end of first corrupt page
+
+ Schema checking options:
+ -n, --schema=PATTERN check relations in the specified schema(s) only
+ -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)
+
+ Table checking options:
+ -t, --table=PATTERN check the specified table(s) only
+ -T, --exclude-table=PATTERN do NOT check the specified table(s)
+ -b, --startblock begin checking table(s) at the given starting block number
+ -e, --endblock check table(s) only up to the given ending block number
+ -f, --skip-all-frozen do NOT check blocks marked as all-frozen
+ -v, --skip-all-visible do NOT check blocks marked as all-visible
+
+ TOAST table checking options:
+ -z, --check-toast check associated toast tables and toast indexes
+ -Z, --skip-toast do NOT check associated toast tables and toast indexes
+ -B, --toast-startblock begin checking toast table(s) at the given starting block
+ -E, --toast-endblock check toast table(s) only up to the given ending block
+
+ Index checking options:
+ -x, --check-indexes check btree indexes associated with tables being checked
+ -X, --skip-indexes do NOT check any btree indexes
+ -i, --index=PATTERN check the specified index(es) only
+ -I, --exclude-index=PATTERN do NOT check the specified index(es)
+ -c, --check-corrupt check indexes even if their associated table is corrupt
+ -C, --skip-corrupt do NOT check indexes if their associated table is corrupt
+ -a, --heapallindexed check index tuples against the table tuples
+ -A, --no-heapallindexed do NOT check index tuples against the table tuples
+ -r, --rootdescend search from the root page for each index tuple
+ -R, --no-rootdescend do NOT search from the root page for each index tuple
+
+ Connection options:
+ -d, --dbname=DBNAME database name to connect to
+ -h, --host=HOSTNAME database server host or socket directory
+ -p, --port=PORT database server port
+ -U, --username=USERNAME database user name
+ -w, --no-password never prompt for password
+ -W, --password force password prompt (should happen automatically)
+</synopsis>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked in the visibility map as all-frozen or all-visible,
+ respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ For table corruption, each detected corruption is reported on two lines, the
+ first line shows the location and the second line shows a message describing
+ the problem.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "mytable",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --skip-indexes mydb
+(relname=mytable,blkno=17,offnum=12,attnum=)
+xmin 4294967295 precedes relation freeze threshold 17:1134217582
+(relname=mytable,blkno=960,offnum=4,attnum=)
+data begins at offset 152 beyond the tuple length 58
+(relname=mytable,blkno=960,offnum=4,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=5,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=6,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=7,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+(relname=mytable,blkno=1147,offnum=2,attnum=)
+number of attributes 2047 exceeds maximum expected for table 3
+(relname=mytable,blkno=1147,offnum=10,attnum=)
+tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+(relname=mytable,blkno=1147,offnum=15,attnum=)
+number of attributes 67 exceeds maximum expected for table 3
+(relname=mytable,blkno=1147,offnum=16,attnum=1)
+attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+(relname=mytable,blkno=1147,offnum=18,attnum=2)
+final toast chunk number 0 differs from expected value 6
+(relname=mytable,blkno=1147,offnum=19,attnum=2)
+toasted value for attribute 2 missing from toast table
+(relname=mytable,blkno=1147,offnum=21,attnum=)
+tuple is marked as only locked, but also claims key columns were updated
+(relname=mytable,blkno=1147,offnum=22,attnum=)
+multitransaction ID 1775655 is from before relation cutoff 2355572
+</screen>
+
+ <para>
+ For index corruption, the output is more free-form, and may span differing
+ numbers of lines per corruption detected.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt index,
+ "corrupt_index", with corruption in the page header, along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index "corrupt_index" is not a btree
+LOCATION: _bt_getmeta, nbtpage.c:152
+</screen>
+
+ <para>
+ Checking again after rebuilding the index but corrupting the contents,
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index tuple size does not equal lp_len in index "corrupt_index"
+DETAIL: Index tid=(39,49) tuple size=3373 lp_len=24 page lsn=0/2B548C0.
+HINT: This could be a torn page problem.
+LOCATION: bt_target_page_check, verify_nbtree.c:1125
+</screen>
+
+ </sect2>
+</sect1>
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 90594bd41b..ec87fb85b3 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'pg_standby', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ff853634bc..2408bb2bf6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -102,6 +102,7 @@ AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AlternativeSubPlanState
+AmCheckSettings
AnalyzeAttrComputeStatsFunc
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
@@ -403,6 +404,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnectOptions
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
v20-0001-Adding-function-verify_heapam-to-amcheck-module.patchapplication/octet-stream; name=v20-0001-Adding-function-verify_heapam-to-amcheck-module.patch; x-unix-mode=0644Download
From 38ed9a314c30753ca043b042130e189cec0f8709 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 21 Oct 2020 20:24:18 -0700
Subject: [PATCH v20 1/5] Adding function verify_heapam to amcheck module
Adding new function verify_heapam for checking a heap relation and
optionally its associated toast relation, if any.
---
contrib/amcheck/Makefile | 7 +-
contrib/amcheck/amcheck--1.2--1.3.sql | 30 +
contrib/amcheck/amcheck.control | 2 +-
contrib/amcheck/expected/check_heap.out | 194 +++
contrib/amcheck/sql/check_heap.sql | 116 ++
contrib/amcheck/t/001_verify_heapam.pl | 242 ++++
contrib/amcheck/verify_heapam.c | 1447 +++++++++++++++++++++++
doc/src/sgml/amcheck.sgml | 235 +++-
src/backend/access/heap/hio.c | 11 +
src/backend/access/transam/multixact.c | 19 +
src/include/access/multixact.h | 1 +
src/tools/pgindent/typedefs.list | 4 +
12 files changed, 2299 insertions(+), 9 deletions(-)
create mode 100644 contrib/amcheck/amcheck--1.2--1.3.sql
create mode 100644 contrib/amcheck/expected/check_heap.out
create mode 100644 contrib/amcheck/sql/check_heap.sql
create mode 100644 contrib/amcheck/t/001_verify_heapam.pl
create mode 100644 contrib/amcheck/verify_heapam.c
diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index a2b1b1036b..b82f221e50 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -3,13 +3,16 @@
MODULE_big = amcheck
OBJS = \
$(WIN32RES) \
+ verify_heapam.o \
verify_nbtree.o
EXTENSION = amcheck
-DATA = amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
+DATA = amcheck--1.2--1.3.sql amcheck--1.1--1.2.sql amcheck--1.0--1.1.sql amcheck--1.0.sql
PGFILEDESC = "amcheck - function for verifying relation integrity"
-REGRESS = check check_btree
+REGRESS = check check_btree check_heap
+
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/amcheck/amcheck--1.2--1.3.sql b/contrib/amcheck/amcheck--1.2--1.3.sql
new file mode 100644
index 0000000000..7237ab738c
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.2--1.3.sql
@@ -0,0 +1,30 @@
+/* contrib/amcheck/amcheck--1.2--1.3.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.3'" to load this file. \quit
+
+--
+-- verify_heapam()
+--
+CREATE FUNCTION verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip text default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'verify_heapam'
+LANGUAGE C;
+
+-- Don't want this to be available to public
+REVOKE ALL ON FUNCTION verify_heapam(regclass,
+ boolean,
+ boolean,
+ text,
+ bigint,
+ bigint)
+FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index c6e310046d..ab50931f75 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
# amcheck extension
comment = 'functions for verifying relation integrity'
-default_version = '1.2'
+default_version = '1.3'
module_pathname = '$libdir/amcheck'
relocatable = true
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
new file mode 100644
index 0000000000..882f853d56
--- /dev/null
+++ b/contrib/amcheck/expected/check_heap.out
@@ -0,0 +1,194 @@
+CREATE TABLE heaptest (a integer, b text);
+REVOKE ALL ON heaptest FROM PUBLIC;
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+ERROR: invalid skip option
+HINT: Valid skip options are "all-visible", "all-frozen", and "none".
+-- Check specifying invalid block ranges when verifying an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 5, endblock := 8);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table, and that skip enum-like parameter is case-insensitive
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'None');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+CREATE ROLE regress_heaptest_role;
+-- verify permissions are checked (error due to function not callable)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ERROR: permission denied for function verify_heapam
+RESET ROLE;
+GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are now sufficient
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+RESET ROLE;
+-- Check specifying invalid block ranges when verifying a non-empty table.
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 10000);
+ERROR: ending block number must be between 0 and 0
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
+ERROR: starting block number must be between 0 and 0
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+CREATE TABLE test_partitioned (a int, b text default repeat('x', 5000))
+ PARTITION BY list (a);
+SELECT * FROM verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_partitioned" is not a table, materialized view, or TOAST table
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+CREATE TABLE test_partition partition OF test_partitioned FOR VALUES IN (1);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+INSERT INTO test_partitioned (a) (SELECT 1 FROM generate_series(1,1000) gs);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+ blkno | offnum | attnum | msg
+-------+--------+--------+-----
+(0 rows)
+
+-- Check that indexes are rejected
+CREATE INDEX test_index ON test_partition (a);
+SELECT * FROM verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_index" is not a table, materialized view, or TOAST table
+-- Check that views are rejected
+CREATE VIEW test_view AS SELECT 1;
+SELECT * FROM verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_view" is not a table, materialized view, or TOAST table
+-- Check that sequences are rejected
+CREATE SEQUENCE test_sequence;
+SELECT * FROM verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_sequence" is not a table, materialized view, or TOAST table
+-- Check that foreign tables are rejected
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER dummy_server FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE test_foreign_table () SERVER dummy_server;
+SELECT * FROM verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+ERROR: "test_foreign_table" is not a table, materialized view, or TOAST table
+-- cleanup
+DROP TABLE heaptest;
+DROP TABLE test_partition;
+DROP TABLE test_partitioned;
+DROP OWNED BY regress_heaptest_role; -- permissions
+DROP ROLE regress_heaptest_role;
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
new file mode 100644
index 0000000000..c10a25f21c
--- /dev/null
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -0,0 +1,116 @@
+CREATE TABLE heaptest (a integer, b text);
+REVOKE ALL ON heaptest FROM PUBLIC;
+
+-- Check that invalid skip option is rejected
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'rope');
+
+-- Check specifying invalid block ranges when verifying an empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 5, endblock := 8);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty table, and that skip enum-like parameter is case-insensitive
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'None');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'All-Visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'NONE');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-FROZEN');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'ALL-VISIBLE');
+
+-- Add some data so subsequent tests are not entirely trivial
+INSERT INTO heaptest (a, b)
+ (SELECT gs, repeat('x', gs)
+ FROM generate_series(1,50) gs);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+CREATE ROLE regress_heaptest_role;
+
+-- verify permissions are checked (error due to function not callable)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+
+-- verify permissions are now sufficient
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+-- Check specifying invalid block ranges when verifying a non-empty table.
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 10000);
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 10000, endblock := 11000);
+
+-- Vacuum freeze to change the xids encountered in subsequent tests
+VACUUM FREEZE heaptest;
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty frozen table
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'none');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-frozen');
+SELECT * FROM verify_heapam(relation := 'heaptest', skip := 'all-visible');
+SELECT * FROM verify_heapam(relation := 'heaptest', startblock := 0, endblock := 0);
+
+-- Check that partitioned tables (the parent ones) which don't have visibility
+-- maps are rejected
+CREATE TABLE test_partitioned (a int, b text default repeat('x', 5000))
+ PARTITION BY list (a);
+SELECT * FROM verify_heapam('test_partitioned',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for an empty partition table (the child one)
+CREATE TABLE test_partition partition OF test_partitioned FOR VALUES IN (1);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that valid options are not rejected nor corruption reported
+-- for a non-empty partition table (the child one)
+INSERT INTO test_partitioned (a) (SELECT 1 FROM generate_series(1,1000) gs);
+SELECT * FROM verify_heapam('test_partition',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that indexes are rejected
+CREATE INDEX test_index ON test_partition (a);
+SELECT * FROM verify_heapam('test_index',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that views are rejected
+CREATE VIEW test_view AS SELECT 1;
+SELECT * FROM verify_heapam('test_view',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that sequences are rejected
+CREATE SEQUENCE test_sequence;
+SELECT * FROM verify_heapam('test_sequence',
+ startblock := NULL,
+ endblock := NULL);
+
+-- Check that foreign tables are rejected
+CREATE FOREIGN DATA WRAPPER dummy;
+CREATE SERVER dummy_server FOREIGN DATA WRAPPER dummy;
+CREATE FOREIGN TABLE test_foreign_table () SERVER dummy_server;
+SELECT * FROM verify_heapam('test_foreign_table',
+ startblock := NULL,
+ endblock := NULL);
+
+-- cleanup
+DROP TABLE heaptest;
+DROP TABLE test_partition;
+DROP TABLE test_partitioned;
+DROP OWNED BY regress_heaptest_role; -- permissions
+DROP ROLE regress_heaptest_role;
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
new file mode 100644
index 0000000000..e7526c17b8
--- /dev/null
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -0,0 +1,242 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 65;
+
+my ($node, $result);
+
+#
+# Test set-up
+#
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$node->start;
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#
+# Check a table with data loaded but no corruption, freezing, etc.
+#
+fresh_test_table('test');
+check_all_options_uncorrupted('test', 'plain');
+
+#
+# Check a corrupt table
+#
+fresh_test_table('test');
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "plain corrupted table");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-visible')",
+ "plain corrupted table skipping all-visible");
+detects_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "plain corrupted table skipping all-frozen");
+detects_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "plain corrupted table skipping toast");
+detects_corruption(
+ "verify_heapam('test', startblock := 0, endblock := 0)",
+ "plain corrupted table checking only block zero");
+
+#
+# Check a corrupt table with all-frozen data
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+corrupt_first_page('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "all-frozen corrupted table");
+detects_no_corruption(
+ "verify_heapam('test', skip := 'all-frozen')",
+ "all-frozen corrupted table skipping all-frozen");
+
+#
+# Check a corrupt table with corrupt page header
+#
+fresh_test_table('test');
+corrupt_first_page_and_header('test');
+detects_corruption(
+ "verify_heapam('test')",
+ "corrupted test table with bad page header");
+
+#
+# Check an uncorrupted table with corrupt toast page header
+#
+fresh_test_table('test');
+my $toast = get_toast_for('test');
+corrupt_first_page_and_header($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast page header checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast page header skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast page header");
+
+#
+# Check an uncorrupted table with corrupt toast
+#
+fresh_test_table('test');
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table");
+
+#
+# Check an uncorrupted all-frozen table with corrupt toast
+#
+fresh_test_table('test');
+$node->safe_psql('postgres', q(VACUUM FREEZE test));
+$toast = get_toast_for('test');
+corrupt_first_page($toast);
+detects_corruption(
+ "verify_heapam('test', check_toast := true)",
+ "all-frozen table with corrupted toast checking toast");
+detects_no_corruption(
+ "verify_heapam('test', check_toast := false)",
+ "all-frozen table with corrupted toast skipping toast");
+detects_corruption(
+ "verify_heapam('$toast')",
+ "corrupted toast table of all-frozen table");
+
+# Returns the filesystem path for the named relation.
+sub relation_filepath
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the fully qualified name of the toast table for the named relation
+sub get_toast_for
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ SELECT 'pg_toast.' || t.relname
+ FROM pg_catalog.pg_class c, pg_catalog.pg_class t
+ WHERE c.relname = '$relname'
+ AND c.reltoastrelid = t.oid));
+}
+
+# (Re)create and populate a test table of the given name.
+sub fresh_test_table
+{
+ my ($relname) = @_;
+ $node->safe_psql('postgres', qq(
+ DROP TABLE IF EXISTS $relname CASCADE;
+ CREATE TABLE $relname (a integer, b text);
+ ALTER TABLE $relname SET (autovacuum_enabled=false);
+ ALTER TABLE $relname ALTER b SET STORAGE external;
+ INSERT INTO $relname (a, b)
+ (SELECT gs, repeat('b',gs*10) FROM generate_series(1,1000) gs);
+ ));
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+sub corrupt_first_page_internal
+{
+ my ($relname, $corrupt_header) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+
+ # If we corrupt the header, postgres won't allow the page into the buffer.
+ syswrite($fh, '\xFF\xFF\xFF\xFF', 8) if ($corrupt_header);
+
+ # Corrupt at least the line pointers. Exactly what this corrupts will
+ # depend on the page, as it may run past the line pointers into the user
+ # data. We stop short of writing 2048 bytes (2k), the smallest supported
+ # page size, as we don't want to corrupt the next page.
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+sub corrupt_first_page
+{
+ corrupt_first_page_internal($_[0], undef);
+}
+
+sub corrupt_first_page_and_header
+{
+ corrupt_first_page_internal($_[0], 1);
+}
+
+sub detects_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) > 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+sub detects_no_corruption
+{
+ my ($function, $testname) = @_;
+
+ my $result = $node->safe_psql('postgres',
+ qq(SELECT COUNT(*) = 0 FROM $function));
+ is($result, 't', $testname);
+}
+
+# Check various options are stable (don't abort) and do not report corruption
+# when running verify_heapam on an uncorrupted test table.
+#
+# The relname *must* be an uncorrupted table, or this will fail.
+#
+# The prefix is used to identify the test, along with the options,
+# and should be unique.
+sub check_all_options_uncorrupted
+{
+ my ($relname, $prefix) = @_;
+ for my $stop (qw(true false))
+ {
+ for my $check_toast (qw(true false))
+ {
+ for my $skip ("'none'", "'all-frozen'", "'all-visible'")
+ {
+ for my $startblock (qw(NULL 0))
+ {
+ for my $endblock (qw(NULL 0))
+ {
+ my $opts = "on_error_stop := $stop, " .
+ "check_toast := $check_toast, " .
+ "skip := $skip, " .
+ "startblock := $startblock, " .
+ "endblock := $endblock";
+
+ detects_no_corruption(
+ "verify_heapam('$relname', $opts)",
+ "$prefix: $opts");
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
new file mode 100644
index 0000000000..0156c1e74a
--- /dev/null
+++ b/contrib/amcheck/verify_heapam.c
@@ -0,0 +1,1447 @@
+/*-------------------------------------------------------------------------
+ *
+ * verify_heapam.c
+ * Functions to check postgresql heap relations for corruption
+ *
+ * Copyright (c) 2016-2020, PostgreSQL Global Development Group
+ *
+ * contrib/amcheck/verify_heapam.c
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/detoast.h"
+#include "access/genam.h"
+#include "access/heapam.h"
+#include "access/heaptoast.h"
+#include "access/multixact.h"
+#include "access/toast_internals.h"
+#include "access/visibilitymap.h"
+#include "catalog/pg_am.h"
+#include "funcapi.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "storage/procarray.h"
+#include "utils/builtins.h"
+#include "utils/fmgroids.h"
+
+PG_FUNCTION_INFO_V1(verify_heapam);
+
+/* The number of columns in tuples returned by verify_heapam */
+#define HEAPCHECK_RELATION_COLS 4
+
+/*
+ * Despite the name, we use this for reporting problems with both XIDs and
+ * MXIDs.
+ */
+typedef enum XidBoundsViolation
+{
+ XID_INVALID,
+ XID_IN_FUTURE,
+ XID_PRECEDES_CLUSTERMIN,
+ XID_PRECEDES_RELMIN,
+ XID_BOUNDS_OK
+} XidBoundsViolation;
+
+typedef enum XidCommitStatus
+{
+ XID_COMMITTED,
+ XID_IN_PROGRESS,
+ XID_ABORTED
+} XidCommitStatus;
+
+typedef enum SkipPages
+{
+ SKIP_PAGES_ALL_FROZEN,
+ SKIP_PAGES_ALL_VISIBLE,
+ SKIP_PAGES_NONE
+} SkipPages;
+
+/*
+ * Struct holding the running context information during
+ * a lifetime of a verify_heapam execution.
+ */
+typedef struct HeapCheckContext
+{
+ /*
+ * Cached copies of values from ShmemVariableCache and computed values
+ * from them.
+ */
+ FullTransactionId next_fxid; /* ShmemVariableCache->nextXid */
+ TransactionId next_xid; /* 32-bit version of next_fxid */
+ TransactionId oldest_xid; /* ShmemVariableCache->oldestXid */
+ FullTransactionId oldest_fxid; /* 64-bit version of oldest_xid, computed
+ * relative to next_fxid */
+
+ /*
+ * Cached copy of value from MultiXactState
+ */
+ MultiXactId next_mxact; /* MultiXactState->nextMXact */
+ MultiXactId oldest_mxact; /* MultiXactState->oldestMultiXactId */
+
+ /*
+ * Cached copies of the most recently checked xid and its status.
+ */
+ TransactionId cached_xid;
+ XidCommitStatus cached_status;
+
+ /* Values concerning the heap relation being checked */
+ Relation rel;
+ TransactionId relfrozenxid;
+ FullTransactionId relfrozenfxid;
+ TransactionId relminmxid;
+ Relation toast_rel;
+ Relation *toast_indexes;
+ Relation valid_toast_index;
+ int num_toast_indexes;
+
+ /* Values for iterating over pages in the relation */
+ BlockNumber blkno;
+ BufferAccessStrategy bstrategy;
+ Buffer buffer;
+ Page page;
+
+ /* Values for iterating over tuples within a page */
+ OffsetNumber offnum;
+ ItemId itemid;
+ uint16 lp_len;
+ HeapTupleHeader tuphdr;
+ int natts;
+
+ /* Values for iterating over attributes within the tuple */
+ uint32 offset; /* offset in tuple data */
+ AttrNumber attnum;
+
+ /* Values for iterating over toast for the attribute */
+ int32 chunkno;
+ int32 attrsize;
+ int32 endchunk;
+ int32 totalchunks;
+
+ /* Whether verify_heapam has yet encountered any corrupt tuples */
+ bool is_corrupt;
+
+ /* The descriptor and tuplestore for verify_heapam's result tuples */
+ TupleDesc tupdesc;
+ Tuplestorestate *tupstore;
+} HeapCheckContext;
+
+/* Internal implementation */
+static void sanity_check_relation(Relation rel);
+static void check_tuple(HeapCheckContext *ctx);
+static void check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx);
+
+static bool check_tuple_attribute(HeapCheckContext *ctx);
+static bool check_tuple_header_and_visibilty(HeapTupleHeader tuphdr,
+ HeapCheckContext *ctx);
+
+static void report_corruption(HeapCheckContext *ctx, char *msg);
+static TupleDesc verify_heapam_tupdesc(void);
+static FullTransactionId FullTransactionIdFromXidAndCtx(TransactionId xid,
+ const HeapCheckContext *ctx);
+static void update_cached_xid_range(HeapCheckContext *ctx);
+static void update_cached_mxid_range(HeapCheckContext *ctx);
+static XidBoundsViolation check_mxid_in_range(MultiXactId mxid,
+ HeapCheckContext *ctx);
+static XidBoundsViolation check_mxid_valid_in_rel(MultiXactId mxid,
+ HeapCheckContext *ctx);
+static XidBoundsViolation get_xid_status(TransactionId xid,
+ HeapCheckContext *ctx,
+ XidCommitStatus *status);
+
+/*
+ * Scan and report corruption in heap pages, optionally reconciling toasted
+ * attributes with entries in the associated toast table. Intended to be
+ * called from SQL with the following parameters:
+ *
+ * relation:
+ * The Oid of the heap relation to be checked.
+ *
+ * on_error_stop:
+ * Whether to stop at the end of the first page for which errors are
+ * detected. Note that multiple rows may be returned.
+ *
+ * check_toast:
+ * Whether to check each toasted attribute against the toast table to
+ * verify that it can be found there.
+ *
+ * skip:
+ * What kinds of pages in the heap relation should be skipped. Valid
+ * options are "all-visible", "all-frozen", and "none".
+ *
+ * Returns to the SQL caller a set of tuples, each containing the location
+ * and a description of a corruption found in the heap.
+ *
+ * This code goes to some trouble to avoid crashing the server even if the
+ * table pages are badly corrupted, but it's probably not perfect. If
+ * check_toast is true, we'll use regular index lookups to try to fetch TOAST
+ * tuples, which can certainly cause crashes if the right kind of corruption
+ * exists in the toast table or index. No matter what parameters you pass,
+ * we can't protect against crashes that might occur trying to look up the
+ * commit status of transaction IDs (though we avoid trying to do such lookups
+ * for transaction IDs that can't legally appear in the table).
+ */
+Datum
+verify_heapam(PG_FUNCTION_ARGS)
+{
+ ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ MemoryContext old_context;
+ bool random_access;
+ HeapCheckContext ctx;
+ Buffer vmbuffer = InvalidBuffer;
+ Oid relid;
+ bool on_error_stop;
+ bool check_toast;
+ SkipPages skip_option = SKIP_PAGES_NONE;
+ BlockNumber first_block;
+ BlockNumber last_block;
+ BlockNumber nblocks;
+ const char *skip;
+
+ /* Check to see if caller supports us returning a tuplestore */
+ if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("set-valued function called in context that cannot accept a set")));
+ if (!(rsinfo->allowedModes & SFRM_Materialize))
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("materialize mode required, but it is not allowed in this context")));
+
+ /* Check supplied arguments */
+ if (PG_ARGISNULL(0))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("relation cannot be null")));
+ relid = PG_GETARG_OID(0);
+
+ if (PG_ARGISNULL(1))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("on_error_stop cannot be null")));
+ on_error_stop = PG_GETARG_BOOL(1);
+
+ if (PG_ARGISNULL(2))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("check_toast cannot be null")));
+ check_toast = PG_GETARG_BOOL(2);
+
+ if (PG_ARGISNULL(3))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("skip cannot be null")));
+ skip = text_to_cstring(PG_GETARG_TEXT_PP(3));
+ if (pg_strcasecmp(skip, "all-visible") == 0)
+ skip_option = SKIP_PAGES_ALL_VISIBLE;
+ else if (pg_strcasecmp(skip, "all-frozen") == 0)
+ skip_option = SKIP_PAGES_ALL_FROZEN;
+ else if (pg_strcasecmp(skip, "none") == 0)
+ skip_option = SKIP_PAGES_NONE;
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid skip option"),
+ errhint("Valid skip options are \"all-visible\", \"all-frozen\", and \"none\".")));
+
+ memset(&ctx, 0, sizeof(HeapCheckContext));
+ ctx.cached_xid = InvalidTransactionId;
+
+ /* The tupdesc and tuplestore must be created in ecxt_per_query_memory */
+ old_context = MemoryContextSwitchTo(rsinfo->econtext->ecxt_per_query_memory);
+ random_access = (rsinfo->allowedModes & SFRM_Materialize_Random) != 0;
+ ctx.tupdesc = verify_heapam_tupdesc();
+ ctx.tupstore = tuplestore_begin_heap(random_access, false, work_mem);
+ rsinfo->returnMode = SFRM_Materialize;
+ rsinfo->setResult = ctx.tupstore;
+ rsinfo->setDesc = ctx.tupdesc;
+ MemoryContextSwitchTo(old_context);
+
+ /* Open relation, check relkind and access method, and check privileges */
+ ctx.rel = relation_open(relid, AccessShareLock);
+ sanity_check_relation(ctx.rel);
+
+ /* Early exit if the relation is empty */
+ nblocks = RelationGetNumberOfBlocks(ctx.rel);
+ if (!nblocks)
+ {
+ relation_close(ctx.rel, AccessShareLock);
+ PG_RETURN_NULL();
+ }
+
+ ctx.bstrategy = GetAccessStrategy(BAS_BULKREAD);
+ ctx.buffer = InvalidBuffer;
+ ctx.page = NULL;
+
+ /* Validate block numbers, or handle nulls. */
+ if (PG_ARGISNULL(4))
+ first_block = 0;
+ else
+ {
+ int64 fb = PG_GETARG_INT64(4);
+
+ if (fb < 0 || fb >= nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("starting block number must be between 0 and %u",
+ nblocks - 1)));
+ first_block = (BlockNumber) fb;
+ }
+ if (PG_ARGISNULL(5))
+ last_block = nblocks - 1;
+ else
+ {
+ int64 lb = PG_GETARG_INT64(5);
+
+ if (lb < 0 || lb >= nblocks)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("ending block number must be between 0 and %u",
+ nblocks - 1)));
+ last_block = (BlockNumber) lb;
+ }
+
+ /* Optionally open the toast relation, if any. */
+ if (ctx.rel->rd_rel->reltoastrelid && check_toast)
+ {
+ int offset;
+
+ /* Main relation has associated toast relation */
+ ctx.toast_rel = table_open(ctx.rel->rd_rel->reltoastrelid,
+ AccessShareLock);
+ offset = toast_open_indexes(ctx.toast_rel,
+ AccessShareLock,
+ &(ctx.toast_indexes),
+ &(ctx.num_toast_indexes));
+ ctx.valid_toast_index = ctx.toast_indexes[offset];
+ }
+ else
+ {
+ /*
+ * Main relation has no associated toast relation, or we're
+ * intentionally skipping it.
+ */
+ ctx.toast_rel = NULL;
+ ctx.toast_indexes = NULL;
+ ctx.num_toast_indexes = 0;
+ }
+
+ update_cached_xid_range(&ctx);
+ update_cached_mxid_range(&ctx);
+ ctx.relfrozenxid = ctx.rel->rd_rel->relfrozenxid;
+ ctx.relfrozenfxid = FullTransactionIdFromXidAndCtx(ctx.relfrozenxid, &ctx);
+ ctx.relminmxid = ctx.rel->rd_rel->relminmxid;
+
+ if (TransactionIdIsNormal(ctx.relfrozenxid))
+ ctx.oldest_xid = ctx.relfrozenxid;
+
+ for (ctx.blkno = first_block; ctx.blkno <= last_block; ctx.blkno++)
+ {
+ OffsetNumber maxoff;
+
+ /* Optionally skip over all-frozen or all-visible blocks */
+ if (skip_option != SKIP_PAGES_NONE)
+ {
+ int32 mapbits;
+
+ mapbits = (int32) visibilitymap_get_status(ctx.rel, ctx.blkno,
+ &vmbuffer);
+ if (skip_option == SKIP_PAGES_ALL_FROZEN)
+ {
+ if ((mapbits & VISIBILITYMAP_ALL_FROZEN) != 0)
+ continue;
+ }
+
+ if (skip_option == SKIP_PAGES_ALL_VISIBLE)
+ {
+ if ((mapbits & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ continue;
+ }
+ }
+
+ /* Read and lock the next page. */
+ ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
+ RBM_NORMAL, ctx.bstrategy);
+ LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
+ ctx.page = BufferGetPage(ctx.buffer);
+
+ /* Perform tuple checks */
+ maxoff = PageGetMaxOffsetNumber(ctx.page);
+ for (ctx.offnum = FirstOffsetNumber; ctx.offnum <= maxoff;
+ ctx.offnum = OffsetNumberNext(ctx.offnum))
+ {
+ ctx.itemid = PageGetItemId(ctx.page, ctx.offnum);
+
+ /* Skip over unused/dead line pointers */
+ if (!ItemIdIsUsed(ctx.itemid) || ItemIdIsDead(ctx.itemid))
+ continue;
+
+ /*
+ * If this line pointer has been redirected, check that it
+ * redirects to a valid offset within the line pointer array.
+ */
+ if (ItemIdIsRedirected(ctx.itemid))
+ {
+ OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
+ ItemId rditem;
+
+ if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ {
+ report_corruption(&ctx,
+ psprintf("line pointer redirection to item at offset %u exceeds maximum offset %u",
+ (unsigned) rdoffnum,
+ (unsigned) maxoff));
+ continue;
+ }
+ rditem = PageGetItemId(ctx.page, rdoffnum);
+ if (!ItemIdIsUsed(rditem))
+ report_corruption(&ctx,
+ psprintf("line pointer redirection to unused item at offset %u",
+ (unsigned) rdoffnum));
+ continue;
+ }
+
+ /* Set up context information about this next tuple */
+ ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
+ ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
+
+ /* Ok, ready to check this next tuple */
+ check_tuple(&ctx);
+ }
+
+ /* clean up */
+ UnlockReleaseBuffer(ctx.buffer);
+
+ if (on_error_stop && ctx.is_corrupt)
+ break;
+ }
+
+ if (vmbuffer != InvalidBuffer)
+ ReleaseBuffer(vmbuffer);
+
+ /* Close the associated toast table and indexes, if any. */
+ if (ctx.toast_indexes)
+ toast_close_indexes(ctx.toast_indexes, ctx.num_toast_indexes,
+ AccessShareLock);
+ if (ctx.toast_rel)
+ table_close(ctx.toast_rel, AccessShareLock);
+
+ /* Close the main relation */
+ relation_close(ctx.rel, AccessShareLock);
+
+ PG_RETURN_NULL();
+}
+
+/*
+ * Check that a relation's relkind and access method are both supported,
+ * and that the caller has select privilege on the relation.
+ */
+static void
+sanity_check_relation(Relation rel)
+{
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only heap AM is supported")));
+}
+
+/*
+ * Record a single corruption found in the table. The values in ctx should
+ * reflect the location of the corruption, and the msg argument should contain
+ * a human readable description of the corruption.
+ *
+ * The msg argument is pfree'd by this function.
+ */
+static void
+report_corruption(HeapCheckContext *ctx, char *msg)
+{
+ Datum values[HEAPCHECK_RELATION_COLS];
+ bool nulls[HEAPCHECK_RELATION_COLS];
+ HeapTuple tuple;
+
+ MemSet(values, 0, sizeof(values));
+ MemSet(nulls, 0, sizeof(nulls));
+ values[0] = Int64GetDatum(ctx->blkno);
+ values[1] = Int32GetDatum(ctx->offnum);
+ values[2] = Int32GetDatum(ctx->attnum);
+ nulls[2] = (ctx->attnum < 0);
+ values[3] = CStringGetTextDatum(msg);
+
+ /*
+ * In principle, there is nothing to prevent a scan over a large, highly
+ * corrupted table from using work_mem worth of memory building up the
+ * tuplestore. That's ok, but if we also leak the msg argument memory
+ * until the end of the query, we could exceed work_mem by more than a
+ * trivial amount. Therefore, free the msg argument each time we are
+ * called rather than waiting for our current memory context to be freed.
+ */
+ pfree(msg);
+
+ tuple = heap_form_tuple(ctx->tupdesc, values, nulls);
+ tuplestore_puttuple(ctx->tupstore, tuple);
+ ctx->is_corrupt = true;
+}
+
+/*
+ * Construct the TupleDesc used to report messages about corruptions found
+ * while scanning the heap.
+ */
+static TupleDesc
+verify_heapam_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ AttrNumber a = 0;
+
+ tupdesc = CreateTemplateTupleDesc(HEAPCHECK_RELATION_COLS);
+ TupleDescInitEntry(tupdesc, ++a, "blkno", INT8OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "offnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "attnum", INT4OID, -1, 0);
+ TupleDescInitEntry(tupdesc, ++a, "msg", TEXTOID, -1, 0);
+ Assert(a == HEAPCHECK_RELATION_COLS);
+
+ return BlessTupleDesc(tupdesc);
+}
+
+/*
+ * Check for tuple header corruption and tuple visibility.
+ *
+ * Since we do not hold a snapshot, tuple visibility is not a question of
+ * whether we should be able to see the tuple relative to any particular
+ * snapshot, but rather a question of whether it is safe and reasonable to
+ * to check the tuple attributes.
+ *
+ * Some kinds of corruption make it unsafe to check the tuple attributes, for
+ * example when the line pointer refers to a range of bytes outside the page.
+ * In such cases, we return false (not visible) after recording appropriate
+ * corruption messages.
+ *
+ * Some other kinds of tuple header corruption confuse the question of where
+ * the tuple attributes begin, or how long the nulls bitmap is, etc., making it
+ * unreasonable to attempt to check attributes, even if all candidate answers
+ * to those questions would not result in reading past the end of the line
+ * pointer or page. In such cases, like above, we record corruption messages
+ * about the header and then return false.
+ *
+ * Other kinds of tuple header corruption do not bear on the question of
+ * whether the tuple attributes can be checked, so we record corruption
+ * messages for them but do not base our visibility determination on them. (In
+ * other words, we do not return false merely because we detected them.)
+ *
+ * For visibility determination not specifically related to corruption, what we
+ * want to know is if a tuple is potentially visible to any running
+ * transaction. If you are tempted to replace this function's visibility logic
+ * with a call to another visibility checking function, keep in mind that this
+ * function does not update hint bits, as it seems imprudent to write hint bits
+ * (or anything at all) to a table during a corruption check. Nor does this
+ * function bother classifying tuple visibility beyond a boolean visible vs.
+ * not visible.
+ *
+ * The caller should already have checked that xmin and xmax are not out of
+ * bounds for the relation.
+ *
+ * Returns whether the tuple is both visible and sufficiently sensible to
+ * undergo attribute checks.
+ */
+static bool
+check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
+{
+ uint16 infomask = tuphdr->t_infomask;
+ bool header_garbled = false;
+ unsigned expected_hoff;;
+
+ if (ctx->tuphdr->t_hoff > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ psprintf("data begins at offset %u beyond the tuple length %u",
+ ctx->tuphdr->t_hoff, ctx->lp_len));
+ header_garbled = true;
+ }
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (ctx->tuphdr->t_infomask2 & HEAP_KEYS_UPDATED))
+ {
+ report_corruption(ctx,
+ pstrdup("tuple is marked as only locked, but also claims key columns were updated"));
+ header_garbled = true;
+ }
+
+ if ((ctx->tuphdr->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (ctx->tuphdr->t_infomask & HEAP_XMAX_IS_MULTI))
+ {
+ report_corruption(ctx,
+ pstrdup("multixact should not be marked committed"));
+
+ /*
+ * This condition is clearly wrong, but we do not consider the header
+ * garbled, because we don't rely on this property for determining if
+ * the tuple is visible or for interpreting other relevant header
+ * fields.
+ */
+ }
+
+ if (infomask & HEAP_HASNULL)
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader + BITMAPLEN(ctx->natts));
+ else
+ expected_hoff = MAXALIGN(SizeofHeapTupleHeader);
+ if (ctx->tuphdr->t_hoff != expected_hoff)
+ {
+ if ((infomask & HEAP_HASNULL) && ctx->natts == 1)
+ report_corruption(ctx,
+ psprintf("tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, has nulls)",
+ expected_hoff, ctx->tuphdr->t_hoff));
+ else if ((infomask & HEAP_HASNULL))
+ report_corruption(ctx,
+ psprintf("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, has nulls)",
+ expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
+ else if (ctx->natts == 1)
+ report_corruption(ctx,
+ psprintf("tuple data should begin at byte %u, but actually begins at byte %u (1 attribute, no nulls)",
+ expected_hoff, ctx->tuphdr->t_hoff));
+ else
+ report_corruption(ctx,
+ psprintf("tuple data should begin at byte %u, but actually begins at byte %u (%u attributes, no nulls)",
+ expected_hoff, ctx->tuphdr->t_hoff, ctx->natts));
+ header_garbled = true;
+ }
+
+ if (header_garbled)
+ return false; /* checking of this tuple should not continue */
+
+ /*
+ * Ok, we can examine the header for tuple visibility purposes, though we
+ * still need to be careful about a few remaining types of header
+ * corruption. This logic roughly follows that of
+ * HeapTupleSatisfiesVacuum. Where possible the comments indicate which
+ * HTSV_Result we think that function might return for this tuple.
+ */
+ if (!HeapTupleHeaderXminCommitted(tuphdr))
+ {
+ TransactionId raw_xmin = HeapTupleHeaderGetRawXmin(tuphdr);
+
+ if (HeapTupleHeaderXminInvalid(tuphdr))
+ return false; /* HEAPTUPLE_DEAD */
+ /* Used by pre-9.0 binary upgrades */
+ else if (infomask & HEAP_MOVED_OFF ||
+ infomask & HEAP_MOVED_IN)
+ {
+ XidCommitStatus status;
+ TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
+
+ switch (get_xid_status(xvac, ctx, &status))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup("old-style VACUUM FULL transaction ID is invalid"));
+ return false; /* corrupt */
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ psprintf("old-style VACUUM FULL transaction ID %u equals or exceeds next valid transaction ID %u:%u",
+ xvac,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ psprintf("old-style VACUUM FULL transaction ID %u precedes relation freeze threshold %u:%u",
+ xvac,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ break;
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ psprintf("old-style VACUUM FULL transaction ID %u precedes oldest valid transaction ID %u:%u",
+ xvac,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ break;
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_IN_PROGRESS:
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ case XID_COMMITTED:
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+ }
+ else
+ {
+ XidCommitStatus status;
+
+ switch (get_xid_status(raw_xmin, ctx, &status))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup("raw xmin is invalid"));
+ return false;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ psprintf("raw xmin %u equals or exceeds next valid transaction ID %u:%u",
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ psprintf("raw xmin %u precedes relation freeze threshold %u:%u",
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ psprintf("raw xmin %u precedes oldest valid transaction ID %u:%u",
+ raw_xmin,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_COMMITTED:
+ break;
+ case XID_IN_PROGRESS:
+ return true; /* insert or delete in progress */
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_DEAD */
+ }
+ }
+ }
+ }
+
+ if (!(infomask & HEAP_XMAX_INVALID) && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
+ {
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ XidCommitStatus status;
+ TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
+
+ switch (get_xid_status(xmax, ctx, &status))
+ {
+ /* not LOCKED_ONLY, so it has to have an xmax */
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup("xmax is invalid"));
+ return false; /* corrupt */
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ psprintf("xmax %u equals or exceeds next valid transaction ID %u:%u",
+ xmax,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ psprintf("xmax %u precedes relation freeze threshold %u:%u",
+ xmax,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ return false; /* corrupt */
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ psprintf("xmax %u precedes oldest valid transaction ID %u:%u",
+ xmax,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ return false; /* corrupt */
+ case XID_BOUNDS_OK:
+ switch (status)
+ {
+ case XID_IN_PROGRESS:
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
+ case XID_COMMITTED:
+ case XID_ABORTED:
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or
+ * HEAPTUPLE_DEAD */
+ }
+ }
+
+ /* Ok, the tuple is live */
+ }
+ else if (!(infomask & HEAP_XMAX_COMMITTED))
+ return true; /* HEAPTUPLE_DELETE_IN_PROGRESS or
+ * HEAPTUPLE_LIVE */
+ else
+ return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
+ }
+ return true; /* not dead */
+}
+
+/*
+ * Check the current toast tuple against the state tracked in ctx, recording
+ * any corruption found in ctx->tupstore.
+ *
+ * This is not equivalent to running verify_heapam on the toast table itself,
+ * and is not hardened against corruption of the toast table. Rather, when
+ * validating a toasted attribute in the main table, the sequence of toast
+ * tuples that store the toasted value are retrieved and checked in order, with
+ * each toast tuple being checked against where we are in the sequence, as well
+ * as each toast tuple having its varlena structure sanity checked.
+ */
+static void
+check_toast_tuple(HeapTuple toasttup, HeapCheckContext *ctx)
+{
+ int32 curchunk;
+ Pointer chunk;
+ bool isnull;
+ int32 chunksize;
+ int32 expected_size;
+
+ /*
+ * Have a chunk, extract the sequence number and the data
+ */
+ curchunk = DatumGetInt32(fastgetattr(toasttup, 2,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup("toast chunk sequence number is null"));
+ return;
+ }
+ chunk = DatumGetPointer(fastgetattr(toasttup, 3,
+ ctx->toast_rel->rd_att, &isnull));
+ if (isnull)
+ {
+ report_corruption(ctx,
+ pstrdup("toast chunk data is null"));
+ return;
+ }
+ if (!VARATT_IS_EXTENDED(chunk))
+ chunksize = VARSIZE(chunk) - VARHDRSZ;
+ else if (VARATT_IS_SHORT(chunk))
+ {
+ /*
+ * could happen due to heap_form_tuple doing its thing
+ */
+ chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
+ }
+ else
+ {
+ /* should never happen */
+ uint32 header = ((varattrib_4b *) chunk)->va_4byte.va_header;
+
+ report_corruption(ctx,
+ psprintf("corrupt extended toast chunk has invalid varlena header: %0x (sequence number %d)",
+ header, curchunk));
+ return;
+ }
+
+ /*
+ * Some checks on the data we've found
+ */
+ if (curchunk != ctx->chunkno)
+ {
+ report_corruption(ctx,
+ psprintf("toast chunk sequence number %u does not match the expected sequence number %u",
+ curchunk, ctx->chunkno));
+ return;
+ }
+ if (curchunk > ctx->endchunk)
+ {
+ report_corruption(ctx,
+ psprintf("toast chunk sequence number %u exceeds the end chunk sequence number %u",
+ curchunk, ctx->endchunk));
+ return;
+ }
+
+ expected_size = curchunk < ctx->totalchunks - 1 ? TOAST_MAX_CHUNK_SIZE
+ : ctx->attrsize - ((ctx->totalchunks - 1) * TOAST_MAX_CHUNK_SIZE);
+ if (chunksize != expected_size)
+ {
+ report_corruption(ctx,
+ psprintf("toast chunk size %u differs from the expected size %u",
+ chunksize, expected_size));
+ return;
+ }
+}
+
+/*
+ * Check the current attribute as tracked in ctx, recording any corruption
+ * found in ctx->tupstore.
+ *
+ * This function follows the logic performed by heap_deform_tuple(), and in the
+ * case of a toasted value, optionally continues along the logic of
+ * detoast_external_attr(), checking for any conditions that would result in
+ * either of those functions Asserting or crashing the backend. The checks
+ * performed by Asserts present in those two functions are also performed here.
+ * In cases where those two functions are a bit cavalier in their assumptions
+ * about data being correct, we perform additional checks not present in either
+ * of those two functions. Where some condition is checked in both of those
+ * functions, we perform it here twice, as we parallel the logical flow of
+ * those two functions. The presence of duplicate checks seems a reasonable
+ * price to pay for keeping this code tightly coupled with the code it
+ * protects.
+ *
+ * Returns true if the tuple attribute is sane enough for processing to
+ * continue on to the next attribute, false otherwise.
+ */
+static bool
+check_tuple_attribute(HeapCheckContext *ctx)
+{
+ struct varatt_external toast_pointer;
+ ScanKeyData toastkey;
+ SysScanDesc toastscan;
+ SnapshotData SnapshotToast;
+ HeapTuple toasttup;
+ bool found_toasttup;
+ Datum attdatum;
+ struct varlena *attr;
+ char *tp; /* pointer to the tuple data */
+ uint16 infomask;
+ Form_pg_attribute thisatt;
+
+ infomask = ctx->tuphdr->t_infomask;
+ thisatt = TupleDescAttr(RelationGetDescr(ctx->rel), ctx->attnum);
+
+ tp = (char *) ctx->tuphdr + ctx->tuphdr->t_hoff;
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ psprintf("attribute %u with length %u starts at offset %u beyond total tuple length %u",
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+
+ /* Skip null values */
+ if (infomask & HEAP_HASNULL && att_isnull(ctx->attnum, ctx->tuphdr->t_bits))
+ return true;
+
+ /* Skip non-varlena values, but update offset first */
+ if (thisatt->attlen != -1)
+ {
+ ctx->offset = att_align_nominal(ctx->offset, thisatt->attalign);
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ psprintf("attribute %u with length %u ends at offset %u beyond total tuple length %u",
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+ return false;
+ }
+ return true;
+ }
+
+ /* Ok, we're looking at a varlena attribute. */
+ ctx->offset = att_align_pointer(ctx->offset, thisatt->attalign, -1,
+ tp + ctx->offset);
+
+ /* Get the (possibly corrupt) varlena datum */
+ attdatum = fetchatt(thisatt, tp + ctx->offset);
+
+ /*
+ * We have the datum, but we cannot decode it carelessly, as it may still
+ * be corrupt.
+ */
+
+ /*
+ * Check that VARTAG_SIZE won't hit a TrapMacro on a corrupt va_tag before
+ * risking a call into att_addlength_pointer
+ */
+ if (VARATT_IS_EXTERNAL(tp + ctx->offset))
+ {
+ uint8 va_tag = VARTAG_EXTERNAL(tp + ctx->offset);
+
+ if (va_tag != VARTAG_ONDISK)
+ {
+ report_corruption(ctx,
+ psprintf("toasted attribute %u has unexpected TOAST tag %u",
+ ctx->attnum,
+ va_tag));
+ /* We can't know where the next attribute begins */
+ return false;
+ }
+ }
+
+ /* Ok, should be safe now */
+ ctx->offset = att_addlength_pointer(ctx->offset, thisatt->attlen,
+ tp + ctx->offset);
+
+ if (ctx->tuphdr->t_hoff + ctx->offset > ctx->lp_len)
+ {
+ report_corruption(ctx,
+ psprintf("attribute %u with length %u ends at offset %u beyond total tuple length %u",
+ ctx->attnum,
+ thisatt->attlen,
+ ctx->tuphdr->t_hoff + ctx->offset,
+ ctx->lp_len));
+
+ return false;
+ }
+
+ /*
+ * heap_deform_tuple would be done with this attribute at this point,
+ * having stored it in values[], and would continue to the next attribute.
+ * We go further, because we need to check if the toast datum is corrupt.
+ */
+
+ attr = (struct varlena *) DatumGetPointer(attdatum);
+
+ /*
+ * Now we follow the logic of detoast_external_attr(), with the same
+ * caveats about being paranoid about corruption.
+ */
+
+ /* Skip values that are not external */
+ if (!VARATT_IS_EXTERNAL(attr))
+ return true;
+
+ /* It is external, and we're looking at a page on disk */
+
+ /* The tuple header better claim to contain toasted values */
+ if (!(infomask & HEAP_HASEXTERNAL))
+ {
+ report_corruption(ctx,
+ psprintf("attribute %u is external but tuple header flag HEAP_HASEXTERNAL not set",
+ ctx->attnum));
+ return true;
+ }
+
+ /* The relation better have a toast table */
+ if (!ctx->rel->rd_rel->reltoastrelid)
+ {
+ report_corruption(ctx,
+ psprintf("attribute %u is external but relation has no toast relation",
+ ctx->attnum));
+ return true;
+ }
+
+ /* If we were told to skip toast checking, then we're done. */
+ if (ctx->toast_rel == NULL)
+ return true;
+
+ /*
+ * Must copy attr into toast_pointer for alignment considerations
+ */
+ VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
+
+ ctx->attrsize = toast_pointer.va_extsize;
+ ctx->endchunk = (ctx->attrsize - 1) / TOAST_MAX_CHUNK_SIZE;
+ ctx->totalchunks = ctx->endchunk + 1;
+
+ /*
+ * Setup a scan key to find chunks in toast table with matching va_valueid
+ */
+ ScanKeyInit(&toastkey,
+ (AttrNumber) 1,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(toast_pointer.va_valueid));
+
+ /*
+ * Check if any chunks for this toasted object exist in the toast table,
+ * accessible via the index.
+ */
+ init_toast_snapshot(&SnapshotToast);
+ toastscan = systable_beginscan_ordered(ctx->toast_rel,
+ ctx->valid_toast_index,
+ &SnapshotToast, 1,
+ &toastkey);
+ ctx->chunkno = 0;
+ found_toasttup = false;
+ while ((toasttup =
+ systable_getnext_ordered(toastscan,
+ ForwardScanDirection)) != NULL)
+ {
+ found_toasttup = true;
+ check_toast_tuple(toasttup, ctx);
+ ctx->chunkno++;
+ }
+ if (ctx->chunkno != (ctx->endchunk + 1))
+ report_corruption(ctx,
+ psprintf("final toast chunk number %u differs from expected value %u",
+ ctx->chunkno, (ctx->endchunk + 1)));
+ if (!found_toasttup)
+ report_corruption(ctx,
+ psprintf("toasted value for attribute %u missing from toast table",
+ ctx->attnum));
+ systable_endscan_ordered(toastscan);
+
+ return true;
+}
+
+/*
+ * Check the current tuple as tracked in ctx, recording any corruption found in
+ * ctx->tupstore.
+ */
+static void
+check_tuple(HeapCheckContext *ctx)
+{
+ TransactionId xmin;
+ TransactionId xmax;
+ bool fatal = false;
+ uint16 infomask = ctx->tuphdr->t_infomask;
+
+ /*
+ * If we report corruption before iterating over individual attributes, we
+ * need attnum to be reported as NULL. Set that up before any corruption
+ * reporting might happen.
+ */
+ ctx->attnum = -1;
+
+ /*
+ * If the line pointer for this tuple does not reserve enough space for a
+ * complete tuple header, we dare not read the tuple header.
+ */
+ if (ctx->lp_len < MAXALIGN(SizeofHeapTupleHeader))
+ {
+ report_corruption(ctx,
+ psprintf("line pointer length %u is less than the minimum tuple header size %u",
+ ctx->lp_len, (uint32) MAXALIGN(SizeofHeapTupleHeader)));
+ return;
+ }
+
+ /* If xmin is normal, it should be within valid range */
+ xmin = HeapTupleHeaderGetXmin(ctx->tuphdr);
+ switch (get_xid_status(xmin, ctx, NULL))
+ {
+ case XID_INVALID:
+ case XID_BOUNDS_OK:
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ psprintf("xmin %u equals or exceeds next valid transaction ID %u:%u",
+ xmin,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ psprintf("xmin %u precedes oldest valid transaction ID %u:%u",
+ xmin,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ psprintf("xmin %u precedes relation freeze threshold %u:%u",
+ xmin,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ fatal = true;
+ break;
+ }
+
+ xmax = HeapTupleHeaderGetRawXmax(ctx->tuphdr);
+
+ if (infomask & HEAP_XMAX_IS_MULTI)
+ {
+ /* xmax is a multixact, so it should be within valid MXID range */
+ switch (check_mxid_valid_in_rel(xmax, ctx))
+ {
+ case XID_INVALID:
+ report_corruption(ctx,
+ pstrdup("multitransaction ID is invalid"));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ psprintf("multitransaction ID %u precedes relation minimum multitransaction ID threshold %u",
+ xmax, ctx->relminmxid));
+ fatal = true;
+ break;
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ psprintf("multitransaction ID %u precedes oldest valid multitransaction ID threshold %u",
+ xmax, ctx->oldest_mxact));
+ fatal = true;
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ psprintf("multitransaction ID %u equals or exceeds next valid multitransaction ID %u",
+ xmax,
+ ctx->next_mxact));
+ fatal = true;
+ break;
+ case XID_BOUNDS_OK:
+ break;
+ }
+ }
+ else
+ {
+ /*
+ * xmax is not a multixact and is normal, so it should be within the
+ * valid XID range.
+ */
+ switch (get_xid_status(xmax, ctx, NULL))
+ {
+ case XID_INVALID:
+ case XID_BOUNDS_OK:
+ break;
+ case XID_IN_FUTURE:
+ report_corruption(ctx,
+ psprintf("xmax %u equals or exceeds next valid transaction ID %u:%u",
+ xmax,
+ EpochFromFullTransactionId(ctx->next_fxid),
+ XidFromFullTransactionId(ctx->next_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_CLUSTERMIN:
+ report_corruption(ctx,
+ psprintf("xmax %u precedes oldest valid transaction ID %u:%u",
+ xmax,
+ EpochFromFullTransactionId(ctx->oldest_fxid),
+ XidFromFullTransactionId(ctx->oldest_fxid)));
+ fatal = true;
+ break;
+ case XID_PRECEDES_RELMIN:
+ report_corruption(ctx,
+ psprintf("xmax %u precedes relation freeze threshold %u:%u",
+ xmax,
+ EpochFromFullTransactionId(ctx->relfrozenfxid),
+ XidFromFullTransactionId(ctx->relfrozenfxid)));
+ fatal = true;
+ }
+ }
+
+ /*
+ * Cannot process tuple data if tuple header was corrupt, as the offsets
+ * within the page cannot be trusted, leaving too much risk of reading
+ * garbage if we continue.
+ *
+ * We also cannot process the tuple if the xmin or xmax were invalid
+ * relative to relfrozenxid or relminmxid, as clog entries for the xids
+ * may already be gone.
+ */
+ if (fatal)
+ return;
+
+ /*
+ * Check various forms of tuple header corruption. If the header is too
+ * corrupt to continue checking, or if the tuple is not visible to anyone,
+ * we cannot continue with other checks.
+ */
+ if (!check_tuple_header_and_visibilty(ctx->tuphdr, ctx))
+ return;
+
+ /*
+ * The tuple is visible, so it must be compatible with the current version
+ * of the relation descriptor. It might have fewer columns than are
+ * present in the relation descriptor, but it cannot have more.
+ */
+ if (RelationGetDescr(ctx->rel)->natts < ctx->natts)
+ {
+ report_corruption(ctx,
+ psprintf("number of attributes %u exceeds maximum expected for table %u",
+ ctx->natts,
+ RelationGetDescr(ctx->rel)->natts));
+ return;
+ }
+
+ /*
+ * Check each attribute unless we hit corruption that confuses what to do
+ * next, at which point we abort further attribute checks for this tuple.
+ * Note that we don't abort for all types of corruption, only for those
+ * types where we don't know how to continue.
+ */
+ ctx->offset = 0;
+ for (ctx->attnum = 0; ctx->attnum < ctx->natts; ctx->attnum++)
+ if (!check_tuple_attribute(ctx))
+ break; /* cannot continue */
+}
+
+/*
+ * Convert a TransactionId into a FullTransactionId using our cached values of
+ * the valid transaction ID range. It is the caller's responsibility to have
+ * already updated the cached values, if necessary.
+ */
+static FullTransactionId
+FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
+{
+ uint32 epoch;
+
+ if (!TransactionIdIsNormal(xid))
+ return FullTransactionIdFromEpochAndXid(0, xid);
+ epoch = EpochFromFullTransactionId(ctx->next_fxid);
+ if (xid > ctx->next_xid)
+ epoch--;
+ return FullTransactionIdFromEpochAndXid(epoch, xid);
+}
+
+/*
+ * Update our cached range of valid transaction IDs.
+ */
+static void
+update_cached_xid_range(HeapCheckContext *ctx)
+{
+ /* Make cached copies */
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ ctx->next_fxid = ShmemVariableCache->nextXid;
+ ctx->oldest_xid = ShmemVariableCache->oldestXid;
+ LWLockRelease(XidGenLock);
+
+ /* And compute alternate versions of the same */
+ ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
+ ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
+}
+
+/*
+ * Update our cached range of valid multitransaction IDs.
+ */
+static void
+update_cached_mxid_range(HeapCheckContext *ctx)
+{
+ ReadMultiXactIdRange(&ctx->oldest_mxact, &ctx->next_mxact);
+}
+
+/*
+ * Return whether the given FullTransactionId is within our cached valid
+ * transaction ID range.
+ */
+static inline bool
+fxid_in_cached_range(FullTransactionId fxid, const HeapCheckContext *ctx)
+{
+ return (FullTransactionIdPrecedesOrEquals(ctx->oldest_fxid, fxid) &&
+ FullTransactionIdPrecedes(fxid, ctx->next_fxid));
+}
+
+/*
+ * Checks wheter a multitransaction ID is in the cached valid range, returning
+ * the nature of the range violation, if any.
+ */
+static XidBoundsViolation
+check_mxid_in_range(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ if (!TransactionIdIsValid(mxid))
+ return XID_INVALID;
+ if (MultiXactIdPrecedes(mxid, ctx->relminmxid))
+ return XID_PRECEDES_RELMIN;
+ if (MultiXactIdPrecedes(mxid, ctx->oldest_mxact))
+ return XID_PRECEDES_CLUSTERMIN;
+ if (MultiXactIdPrecedesOrEquals(ctx->next_mxact, mxid))
+ return XID_IN_FUTURE;
+ return XID_BOUNDS_OK;
+}
+
+/*
+ * Checks whether the given mxid is valid to appear in the heap being checked,
+ * returning the nature of the range violation, if any.
+ *
+ * This function attempts to return quickly by caching the known valid mxid
+ * range in ctx. Callers should already have performed the initial setup of
+ * the cache prior to the first call to this function.
+ */
+static XidBoundsViolation
+check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
+{
+ XidBoundsViolation result;
+
+ result = check_mxid_in_range(mxid, ctx);
+ if (result == XID_BOUNDS_OK)
+ return XID_BOUNDS_OK;
+
+ /* The range may have advanced. Recheck. */
+ update_cached_mxid_range(ctx);
+ return check_mxid_in_range(mxid, ctx);
+}
+
+/*
+ * Checks whether the given transaction ID is (or was recently) valid to appear
+ * in the heap being checked, or whether it is too old or too new to appear in
+ * the relation, returning information about the nature of the bounds violation.
+ *
+ * We cache the range of valid transaction IDs. If xid is in that range, we
+ * conclude that it is valid, even though concurrent changes to the table might
+ * invalidate it under certain corrupt conditions. (For example, if the table
+ * contains corrupt all-frozen bits, a concurrent vacuum might skip the page(s)
+ * containing the xid and then truncate clog and advance the relfrozenxid
+ * beyond xid.) Reporting the xid as valid under such conditions seems
+ * acceptable, since if we had checked it earlier in our scan it would have
+ * truly been valid at that time.
+ *
+ * If the status argument is not NULL, and if and only if the transaction ID
+ * appears to be valid in this relation, clog will be consulted and the commit
+ * status argument will be set with the status of the transaction ID.
+ */
+static XidBoundsViolation
+get_xid_status(TransactionId xid, HeapCheckContext *ctx,
+ XidCommitStatus *status)
+{
+ XidBoundsViolation result;
+ FullTransactionId fxid;
+ FullTransactionId clog_horizon;
+
+ /* Quick check for special xids */
+ if (!TransactionIdIsValid(xid))
+ result = XID_INVALID;
+ else if (xid == BootstrapTransactionId || xid == FrozenTransactionId)
+ result = XID_BOUNDS_OK;
+ else
+ {
+ /* Check if the xid is within bounds */
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ if (!fxid_in_cached_range(fxid, ctx))
+ {
+ /*
+ * We may have been checking against stale values. Update the
+ * cached range to be sure, and since we relied on the cached
+ * range when we performed the full xid conversion, reconvert.
+ */
+ update_cached_xid_range(ctx);
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ }
+
+ if (FullTransactionIdPrecedesOrEquals(ctx->next_fxid, fxid))
+ result = XID_IN_FUTURE;
+ else if (FullTransactionIdPrecedes(fxid, ctx->oldest_fxid))
+ result = XID_PRECEDES_CLUSTERMIN;
+ else if (FullTransactionIdPrecedes(fxid, ctx->relfrozenfxid))
+ result = XID_PRECEDES_RELMIN;
+ else
+ result = XID_BOUNDS_OK;
+ }
+
+ /*
+ * Early return if the caller does not request clog checking, or if the
+ * xid is already known to be out of bounds. We dare not check clog for
+ * out of bounds transaction IDs.
+ */
+ if (status == NULL || result != XID_BOUNDS_OK)
+ return result;
+
+ /* Early return if we just checked this xid in a prior call */
+ if (xid == ctx->cached_xid)
+ {
+ *status = ctx->cached_status;
+ return result;
+ }
+
+ *status = XID_COMMITTED;
+ LWLockAcquire(XactTruncationLock, LW_SHARED);
+ clog_horizon =
+ FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid,
+ ctx);
+ if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
+ {
+ if (TransactionIdIsCurrentTransactionId(xid))
+ *status = XID_IN_PROGRESS;
+ else if (TransactionIdDidCommit(xid))
+ *status = XID_COMMITTED;
+ else if (TransactionIdDidAbort(xid))
+ *status = XID_ABORTED;
+ else
+ *status = XID_IN_PROGRESS;
+ }
+ LWLockRelease(XactTruncationLock);
+ ctx->cached_xid = xid;
+ ctx->cached_status = *status;
+ return result;
+}
diff --git a/doc/src/sgml/amcheck.sgml b/doc/src/sgml/amcheck.sgml
index a9df2c1a9d..25e4bb2bfe 100644
--- a/doc/src/sgml/amcheck.sgml
+++ b/doc/src/sgml/amcheck.sgml
@@ -9,12 +9,11 @@
<para>
The <filename>amcheck</filename> module provides functions that allow you to
- verify the logical consistency of the structure of relations. If the
- structure appears to be valid, no error is raised.
+ verify the logical consistency of the structure of relations.
</para>
<para>
- The functions verify various <emphasis>invariants</emphasis> in the
+ The B-Tree checking functions verify various <emphasis>invariants</emphasis> in the
structure of the representation of particular relations. The
correctness of the access method functions behind index scans and
other important operations relies on these invariants always
@@ -24,7 +23,7 @@
collated lexical order). If that particular invariant somehow fails
to hold, we can expect binary searches on the affected page to
incorrectly guide index scans, resulting in wrong answers to SQL
- queries.
+ queries. If the structure appears to be valid, no error is raised.
</para>
<para>
Verification is performed using the same procedures as those used by
@@ -35,7 +34,22 @@
functions.
</para>
<para>
- <filename>amcheck</filename> functions may only be used by superusers.
+ Unlike the B-Tree checking functions which report corruption by raising
+ errors, the heap checking function <function>verify_heapam</function> checks
+ a table and attempts to return a set of rows, one row per corruption
+ detected. Despite this, if facilities that
+ <function>verify_heapam</function> relies upon are themselves corrupted, the
+ function may be unable to continue and may instead raise an error.
+ </para>
+ <para>
+ Permission to execute <filename>amcheck</filename> functions may be granted
+ to non-superusers, but before granting such permissions careful consideration
+ should be given to data security and privacy concerns. Although the
+ corruption reports generated by these functions do not focus on the contents
+ of the corrupted data so much as on the structure of that data and the nature
+ of the corruptions found, an attacker who gains permission to execute these
+ functions, particularly if the attacker can also induce corruption, might be
+ able to infer something of the data itself from such messages.
</para>
<sect2>
@@ -187,12 +201,221 @@ SET client_min_messages = DEBUG1;
</para>
</tip>
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>
+ verify_heapam(relation regclass,
+ on_error_stop boolean,
+ check_toast boolean,
+ skip cstring,
+ startblock bigint,
+ endblock bigint,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ returns record
+ </function>
+ </term>
+ <listitem>
+ <para>
+ Checks a table for structural corruption, where pages in the relation
+ contain data that is invalidly formatted, and for logical corruption,
+ where pages are structurally valid but inconsistent with the rest of the
+ database cluster. Example usage:
+<screen>
+test=# select * from verify_heapam('mytable', check_toast := true);
+ blkno | offnum | attnum | msg
+-------+--------+--------+--------------------------------------------------------------------------------------------------
+ 17 | 12 | | xmin 4294967295 precedes relation freeze threshold 17:1134217582
+ 960 | 4 | | data begins at offset 152 beyond the tuple length 58
+ 960 | 4 | | tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+ 960 | 5 | | tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+ 960 | 6 | | tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+ 960 | 7 | | tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+ 1147 | 2 | | number of attributes 2047 exceeds maximum expected for table 3
+ 1147 | 10 | | tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+ 1147 | 15 | | number of attributes 67 exceeds maximum expected for table 3
+ 1147 | 16 | 1 | attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+ 1147 | 18 | 2 | final toast chunk number 0 differs from expected value 6
+ 1147 | 19 | 2 | toasted value for attribute 2 missing from toast table
+ 1147 | 21 | | tuple is marked as only locked, but also claims key columns were updated
+ 1147 | 22 | | multitransaction ID 1775655 is from before relation cutoff 2355572
+(14 rows)
+</screen>
+ As this example shows, the Tuple ID (TID) of the corrupt tuple is given
+ in the (<literal>blkno</literal>, <literal>offnum</literal>) columns, and
+ for corruptions specific to a particular attribute in the tuple, the
+ <literal>attnum</literal> field shows which one.
+ </para>
+ <para>
+ Structural corruption can happen due to faulty storage hardware, or
+ relation files being overwritten or modified by unrelated software.
+ This kind of corruption can also be detected with
+ <link linkend="app-initdb-data-checksums"><application>data page
+ checksums</application></link>.
+ </para>
+ <para>
+ Relation pages which are correctly formatted, internally consistent, and
+ correct relative to their own internal checksums may still contain
+ logical corruption. As such, this kind of corruption cannot be detected
+ with <application>checksums</application>. Examples include toasted
+ values in the main table which lack a corresponding entry in the toast
+ table, and tuples in the main table with a Transaction ID that is older
+ than the oldest valid Transaction ID in the database or cluster.
+ </para>
+ <para>
+ Multiple causes of logical corruption have been observed in production
+ systems, including bugs in the <productname>PostgreSQL</productname>
+ server software, faulty and ill-conceived backup and restore tools, and
+ user error.
+ </para>
+ <para>
+ Corrupt relations are most concerning in live production environments,
+ precisely the same environments where high risk activities are least
+ welcome. For this reason, <function>verify_heapam</function> has been
+ designed to diagnose corruption without undue risk. It cannot guard
+ against all causes of backend crashes, as even executing the calling
+ query could be unsafe on a badly corrupted system. Access to <link
+ linkend="catalogs-overview">catalog tables</link> are performed and could
+ be problematic if the catalogs themselves are corrupted.
+ </para>
+ <para>
+ The design principle adhered to in <function>verify_heapam</function> is
+ that, if the rest of the system and server hardware are correct, under
+ default options, <function>verify_heapam</function> will not crash the
+ server due merely to structural or logical corruption in the target
+ table.
+ </para>
+ <para>
+ The <literal>check_toast</literal> attempts to reconcile the target
+ table against entries in its corresponding toast table. This option is
+ disabled by default and is known to be slow.
+ If the target relation's corresponding toast table or toast index is
+ corrupt, reconciling the target table against toast values could
+ conceivably crash the server, although in many cases this would
+ just produce an error.
+ </para>
+ <para>
+ The following optional arguments are recognized:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>on_error_stop</term>
+ <listitem>
+ <para>
+ If true, corruption checking stops at the end of the first block on
+ which any corruptions are found.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>check_toast</term>
+ <listitem>
+ <para>
+ If true, toasted values are checked gainst the corresponding
+ TOAST table.
+ </para>
+ <para>
+ Defaults to false.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>skip</term>
+ <listitem>
+ <para>
+ If not <literal>none</literal>, corruption checking skips blocks that
+ are marked as all-visible or all-frozen, as given.
+ Valid options are <literal>all-visible</literal>,
+ <literal>all-frozen</literal> and <literal>none</literal>.
+ </para>
+ <para>
+ Defaults to <literal>none</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>startblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking begins at the specified block,
+ skipping all previous blocks. It is an error to specify a
+ <literal>startblock</literal> outside the range of blocks in the
+ target table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>endblock</term>
+ <listitem>
+ <para>
+ If specified, corruption checking ends at the specified block,
+ skipping all remaining blocks. It is an error to specify an
+ <literal>endblock</literal> outside the range of blocks in the target
+ table.
+ </para>
+ <para>
+ By default, does not skip any blocks.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ <para>
+ For each corruption detected, <function>verify_heapam</function> returns
+ a row with the following columns:
+ </para>
+ <variablelist>
+ <varlistentry>
+ <term>blkno</term>
+ <listitem>
+ <para>
+ The number of the block containing the corrupt page.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>offnum</term>
+ <listitem>
+ <para>
+ The OffsetNumber of the corrupt tuple.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>attnum</term>
+ <listitem>
+ <para>
+ The attribute number of the corrupt column in the tuple, if the
+ corruption is specific to a column and not the tuple as a whole.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>msg</term>
+ <listitem>
+ <para>
+ A human readable message describing the corruption in the page.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</sect2>
<sect2>
<title>Optional <parameter>heapallindexed</parameter> Verification</title>
<para>
- When the <parameter>heapallindexed</parameter> argument to
+ When the <parameter>heapallindexed</parameter> argument to B-Tree
verification functions is <literal>true</literal>, an additional
phase of verification is performed against the table associated with
the target index relation. This consists of a <quote>dummy</quote>
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index aa3f14c019..ca357410a2 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -47,6 +47,17 @@ RelationPutHeapTuple(Relation relation,
*/
Assert(!token || HeapTupleHeaderIsSpeculative(tuple->t_data));
+ /*
+ * Do not allow tuples with invalid combinations of hint bits to be placed
+ * on a page. These combinations are detected as corruption by the
+ * contrib/amcheck logic, so if you disable one or both of these
+ * assertions, make corresponding changes there.
+ */
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ (tuple->t_data->t_infomask2 & HEAP_KEYS_UPDATED)));
+ Assert(!((tuple->t_data->t_infomask & HEAP_XMAX_COMMITTED) &&
+ (tuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)));
+
/* Add the tuple to the page */
pageHeader = BufferGetPage(buffer);
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 6ccdc5b58c..43653fe572 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -735,6 +735,25 @@ ReadNextMultiXactId(void)
return mxid;
}
+/*
+ * ReadMultiXactIdRange
+ * Get the range of IDs that may still be referenced by a relation.
+ */
+void
+ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next)
+{
+ LWLockAcquire(MultiXactGenLock, LW_SHARED);
+ *oldest = MultiXactState->oldestMultiXactId;
+ *next = MultiXactState->nextMXact;
+ LWLockRelease(MultiXactGenLock);
+
+ if (*oldest < FirstMultiXactId)
+ *oldest = FirstMultiXactId;
+ if (*next < FirstMultiXactId)
+ *next = FirstMultiXactId;
+}
+
+
/*
* MultiXactIdCreateFromMembers
* Make a new MultiXactId from the specified set of members
diff --git a/src/include/access/multixact.h b/src/include/access/multixact.h
index 58c42ffe1f..9a30380901 100644
--- a/src/include/access/multixact.h
+++ b/src/include/access/multixact.h
@@ -109,6 +109,7 @@ extern MultiXactId MultiXactIdCreateFromMembers(int nmembers,
MultiXactMember *members);
extern MultiXactId ReadNextMultiXactId(void);
+extern void ReadMultiXactIdRange(MultiXactId *oldest, MultiXactId *next);
extern bool MultiXactIdIsRunning(MultiXactId multi, bool isLockOnly);
extern void MultiXactIdSetOldestMember(void);
extern int GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **xids,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c52f20d4ba..ff853634bc 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1020,6 +1020,7 @@ HbaToken
HeadlineJsonState
HeadlineParsedText
HeadlineWordEntry
+HeapCheckContext
HeapScanDesc
HeapTuple
HeapTupleData
@@ -2290,6 +2291,7 @@ SimpleStringList
SimpleStringListCell
SingleBoundSortItem
Size
+SkipPages
SlabBlock
SlabChunk
SlabContext
@@ -2791,6 +2793,8 @@ XactCallback
XactCallbackItem
XactEvent
XactLockTableWaitInfo
+XidBoundsViolation
+XidCommitStatus
XidHorizonPrefetchState
XidStatus
XmlExpr
--
2.21.1 (Apple Git-122.3)
v20-0003-Creating-non-throwing-interface-to-clog-and-slru.patchapplication/octet-stream; name=v20-0003-Creating-non-throwing-interface-to-clog-and-slru.patch; x-unix-mode=0644Download
From 270ce4cf0dff364ac68ab4f951ca3b2bc4f25068 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 21 Oct 2020 20:26:07 -0700
Subject: [PATCH v20 3/5] Creating non-throwing interface to clog and slru.
---
src/backend/access/transam/clog.c | 21 +++---
src/backend/access/transam/commit_ts.c | 4 +-
src/backend/access/transam/multixact.c | 16 ++---
src/backend/access/transam/slru.c | 23 +++---
src/backend/access/transam/subtrans.c | 4 +-
src/backend/access/transam/transam.c | 98 +++++++++-----------------
src/backend/commands/async.c | 4 +-
src/backend/storage/lmgr/predicate.c | 4 +-
src/include/access/clog.h | 18 +----
src/include/access/clogdefs.h | 33 +++++++++
src/include/access/slru.h | 6 +-
src/include/access/transam.h | 3 +
12 files changed, 122 insertions(+), 112 deletions(-)
create mode 100644 src/include/access/clogdefs.h
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 034349aa7b..a2eb3e2983 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -357,7 +357,7 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
* write-busy, since we don't care if the update reaches disk sooner than
* we think.
*/
- slotno = SimpleLruReadPage(XactCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
+ slotno = SimpleLruReadPage(XactCtl, pageno, XLogRecPtrIsInvalid(lsn), xid, true);
/*
* Set the main transaction id, if any.
@@ -631,7 +631,7 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
* for most uses; TransactionLogFetch() in transam.c is the intended caller.
*/
XidStatus
-TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
+TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn, bool throwError)
{
int pageno = TransactionIdToPage(xid);
int byteno = TransactionIdToByte(xid);
@@ -643,13 +643,18 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(XactCtl, pageno, xid);
- byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
+ slotno = SimpleLruReadPage_ReadOnly(XactCtl, pageno, xid, throwError);
+ if (slotno == InvalidSlotNo)
+ status = TRANSACTION_STATUS_UNKNOWN;
+ else
+ {
+ byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
- status = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
+ status = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
- lsnindex = GetLSNIndex(slotno, xid);
- *lsn = XactCtl->shared->group_lsn[lsnindex];
+ lsnindex = GetLSNIndex(slotno, xid);
+ *lsn = XactCtl->shared->group_lsn[lsnindex];
+ }
LWLockRelease(XactSLRULock);
@@ -796,7 +801,7 @@ TrimCLOG(void)
int slotno;
char *byteptr;
- slotno = SimpleLruReadPage(XactCtl, pageno, false, xid);
+ slotno = SimpleLruReadPage(XactCtl, pageno, false, xid, true);
byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
/* Zero so-far-unused positions in the current byte */
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index cb8a968801..98c685405c 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -237,7 +237,7 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
- slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
+ slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid, true);
TransactionIdSetCommitTs(xid, ts, nodeid, slotno);
for (i = 0; i < nsubxids; i++)
@@ -342,7 +342,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
}
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(CommitTsCtl, pageno, xid);
+ slotno = SimpleLruReadPage_ReadOnly(CommitTsCtl, pageno, xid, true);
memcpy(&entry,
CommitTsCtl->shared->page_buffer[slotno] +
SizeOfCommitTimestampEntry * entryno,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 43653fe572..ed902a9d64 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -881,7 +881,7 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
* enough that a MultiXactId is really involved. Perhaps someday we'll
* take the trouble to generalize the slru.c error reporting code.
*/
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -914,7 +914,7 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
if (pageno != prev_pageno)
{
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi, true);
prev_pageno = pageno;
}
@@ -1345,7 +1345,7 @@ retry:
pageno = MultiXactIdToOffsetPage(multi);
entryno = MultiXactIdToOffsetEntry(multi);
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
offset = *offptr;
@@ -1377,7 +1377,7 @@ retry:
entryno = MultiXactIdToOffsetEntry(tmpMXact);
if (pageno != prev_pageno)
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -1418,7 +1418,7 @@ retry:
if (pageno != prev_pageno)
{
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi, true);
prev_pageno = pageno;
}
@@ -2063,7 +2063,7 @@ TrimMultiXact(void)
int slotno;
MultiXactOffset *offptr;
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -2095,7 +2095,7 @@ TrimMultiXact(void)
int memberoff;
memberoff = MXOffsetToMemberOffset(offset);
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset, true);
xidptr = (TransactionId *)
(MultiXactMemberCtl->shared->page_buffer[slotno] + memberoff);
@@ -2749,7 +2749,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return false;
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi);
+ slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
offset = *offptr;
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 16a7898697..daa145eeff 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -385,14 +385,15 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
* The passed-in xid is used only for error reporting, and may be
* InvalidTransactionId if no specific xid is associated with the action.
*
- * Return value is the shared-buffer slot number now holding the page.
- * The buffer's LRU access info is updated.
+ * On error, when throwError is false, the return value is negative.
+ * Otherwise, return value is the shared-buffer slot number now holding the
+ * page, and the buffer's LRU access info is updated.
*
* Control lock must be held at entry, and will be held at exit.
*/
int
SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
- TransactionId xid)
+ TransactionId xid, bool throwError)
{
SlruShared shared = ctl->shared;
@@ -465,7 +466,11 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
/* Now it's okay to ereport if we failed */
if (!ok)
- SlruReportIOError(ctl, pageno, xid);
+ {
+ if (throwError)
+ SlruReportIOError(ctl, pageno, xid);
+ return InvalidSlotNo;
+ }
SlruRecentlyUsed(shared, slotno);
@@ -484,14 +489,16 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
* The passed-in xid is used only for error reporting, and may be
* InvalidTransactionId if no specific xid is associated with the action.
*
- * Return value is the shared-buffer slot number now holding the page.
- * The buffer's LRU access info is updated.
+ * On error, when throwError is false, the return value is negative.
+ * Otherwise, return value is the shared-buffer slot number now holding the
+ * page, and the buffer's LRU access info is updated.
*
* Control lock must NOT be held at entry, but will be held at exit.
* It is unspecified whether the lock will be shared or exclusive.
*/
int
-SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
+SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid,
+ bool throwError)
{
SlruShared shared = ctl->shared;
int slotno;
@@ -520,7 +527,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
LWLockRelease(shared->ControlLock);
LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
- return SimpleLruReadPage(ctl, pageno, true, xid);
+ return SimpleLruReadPage(ctl, pageno, true, xid, throwError);
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0111e867c7..353b946731 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -83,7 +83,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
- slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
+ slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid, true);
ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
ptr += entryno;
@@ -123,7 +123,7 @@ SubTransGetParent(TransactionId xid)
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid);
+ slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid, true);
ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
ptr += entryno;
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index a28918657c..88f867e5ef 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -35,7 +35,8 @@ static XidStatus cachedFetchXidStatus;
static XLogRecPtr cachedCommitLSN;
/* Local functions */
-static XidStatus TransactionLogFetch(TransactionId transactionId);
+static XidStatus TransactionLogFetch(TransactionId transactionId,
+ bool throwError);
/* ----------------------------------------------------------------
@@ -49,7 +50,7 @@ static XidStatus TransactionLogFetch(TransactionId transactionId);
* TransactionLogFetch --- fetch commit status of specified transaction id
*/
static XidStatus
-TransactionLogFetch(TransactionId transactionId)
+TransactionLogFetch(TransactionId transactionId, bool throwError)
{
XidStatus xidstatus;
XLogRecPtr xidlsn;
@@ -76,14 +77,16 @@ TransactionLogFetch(TransactionId transactionId)
/*
* Get the transaction status.
*/
- xidstatus = TransactionIdGetStatus(transactionId, &xidlsn);
+ xidstatus = TransactionIdGetStatus(transactionId, &xidlsn, throwError);
/*
* Cache it, but DO NOT cache status for unfinished or sub-committed
* transactions! We only cache status that is guaranteed not to change.
+ * Likewise, DO NOT cache when the status is unknown.
*/
if (xidstatus != TRANSACTION_STATUS_IN_PROGRESS &&
- xidstatus != TRANSACTION_STATUS_SUB_COMMITTED)
+ xidstatus != TRANSACTION_STATUS_SUB_COMMITTED &&
+ xidstatus != TRANSACTION_STATUS_UNKNOWN)
{
cachedFetchXid = transactionId;
cachedFetchXidStatus = xidstatus;
@@ -96,6 +99,7 @@ TransactionLogFetch(TransactionId transactionId)
/* ----------------------------------------------------------------
* Interface functions
*
+ * TransactionIdResolveStatus
* TransactionIdDidCommit
* TransactionIdDidAbort
* ========
@@ -115,24 +119,17 @@ TransactionLogFetch(TransactionId transactionId)
*/
/*
- * TransactionIdDidCommit
- * True iff transaction associated with the identifier did commit.
- *
- * Note:
- * Assumes transaction identifier is valid and exists in clog.
+ * TransactionIdResolveStatus
+ * Returns the status of the transaction associated with the identifer,
+ * recursively resolving sub-committed transaction status by checking
+ * the parent transaction.
*/
-bool /* true if given transaction committed */
-TransactionIdDidCommit(TransactionId transactionId)
+XidStatus
+TransactionIdResolveStatus(TransactionId transactionId, bool throwError)
{
XidStatus xidstatus;
- xidstatus = TransactionLogFetch(transactionId);
-
- /*
- * If it's marked committed, it's committed.
- */
- if (xidstatus == TRANSACTION_STATUS_COMMITTED)
- return true;
+ xidstatus = TransactionLogFetch(transactionId, throwError);
/*
* If it's marked subcommitted, we have to check the parent recursively.
@@ -153,21 +150,31 @@ TransactionIdDidCommit(TransactionId transactionId)
TransactionId parentXid;
if (TransactionIdPrecedes(transactionId, TransactionXmin))
- return false;
+ return TRANSACTION_STATUS_ABORTED;
parentXid = SubTransGetParent(transactionId);
if (!TransactionIdIsValid(parentXid))
{
elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
transactionId);
- return false;
+ return TRANSACTION_STATUS_ABORTED;
}
- return TransactionIdDidCommit(parentXid);
+ return TransactionIdResolveStatus(parentXid, throwError);
}
+ return xidstatus;
+}
- /*
- * It's not committed.
- */
- return false;
+/*
+ * TransactionIdDidCommit
+ * True iff transaction associated with the identifier did commit.
+ *
+ * Note:
+ * Assumes transaction identifier is valid and exists in clog.
+ */
+bool /* true if given transaction committed */
+TransactionIdDidCommit(TransactionId transactionId)
+{
+ return (TransactionIdResolveStatus(transactionId, true) ==
+ TRANSACTION_STATUS_COMMITTED);
}
/*
@@ -180,43 +187,8 @@ TransactionIdDidCommit(TransactionId transactionId)
bool /* true if given transaction aborted */
TransactionIdDidAbort(TransactionId transactionId)
{
- XidStatus xidstatus;
-
- xidstatus = TransactionLogFetch(transactionId);
-
- /*
- * If it's marked aborted, it's aborted.
- */
- if (xidstatus == TRANSACTION_STATUS_ABORTED)
- return true;
-
- /*
- * If it's marked subcommitted, we have to check the parent recursively.
- * However, if it's older than TransactionXmin, we can't look at
- * pg_subtrans; instead assume that the parent crashed without cleaning up
- * its children.
- */
- if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
- {
- TransactionId parentXid;
-
- if (TransactionIdPrecedes(transactionId, TransactionXmin))
- return true;
- parentXid = SubTransGetParent(transactionId);
- if (!TransactionIdIsValid(parentXid))
- {
- /* see notes in TransactionIdDidCommit */
- elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
- transactionId);
- return true;
- }
- return TransactionIdDidAbort(parentXid);
- }
-
- /*
- * It's not aborted.
- */
- return false;
+ return (TransactionIdResolveStatus(transactionId, true) ==
+ TRANSACTION_STATUS_ABORTED);
}
/*
@@ -419,7 +391,7 @@ TransactionIdGetCommitLSN(TransactionId xid)
/*
* Get the transaction status.
*/
- (void) TransactionIdGetStatus(xid, &result);
+ (void) TransactionIdGetStatus(xid, &result, true);
return result;
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8dbcace3f9..a49126dba0 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1477,7 +1477,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
slotno = SimpleLruZeroPage(NotifyCtl, pageno);
else
slotno = SimpleLruReadPage(NotifyCtl, pageno, true,
- InvalidTransactionId);
+ InvalidTransactionId, true);
/* Note we mark the page dirty before writing in it */
NotifyCtl->shared->page_dirty[slotno] = true;
@@ -2010,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
* part of the page we will actually inspect.
*/
slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
- InvalidTransactionId);
+ InvalidTransactionId, true);
if (curpage == QUEUE_POS_PAGE(head))
{
/* we only want to read as far as head */
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 8a365b400c..6cf12e46f6 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -904,7 +904,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
slotno = SimpleLruZeroPage(SerialSlruCtl, targetPage);
}
else
- slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
+ slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid, true);
SerialValue(slotno, xid) = minConflictCommitSeqNo;
SerialSlruCtl->shared->page_dirty[slotno] = true;
@@ -946,7 +946,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
* but will return with that lock held, which must then be released.
*/
slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
- SerialPage(xid), xid);
+ SerialPage(xid), xid, true);
val = SerialValue(slotno, xid);
LWLockRelease(SerialSLRULock);
return val;
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index 6c840cbf29..cf299cd8f6 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -11,24 +11,11 @@
#ifndef CLOG_H
#define CLOG_H
+#include "access/clogdefs.h"
#include "access/xlogreader.h"
#include "storage/sync.h"
#include "lib/stringinfo.h"
-/*
- * Possible transaction statuses --- note that all-zeroes is the initial
- * state.
- *
- * A "subcommitted" transaction is a committed subtransaction whose parent
- * hasn't committed or aborted yet.
- */
-typedef int XidStatus;
-
-#define TRANSACTION_STATUS_IN_PROGRESS 0x00
-#define TRANSACTION_STATUS_COMMITTED 0x01
-#define TRANSACTION_STATUS_ABORTED 0x02
-#define TRANSACTION_STATUS_SUB_COMMITTED 0x03
-
typedef struct xl_clog_truncate
{
int pageno;
@@ -38,7 +25,8 @@ typedef struct xl_clog_truncate
extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
-extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
+extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn,
+ bool throwError);
extern Size CLOGShmemBuffers(void);
extern Size CLOGShmemSize(void);
diff --git a/src/include/access/clogdefs.h b/src/include/access/clogdefs.h
new file mode 100644
index 0000000000..0f9996bb08
--- /dev/null
+++ b/src/include/access/clogdefs.h
@@ -0,0 +1,33 @@
+/*
+ * clogdefs.h
+ *
+ * PostgreSQL transaction-commit-log manager
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/clogdefs.h
+ */
+#ifndef CLOGDEFS_H
+#define CLOGDEFS_H
+
+/*
+ * Possible transaction statuses --- note that all-zeroes is the initial
+ * state.
+ *
+ * A "subcommitted" transaction is a committed subtransaction whose parent
+ * hasn't committed or aborted yet.
+ *
+ * An "unknown" status indicates an error condition, such as when the clog has
+ * been erroneously truncated and the commit status of a transaction cannot be
+ * determined.
+ */
+typedef enum XidStatus {
+ TRANSACTION_STATUS_IN_PROGRESS = 0x00,
+ TRANSACTION_STATUS_COMMITTED = 0x01,
+ TRANSACTION_STATUS_ABORTED = 0x02,
+ TRANSACTION_STATUS_SUB_COMMITTED = 0x03,
+ TRANSACTION_STATUS_UNKNOWN = 0x04 /* error condition */
+} XidStatus;
+
+#endif /* CLOG_H */
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b39b43504d..0b6a5669d8 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -133,6 +133,8 @@ typedef struct SlruCtlData
typedef SlruCtlData *SlruCtl;
+#define InvalidSlotNo ((int) -1)
+
extern Size SimpleLruShmemSize(int nslots, int nlsns);
extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
@@ -140,9 +142,9 @@ extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
SyncRequestHandler sync_handler);
extern int SimpleLruZeroPage(SlruCtl ctl, int pageno);
extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
- TransactionId xid);
+ TransactionId xid, bool throwError);
extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
- TransactionId xid);
+ TransactionId xid, bool throwError);
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 2f1f144db4..7d5e2f614d 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -14,6 +14,7 @@
#ifndef TRANSAM_H
#define TRANSAM_H
+#include "access/clogdefs.h"
#include "access/xlogdefs.h"
@@ -264,6 +265,8 @@ extern PGDLLIMPORT VariableCache ShmemVariableCache;
/*
* prototypes for functions in transam/transam.c
*/
+extern XidStatus TransactionIdResolveStatus(TransactionId transactionId,
+ bool throwError);
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
--
2.21.1 (Apple Git-122.3)
v20-0004-Using-non-throwing-clog-interface-from-amcheck.patchapplication/octet-stream; name=v20-0004-Using-non-throwing-clog-interface-from-amcheck.patch; x-unix-mode=0644Download
From 810b8b881fae4574d654455b757c40c5622d4b6e Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 21 Oct 2020 20:26:43 -0700
Subject: [PATCH v20 4/5] Using non-throwing clog interface from amcheck
Converting the heap checking functions to use the recently introduced
non-throwing interface to clog when checking transaction commit status, and
adding corruption reports about missing clog rather than aborting.
---
contrib/amcheck/verify_heapam.c | 72 ++++++++-----
contrib/pg_amcheck/t/006_clog_truncation.pl | 111 ++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 -
3 files changed, 154 insertions(+), 30 deletions(-)
create mode 100644 contrib/pg_amcheck/t/006_clog_truncation.pl
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 0156c1e74a..a42b74ed46 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -10,6 +10,7 @@
*/
#include "postgres.h"
+#include "access/clogdefs.h"
#include "access/detoast.h"
#include "access/genam.h"
#include "access/heapam.h"
@@ -43,13 +44,6 @@ typedef enum XidBoundsViolation
XID_BOUNDS_OK
} XidBoundsViolation;
-typedef enum XidCommitStatus
-{
- XID_COMMITTED,
- XID_IN_PROGRESS,
- XID_ABORTED
-} XidCommitStatus;
-
typedef enum SkipPages
{
SKIP_PAGES_ALL_FROZEN,
@@ -83,7 +77,7 @@ typedef struct HeapCheckContext
* Cached copies of the most recently checked xid and its status.
*/
TransactionId cached_xid;
- XidCommitStatus cached_status;
+ XidStatus cached_status;
/* Values concerning the heap relation being checked */
Relation rel;
@@ -147,7 +141,7 @@ static XidBoundsViolation check_mxid_valid_in_rel(MultiXactId mxid,
HeapCheckContext *ctx);
static XidBoundsViolation get_xid_status(TransactionId xid,
HeapCheckContext *ctx,
- XidCommitStatus *status);
+ XidStatus *status);
/*
* Scan and report corruption in heap pages, optionally reconciling toasted
@@ -631,7 +625,7 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
else if (infomask & HEAP_MOVED_OFF ||
infomask & HEAP_MOVED_IN)
{
- XidCommitStatus status;
+ XidStatus status;
TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
switch (get_xid_status(xvac, ctx, &status))
@@ -666,17 +660,25 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
- case XID_COMMITTED:
- case XID_ABORTED:
+ case TRANSACTION_STATUS_COMMITTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ psprintf("old-style VACUUM FULL transaction ID %u transaction status is lost",
+ xvac));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
}
else
{
- XidCommitStatus status;
+ XidStatus status;
switch (get_xid_status(raw_xmin, ctx, &status))
{
@@ -708,12 +710,20 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_COMMITTED:
+ case TRANSACTION_STATUS_COMMITTED:
break;
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* insert or delete in progress */
- case XID_ABORTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ psprintf("raw xmin %u transaction status is lost",
+ raw_xmin));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
}
@@ -723,7 +733,7 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
{
if (infomask & HEAP_XMAX_IS_MULTI)
{
- XidCommitStatus status;
+ XidStatus status;
TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
switch (get_xid_status(xmax, ctx, &status))
@@ -757,12 +767,20 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
- case XID_COMMITTED:
- case XID_ABORTED:
+ case TRANSACTION_STATUS_COMMITTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_RECENTLY_DEAD or
* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ psprintf("xmax %u transaction status is lost",
+ xmax));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
@@ -1373,7 +1391,7 @@ check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
*/
static XidBoundsViolation
get_xid_status(TransactionId xid, HeapCheckContext *ctx,
- XidCommitStatus *status)
+ XidStatus *status)
{
XidBoundsViolation result;
FullTransactionId fxid;
@@ -1424,7 +1442,7 @@ get_xid_status(TransactionId xid, HeapCheckContext *ctx,
return result;
}
- *status = XID_COMMITTED;
+ *status = TRANSACTION_STATUS_COMMITTED;
LWLockAcquire(XactTruncationLock, LW_SHARED);
clog_horizon =
FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid,
@@ -1432,13 +1450,9 @@ get_xid_status(TransactionId xid, HeapCheckContext *ctx,
if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
{
if (TransactionIdIsCurrentTransactionId(xid))
- *status = XID_IN_PROGRESS;
- else if (TransactionIdDidCommit(xid))
- *status = XID_COMMITTED;
- else if (TransactionIdDidAbort(xid))
- *status = XID_ABORTED;
+ *status = TRANSACTION_STATUS_IN_PROGRESS;
else
- *status = XID_IN_PROGRESS;
+ *status = TransactionIdResolveStatus(xid, false);
}
LWLockRelease(XactTruncationLock);
ctx->cached_xid = xid;
diff --git a/contrib/pg_amcheck/t/006_clog_truncation.pl b/contrib/pg_amcheck/t/006_clog_truncation.pl
new file mode 100644
index 0000000000..f205ae7ede
--- /dev/null
+++ b/contrib/pg_amcheck/t/006_clog_truncation.pl
@@ -0,0 +1,111 @@
+# This regression test checks the behavior of the heap validation in the
+# presence of clog corruption.
+
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 3;
+
+my ($node, $pgdata, $clogdir);
+
+sub count_clog_files
+{
+ my $result = 0;
+ opendir(DIR, $clogdir) or die "Cannot opendir $clogdir: $!";
+ while (my $fname = readdir(DIR))
+ {
+ $result++ if (-f "$clogdir/$fname");
+ }
+ closedir(DIR);
+ return $result;
+}
+
+# Burn through enough xids that at least three clog files exists in pg_xact/
+sub create_three_clog_files
+{
+ print STDERR "Generating clog entries....\n";
+
+ $node->safe_psql('postgres', q(
+ CREATE PROCEDURE burn_xids ()
+ LANGUAGE plpgsql
+ AS $$
+ DECLARE
+ loopcnt BIGINT;
+ BEGIN
+ FOR loopcnt IN 1..32768
+ LOOP
+ PERFORM txid_current();
+ COMMIT;
+ END LOOP;
+ END;
+ $$;
+ ));
+
+ do {
+ $node->safe_psql('postgres', 'INSERT INTO test_0 (i) VALUES (0)');
+ $node->safe_psql('postgres', 'CALL burn_xids()');
+ print STDERR "Burned transaction ids...\n";
+ $node->safe_psql('postgres', 'INSERT INTO test_1 (i) VALUES (1)');
+ } while (count_clog_files() < 3);
+}
+
+# Of the clog files in pg_xact, remove the second one, sorted by name order.
+# This function, used along with create_three_clog_files(), is intended to
+# remove neither the newest nor the oldest clog file. Experimentation shows
+# that removing the newest clog file works ok, but for future-proofing, remove
+# one less likely to be checked at server startup.
+sub unlink_second_clog_file
+{
+ my @paths;
+ opendir(DIR, $clogdir) or die "Cannot opendir $clogdir: $!";
+ while (my $fname = readdir(DIR))
+ {
+ my $path = "$clogdir/$fname";
+ next unless -f $path;
+ push @paths, $path;
+ }
+ closedir(DIR);
+
+ my @ordered = sort { $a cmp $b } @paths;
+ unlink $ordered[1];
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we corrupt clog, autovacuum workers visiting tables
+# could crash the backend. Disable autovacuum so that won't happen.
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$pgdata = $node->data_dir;
+$clogdir = join('/', $pgdata, 'pg_xact');
+$node->start;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE TABLE test_0 (i INTEGER)");
+$node->safe_psql('postgres', "CREATE TABLE test_1 (i INTEGER)");
+$node->safe_psql('postgres', "VACUUM FREEZE");
+
+create_three_clog_files();
+
+# Corruptly delete a clog file
+$node->stop;
+unlink_second_clog_file();
+$node->start;
+
+my $port = $node->port;
+
+# Run pg_amcheck against the corrupt database, looking for clog related
+# corruption messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ qr/transaction status is lost/ ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2408bb2bf6..c1144c1a92 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2796,7 +2796,6 @@ XactCallbackItem
XactEvent
XactLockTableWaitInfo
XidBoundsViolation
-XidCommitStatus
XidHorizonPrefetchState
XidStatus
XmlExpr
--
2.21.1 (Apple Git-122.3)
v20-0005-Adding-ACL-checks-for-verify_heapam.patchapplication/octet-stream; name=v20-0005-Adding-ACL-checks-for-verify_heapam.patch; x-unix-mode=0644Download
From e8116cff947a4a7dedf1fcf9c7d0b4b87b796237 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 21 Oct 2020 20:27:23 -0700
Subject: [PATCH v20 5/5] Adding ACL checks for verify_heapam
Requiring select privileges tables scanned by verify_heapam, in
addition to the already required execute privileges on the function.
---
contrib/amcheck/expected/check_heap.out | 6 ++++++
contrib/amcheck/sql/check_heap.sql | 7 +++++++
contrib/amcheck/verify_heapam.c | 8 ++++++++
doc/src/sgml/pgamcheck.sgml | 2 +-
4 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
index 882f853d56..41cdc6435c 100644
--- a/contrib/amcheck/expected/check_heap.out
+++ b/contrib/amcheck/expected/check_heap.out
@@ -95,6 +95,12 @@ SELECT * FROM verify_heapam(relation := 'heaptest');
ERROR: permission denied for function verify_heapam
RESET ROLE;
GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are checked (error due to no select privileges on relation)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ERROR: permission denied for table heaptest
+RESET ROLE;
+GRANT SELECT ON heaptest TO regress_heaptest_role;
-- verify permissions are now sufficient
SET ROLE regress_heaptest_role;
SELECT * FROM verify_heapam(relation := 'heaptest');
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
index c10a25f21c..c8397a46f0 100644
--- a/contrib/amcheck/sql/check_heap.sql
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -41,6 +41,13 @@ RESET ROLE;
GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are checked (error due to no select privileges on relation)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+GRANT SELECT ON heaptest TO regress_heaptest_role;
+
-- verify permissions are now sufficient
SET ROLE regress_heaptest_role;
SELECT * FROM verify_heapam(relation := 'heaptest');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index a42b74ed46..2cb735f576 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -23,6 +23,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
@@ -434,6 +435,8 @@ verify_heapam(PG_FUNCTION_ARGS)
static void
sanity_check_relation(Relation rel)
{
+ AclResult aclresult;
+
if (rel->rd_rel->relkind != RELKIND_RELATION &&
rel->rd_rel->relkind != RELKIND_MATVIEW &&
rel->rd_rel->relkind != RELKIND_TOASTVALUE)
@@ -445,6 +448,11 @@ sanity_check_relation(Relation rel)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only heap AM is supported")));
+ aclresult = pg_class_aclcheck(rel->rd_id, GetUserId(), ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult,
+ get_relkind_objtype(rel->rd_rel->relkind),
+ RelationGetRelationName(rel));
}
/*
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
index 3e059e7753..fc36447dda 100644
--- a/doc/src/sgml/pgamcheck.sgml
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -19,7 +19,7 @@
connecting as a user with sufficient privileges to check tables and indexes.
Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
<function>bt_index_parent_check</function> and <function>verify_heapam</function>
- functions.
+ functions, and on having privileges to access the relations being checked.
</para>
<synopsis>
--
2.21.1 (Apple Git-122.3)
On Wed, Oct 21, 2020 at 11:45 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Done that way in the attached, which also include Robert's changes from v19 he posted earlier today.
Committed. Let's see what the buildfarm thinks.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Oct 22, 2020 at 8:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
Committed. Let's see what the buildfarm thinks.
It is mostly happy, but thorntail is not:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-22%2012%3A58%3A11
I thought that the problem might be related to the fact that thorntail
is using force_parallel_mode, but I tried that here and it did not
cause a failure. So my next guess is that it is related to the fact
that this is a sparc64 machine, but it's hard to tell, since none of
the other sparc64 critters have run yet. In any case I don't know why
that would cause a failure. The messages in the log aren't very
illuminating, unfortunately. :-(
Mark, any ideas what might cause specifically that set of tests to fail?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
The messages in the log aren't very
illuminating, unfortunately. :-(
Considering this is a TAP test, why in the world is it designed to hide
all details of any unexpected amcheck messages? Surely being able to
see what amcheck is saying would be helpful here.
IOW, don't have the tests abbreviate the module output with count(*),
but return the full thing, and then use a regex to see if you got what
was expected. If you didn't, the output will show what you did get.
regards, tom lane
On Thu, Oct 22, 2020 at 10:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Considering this is a TAP test, why in the world is it designed to hide
all details of any unexpected amcheck messages? Surely being able to
see what amcheck is saying would be helpful here.IOW, don't have the tests abbreviate the module output with count(*),
but return the full thing, and then use a regex to see if you got what
was expected. If you didn't, the output will show what you did get.
Yeah, that thought crossed my mind, too. But I'm not sure it would
help in the case of this particular failure, because I think the
problem is that we're expecting to get complaints and instead getting
none.
It might be good to change it anyway, though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
lapwing just spit up a possibly relevant issue:
ccache gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -Werror -fPIC -I. -I. -I../../src/include -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/et -c -o verify_heapam.o verify_heapam.c
verify_heapam.c: In function 'get_xid_status':
verify_heapam.c:1432:5: error: 'fxid.value' may be used uninitialized in this function [-Werror=maybe-uninitialized]
cc1: all warnings being treated as errors
On Oct 22, 2020, at 7:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 22, 2020 at 8:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
Committed. Let's see what the buildfarm thinks.
It is mostly happy, but thorntail is not:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-22%2012%3A58%3A11
I thought that the problem might be related to the fact that thorntail
is using force_parallel_mode, but I tried that here and it did not
cause a failure. So my next guess is that it is related to the fact
that this is a sparc64 machine, but it's hard to tell, since none of
the other sparc64 critters have run yet. In any case I don't know why
that would cause a failure. The messages in the log aren't very
illuminating, unfortunately. :-(Mark, any ideas what might cause specifically that set of tests to fail?
The code is correctly handling an uncorrupted table, but then more or less randomly failing some of the time when processing a corrupt table.
Tom identified a problem with an uninitialized variable. I'm putting together a new patch set to address it.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Oct 22, 2020, at 9:01 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
On Oct 22, 2020, at 7:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 22, 2020 at 8:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
Committed. Let's see what the buildfarm thinks.
It is mostly happy, but thorntail is not:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-22%2012%3A58%3A11
I thought that the problem might be related to the fact that thorntail
is using force_parallel_mode, but I tried that here and it did not
cause a failure. So my next guess is that it is related to the fact
that this is a sparc64 machine, but it's hard to tell, since none of
the other sparc64 critters have run yet. In any case I don't know why
that would cause a failure. The messages in the log aren't very
illuminating, unfortunately. :-(Mark, any ideas what might cause specifically that set of tests to fail?
The code is correctly handling an uncorrupted table, but then more or less randomly failing some of the time when processing a corrupt table.
Tom identified a problem with an uninitialized variable. I'm putting together a new patch set to address it.
The 0001 attached patch addresses the -Werror=maybe-uninitialized problem.
The 0002 attached patch addresses the test failures:
The failing test is designed to stop the server, create blunt force trauma to the heap and toast files through overwriting garbage bytes, restart the server, and verify that corruption is detected by amcheck's verify_heapam(). The exact trauma is intended to be the same on all platforms, in terms of the number of bytes written and the location in the file that it gets written, but owing to differences between platforms, by design the test does not expect a particular corruption message.
The test was overwriting far fewer bytes than I had intended, but since it was still sufficient to create corruption on the platforms where I tested, I failed to notice. It should do a more thorough job now.
Attachments:
v21-0001-Fixing-unitialized-variable-bug.patchapplication/octet-stream; name=v21-0001-Fixing-unitialized-variable-bug.patch; x-unix-mode=0644Download
From 967b1671a460affd54e0a67d5b707b79feca9f53 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Thu, 22 Oct 2020 08:53:11 -0700
Subject: [PATCH v21 1/2] Fixing unitialized variable bug.
---
contrib/amcheck/verify_heapam.c | 65 +++++++++++++++------------------
1 file changed, 30 insertions(+), 35 deletions(-)
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 0156c1e74a..2b0a0dec1d 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -1368,60 +1368,55 @@ check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
* truly been valid at that time.
*
* If the status argument is not NULL, and if and only if the transaction ID
- * appears to be valid in this relation, clog will be consulted and the commit
- * status argument will be set with the status of the transaction ID.
+ * appears to be valid in this relation, the status argument will be set with
+ * the commit status of the transaction ID.
*/
static XidBoundsViolation
get_xid_status(TransactionId xid, HeapCheckContext *ctx,
XidCommitStatus *status)
{
- XidBoundsViolation result;
FullTransactionId fxid;
FullTransactionId clog_horizon;
/* Quick check for special xids */
if (!TransactionIdIsValid(xid))
- result = XID_INVALID;
+ return XID_INVALID;
else if (xid == BootstrapTransactionId || xid == FrozenTransactionId)
- result = XID_BOUNDS_OK;
- else
{
- /* Check if the xid is within bounds */
- fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
- if (!fxid_in_cached_range(fxid, ctx))
- {
- /*
- * We may have been checking against stale values. Update the
- * cached range to be sure, and since we relied on the cached
- * range when we performed the full xid conversion, reconvert.
- */
- update_cached_xid_range(ctx);
- fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
- }
+ if (status != NULL)
+ *status = XID_COMMITTED;
+ return XID_BOUNDS_OK;
+ }
- if (FullTransactionIdPrecedesOrEquals(ctx->next_fxid, fxid))
- result = XID_IN_FUTURE;
- else if (FullTransactionIdPrecedes(fxid, ctx->oldest_fxid))
- result = XID_PRECEDES_CLUSTERMIN;
- else if (FullTransactionIdPrecedes(fxid, ctx->relfrozenfxid))
- result = XID_PRECEDES_RELMIN;
- else
- result = XID_BOUNDS_OK;
+ /* Check if the xid is within bounds */
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+ if (!fxid_in_cached_range(fxid, ctx))
+ {
+ /*
+ * We may have been checking against stale values. Update the
+ * cached range to be sure, and since we relied on the cached
+ * range when we performed the full xid conversion, reconvert.
+ */
+ update_cached_xid_range(ctx);
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
}
- /*
- * Early return if the caller does not request clog checking, or if the
- * xid is already known to be out of bounds. We dare not check clog for
- * out of bounds transaction IDs.
- */
- if (status == NULL || result != XID_BOUNDS_OK)
- return result;
+ if (FullTransactionIdPrecedesOrEquals(ctx->next_fxid, fxid))
+ return XID_IN_FUTURE;
+ if (FullTransactionIdPrecedes(fxid, ctx->oldest_fxid))
+ return XID_PRECEDES_CLUSTERMIN;
+ if (FullTransactionIdPrecedes(fxid, ctx->relfrozenfxid))
+ return XID_PRECEDES_RELMIN;
+
+ /* Early return if the caller does not request clog checking */
+ if (status == NULL)
+ return XID_BOUNDS_OK;;
/* Early return if we just checked this xid in a prior call */
if (xid == ctx->cached_xid)
{
*status = ctx->cached_status;
- return result;
+ return XID_BOUNDS_OK;
}
*status = XID_COMMITTED;
@@ -1443,5 +1438,5 @@ get_xid_status(TransactionId xid, HeapCheckContext *ctx,
LWLockRelease(XactTruncationLock);
ctx->cached_xid = xid;
ctx->cached_status = *status;
- return result;
+ return XID_BOUNDS_OK;
}
--
2.21.1 (Apple Git-122.3)
v21-0002-Fixing-sloppy-regression-test-coding.patchapplication/octet-stream; name=v21-0002-Fixing-sloppy-regression-test-coding.patch; x-unix-mode=0644Download
From 0413d6063ab04f3599dba1c423565a19dfd8b766 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Thu, 22 Oct 2020 11:29:23 -0700
Subject: [PATCH v21 2/2] Fixing sloppy regression test coding
---
contrib/amcheck/t/001_verify_heapam.pl | 27 +++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index e7526c17b8..9b9190db35 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -152,6 +152,20 @@ sub fresh_test_table
));
}
+sub syswrite_or_bail
+{
+ my ($fh, $contents, $length) = @_;
+
+ my $written = syswrite($fh, $contents, $length);
+
+ BAIL_OUT("syswrite failed: $!")
+ unless defined $written;
+ BAIL_OUT("short syswrite: $written bytes written, $length expected")
+ if ($written < $length);
+ BAIL_OUT("long syswrite: $written bytes written, $length expected")
+ if ($written > $length);
+}
+
# Stops the test node, corrupts the first page of the named relation, and
# restarts the node.
sub corrupt_first_page_internal
@@ -161,19 +175,22 @@ sub corrupt_first_page_internal
$node->stop;
my $fh;
- open($fh, '+<', $relpath);
+ open($fh, '+<', $relpath)
+ or BAIL_OUT("open failed: $!");
binmode $fh;
# If we corrupt the header, postgres won't allow the page into the buffer.
- syswrite($fh, '\xFF\xFF\xFF\xFF', 8) if ($corrupt_header);
+ syswrite_or_bail($fh, '\xFF' x 8, 8) if ($corrupt_header);
# Corrupt at least the line pointers. Exactly what this corrupts will
# depend on the page, as it may run past the line pointers into the user
# data. We stop short of writing 2048 bytes (2k), the smallest supported
# page size, as we don't want to corrupt the next page.
- seek($fh, 32, 0);
- syswrite($fh, '\x77\x77\x77\x77', 500);
- close($fh);
+ seek($fh, 32, 0)
+ or BAIL_OUT("seek failed: $!");
+ syswrite_or_bail($fh, '\x77' x 2000, 2000);
+ close($fh)
+ or BAIL_OUT("close failed: $!");;
$node->start;
}
--
2.21.1 (Apple Git-122.3)
On Thu, Oct 22, 2020 at 3:15 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
The 0001 attached patch addresses the -Werror=maybe-uninitialized problem.
I am skeptical. Why so much code churn to fix a compiler warning? And
even in the revised code, *status isn't set in all cases, so I don't
see why this would satisfy the compiler. Even if it satisfies this
particular compiler for some other reason, some other compiler is
bound to be unhappy sometime. It's better to just arrange to set
*status always, and use a dummy value in cases where it doesn't
matter. Also, "return XID_BOUNDS_OK;;" has exceeded its recommended
allowance of semicolons.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Oct 22, 2020, at 1:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Oct 22, 2020 at 3:15 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:The 0001 attached patch addresses the -Werror=maybe-uninitialized problem.
I am skeptical. Why so much code churn to fix a compiler warning? And
even in the revised code, *status isn't set in all cases, so I don't
see why this would satisfy the compiler. Even if it satisfies this
particular compiler for some other reason, some other compiler is
bound to be unhappy sometime. It's better to just arrange to set
*status always, and use a dummy value in cases where it doesn't
matter. Also, "return XID_BOUNDS_OK;;" has exceeded its recommended
allowance of semicolons.
I think the compiler warning was about fxid not being set. The callers pass NULL for status if they don't want status checked, so writing *status unconditionally would be an error. Also, if the xid being checked is out of bounds, we can't check the status of the xid in clog.
As for the code churn, I probably refactored it a bit more than I needed to fix the compiler warning about fxid, but that was because the old arrangement seemed to make it harder to reason about when and where fxid got set. I think that is more clear now.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
ooh, looks like prairiedog sees the problem too. That means I should be
able to reproduce it under a debugger, if you're not certain yet where
the problem lies.
regards, tom lane
... btw, having now looked more closely at get_xid_status(), I wonder
how come there aren't more compilers bitching about it, because it
is very very obviously broken. In particular, the case of
requesting status for an xid that is BootstrapTransactionId or
FrozenTransactionId *will* fall through to perform
FullTransactionIdPrecedesOrEquals with an uninitialized fxid.
The fact that most compilers seem to fail to notice that is quite scary.
I suppose it has something to do with FullTransactionId being a struct,
which makes me wonder if that choice was quite as wise as we thought.
Meanwhile, so far as this code goes, I wonder why you don't just change it
to always set that value, ie
XidBoundsViolation result;
FullTransactionId fxid;
FullTransactionId clog_horizon;
+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
+
/* Quick check for special xids */
if (!TransactionIdIsValid(xid))
result = XID_INVALID;
else if (xid == BootstrapTransactionId || xid == FrozenTransactionId)
result = XID_BOUNDS_OK;
else
{
/* Check if the xid is within bounds */
- fxid = FullTransactionIdFromXidAndCtx(xid, ctx);
if (!fxid_in_cached_range(fxid, ctx))
{
regards, tom lane
On Oct 22, 2020, at 1:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
ooh, looks like prairiedog sees the problem too. That means I should be
able to reproduce it under a debugger, if you're not certain yet where
the problem lies.
Thanks, Tom, but I question whether the regression test failures are from a problem in the verify_heapam.c code. I think they are a busted perl test. The test was supposed to corrupt the heap by overwriting a heap file with a large chunk of garbage, but in fact only wrote a small amount of garbage. The idea was to write about 2000 bytes starting at offset 32 in the page, in order to corrupt the line pointers, but owing to my incorrect use of syswrite in the perl test, that didn't happen.
I think the uninitialized variable warning is warning about a real problem in the c-code, but I have no reason to think that particular problem is causing this particular regression test failure.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Oct 22, 2020, at 1:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... btw, having now looked more closely at get_xid_status(), I wonder
how come there aren't more compilers bitching about it, because it
is very very obviously broken. In particular, the case of
requesting status for an xid that is BootstrapTransactionId or
FrozenTransactionId *will* fall through to perform
FullTransactionIdPrecedesOrEquals with an uninitialized fxid.The fact that most compilers seem to fail to notice that is quite scary.
I suppose it has something to do with FullTransactionId being a struct,
which makes me wonder if that choice was quite as wise as we thought.Meanwhile, so far as this code goes, I wonder why you don't just change it
to always set that value, ieXidBoundsViolation result;
FullTransactionId fxid;
FullTransactionId clog_horizon;+ fxid = FullTransactionIdFromXidAndCtx(xid, ctx); + /* Quick check for special xids */ if (!TransactionIdIsValid(xid)) result = XID_INVALID; else if (xid == BootstrapTransactionId || xid == FrozenTransactionId) result = XID_BOUNDS_OK; else { /* Check if the xid is within bounds */ - fxid = FullTransactionIdFromXidAndCtx(xid, ctx); if (!fxid_in_cached_range(fxid, ctx)) {
Yeah, I reached the same conclusion before submitting the fix upthread. I structured it a bit differently, but I believe fxid will now always get set before being used, though sometimes the function returns before doing either.
I had the same thought about compilers not catching that, too.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes:
On Oct 22, 2020, at 1:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
ooh, looks like prairiedog sees the problem too. That means I should be
able to reproduce it under a debugger, if you're not certain yet where
the problem lies.
Thanks, Tom, but I question whether the regression test failures are from a problem in the verify_heapam.c code. I think they are a busted perl test. The test was supposed to corrupt the heap by overwriting a heap file with a large chunk of garbage, but in fact only wrote a small amount of garbage. The idea was to write about 2000 bytes starting at offset 32 in the page, in order to corrupt the line pointers, but owing to my incorrect use of syswrite in the perl test, that didn't happen.
Hm, but why are we seeing the failure only on specific machine
architectures? sparc64 and ppc32 is a weird pairing, too.
regards, tom lane
On Oct 22, 2020, at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
On Oct 22, 2020, at 1:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
ooh, looks like prairiedog sees the problem too. That means I should be
able to reproduce it under a debugger, if you're not certain yet where
the problem lies.Thanks, Tom, but I question whether the regression test failures are from a problem in the verify_heapam.c code. I think they are a busted perl test. The test was supposed to corrupt the heap by overwriting a heap file with a large chunk of garbage, but in fact only wrote a small amount of garbage. The idea was to write about 2000 bytes starting at offset 32 in the page, in order to corrupt the line pointers, but owing to my incorrect use of syswrite in the perl test, that didn't happen.
Hm, but why are we seeing the failure only on specific machine
architectures? sparc64 and ppc32 is a weird pairing, too.
It is seeking to position 32 and writing '\x77\x77\x77\x77'. x86_64 is little-endian, and ppc32 and sparc64 are both big-endian, right?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes:
On Oct 22, 2020, at 1:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hm, but why are we seeing the failure only on specific machine
architectures? sparc64 and ppc32 is a weird pairing, too.
It is seeking to position 32 and writing '\x77\x77\x77\x77'. x86_64 is
little-endian, and ppc32 and sparc64 are both big-endian, right?
They are, but that should not meaningfully affect the results of
that corruption step. You zapped only one line pointer not
several, but it would look the same regardless of endiannness.
I find it more plausible that we might see the bad effects of
the uninitialized variable only on those arches --- but that
theory is still pretty shaky, since you'd think compiler
choices about register or stack-location assignment would
be the controlling factor, and those should be all over the
map.
regards, tom lane
I wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
It is seeking to position 32 and writing '\x77\x77\x77\x77'. x86_64 is
little-endian, and ppc32 and sparc64 are both big-endian, right?
They are, but that should not meaningfully affect the results of
that corruption step. You zapped only one line pointer not
several, but it would look the same regardless of endiannness.
Oh, wait a second. ItemIdData has the flag bits in the middle:
typedef struct ItemIdData
{
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of line pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;
meaning that for that particular bit pattern, one endianness
is going to see the flags as 01 (LP_NORMAL) and the other as 10
(LP_REDIRECT). The offset/len are corrupt either way, but
I'd certainly expect that amcheck would produce different
complaints about those two cases. So it's unsurprising if
this test case's output is endian-dependent.
regards, tom lane
On Oct 22, 2020, at 2:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
It is seeking to position 32 and writing '\x77\x77\x77\x77'. x86_64 is
little-endian, and ppc32 and sparc64 are both big-endian, right?They are, but that should not meaningfully affect the results of
that corruption step. You zapped only one line pointer not
several, but it would look the same regardless of endiannness.Oh, wait a second. ItemIdData has the flag bits in the middle:
typedef struct ItemIdData
{
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of line pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;meaning that for that particular bit pattern, one endianness
is going to see the flags as 01 (LP_NORMAL) and the other as 10
(LP_REDIRECT). The offset/len are corrupt either way, but
I'd certainly expect that amcheck would produce different
complaints about those two cases. So it's unsurprising if
this test case's output is endian-dependent.
Yeah, I'm already looking at that. The logic in verify_heapam skips over line pointers that are unused or dead, and the test is reporting zero corruption (and complaining about that), so it's probably not going to help to overwrite all the line pointers with this particular bit pattern any more than to just overwrite the first one, as it would just skip them all.
I think the test should overwrite the line pointers with a variety of different bit patterns, or one calculated to work on all platforms. I'll have to write that up.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Oct 22, 2020, at 2:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
It is seeking to position 32 and writing '\x77\x77\x77\x77'. x86_64 is
little-endian, and ppc32 and sparc64 are both big-endian, right?They are, but that should not meaningfully affect the results of
that corruption step. You zapped only one line pointer not
several, but it would look the same regardless of endiannness.Oh, wait a second. ItemIdData has the flag bits in the middle:
typedef struct ItemIdData
{
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of line pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;meaning that for that particular bit pattern, one endianness
is going to see the flags as 01 (LP_NORMAL) and the other as 10
(LP_REDIRECT). The offset/len are corrupt either way, but
I'd certainly expect that amcheck would produce different
complaints about those two cases. So it's unsurprising if
this test case's output is endian-dependent.
Well, the issue is that on big-endian machines it is not reporting any corruption at all. Are you sure the difference will be LP_NORMAL vs LP_REDIRECT? I was thinking it was LP_DEAD vs LP_REDIRECT, as the little endian platforms are seeing corruption messages about bad redirect line pointers, and the big-endian are apparently skipping over the line pointer entirely, which makes sense if it is LP_DEAD but not if it is LP_NORMAL. It would also skip over LP_UNUSED, but I don't see how that could be stored in lp_flags, because 0x77 is going to either be 01110111 or 11101110, and in neither case do you get two zeros adjacent, but you could get two ones adjacent. (LP_UNUSED = binary 00 and LP_DEAD = binary 11)
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes:
Yeah, I'm already looking at that. The logic in verify_heapam skips over line pointers that are unused or dead, and the test is reporting zero corruption (and complaining about that), so it's probably not going to help to overwrite all the line pointers with this particular bit pattern any more than to just overwrite the first one, as it would just skip them all.
I think the test should overwrite the line pointers with a variety of different bit patterns, or one calculated to work on all platforms. I'll have to write that up.
What we need here is to produce the same test results on either
endianness. So probably the thing to do is apply the equivalent
of ntohl() to produce a string that looks right for either host
endianness. As a separate matter, you'd want to test corruption
producing any of the four flag bitpatterns, probably.
It says here you can use Perl's pack/unpack functions to get
the equivalent of ntohl(), but I've not troubled to work out how.
regards, tom lane
On Thu, Oct 22, 2020 at 5:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
Committed. Let's see what the buildfarm thinks.
This is great work. Thanks Mark and Robert.
--
Peter Geoghegan
On Oct 22, 2020, at 2:26 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Oct 22, 2020 at 5:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
Committed. Let's see what the buildfarm thinks.
This is great work. Thanks Mark and Robert.
That's the first time I've laughed today. Having turned the build-farm red, this is quite ironic feedback! Thanks all the same for the sentiment.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Oct 22, 2020 at 2:39 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
This is great work. Thanks Mark and Robert.
That's the first time I've laughed today. Having turned the build-farm red, this is quite ironic feedback! Thanks all the same for the sentiment.
Breaking the buildfarm is not a capital offense. Especially when it
happens with patches that are in some sense low level and/or novel,
and therefore inherently more likely to cause trouble.
--
Peter Geoghegan
On Thu, Oct 22, 2020 at 4:04 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I think the compiler warning was about fxid not being set. The callers pass NULL for status if they don't want status checked, so writing *status unconditionally would be an error. Also, if the xid being checked is out of bounds, we can't check the status of the xid in clog.
Sorry, you're (partly) right. The new logic is a lot more clear that
we never used that uninitialized.
I'll remove the extra semi-colon and commit this.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes:
On Oct 22, 2020, at 2:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Oh, wait a second. ItemIdData has the flag bits in the middle:
meaning that for that particular bit pattern, one endianness
is going to see the flags as 01 (LP_NORMAL) and the other as 10
(LP_REDIRECT).
Well, the issue is that on big-endian machines it is not reporting any
corruption at all. Are you sure the difference will be LP_NORMAL vs
LP_REDIRECT?
[ thinks a bit harder... ] Probably not. The byte/bit string looks
the same either way, given that it's four repetitions of the same
byte value. But which field is which will differ: we have either
oooooooooooooooFFlllllllllllllll
01110111011101110111011101110111
or
lllllllllllllllFFooooooooooooooo
01110111011101110111011101110111
So now I think this is a REDIRECT on either architecture, but the
offset and length fields have different values, causing the redirect
pointer to point to different places. Maybe it happens to point
at a DEAD tuple in the big-endian case.
regards, tom lane
I wrote:
So now I think this is a REDIRECT on either architecture, but the
offset and length fields have different values, causing the redirect
pointer to point to different places. Maybe it happens to point
at a DEAD tuple in the big-endian case.
Just to make sure, I tried this test program:
#include <stdio.h>
#include <string.h>
typedef struct ItemIdData
{
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of line pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;
int main()
{
ItemIdData lp;
memset(&lp, 0x77, sizeof(lp));
printf("off = %x, flags = %x, len = %x\n",
lp.lp_off, lp.lp_flags, lp.lp_len);
return 0;
}
I get
off = 7777, flags = 2, len = 3bbb
on a little-endian machine, and
off = 3bbb, flags = 2, len = 7777
on big-endian. It'd be less symmetric if the bytes weren't
all the same ...
regards, tom lane
I wrote:
I get
off = 7777, flags = 2, len = 3bbb
on a little-endian machine, and
off = 3bbb, flags = 2, len = 7777
on big-endian. It'd be less symmetric if the bytes weren't
all the same ...
... but given that this is the test value we are using, why
don't both endiannesses whine about a non-maxalign'd offset?
The code really shouldn't even be trying to follow these
redirects, because we risk SIGBUS on picky architectures.
regards, tom lane
On Oct 22, 2020, at 6:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
So now I think this is a REDIRECT on either architecture, but the
offset and length fields have different values, causing the redirect
pointer to point to different places. Maybe it happens to point
at a DEAD tuple in the big-endian case.Just to make sure, I tried this test program:
#include <stdio.h>
#include <string.h>typedef struct ItemIdData
{
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of line pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;int main()
{
ItemIdData lp;memset(&lp, 0x77, sizeof(lp));
printf("off = %x, flags = %x, len = %x\n",
lp.lp_off, lp.lp_flags, lp.lp_len);
return 0;
}I get
off = 7777, flags = 2, len = 3bbb
on a little-endian machine, and
off = 3bbb, flags = 2, len = 7777
on big-endian. It'd be less symmetric if the bytes weren't
all the same ...
I think we're going in the wrong direction here. The idea behind this test was to have as little knowledge about the layout of pages as possible and still verify that damaging the pages would result in corruption reports. Of course, not all damage will result in corruption reports, because some damage looks legit. I think it was just luck (good or bad depending on your perspective) that the damage in the test as committed works on little-endian but not big-endian.
I can embed this knowledge that you have researched into the test if you want me to, but my instinct is to go the other direction and have even less knowledge about pages in the test. That would work if instead of expecting corruption for every time the test writes the file, instead to have it just make sure that it gets corruption reports at least some of the times that it does so. That seems more maintainable long term.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Oct 22, 2020, at 6:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
I get
off = 7777, flags = 2, len = 3bbb
on a little-endian machine, and
off = 3bbb, flags = 2, len = 7777
on big-endian. It'd be less symmetric if the bytes weren't
all the same ...... but given that this is the test value we are using, why
don't both endiannesses whine about a non-maxalign'd offset?
The code really shouldn't even be trying to follow these
redirects, because we risk SIGBUS on picky architectures.
Ahh, crud. It's because
syswrite($fh, '\x77\x77\x77\x77', 500)
is wrong twice. The 500 was wrong, but the string there isn't the bit pattern we want -- it's just a string literal with backslashes and such. It should have been double-quoted.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Oct 22, 2020, at 6:50 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
On Oct 22, 2020, at 6:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
I get
off = 7777, flags = 2, len = 3bbb
on a little-endian machine, and
off = 3bbb, flags = 2, len = 7777
on big-endian. It'd be less symmetric if the bytes weren't
all the same ...... but given that this is the test value we are using, why
don't both endiannesses whine about a non-maxalign'd offset?
The code really shouldn't even be trying to follow these
redirects, because we risk SIGBUS on picky architectures.Ahh, crud. It's because
syswrite($fh, '\x77\x77\x77\x77', 500)
is wrong twice. The 500 was wrong, but the string there isn't the bit pattern we want -- it's just a string literal with backslashes and such. It should have been double-quoted.
The reason this never came up in testing is what I was talking about elsewhere -- this test isn't designed to create *specific* corruptions. It's just supposed to corrupt the table in some random way. For whatever reasons I'm not too curious about, that string corrupts on little endian machines but not big endian machines. If we want to have a test that tailors very specific corruptions, I don't think the way to get there is by debugging this test.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Mark Dilger <mark.dilger@enterprisedb.com> writes:
Ahh, crud. It's because
syswrite($fh, '\x77\x77\x77\x77', 500)
is wrong twice. The 500 was wrong, but the string there isn't the bit pattern we want -- it's just a string literal with backslashes and such. It should have been double-quoted.
Argh. So we really have, using same test except
memcpy(&lp, "\\x77", sizeof(lp));
little endian: off = 785c, flags = 2, len = 1b9b
big endian: off = 2e3c, flags = 0, len = 3737
which explains the apparent LP_DEAD result.
I'm not particularly on board with your suggestion of "well, if it works
sometimes then it's okay". Then we have no idea of what we really tested.
regards, tom lane
On Oct 22, 2020, at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
Ahh, crud. It's because
syswrite($fh, '\x77\x77\x77\x77', 500)
is wrong twice. The 500 was wrong, but the string there isn't the bit pattern we want -- it's just a string literal with backslashes and such. It should have been double-quoted.Argh. So we really have, using same test except
memcpy(&lp, "\\x77", sizeof(lp));
little endian: off = 785c, flags = 2, len = 1b9b
big endian: off = 2e3c, flags = 0, len = 3737which explains the apparent LP_DEAD result.
I'm not particularly on board with your suggestion of "well, if it works
sometimes then it's okay". Then we have no idea of what we really tested.regards, tom lane
Ok, I've pruned it down to something you may like better. Instead of just checking that *some* corruption occurs, it checks the returned corruption against an expected regex, and if it fails to match, you should see in the logs what you got vs. what you expected.
It only corrupts the first two line pointers, the first one with 0x77777777 and the second one with 0xAAAAAAAA, which are consciously chosen to be bitwise reverses of each other and just strings of alternating bits rather than anything that could have a more complicated interpretation.
On my little-endian mac, the 0x77777777 value creates a line pointer which redirects to an invalid offset 0x7777, which gets reported as decimal 30583 in the corruption report, "line pointer redirection to item at offset 30583 exceeds maximum offset 38". The test is indifferent to whether the corruption it is looking for is reported relative to the first line pointer or the second one, so if endian-ness matters, it may be the 0xAAAAAAAA that results in that corruption message. I don't have a machine handy to test that. It would be nice to determine the minimum amount of paranoia necessary to make this portable and not commit the rest.
Attachments:
regress.patch.WIPapplication/octet-stream; name=regress.patch.WIP; x-unix-mode=0644Download
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index e7526c17b8..9d93ad197c 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -4,7 +4,7 @@ use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 65;
+use Test::More tests => 55;
my ($node, $result);
@@ -28,19 +28,19 @@ check_all_options_uncorrupted('test', 'plain');
#
fresh_test_table('test');
corrupt_first_page('test');
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test')",
"plain corrupted table");
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test', skip := 'all-visible')",
"plain corrupted table skipping all-visible");
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test', skip := 'all-frozen')",
"plain corrupted table skipping all-frozen");
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test', check_toast := false)",
"plain corrupted table skipping toast");
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test', startblock := 0, endblock := 0)",
"plain corrupted table checking only block zero");
@@ -50,71 +50,13 @@ detects_corruption(
fresh_test_table('test');
$node->safe_psql('postgres', q(VACUUM FREEZE test));
corrupt_first_page('test');
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test')",
"all-frozen corrupted table");
detects_no_corruption(
"verify_heapam('test', skip := 'all-frozen')",
"all-frozen corrupted table skipping all-frozen");
-#
-# Check a corrupt table with corrupt page header
-#
-fresh_test_table('test');
-corrupt_first_page_and_header('test');
-detects_corruption(
- "verify_heapam('test')",
- "corrupted test table with bad page header");
-
-#
-# Check an uncorrupted table with corrupt toast page header
-#
-fresh_test_table('test');
-my $toast = get_toast_for('test');
-corrupt_first_page_and_header($toast);
-detects_corruption(
- "verify_heapam('test', check_toast := true)",
- "table with corrupted toast page header checking toast");
-detects_no_corruption(
- "verify_heapam('test', check_toast := false)",
- "table with corrupted toast page header skipping toast");
-detects_corruption(
- "verify_heapam('$toast')",
- "corrupted toast page header");
-
-#
-# Check an uncorrupted table with corrupt toast
-#
-fresh_test_table('test');
-$toast = get_toast_for('test');
-corrupt_first_page($toast);
-detects_corruption(
- "verify_heapam('test', check_toast := true)",
- "table with corrupted toast checking toast");
-detects_no_corruption(
- "verify_heapam('test', check_toast := false)",
- "table with corrupted toast skipping toast");
-detects_corruption(
- "verify_heapam('$toast')",
- "corrupted toast table");
-
-#
-# Check an uncorrupted all-frozen table with corrupt toast
-#
-fresh_test_table('test');
-$node->safe_psql('postgres', q(VACUUM FREEZE test));
-$toast = get_toast_for('test');
-corrupt_first_page($toast);
-detects_corruption(
- "verify_heapam('test', check_toast := true)",
- "all-frozen table with corrupted toast checking toast");
-detects_no_corruption(
- "verify_heapam('test', check_toast := false)",
- "all-frozen table with corrupted toast skipping toast");
-detects_corruption(
- "verify_heapam('$toast')",
- "corrupted toast table of all-frozen table");
-
# Returns the filesystem path for the named relation.
sub relation_filepath
{
@@ -154,46 +96,43 @@ sub fresh_test_table
# Stops the test node, corrupts the first page of the named relation, and
# restarts the node.
-sub corrupt_first_page_internal
+sub corrupt_first_page
{
- my ($relname, $corrupt_header) = @_;
+ my ($relname) = @_;
my $relpath = relation_filepath($relname);
$node->stop;
my $fh;
- open($fh, '+<', $relpath);
+ open($fh, '+<', $relpath)
+ or BAIL_OUT("open failed: $!");;
binmode $fh;
- # If we corrupt the header, postgres won't allow the page into the buffer.
- syswrite($fh, '\xFF\xFF\xFF\xFF', 8) if ($corrupt_header);
-
- # Corrupt at least the line pointers. Exactly what this corrupts will
- # depend on the page, as it may run past the line pointers into the user
- # data. We stop short of writing 2048 bytes (2k), the smallest supported
- # page size, as we don't want to corrupt the next page.
- seek($fh, 32, 0);
- syswrite($fh, '\x77\x77\x77\x77', 500);
- close($fh);
+ # Corrupt the first two line pointers. To be stable across platforms,
+ # we use 0x77777777 and 0xAAAAAAAA for the first two, which are bitwise
+ # reverses of each other.
+ seek($fh, 24, 0)
+ or BAIL_OUT("seek failed: $!");;
+ syswrite($fh, pack("L*", 0x77777777, 0xAAAAAAAA))
+ or BAIL_OUT("syswrite failed: $!");
+ close($fh)
+ or BAIL_OUT("close failed: $!");
$node->start;
}
-sub corrupt_first_page
-{
- corrupt_first_page_internal($_[0], undef);
-}
-
-sub corrupt_first_page_and_header
+sub detects_heap_corruption
{
- corrupt_first_page_internal($_[0], 1);
+ my ($function, $testname) = @_;
+ detects_corruption($function, $testname,
+ qr/line pointer redirection to item at offset \d+ exceeds maximum offset \d+/);
}
sub detects_corruption
{
- my ($function, $testname) = @_;
+ my ($function, $testname, $re) = @_;
my $result = $node->safe_psql('postgres',
- qq(SELECT COUNT(*) > 0 FROM $function));
- is($result, 't', $testname);
+ qq(SELECT * FROM $function));
+ like($result, $re, $testname);
}
sub detects_no_corruption
@@ -201,8 +140,8 @@ sub detects_no_corruption
my ($function, $testname) = @_;
my $result = $node->safe_psql('postgres',
- qq(SELECT COUNT(*) = 0 FROM $function));
- is($result, 't', $testname);
+ qq(SELECT * FROM $function));
+ is($result, '', $testname);
}
# Check various options are stable (don't abort) and do not report corruption
On Oct 22, 2020, at 9:21 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
On Oct 22, 2020, at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
Ahh, crud. It's because
syswrite($fh, '\x77\x77\x77\x77', 500)
is wrong twice. The 500 was wrong, but the string there isn't the bit pattern we want -- it's just a string literal with backslashes and such. It should have been double-quoted.Argh. So we really have, using same test except
memcpy(&lp, "\\x77", sizeof(lp));
little endian: off = 785c, flags = 2, len = 1b9b
big endian: off = 2e3c, flags = 0, len = 3737which explains the apparent LP_DEAD result.
I'm not particularly on board with your suggestion of "well, if it works
sometimes then it's okay". Then we have no idea of what we really tested.regards, tom lane
Ok, I've pruned it down to something you may like better. Instead of just checking that *some* corruption occurs, it checks the returned corruption against an expected regex, and if it fails to match, you should see in the logs what you got vs. what you expected.
It only corrupts the first two line pointers, the first one with 0x77777777 and the second one with 0xAAAAAAAA, which are consciously chosen to be bitwise reverses of each other and just strings of alternating bits rather than anything that could have a more complicated interpretation.
On my little-endian mac, the 0x77777777 value creates a line pointer which redirects to an invalid offset 0x7777, which gets reported as decimal 30583 in the corruption report, "line pointer redirection to item at offset 30583 exceeds maximum offset 38". The test is indifferent to whether the corruption it is looking for is reported relative to the first line pointer or the second one, so if endian-ness matters, it may be the 0xAAAAAAAA that results in that corruption message. I don't have a machine handy to test that. It would be nice to determine the minimum amount of paranoia necessary to make this portable and not commit the rest.
Obviously, that should have said 0x55555555 and 0xAAAAAAAA. After writing the patch that way, I checked that the old value 0x77777777 also works on my mac, which it does, and checked that writing the line pointers starting at offset 24 rather than 32 works on my mac, which it does, and then went on to write this rather confused email and attached the patch with those changes, which all work (at least on my mac) but are potentially less portable than what I had before testing those changes.
I apologize for any confusion my email from last night may have caused.
The patch I *should* have attached last night this time:
Attachments:
regress.patch.WIP.2application/octet-stream; name=regress.patch.WIP.2; x-unix-mode=0644Download
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index e7526c17b8..19278f4ef2 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -4,7 +4,7 @@ use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 65;
+use Test::More tests => 55;
my ($node, $result);
@@ -28,19 +28,19 @@ check_all_options_uncorrupted('test', 'plain');
#
fresh_test_table('test');
corrupt_first_page('test');
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test')",
"plain corrupted table");
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test', skip := 'all-visible')",
"plain corrupted table skipping all-visible");
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test', skip := 'all-frozen')",
"plain corrupted table skipping all-frozen");
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test', check_toast := false)",
"plain corrupted table skipping toast");
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test', startblock := 0, endblock := 0)",
"plain corrupted table checking only block zero");
@@ -50,71 +50,13 @@ detects_corruption(
fresh_test_table('test');
$node->safe_psql('postgres', q(VACUUM FREEZE test));
corrupt_first_page('test');
-detects_corruption(
+detects_heap_corruption(
"verify_heapam('test')",
"all-frozen corrupted table");
detects_no_corruption(
"verify_heapam('test', skip := 'all-frozen')",
"all-frozen corrupted table skipping all-frozen");
-#
-# Check a corrupt table with corrupt page header
-#
-fresh_test_table('test');
-corrupt_first_page_and_header('test');
-detects_corruption(
- "verify_heapam('test')",
- "corrupted test table with bad page header");
-
-#
-# Check an uncorrupted table with corrupt toast page header
-#
-fresh_test_table('test');
-my $toast = get_toast_for('test');
-corrupt_first_page_and_header($toast);
-detects_corruption(
- "verify_heapam('test', check_toast := true)",
- "table with corrupted toast page header checking toast");
-detects_no_corruption(
- "verify_heapam('test', check_toast := false)",
- "table with corrupted toast page header skipping toast");
-detects_corruption(
- "verify_heapam('$toast')",
- "corrupted toast page header");
-
-#
-# Check an uncorrupted table with corrupt toast
-#
-fresh_test_table('test');
-$toast = get_toast_for('test');
-corrupt_first_page($toast);
-detects_corruption(
- "verify_heapam('test', check_toast := true)",
- "table with corrupted toast checking toast");
-detects_no_corruption(
- "verify_heapam('test', check_toast := false)",
- "table with corrupted toast skipping toast");
-detects_corruption(
- "verify_heapam('$toast')",
- "corrupted toast table");
-
-#
-# Check an uncorrupted all-frozen table with corrupt toast
-#
-fresh_test_table('test');
-$node->safe_psql('postgres', q(VACUUM FREEZE test));
-$toast = get_toast_for('test');
-corrupt_first_page($toast);
-detects_corruption(
- "verify_heapam('test', check_toast := true)",
- "all-frozen table with corrupted toast checking toast");
-detects_no_corruption(
- "verify_heapam('test', check_toast := false)",
- "all-frozen table with corrupted toast skipping toast");
-detects_corruption(
- "verify_heapam('$toast')",
- "corrupted toast table of all-frozen table");
-
# Returns the filesystem path for the named relation.
sub relation_filepath
{
@@ -154,46 +96,43 @@ sub fresh_test_table
# Stops the test node, corrupts the first page of the named relation, and
# restarts the node.
-sub corrupt_first_page_internal
+sub corrupt_first_page
{
- my ($relname, $corrupt_header) = @_;
+ my ($relname) = @_;
my $relpath = relation_filepath($relname);
$node->stop;
my $fh;
- open($fh, '+<', $relpath);
+ open($fh, '+<', $relpath)
+ or BAIL_OUT("open failed: $!");;
binmode $fh;
- # If we corrupt the header, postgres won't allow the page into the buffer.
- syswrite($fh, '\xFF\xFF\xFF\xFF', 8) if ($corrupt_header);
-
- # Corrupt at least the line pointers. Exactly what this corrupts will
- # depend on the page, as it may run past the line pointers into the user
- # data. We stop short of writing 2048 bytes (2k), the smallest supported
- # page size, as we don't want to corrupt the next page.
- seek($fh, 32, 0);
- syswrite($fh, '\x77\x77\x77\x77', 500);
- close($fh);
+ # Corrupt two line pointers. To be stable across platforms, we use
+ # 0x55555555 and 0xAAAAAAAA for the two, which are bitwise reverses of each
+ # other.
+ seek($fh, 32, 0)
+ or BAIL_OUT("seek failed: $!");;
+ syswrite($fh, pack("L*", 0x55555555, 0xAAAAAAAA))
+ or BAIL_OUT("syswrite failed: $!");
+ close($fh)
+ or BAIL_OUT("close failed: $!");
$node->start;
}
-sub corrupt_first_page
-{
- corrupt_first_page_internal($_[0], undef);
-}
-
-sub corrupt_first_page_and_header
+sub detects_heap_corruption
{
- corrupt_first_page_internal($_[0], 1);
+ my ($function, $testname) = @_;
+ detects_corruption($function, $testname,
+ qr/line pointer redirection to item at offset \d+ exceeds maximum offset \d+/);
}
sub detects_corruption
{
- my ($function, $testname) = @_;
+ my ($function, $testname, $re) = @_;
my $result = $node->safe_psql('postgres',
- qq(SELECT COUNT(*) > 0 FROM $function));
- is($result, 't', $testname);
+ qq(SELECT * FROM $function));
+ like($result, $re, $testname);
}
sub detects_no_corruption
@@ -201,8 +140,8 @@ sub detects_no_corruption
my ($function, $testname) = @_;
my $result = $node->safe_psql('postgres',
- qq(SELECT COUNT(*) = 0 FROM $function));
- is($result, 't', $testname);
+ qq(SELECT * FROM $function));
+ is($result, '', $testname);
}
# Check various options are stable (don't abort) and do not report corruption
Mark Dilger <mark.dilger@enterprisedb.com> writes:
The patch I *should* have attached last night this time:
Thanks, I'll do some big-endian testing with this.
regards, tom lane
I wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
The patch I *should* have attached last night this time:
Thanks, I'll do some big-endian testing with this.
Seems to work, so I pushed it (after some compulsive fooling
about with whitespace and perltidy-ing). It appears to me that
the code coverage for verify_heapam.c is not very good though,
only circa 50%. Do we care to expend more effort on that?
regards, tom lane
On Oct 23, 2020, at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
The patch I *should* have attached last night this time:
Thanks, I'll do some big-endian testing with this.
Seems to work, so I pushed it (after some compulsive fooling
about with whitespace and perltidy-ing).
Thanks for all the help!
It appears to me that
the code coverage for verify_heapam.c is not very good though,
only circa 50%. Do we care to expend more effort on that?
Part of the issue here is that I developed the heapcheck code as a sequence of patches, and there is much greater coverage in the tests in the 0002 patch, which hasn't been committed yet. (Nor do I know that it ever will be.) Over time, the patch set became:
0001 -- adds verify_heapam.c to contrib/amcheck, with basic test coverage
0002 -- adds pg_amcheck command line interface to contrib/pg_amcheck, with more extensive test coverage
0003 -- creates a non-throwing interface to clog
0004 -- uses the non-throwing clog interface from within verify_heapam
0005 -- adds some controversial acl checks to verify_heapam
Your question doesn't have much to do with 3,4,5 above, but it definitely matters whether we're going to commit 0002. The test in that patch, in contrib/pg_amcheck/t/004_verify_heapam.pl, does quite a bit of bit twiddling of heap tuples and toast records and checks that the right corruption messages come back. Part of the reason I was trying to keep 0001's t/001_verify_heapam.pl test ignorant of the exact page layout information is that I already had this other test that covers that.
So, should I port that test from (currently non-existant) contrib/pg_amcheck into contrib/amcheck, or should we wait to see if the 0002 patch is going to get committed?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hmm, we're not out of the woods yet: thorntail is even less happy
than before.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-23%2018%3A08%3A11
I do not have 64-bit big-endian hardware to play with unfortunately.
But what I suspect is happening here is less about endianness and
more about alignment pickiness; or maybe we were unlucky enough to
index off the end of the shmem segment. I see that verify_heapam
does this for non-redirect tuples:
/* Set up context information about this next tuple */
ctx.lp_len = ItemIdGetLength(ctx.itemid);
ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
with absolutely no thought for the possibility that lp_off is out of
range or not maxaligned. The checks for a sane lp_len seem to have
gone missing as well.
regards, tom lane
On Fri, Oct 23, 2020 at 11:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
/* Set up context information about this next tuple */
ctx.lp_len = ItemIdGetLength(ctx.itemid);
ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);with absolutely no thought for the possibility that lp_off is out of
range or not maxaligned. The checks for a sane lp_len seem to have
gone missing as well.
That is surprising. verify_nbtree.c has PageGetItemIdCareful() for
this exact reason.
--
Peter Geoghegan
On Oct 23, 2020, at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hmm, we're not out of the woods yet: thorntail is even less happy
than before.https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-23%2018%3A08%3A11
I do not have 64-bit big-endian hardware to play with unfortunately.
But what I suspect is happening here is less about endianness and
more about alignment pickiness; or maybe we were unlucky enough to
index off the end of the shmem segment. I see that verify_heapam
does this for non-redirect tuples:/* Set up context information about this next tuple */
ctx.lp_len = ItemIdGetLength(ctx.itemid);
ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);with absolutely no thought for the possibility that lp_off is out of
range or not maxaligned. The checks for a sane lp_len seem to have
gone missing as well.
You certainly appear to be right about that. I've added the extra checks, and extended the regression test to include them. Patch attached.
Attachments:
v23-0001-Sanity-checking-line-pointers.patchapplication/octet-stream; name=v23-0001-Sanity-checking-line-pointers.patch; x-unix-mode=0644Download
From da61aac5f94adcc9d1db81f04d36c902328fac74 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Fri, 23 Oct 2020 13:44:49 -0700
Subject: [PATCH v23] Sanity checking line pointers
Sanity checking the offset and length of line pointers before
fetching data based on them, per report from Tom Lane about the
unsafety of what the code was doing.
---
contrib/amcheck/t/001_verify_heapam.pl | 22 +++++++++-----
contrib/amcheck/verify_heapam.c | 42 ++++++++++++++++++++++++--
2 files changed, 53 insertions(+), 11 deletions(-)
diff --git a/contrib/amcheck/t/001_verify_heapam.pl b/contrib/amcheck/t/001_verify_heapam.pl
index f8ee5384f3..7af91068e3 100644
--- a/contrib/amcheck/t/001_verify_heapam.pl
+++ b/contrib/amcheck/t/001_verify_heapam.pl
@@ -4,7 +4,7 @@ use warnings;
use PostgresNode;
use TestLib;
-use Test::More tests => 55;
+use Test::More tests => 79;
my ($node, $result);
@@ -109,12 +109,14 @@ sub corrupt_first_page
or BAIL_OUT("open failed: $!");
binmode $fh;
- # Corrupt two line pointers. To be stable across platforms, we use
- # 0x55555555 and 0xAAAAAAAA for the two, which are bitwise reverses of each
- # other.
+ # Corrupt pairs of line pointers. To be stable across platforms, we
+ # use pairs of values that are bitwise reverses of each other.
seek($fh, 32, 0)
or BAIL_OUT("seek failed: $!");
- syswrite($fh, pack("L*", 0x55555555, 0xAAAAAAAA))
+ syswrite($fh, pack("L*",
+ 0x55555555, 0xAAAAAAAA,
+ 0x00055000, 0x000AA000,
+ 0x001F0000, 0x0000F800))
or BAIL_OUT("syswrite failed: $!");
close($fh)
or BAIL_OUT("close failed: $!");
@@ -127,16 +129,20 @@ sub detects_heap_corruption
my ($function, $testname) = @_;
detects_corruption($function, $testname,
- qr/line pointer redirection to item at offset \d+ exceeds maximum offset \d+/
+ qr/line pointer redirection to item at offset \d+ exceeds maximum offset \d+/,
+ qr/line pointer to page offset \d+ exceeds maximum page offset \d+/,
+ qr/line pointer to page offset \d+ with length \d+ ends beyond maximum page offset \d+/,
+ qr/line pointer redirection to item at offset \d+ precedes minimum offset \d+/,
+ qr/line pointer to page offset \d+ is not maximally aligned/,
);
}
sub detects_corruption
{
- my ($function, $testname, $re) = @_;
+ my ($function, $testname, @re) = @_;
my $result = $node->safe_psql('postgres', qq(SELECT * FROM $function));
- like($result, $re, $testname);
+ like($result, $_, $testname) for (@re);
}
sub detects_no_corruption
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 37b40a0404..5c01854f50 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -105,6 +105,7 @@ typedef struct HeapCheckContext
OffsetNumber offnum;
ItemId itemid;
uint16 lp_len;
+ uint16 lp_off;
HeapTupleHeader tuphdr;
int natts;
@@ -378,14 +379,22 @@ verify_heapam(PG_FUNCTION_ARGS)
/*
* If this line pointer has been redirected, check that it
- * redirects to a valid offset within the line pointer array.
+ * redirects to a valid offset within the line pointer array
*/
if (ItemIdIsRedirected(ctx.itemid))
{
OffsetNumber rdoffnum = ItemIdGetRedirect(ctx.itemid);
ItemId rditem;
- if (rdoffnum < FirstOffsetNumber || rdoffnum > maxoff)
+ if (rdoffnum < FirstOffsetNumber)
+ {
+ report_corruption(&ctx,
+ psprintf("line pointer redirection to item at offset %u precedes minimum offset %u",
+ (unsigned) rdoffnum,
+ (unsigned) FirstOffsetNumber));
+ continue;
+ }
+ if (rdoffnum > maxoff)
{
report_corruption(&ctx,
psprintf("line pointer redirection to item at offset %u exceeds maximum offset %u",
@@ -401,8 +410,35 @@ verify_heapam(PG_FUNCTION_ARGS)
continue;
}
- /* Set up context information about this next tuple */
+ /* Check the line pointer points within page bounds */
ctx.lp_len = ItemIdGetLength(ctx.itemid);
+ ctx.lp_off = ItemIdGetOffset(ctx.itemid);
+ if (ctx.lp_off != MAXALIGN(ctx.lp_off))
+ {
+ report_corruption(&ctx,
+ psprintf("line pointer to page offset %u is not maximally aligned",
+ (unsigned) ctx.lp_off));
+ continue;
+ }
+ if (ctx.lp_off > BLCKSZ)
+ {
+ report_corruption(&ctx,
+ psprintf("line pointer to page offset %u exceeds maximum page offset %u",
+ (unsigned) ctx.lp_off,
+ (unsigned) BLCKSZ));
+ continue;
+ }
+ if (ctx.lp_off + ctx.lp_len > BLCKSZ)
+ {
+ report_corruption(&ctx,
+ psprintf("line pointer to page offset %u with length %u ends beyond maximum page offset %u",
+ (unsigned) ctx.lp_off,
+ (unsigned) ctx.lp_len,
+ (unsigned) BLCKSZ));
+ continue;
+ }
+
+ /* It should be safe to fetch the page item */
ctx.tuphdr = (HeapTupleHeader) PageGetItem(ctx.page, ctx.itemid);
ctx.natts = HeapTupleHeaderGetNatts(ctx.tuphdr);
--
2.21.1 (Apple Git-122.3)
Mark Dilger <mark.dilger@enterprisedb.com> writes:
On Oct 23, 2020, at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I do not have 64-bit big-endian hardware to play with unfortunately.
But what I suspect is happening here is less about endianness and
more about alignment pickiness; or maybe we were unlucky enough to
index off the end of the shmem segment.
You certainly appear to be right about that. I've added the extra checks, and extended the regression test to include them. Patch attached.
Meanwhile, I've replicated the SIGBUS problem on gaur's host, so
that's definitely what's happening.
(Although PPC is also alignment-picky on the hardware level, I believe
that both macOS and Linux try to mask that by having kernel trap handlers
execute unaligned accesses, leaving only a nasty performance loss behind.
That's why I failed to see this effect when checking your previous patch
on an old Apple box. We likely won't see it in the buildfarm either,
unless maybe on Noah's AIX menagerie.)
I'll check this patch on gaur and push it if it's clean.
regards, tom lane
Mark Dilger <mark.dilger@enterprisedb.com> writes:
You certainly appear to be right about that. I've added the extra checks, and extended the regression test to include them. Patch attached.
Pushed with some more fooling about. The "bit reversal" idea is not
a sufficient guide to picking values that will hit all the code checks.
For instance, I was seeing out-of-range warnings on one endianness and
not the other because on the other one the maxalign check rejected the
value first. I ended up manually tweaking the corruption patterns
until they hit all the tests on both endiannesses. Pretty much the
opposite of black-box testing, but it's not like our notions of line
pointer layout are going to change anytime soon.
I made some logic rearrangements in the C code, too.
regards, tom lane
On Oct 23, 2020, at 4:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mark Dilger <mark.dilger@enterprisedb.com> writes:
You certainly appear to be right about that. I've added the extra checks, and extended the regression test to include them. Patch attached.
Pushed with some more fooling about. The "bit reversal" idea is not
a sufficient guide to picking values that will hit all the code checks.
For instance, I was seeing out-of-range warnings on one endianness and
not the other because on the other one the maxalign check rejected the
value first. I ended up manually tweaking the corruption patterns
until they hit all the tests on both endiannesses. Pretty much the
opposite of black-box testing, but it's not like our notions of line
pointer layout are going to change anytime soon.I made some logic rearrangements in the C code, too.
Thanks Tom! And Peter, your comment earlier save me some time. Thanks to you, also!
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Oct 23, 2020 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Seems to work, so I pushed it (after some compulsive fooling
about with whitespace and perltidy-ing). It appears to me that
the code coverage for verify_heapam.c is not very good though,
only circa 50%. Do we care to expend more effort on that?
There are two competing goods here. On the one hand, more test
coverage is better than less. On the other hand, finicky tests that
have platform-dependent results or fail for strange reasons not
indicative of actual problems with the code are often judged not to be
worth the trouble. An early version of this patch set had a very
extensive chunk of Perl code in it that actually understood the page
layout and, if we adopt something like that, it would probably be
easier to test a whole bunch of scenarios. The downside is that it was
a lot of code that basically duplicated a lot of backend logic in
Perl, and I was (and am) afraid that people will complain about the
amount of code and/or the difficulty of maintaining it. On the other
hand, having all that code might allow better testing not only of this
particular patch but also other scenarios involving corrupted pages,
so maybe it's wrong to view all that code as a burden that we have to
carry specifically to test this; or, alternatively, maybe it's worth
carrying even if we only use it for this. On the third hand, as Mark
points out, if we get 0002 committed, that will help somewhat with
test coverage even if we do nothing else.
Thanks for committing (and adjusting) the patches for the existing
buildfarm failures. If I understand the buildfarm results correctly,
hornet is still unhappy even after
321633e17b07968e68ca5341429e2c8bbf15c331?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Oct 26, 2020, at 6:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Oct 23, 2020 at 2:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Seems to work, so I pushed it (after some compulsive fooling
about with whitespace and perltidy-ing). It appears to me that
the code coverage for verify_heapam.c is not very good though,
only circa 50%. Do we care to expend more effort on that?There are two competing goods here. On the one hand, more test
coverage is better than less. On the other hand, finicky tests that
have platform-dependent results or fail for strange reasons not
indicative of actual problems with the code are often judged not to be
worth the trouble. An early version of this patch set had a very
extensive chunk of Perl code in it that actually understood the page
layout and, if we adopt something like that, it would probably be
easier to test a whole bunch of scenarios. The downside is that it was
a lot of code that basically duplicated a lot of backend logic in
Perl, and I was (and am) afraid that people will complain about the
amount of code and/or the difficulty of maintaining it. On the other
hand, having all that code might allow better testing not only of this
particular patch but also other scenarios involving corrupted pages,
so maybe it's wrong to view all that code as a burden that we have to
carry specifically to test this; or, alternatively, maybe it's worth
carrying even if we only use it for this. On the third hand, as Mark
points out, if we get 0002 committed, that will help somewhat with
test coverage even if we do nothing else.
Much of the test in 0002 could be ported to work without committing the rest of 0002, if the pg_amcheck command line utiilty is not wanted.
Thanks for committing (and adjusting) the patches for the existing
buildfarm failures. If I understand the buildfarm results correctly,
hornet is still unhappy even after
321633e17b07968e68ca5341429e2c8bbf15c331?
That appears to be a failed test for pg_surgery rather than for amcheck. Or am I reading the log wrong?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Much of the test in 0002 could be ported to work without committing the rest of 0002, if the pg_amcheck command line utiilty is not wanted.
How much consensus do we think we have around 0002 at this point? I
think I remember a vote in favor and no votes against, but I haven't
been paying a whole lot of attention.
Thanks for committing (and adjusting) the patches for the existing
buildfarm failures. If I understand the buildfarm results correctly,
hornet is still unhappy even after
321633e17b07968e68ca5341429e2c8bbf15c331?That appears to be a failed test for pg_surgery rather than for amcheck. Or am I reading the log wrong?
Oh, yeah, you're right. I don't know why it just failed now, though:
there are a bunch of successful runs preceding it. But I guess it's
unrelated to this thread.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Oct 26, 2020, at 7:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:Much of the test in 0002 could be ported to work without committing the rest of 0002, if the pg_amcheck command line utiilty is not wanted.
How much consensus do we think we have around 0002 at this point? I
think I remember a vote in favor and no votes against, but I haven't
been paying a whole lot of attention.
My sense over the course of the thread is that people were very much in favor of having heap checking functionality, but quite vague on whether they wanted the command line interface. I think the interface is useful, but I'd rather hear from others on this list whether it is useful enough to justify maintaining it. As the author of it, I'm biased. Hopefully others with a more objective view of the matter will read this and vote?
I don't recall patches 0003 through 0005 getting any votes. 0003 and 0004, which create and use a non-throwing interface to clog, were written in response to Andrey's request, so I'm guessing that's kind of a vote in favor. 0005 was factored out of of 0001 in response to a lack of agreement about whether verify_heapam should have acl checks. You seemed in favor, and Peter against, but I don't think we heard other opinions.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:hornet is still unhappy even after
321633e17b07968e68ca5341429e2c8bbf15c331?
That appears to be a failed test for pg_surgery rather than for amcheck. Or am I reading the log wrong?
Oh, yeah, you're right. I don't know why it just failed now, though:
there are a bunch of successful runs preceding it. But I guess it's
unrelated to this thread.
pg_surgery's been unstable since it went in. I believe Andres is
working on a fix.
regards, tom lane
Hi,
On October 26, 2020 7:13:15 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:hornet is still unhappy even after
321633e17b07968e68ca5341429e2c8bbf15c331?That appears to be a failed test for pg_surgery rather than for
amcheck. Or am I reading the log wrong?
Oh, yeah, you're right. I don't know why it just failed now, though:
there are a bunch of successful runs preceding it. But I guess it's
unrelated to this thread.pg_surgery's been unstable since it went in. I believe Andres is
working on a fix.
I posted one a while ago - was planning to push a cleaned up version soon if nobody comments in the near future.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Oct 26, 2020, at 7:08 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
On Oct 26, 2020, at 7:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 26, 2020 at 9:56 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:Much of the test in 0002 could be ported to work without committing the rest of 0002, if the pg_amcheck command line utiilty is not wanted.
How much consensus do we think we have around 0002 at this point? I
think I remember a vote in favor and no votes against, but I haven't
been paying a whole lot of attention.My sense over the course of the thread is that people were very much in favor of having heap checking functionality, but quite vague on whether they wanted the command line interface. I think the interface is useful, but I'd rather hear from others on this list whether it is useful enough to justify maintaining it. As the author of it, I'm biased. Hopefully others with a more objective view of the matter will read this and vote?
I don't recall patches 0003 through 0005 getting any votes. 0003 and 0004, which create and use a non-throwing interface to clog, were written in response to Andrey's request, so I'm guessing that's kind of a vote in favor. 0005 was factored out of of 0001 in response to a lack of agreement about whether verify_heapam should have acl checks. You seemed in favor, and Peter against, but I don't think we heard other opinions.
The v20 patches 0002, 0003, and 0005 still apply cleanly, but 0004 required a rebase. (0001 was already committed last week.)
Here is a rebased set of 4 patches, numbered 0002..0005 to be consistent with the previous naming. There are no substantial changes.
Attachments:
v24-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v24-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From a4a0f6c30f083859f25d28b1df5111cbc7ecd823 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 21 Oct 2020 20:25:21 -0700
Subject: [PATCH v24 1/4] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 2 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 1281 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.control | 5 +
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 60 +
contrib/pg_amcheck/t/003_check.pl | 231 ++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 489 ++++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 228 ++++
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 2 +
15 files changed, 2393 insertions(+), 3 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.control
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 7a4866e338..0fd4125902 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..f8eecf70bf
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,2 @@
+/pg_amcheck
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..6d20ff3d78
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1281 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "common/connect.h"
+#include "common/string.h"
+#include "fe_utils/print.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.",
+ "",
+ "Usage:",
+ " pg_amcheck [OPTION]... [DBNAME [USERNAME]]",
+ "",
+ "General options:",
+ " -V, --version output version information, then exit",
+ " -?, --help show this help, then exit",
+ " -s, --strict-names require include patterns to match at least one entity each",
+ " -o, --on-error-stop stop checking at end of first corrupt page",
+ "",
+ "Schema checking options:",
+ " -n, --schema=PATTERN check relations in the specified schema(s) only",
+ " -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)",
+ "",
+ "Table checking options:",
+ " -t, --table=PATTERN check the specified table(s) only",
+ " -T, --exclude-table=PATTERN do NOT check the specified table(s)",
+ " -b, --startblock begin checking table(s) at the given starting block number",
+ " -e, --endblock check table(s) only up to the given ending block number",
+ " -f, --skip-all-frozen do NOT check blocks marked as all frozen",
+ " -v, --skip-all-visible do NOT check blocks marked as all visible",
+ "",
+ "TOAST table checking options:",
+ " -z, --check-toast check associated toast tables and toast indexes",
+ " -Z, --skip-toast do NOT check associated toast tables and toast indexes",
+ " -B, --toast-startblock begin checking toast table(s) at the given starting block",
+ " -E, --toast-endblock check toast table(s) only up to the given ending block",
+ "",
+ "Index checking options:",
+ " -x, --check-indexes check btree indexes associated with tables being checked",
+ " -X, --skip-indexes do NOT check any btree indexes",
+ " -i, --index=PATTERN check the specified index(es) only",
+ " -I, --exclude-index=PATTERN do NOT check the specified index(es)",
+ " -c, --check-corrupt check indexes even if their associated table is corrupt",
+ " -C, --skip-corrupt do NOT check indexes if their associated table is corrupt",
+ " -a, --heapallindexed check index tuples against the table tuples",
+ " -A, --no-heapallindexed do NOT check index tuples against the table tuples",
+ " -r, --rootdescend search from the root page for each index tuple",
+ " -R, --no-rootdescend do NOT search from the root page for each index tuple",
+ "",
+ "Connection options:",
+ " -d, --dbname=DBNAME database name to connect to",
+ " -h, --host=HOSTNAME database server host or socket directory",
+ " -p, --port=PORT database server port",
+ " -U, --username=USERNAME database user name",
+ " -w, --no-password never prompt for password",
+ " -W, --password force password prompt (should happen automatically)",
+ "",
+ NULL /* sentinel */
+};
+
+typedef struct
+AmCheckSettings
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ PGconn *db; /* connection to backend */
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes */
+ bool check_toast; /* Check associated toast tables and indexes */
+ bool check_corrupt; /* Check indexes even if table is corrupt */
+ bool heapallindexed; /* Perform index to table reconciling checks */
+ bool rootdescend; /* Perform index rootdescend checks */
+ char *startblock; /* Block number where checking begins */
+ char *endblock; /* Block number where checking ends, inclusive */
+ char *toaststart; /* Block number where toast checking begins */
+ char *toastend; /* Block number where toast checking ends,
+ * inclusive */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of Oids of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+static SimpleStringList index_include_patterns = {NULL, NULL};
+static SimpleOidList index_include_oids = {NULL, NULL};
+static SimpleStringList index_exclude_patterns = {NULL, NULL};
+static SimpleOidList index_exclude_oids = {NULL, NULL};
+
+/*
+ * List of tables to be checked, compiled from above lists.
+ */
+static SimpleOidList checklist = {NULL, NULL};
+
+/*
+ * Strings to be constructed once upon first use. These could be made
+ * string constants instead, but that would require embedding knowledge
+ * of the single character values for each relkind, such as 'm' for
+ * materialized views, which we'd rather not embed here.
+ */
+static char *table_relkind_quals = NULL;
+static char *index_relkind_quals = NULL;
+
+/*
+ * Functions to get pointers to the two strings, above, after initializing
+ * them upon the first call to the function.
+ */
+static const char *get_table_relkind_quals(void);
+static const char *get_index_relkind_quals(void);
+
+/*
+ * Functions for running the various corruption checks.
+ */
+static void check_tables(SimpleOidList *checklist);
+static uint64 check_toast(Oid tbloid);
+static uint64 check_table(Oid tbloid, const char *startblock,
+ const char *endblock, bool on_error_stop,
+ bool check_toast);
+static uint64 check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids);
+static uint64 check_index(const char *idxoid, const char *idxname,
+ const char *tblname);
+
+/*
+ * Functions implementing standard command line behaviors.
+ */
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+static void NoticeProcessor(void *arg, const char *message);
+
+/*
+ * Functions for converting command line options that include or exclude
+ * schemas, tables, and indexes by pattern into internally useful lists of
+ * Oids for objects that match those patterns.
+ */
+static void expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals);
+static void expand_table_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void expand_index_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+static void get_table_check_list(const SimpleOidList *include_nsp,
+ const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl,
+ const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist);
+static PGresult *ExecuteSqlQuery(const char *query, char **error);
+static PGresult *ExecuteSqlQueryOrDie(const char *query);
+
+static void append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids);
+static void apply_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids, bool include);
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit(1); } while(0)
+
+/* Like fatal(), but with a complaint about a particular query. */
+static void
+die_on_query_failure(const char *query)
+{
+ pg_log_error("query failed: %s",
+ PQerrorMessage(settings.db));
+ fatal("query was: %s", query);
+}
+
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char *password = NULL;
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ settings.db = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ /*
+ * Default behaviors for user settable options. Note that these default
+ * to doing all the safe checks and none of the unsafe ones, on the theory
+ * that if a user says "pg_amcheck mydb" without specifying any additional
+ * options, we should check everything we know how to check without
+ * risking any backend aborts.
+ */
+
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+
+ /* Index checking options */
+ settings.check_indexes = false;
+ settings.check_corrupt = false;
+ settings.heapallindexed = false;
+ settings.rootdescend = false;
+
+ /*
+ * Reconciling toasted attributes from the main table against the toast
+ * table can crash the backend if the toast table or index are corrupt.
+ * We can optionally check the toast table and then the toast index prior
+ * to checking the main table, but if the toast table or index are
+ * concurrently corrupted after we conclude they are valid, the check of
+ * the main table can crash the backend. The oneous is on any caller who
+ * enables this option to make certain the environment is sufficiently
+ * stable that concurrent corruptions of the toast is not possible.
+ */
+ settings.check_toast = false;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ password = simple_prompt("Password: ", false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ settings.db = PQconnectdbParams(keywords, values, true);
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after initial attempt");
+ exit(EXIT_BADCONN);
+ }
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(settings.db) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(settings.db) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(settings.db);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf("Password for user %s: ",
+ realusername);
+ else
+ password_prompt = pg_strdup("Password: ");
+ PQfinish(settings.db);
+
+ password = simple_prompt(password_prompt, false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
+
+ if (PQstatus(settings.db) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to server: %s",
+ PQerrorMessage(settings.db));
+ PQfinish(settings.db);
+ exit(EXIT_BADCONN);
+ }
+
+ /*
+ * Expand schema, table, and index exclusion patterns, if any. Note that
+ * non-matching exclusion patterns are not an error, even when
+ * --strict-names was specified.
+ */
+ expand_schema_name_patterns(&schema_exclude_patterns, NULL,
+ &schema_exclude_oids, false);
+ expand_table_name_patterns(&table_exclude_patterns, NULL, NULL,
+ &table_exclude_oids, false);
+ expand_index_name_patterns(&index_exclude_patterns, NULL, NULL,
+ &index_exclude_oids, false);
+
+ /* Expand schema selection patterns into Oid lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(&schema_include_patterns,
+ &schema_exclude_oids,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+
+ /* Expand table selection patterns into Oid lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_table_name_patterns(&table_include_patterns,
+ &schema_exclude_oids,
+ &table_exclude_oids,
+ &table_include_oids,
+ settings.strict_names);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+
+ /* Expand index selection patterns into Oid lists */
+ if (index_include_patterns.head != NULL)
+ {
+ expand_index_name_patterns(&index_include_patterns,
+ &schema_exclude_oids,
+ &index_exclude_oids,
+ &index_include_oids,
+ settings.strict_names);
+ if (index_include_oids.head == NULL)
+ fatal("no matching indexes were found");
+ }
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids, &checklist);
+
+ PQsetNoticeProcessor(settings.db, NoticeProcessor, NULL);
+
+ /*
+ * All information about corrupt indexes are returned via ereport, not as
+ * tuples. We want all the details to report if corruption exists.
+ */
+ PQsetErrorVerbosity(settings.db, PQERRORS_VERBOSE);
+
+ check_tables(&checklist);
+
+ return 0;
+}
+
+/*
+ * Conditionally add a restriction to a query such that lval must be an Oid in
+ * the given list of Oids, except that for a null or empty oids list argument,
+ * no filtering is done and we return without having modified the query buffer.
+ *
+ * The query argument must already have begun the WHERE clause and must be in a
+ * state where we can append an AND clause. No checking of this requirement is
+ * done here.
+ *
+ * On return, the query buffer will be extended with an AND clause that filters
+ * only those rows where the lval is an Oid present in the given list of oids.
+ */
+static inline void
+include_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, true);
+}
+
+/*
+ * Same as include_filter, above, except that for a non-null, non-empty oids
+ * list, the lval is restricted to not be any of the values in the list.
+ */
+static inline void
+exclude_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, false);
+}
+
+/*
+ * Check each table from the given checklist per the user specified options.
+ */
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ uint64 corruptions = 0;
+ bool reconcile_toast;
+
+ /*
+ * If we skip checking the toast table, or if during the check we
+ * detect any toast table corruption, the main table checks below must
+ * not reconcile toasted attributes against the toast table, as such
+ * accesses to the toast table might crash the backend. Instead, skip
+ * such reconciliations for this table.
+ *
+ * This protection contains a race condition; the toast table or index
+ * could become corrupted concurrently with our checks, but prevention
+ * of such concurrent corruption is documented as the caller's
+ * reponsibility, so we don't worry about it here.
+ */
+ reconcile_toast = false;
+ if (settings.check_toast)
+ {
+ if (check_toast(cell->val) == 0)
+ reconcile_toast = true;
+ }
+
+ corruptions = check_table(cell->val,
+ settings.startblock,
+ settings.endblock,
+ settings.on_error_stop,
+ reconcile_toast);
+
+ if (settings.check_indexes)
+ {
+ bool old_heapallindexed;
+
+ /* Optionally skip the index checks for a corrupt table. */
+ if (corruptions && !settings.check_corrupt)
+ continue;
+
+ /*
+ * The btree checking logic which optionally checks the contents
+ * of an index against the corresponding table has not yet been
+ * sufficiently hardened against corrupt tables. In particular,
+ * when called with heapallindexed true, it segfaults if the file
+ * backing the table relation has been erroneously unlinked. In
+ * any event, it seems unwise to reconcile an index against its
+ * table when we already know the table is corrupt.
+ */
+ old_heapallindexed = settings.heapallindexed;
+ if (corruptions)
+ settings.heapallindexed = false;
+
+ corruptions += check_indexes(cell->val,
+ &index_include_oids,
+ &index_exclude_oids);
+
+ settings.heapallindexed = old_heapallindexed;
+ }
+ }
+}
+
+/*
+ * For a given main table relation, returns the associated toast table,
+ * or InvalidOid if none exists.
+ */
+static Oid
+get_toast_oid(Oid tbloid)
+{
+ PQExpBuffer querybuf = createPQExpBuffer();
+ PGresult *res;
+ char *error = NULL;
+ Oid result = InvalidOid;
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.reltoastrelid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nWHERE c.oid = %u",
+ tbloid);
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ result = atooid(PQgetvalue(res, 0, 0));
+ else if (error)
+ die_on_query_failure(querybuf->data);
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return result;
+}
+
+/*
+ * For the given main table relation, checks the associated toast table and
+ * index, in any. This should be performed *before* checking the main table
+ * relation, as the checks inside verify_heapam assume both the toast table and
+ * toast index are usable.
+ *
+ * Returns the number of corruptions detected.
+ */
+static uint64
+check_toast(Oid tbloid)
+{
+ Oid toastoid;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_toast");
+
+ toastoid = get_toast_oid(tbloid);
+ if (OidIsValid(toastoid))
+ {
+ corruption_cnt = check_table(toastoid, settings.toaststart,
+ settings.toastend, settings.on_error_stop,
+ false);
+
+ /*
+ * If the toast table is corrupt, checking the index is not safe.
+ * There is a race condition here, as the toast table could be
+ * concurrently corrupted, but preventing concurrent corruption is the
+ * caller's responsibility, not ours.
+ */
+ if (corruption_cnt == 0)
+ corruption_cnt += check_indexes(toastoid, NULL, NULL);
+ }
+
+ return corruption_cnt;
+}
+
+/*
+ * Checks the given table for corruption, returning the number of corruptions
+ * detected and printed to the user.
+ */
+static uint64
+check_table(Oid tbloid, const char *startblock, const char *endblock,
+ bool on_error_stop, bool check_toast)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *skip;
+ char *toast;
+ const char *stop;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_table");
+
+ if (startblock == NULL)
+ startblock = "NULL";
+ if (endblock == NULL)
+ endblock = "NULL";
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("'none'");
+ stop = (on_error_stop) ? "true" : "false";
+ toast = (check_toast) ? "true" : "false";
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg "
+ "FROM verify_heapam("
+ "relation := %u, "
+ "on_error_stop := %s, "
+ "skip := %s, "
+ "check_toast := %s, "
+ "startblock := %s, "
+ "endblock := %s) v, "
+ "pg_catalog.pg_class c "
+ "WHERE c.oid = %u",
+ tbloid, stop, skip, toast, startblock, endblock, tbloid);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ {
+ corruption_cnt += PQntuples(res);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ printf("(relname=%s,blkno=%s,offnum=%s,attnum=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ }
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+ return corruption_cnt;
+}
+
+static uint64
+check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *error = NULL;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_indexes");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT i.indexrelid, ci.relname, ct.relname"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class ci, "
+ "pg_catalog.pg_class ct"
+ "\nWHERE i.indexrelid = ci.oid"
+ "\n AND i.indrelid = ct.oid"
+ "\n AND ci.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+ include_filter(querybuf, "i.indexrelid", include_oids);
+ exclude_filter(querybuf, "i.indexrelid", exclude_oids);
+
+ res = ExecuteSqlQuery(querybuf->data, &error);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ for (i = 0; i < PQntuples(res); i++)
+ corruption_cnt += check_index(PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2));
+ }
+ else if (error)
+ {
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static uint64
+check_index(const char *idxoid, const char *idxname, const char *tblname)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ uint64 corruption_cnt = 0;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_index");
+ if (idxname == NULL)
+ fatal("no index name on entry to check_index");
+ if (tblname == NULL)
+ fatal("no table name on entry to check_index");
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
+ res = PQexec(settings.db, querybuf->data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("index check failed for index %s of table %s:\n",
+ idxname, tblname);
+ printf("%s", PQerrorMessage(settings.db));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"check-corrupt", no_argument, NULL, 'c'},
+ {"check-indexes", no_argument, NULL, 'x'},
+ {"check-toast", no_argument, NULL, 'z'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"heapallindexed", no_argument, NULL, 'a'},
+ {"help", optional_argument, NULL, '?'},
+ {"host", required_argument, NULL, 'h'},
+ {"index", required_argument, NULL, 'i'},
+ {"no-heapallindexed", no_argument, NULL, 'A'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"no-rootdescend", no_argument, NULL, 'R'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"password", no_argument, NULL, 'W'},
+ {"port", required_argument, NULL, 'p'},
+ {"rootdescend", no_argument, NULL, 'r'},
+ {"schema", required_argument, NULL, 'n'},
+ {"skip-all-frozen", no_argument, NULL, 'f'},
+ {"skip-all-visible", no_argument, NULL, 'v'},
+ {"skip-corrupt", no_argument, NULL, 'C'},
+ {"skip-indexes", no_argument, NULL, 'X'},
+ {"skip-toast", no_argument, NULL, 'Z'},
+ {"startblock", required_argument, NULL, 'b'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"toast-endblock", required_argument, NULL, 'E'},
+ {"toast-startblock", required_argument, NULL, 'B'},
+ {"username", required_argument, NULL, 'U'},
+ {"version", no_argument, NULL, 'V'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "aAb:B:cCd:e:E:fh:i:I:n:N:op:rRst:T:U:vVwWxXzZ?1",
+ long_options, &optindex)) != -1)
+ {
+ switch (c)
+ {
+ case 'a':
+ settings.heapallindexed = true;
+ break;
+ case 'A':
+ settings.heapallindexed = false;
+ break;
+ case 'b':
+ settings.startblock = pg_strdup(optarg);
+ break;
+ case 'B':
+ settings.toaststart = pg_strdup(optarg);
+ break;
+ case 'c':
+ settings.check_corrupt = true;
+ break;
+ case 'C':
+ settings.check_corrupt = false;
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = pg_strdup(optarg);
+ break;
+ case 'E':
+ settings.toastend = pg_strdup(optarg);
+ break;
+ case 'f':
+ settings.skip_frozen = true;
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ simple_string_list_append(&index_include_patterns, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&index_exclude_patterns, optarg);
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 'r':
+ settings.rootdescend = true;
+ break;
+ case 'R':
+ settings.rootdescend = false;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'v':
+ settings.skip_visible = true;
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case 'x':
+ settings.check_indexes = true;
+ break;
+ case 'X':
+ settings.check_indexes = false;
+ break;
+ case 'z':
+ settings.check_toast = true;
+ break;
+ case 'Z':
+ settings.check_toast = false;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, "Try \"%s --help\" for more information.\n",
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ int lineno;
+
+ for (lineno = 0; usage_text[lineno]; lineno++)
+ printf("%s\n", usage_text[lineno]);
+ printf("Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+/*
+ * Helper function for apply_filter, below.
+ */
+static void
+append_csv_oids(PQExpBuffer querybuf, const SimpleOidList *oids)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(querybuf, "%s%u", comma, cell->val);
+}
+
+/*
+ * Internal implementation of include_filter and exclude_filter
+ */
+static void
+apply_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids,
+ bool include)
+{
+ if (!oids || !oids->head)
+ return;
+ if (include)
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.=) ANY(array[", lval);
+ else
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.!=) ALL(array[", lval);
+ append_csv_oids(querybuf, oids);
+ appendPQExpBuffer(querybuf, "]::OID[])");
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all schemas matching the
+ * given list of patterns but not included in the given list of excluded Oids.
+ */
+static void
+expand_schema_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_schema_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the Oid list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(querybuf,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(settings.db, querybuf, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"",
+ cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find and append to the given Oid list the Oids of all relations matching the
+ * given list of patterns but not included in the given list of excluded Oids
+ * nor in one of the given excluded namespaces. The relations are filtered by
+ * the given schema_quals. They are further filtered by the given
+ * relkind_quals, allowing the caller to restrict the relations to just indexes
+ * or tables. The missing_errtext should be a message for use in error
+ * messages if no matching relations are found and strict_names was specified.
+ */
+static void
+expand_relkind_name_patterns(const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names,
+ const char *missing_errtext,
+ const char *relkind_quals)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to expand_relkind_name_patterns");
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ querybuf = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the Oid list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) %s\n",
+ relkind_quals);
+ exclude_filter(querybuf, "c.oid", exclude_oids);
+ exclude_filter(querybuf, "n.oid", exclude_nsp_oids);
+ processSQLNamePattern(settings.db, querybuf, cell->val, true,
+ false, "n.nspname", "c.relname", NULL, NULL);
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("%s \"%s\"", missing_errtext, cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(querybuf);
+ }
+
+ destroyPQExpBuffer(querybuf);
+}
+
+/*
+ * Find the Oids of all tables matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_table_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching tables were found for pattern",
+ get_table_relkind_quals());
+}
+
+/*
+ * Find the Oids of all indexes matching the given list of patterns,
+ * and append them to the given Oid list.
+ */
+static void
+expand_index_name_patterns(const SimpleStringList *patterns, const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids, SimpleOidList *oids, bool strict_names)
+{
+ expand_relkind_name_patterns(patterns, exclude_nsp_oids, exclude_oids, oids, strict_names,
+ "no matching indexes were found for pattern",
+ get_index_relkind_quals());
+}
+
+static void
+get_table_check_list(const SimpleOidList *include_nsp, const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl, const SimpleOidList *exclude_tbl,
+ SimpleOidList *checklist)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+
+ if (settings.db == NULL)
+ fatal("no connection on entry to get_table_check_list");
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c, pg_catalog.pg_namespace n"
+ "\nWHERE n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\n AND c.relkind OPERATOR(pg_catalog.=) %s\n",
+ get_table_relkind_quals());
+ include_filter(querybuf, "n.oid", include_nsp);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+ include_filter(querybuf, "c.oid", include_tbl);
+ exclude_filter(querybuf, "c.oid", exclude_tbl);
+
+ res = ExecuteSqlQueryOrDie(querybuf->data);
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(checklist, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+}
+
+static PGresult *
+ExecuteSqlQueryOrDie(const char *query)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ die_on_query_failure(query);
+ return res;
+}
+
+/*
+ * Execute the given SQL query. This function should only be used for queries
+ * which are not expected to fail under normal circumstances, as failures will
+ * result in the printing of error messages, which will look a bit messy when
+ * interleaved with corruption reports.
+ *
+ * On error, use the supplied error_context string and the error string
+ * returned from the database connection to print an error message for the
+ * user.
+ *
+ * The error_context argument is pfree'd by us at the end of the call.
+ */
+static PGresult *
+ExecuteSqlQuery(const char *query, char **error)
+{
+ PGresult *res;
+
+ res = PQexec(settings.db, query);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ *error = pstrdup(PQerrorMessage(settings.db));
+ return res;
+}
+
+/*
+ * Return the cached relkind quals string for tables, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_table_relkind_quals(void)
+{
+ if (!table_relkind_quals)
+ table_relkind_quals = psprintf("ANY(array['%c', '%c', '%c'])",
+ RELKIND_RELATION, RELKIND_MATVIEW,
+ RELKIND_PARTITIONED_TABLE);
+ return table_relkind_quals;
+}
+
+/*
+ * Return the cached relkind quals string for indexes, computing it first if we
+ * don't have one cached.
+ */
+static const char *
+get_index_relkind_quals(void)
+{
+ if (!index_relkind_quals)
+ index_relkind_quals = psprintf("'%c'", RELKIND_INDEX);
+ return index_relkind_quals;
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.control b/contrib/pg_amcheck/pg_amcheck.control
new file mode 100644
index 0000000000..395f368101
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.control
@@ -0,0 +1,5 @@
+# pg_amcheck extension
+comment = 'command-line tool for verifying relation integrity'
+default_version = '1.3'
+module_pathname = '$libdir/pg_amcheck'
+relocatable = true
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..68be9c6585
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,60 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user' ],
+ qr/\Qpg_amcheck: error: could not connect to server: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-i', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching indexes were found for pattern\E/,
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..4d8e61d871
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,231 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 39;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($)
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the node is running.
+sub corrupt_first_page($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+-- We'll corrupt all indexes in s1
+CREATE SCHEMA s1;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s1.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables in s2
+CREATE SCHEMA s2;
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+INSERT INTO s2.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s2.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables and indexes in s3
+CREATE SCHEMA s3;
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+INSERT INTO s3.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s3.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll leave everything in s4 uncorrupted
+CREATE SCHEMA s4;
+CREATE TABLE s4.t1 (a TEXT);
+CREATE TABLE s4.t2 (a TEXT);
+CREATE INDEX i1 ON s4.t1(a);
+CREATE INDEX i2 ON s4.t2(a);
+INSERT INTO s4.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s4.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('s1.i1');
+corrupt_first_page('s1.i2');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('s2.t1');
+corrupt_first_page('s2.t2');
+
+# Corrupt tables and indexes in schema "s3"
+remove_relation_file('s3.i1');
+corrupt_first_page('s3.i2');
+remove_relation_file('s3.t1');
+corrupt_first_page('s3.t2');
+
+# Leave schema "s4" alone
+
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's*', '-t', 't1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-N', 's1', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s1.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+
+# In schema s3, the tables and indexes are both corrupt. Ordinarily, checking
+# of indexes will not be performed for corrupt tables, but the --check-corrupt
+# option (-c) forces the indexes to also be checked.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s3.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-c', '-p', $port, 'postgres', '-n', 's3', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s3.s2 reports index corruption');
+
+
+# Check that '-x' and '-X' work as expected. Since only index corruption
+# (and not table corruption) exists in s1, '-X' should give no errors, and
+# '-x' should give errors about index corruption.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck over tables and indexes in schema s1 reports corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over only tables in schema s1 reports no corruption');
+
+
+# Check that table corruption is reported as expected, with or without
+# index checking
+#
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables in schema s2 reports table corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables and indexes in schema s2 reports table corruption');
+
+# Check that no corruption is reported in schema s4
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s4 reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt schemas
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-N', 's2', '-N', 's3' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt schemas reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt tables
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt tables reports no corruption');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..1cc36b25b7
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,489 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper functions
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ qr/\(relname=test,blkno=$blkno,offnum=$offnum,attnum=$attnum\)\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, '');
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..fdbb1ea402
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-x', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 4e833d79ef..1efca8adc4 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -119,6 +119,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&oldsnapshot;
&pageinspect;
&passwordcheck;
+ &pgamcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a4e1b28b38 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..3e059e7753
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,228 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<sect1 id="pgamcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> and <function>verify_heapam</function>
+ functions.
+ </para>
+
+<synopsis>
+pg_amcheck [OPTION]... [DBNAME [USERNAME]]
+ General options:
+ -V, --version output version information, then exit
+ -?, --help show this help, then exit
+ -s, --strict-names require include patterns to match at least one entity each
+ -o, --on-error-stop stop checking at end of first corrupt page
+
+ Schema checking options:
+ -n, --schema=PATTERN check relations in the specified schema(s) only
+ -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)
+
+ Table checking options:
+ -t, --table=PATTERN check the specified table(s) only
+ -T, --exclude-table=PATTERN do NOT check the specified table(s)
+ -b, --startblock begin checking table(s) at the given starting block number
+ -e, --endblock check table(s) only up to the given ending block number
+ -f, --skip-all-frozen do NOT check blocks marked as all-frozen
+ -v, --skip-all-visible do NOT check blocks marked as all-visible
+
+ TOAST table checking options:
+ -z, --check-toast check associated toast tables and toast indexes
+ -Z, --skip-toast do NOT check associated toast tables and toast indexes
+ -B, --toast-startblock begin checking toast table(s) at the given starting block
+ -E, --toast-endblock check toast table(s) only up to the given ending block
+
+ Index checking options:
+ -x, --check-indexes check btree indexes associated with tables being checked
+ -X, --skip-indexes do NOT check any btree indexes
+ -i, --index=PATTERN check the specified index(es) only
+ -I, --exclude-index=PATTERN do NOT check the specified index(es)
+ -c, --check-corrupt check indexes even if their associated table is corrupt
+ -C, --skip-corrupt do NOT check indexes if their associated table is corrupt
+ -a, --heapallindexed check index tuples against the table tuples
+ -A, --no-heapallindexed do NOT check index tuples against the table tuples
+ -r, --rootdescend search from the root page for each index tuple
+ -R, --no-rootdescend do NOT search from the root page for each index tuple
+
+ Connection options:
+ -d, --dbname=DBNAME database name to connect to
+ -h, --host=HOSTNAME database server host or socket directory
+ -p, --port=PORT database server port
+ -U, --username=USERNAME database user name
+ -w, --no-password never prompt for password
+ -W, --password force password prompt (should happen automatically)
+</synopsis>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ To specify which database server <application>pg_amcheck</application> should
+ contact, use the command line options <option>-h</option> or
+ <option>--host</option> and <option>-p</option> or
+ <option>port</option>. The default host is the local host
+ or whatever your <envar>PGHOST</envar> environment variable specifies.
+ Similarly, the default port is indicated by the <envar>PGPORT</envar>
+ environment variable or, failing that, by the compiled-in default.
+ </para>
+
+ <para>
+ Like any other <productname>PostgreSQL</productname> client application,
+ <application>pg_amcheck</application> will by default connect with the
+ database user name that is equal to the current operating system user name.
+ To override this, either specify the <option>-U</option> option or set the
+ environment variable <envar>PGUSER</envar>. Remember that
+ <application>pg_amcheck</application> connections are subject to the normal
+ client authentication mechanisms (which are described in <xref
+ linkend="client-authentication"/>).
+ </para>
+
+ <para>
+ To restrict checking of tables and indexes to specific schemas, specify the
+ <option>-s</option> or <option>--schema</option> option with a pattern.
+ To exclude checking of tables and indexes within specific schemas, specify
+ the <option>-N</option> or <option>--exclude-schema</option> option with
+ a pattern.
+ </para>
+
+ <para>
+ To specify which tables are checked, specify the
+ <option>-t</option> or <option>--table</option> option with a pattern.
+ To exclude checking of tables, specify the
+ <option>-T</option> or <option>--exclude-table</option> option with a
+ pattern.
+ </para>
+
+ <para>
+ To check indexes associated with checked tables, specify the
+ <option>-i</option> or <option>--check-indexes</option> option. Only
+ indexes on tables which are being checked will themselves be checked. To
+ check all indexes in a database, all tables on which the indexes exist must
+ also be checked. This restriction may be relaxed in the future.
+ </para>
+
+ <para>
+ To restrict the range of blocks within a table that are checked, specify the
+ <option>-b</option> or <option>--startblock</option> and/or
+ <option>-e</option> or <option>--endblock</option> options with numeric
+ values for the starting and ending block numbers. Although these options
+ make the most sense when applied to a single table, if specified along with
+ options that select multiple tables, each table check will be restricted to
+ the specified blocks. If <option>--startblock</option> is omitted, checking
+ begins with the first block. If <option>--endblock</option> is omitted,
+ checking continues to the end of the relation.
+ </para>
+
+ <para>
+ Some users may wish to periodically check tables without incurring the cost
+ of rechecking older table blocks, presumably because those blocks have
+ already been checked in the past. There is at present no perfect way to do
+ this. Although the <option>--startblock</option> and <option>--endblock</option>
+ options can be used to restrict blocks, the user is not expected to have
+ perfect knowledge of which blocks have already been checked, and in any
+ event, some blocks that were previously checked may have been subject to
+ modification since the last check. As an approximation to the desired
+ functionality, one can specify the
+ <option>-f</option> or <option>--skip-all-frozen</option> option, or
+ alternatively the
+ <option>-v</option> or <option>--skip-all-visible</option> option to skip
+ blocks marked in the visibility map as all-frozen or all-visible,
+ respectively.
+ </para>
+ </sect2>
+
+ <sect2>
+ <title>Example Usage</title>
+
+ <para>
+ For table corruption, each detected corruption is reported on two lines, the
+ first line shows the location and the second line shows a message describing
+ the problem.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt table, "mytable",
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --skip-indexes mydb
+(relname=mytable,blkno=17,offnum=12,attnum=)
+xmin 4294967295 precedes relation freeze threshold 17:1134217582
+(relname=mytable,blkno=960,offnum=4,attnum=)
+data begins at offset 152 beyond the tuple length 58
+(relname=mytable,blkno=960,offnum=4,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 152 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=5,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 27 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=6,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 16 (3 attributes, no nulls)
+(relname=mytable,blkno=960,offnum=7,attnum=)
+tuple data should begin at byte 24, but actually begins at byte 21 (3 attributes, no nulls)
+(relname=mytable,blkno=1147,offnum=2,attnum=)
+number of attributes 2047 exceeds maximum expected for table 3
+(relname=mytable,blkno=1147,offnum=10,attnum=)
+tuple data should begin at byte 280, but actually begins at byte 24 (2047 attributes, has nulls)
+(relname=mytable,blkno=1147,offnum=15,attnum=)
+number of attributes 67 exceeds maximum expected for table 3
+(relname=mytable,blkno=1147,offnum=16,attnum=1)
+attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58
+(relname=mytable,blkno=1147,offnum=18,attnum=2)
+final toast chunk number 0 differs from expected value 6
+(relname=mytable,blkno=1147,offnum=19,attnum=2)
+toasted value for attribute 2 missing from toast table
+(relname=mytable,blkno=1147,offnum=21,attnum=)
+tuple is marked as only locked, but also claims key columns were updated
+(relname=mytable,blkno=1147,offnum=22,attnum=)
+multitransaction ID 1775655 is from before relation cutoff 2355572
+</screen>
+
+ <para>
+ For index corruption, the output is more free-form, and may span differing
+ numbers of lines per corruption detected.
+ </para>
+
+ <para>
+ Checking an entire database which contains one corrupt index,
+ "corrupt_index", with corruption in the page header, along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index "corrupt_index" is not a btree
+LOCATION: _bt_getmeta, nbtpage.c:152
+</screen>
+
+ <para>
+ Checking again after rebuilding the index but corrupting the contents,
+ along with the output:
+ </para>
+
+<screen>
+% pg_amcheck --check-toast --check-indexes --schema=public --table=table_with_corrupt_index mydb
+index check failed for index corrupt_index of table table_with_corrupt_index:
+ERROR: XX002: index tuple size does not equal lp_len in index "corrupt_index"
+DETAIL: Index tid=(39,49) tuple size=3373 lp_len=24 page lsn=0/2B548C0.
+HINT: This could be a torn page problem.
+LOCATION: bt_target_page_check, verify_nbtree.c:1125
+</screen>
+
+ </sect2>
+</sect1>
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 90594bd41b..ec87fb85b3 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'pg_standby', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ff853634bc..2408bb2bf6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -102,6 +102,7 @@ AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AlternativeSubPlanState
+AmCheckSettings
AnalyzeAttrComputeStatsFunc
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
@@ -403,6 +404,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnectOptions
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
v24-0003-Creating-non-throwing-interface-to-clog-and-slru.patchapplication/octet-stream; name=v24-0003-Creating-non-throwing-interface-to-clog-and-slru.patch; x-unix-mode=0644Download
From 73c97ae7cd2e39aee80712132654f4745a35bdae Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 21 Oct 2020 20:26:07 -0700
Subject: [PATCH v24 2/4] Creating non-throwing interface to clog and slru.
---
src/backend/access/transam/clog.c | 21 +++---
src/backend/access/transam/commit_ts.c | 4 +-
src/backend/access/transam/multixact.c | 16 ++---
src/backend/access/transam/slru.c | 23 +++---
src/backend/access/transam/subtrans.c | 4 +-
src/backend/access/transam/transam.c | 98 +++++++++-----------------
src/backend/commands/async.c | 4 +-
src/backend/storage/lmgr/predicate.c | 4 +-
src/include/access/clog.h | 18 +----
src/include/access/clogdefs.h | 33 +++++++++
src/include/access/slru.h | 6 +-
src/include/access/transam.h | 3 +
12 files changed, 122 insertions(+), 112 deletions(-)
create mode 100644 src/include/access/clogdefs.h
diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 034349aa7b..a2eb3e2983 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -357,7 +357,7 @@ TransactionIdSetPageStatusInternal(TransactionId xid, int nsubxids,
* write-busy, since we don't care if the update reaches disk sooner than
* we think.
*/
- slotno = SimpleLruReadPage(XactCtl, pageno, XLogRecPtrIsInvalid(lsn), xid);
+ slotno = SimpleLruReadPage(XactCtl, pageno, XLogRecPtrIsInvalid(lsn), xid, true);
/*
* Set the main transaction id, if any.
@@ -631,7 +631,7 @@ TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, i
* for most uses; TransactionLogFetch() in transam.c is the intended caller.
*/
XidStatus
-TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
+TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn, bool throwError)
{
int pageno = TransactionIdToPage(xid);
int byteno = TransactionIdToByte(xid);
@@ -643,13 +643,18 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn)
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(XactCtl, pageno, xid);
- byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
+ slotno = SimpleLruReadPage_ReadOnly(XactCtl, pageno, xid, throwError);
+ if (slotno == InvalidSlotNo)
+ status = TRANSACTION_STATUS_UNKNOWN;
+ else
+ {
+ byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
- status = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
+ status = (*byteptr >> bshift) & CLOG_XACT_BITMASK;
- lsnindex = GetLSNIndex(slotno, xid);
- *lsn = XactCtl->shared->group_lsn[lsnindex];
+ lsnindex = GetLSNIndex(slotno, xid);
+ *lsn = XactCtl->shared->group_lsn[lsnindex];
+ }
LWLockRelease(XactSLRULock);
@@ -796,7 +801,7 @@ TrimCLOG(void)
int slotno;
char *byteptr;
- slotno = SimpleLruReadPage(XactCtl, pageno, false, xid);
+ slotno = SimpleLruReadPage(XactCtl, pageno, false, xid, true);
byteptr = XactCtl->shared->page_buffer[slotno] + byteno;
/* Zero so-far-unused positions in the current byte */
diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c
index cb8a968801..98c685405c 100644
--- a/src/backend/access/transam/commit_ts.c
+++ b/src/backend/access/transam/commit_ts.c
@@ -237,7 +237,7 @@ SetXidCommitTsInPage(TransactionId xid, int nsubxids,
LWLockAcquire(CommitTsSLRULock, LW_EXCLUSIVE);
- slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid);
+ slotno = SimpleLruReadPage(CommitTsCtl, pageno, true, xid, true);
TransactionIdSetCommitTs(xid, ts, nodeid, slotno);
for (i = 0; i < nsubxids; i++)
@@ -342,7 +342,7 @@ TransactionIdGetCommitTsData(TransactionId xid, TimestampTz *ts,
}
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(CommitTsCtl, pageno, xid);
+ slotno = SimpleLruReadPage_ReadOnly(CommitTsCtl, pageno, xid, true);
memcpy(&entry,
CommitTsCtl->shared->page_buffer[slotno] +
SizeOfCommitTimestampEntry * entryno,
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 43653fe572..ed902a9d64 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -881,7 +881,7 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
* enough that a MultiXactId is really involved. Perhaps someday we'll
* take the trouble to generalize the slru.c error reporting code.
*/
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -914,7 +914,7 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
if (pageno != prev_pageno)
{
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi, true);
prev_pageno = pageno;
}
@@ -1345,7 +1345,7 @@ retry:
pageno = MultiXactIdToOffsetPage(multi);
entryno = MultiXactIdToOffsetEntry(multi);
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
offset = *offptr;
@@ -1377,7 +1377,7 @@ retry:
entryno = MultiXactIdToOffsetEntry(tmpMXact);
if (pageno != prev_pageno)
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -1418,7 +1418,7 @@ retry:
if (pageno != prev_pageno)
{
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, multi, true);
prev_pageno = pageno;
}
@@ -2063,7 +2063,7 @@ TrimMultiXact(void)
int slotno;
MultiXactOffset *offptr;
- slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact);
+ slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, nextMXact, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
@@ -2095,7 +2095,7 @@ TrimMultiXact(void)
int memberoff;
memberoff = MXOffsetToMemberOffset(offset);
- slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset);
+ slotno = SimpleLruReadPage(MultiXactMemberCtl, pageno, true, offset, true);
xidptr = (TransactionId *)
(MultiXactMemberCtl->shared->page_buffer[slotno] + memberoff);
@@ -2749,7 +2749,7 @@ find_multixact_start(MultiXactId multi, MultiXactOffset *result)
return false;
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi);
+ slotno = SimpleLruReadPage_ReadOnly(MultiXactOffsetCtl, pageno, multi, true);
offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
offptr += entryno;
offset = *offptr;
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 16a7898697..daa145eeff 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -385,14 +385,15 @@ SimpleLruWaitIO(SlruCtl ctl, int slotno)
* The passed-in xid is used only for error reporting, and may be
* InvalidTransactionId if no specific xid is associated with the action.
*
- * Return value is the shared-buffer slot number now holding the page.
- * The buffer's LRU access info is updated.
+ * On error, when throwError is false, the return value is negative.
+ * Otherwise, return value is the shared-buffer slot number now holding the
+ * page, and the buffer's LRU access info is updated.
*
* Control lock must be held at entry, and will be held at exit.
*/
int
SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
- TransactionId xid)
+ TransactionId xid, bool throwError)
{
SlruShared shared = ctl->shared;
@@ -465,7 +466,11 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
/* Now it's okay to ereport if we failed */
if (!ok)
- SlruReportIOError(ctl, pageno, xid);
+ {
+ if (throwError)
+ SlruReportIOError(ctl, pageno, xid);
+ return InvalidSlotNo;
+ }
SlruRecentlyUsed(shared, slotno);
@@ -484,14 +489,16 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
* The passed-in xid is used only for error reporting, and may be
* InvalidTransactionId if no specific xid is associated with the action.
*
- * Return value is the shared-buffer slot number now holding the page.
- * The buffer's LRU access info is updated.
+ * On error, when throwError is false, the return value is negative.
+ * Otherwise, return value is the shared-buffer slot number now holding the
+ * page, and the buffer's LRU access info is updated.
*
* Control lock must NOT be held at entry, but will be held at exit.
* It is unspecified whether the lock will be shared or exclusive.
*/
int
-SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
+SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid,
+ bool throwError)
{
SlruShared shared = ctl->shared;
int slotno;
@@ -520,7 +527,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid)
LWLockRelease(shared->ControlLock);
LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE);
- return SimpleLruReadPage(ctl, pageno, true, xid);
+ return SimpleLruReadPage(ctl, pageno, true, xid, throwError);
}
/*
diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index 0111e867c7..353b946731 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -83,7 +83,7 @@ SubTransSetParent(TransactionId xid, TransactionId parent)
LWLockAcquire(SubtransSLRULock, LW_EXCLUSIVE);
- slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid);
+ slotno = SimpleLruReadPage(SubTransCtl, pageno, true, xid, true);
ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
ptr += entryno;
@@ -123,7 +123,7 @@ SubTransGetParent(TransactionId xid)
/* lock is acquired by SimpleLruReadPage_ReadOnly */
- slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid);
+ slotno = SimpleLruReadPage_ReadOnly(SubTransCtl, pageno, xid, true);
ptr = (TransactionId *) SubTransCtl->shared->page_buffer[slotno];
ptr += entryno;
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index a28918657c..88f867e5ef 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -35,7 +35,8 @@ static XidStatus cachedFetchXidStatus;
static XLogRecPtr cachedCommitLSN;
/* Local functions */
-static XidStatus TransactionLogFetch(TransactionId transactionId);
+static XidStatus TransactionLogFetch(TransactionId transactionId,
+ bool throwError);
/* ----------------------------------------------------------------
@@ -49,7 +50,7 @@ static XidStatus TransactionLogFetch(TransactionId transactionId);
* TransactionLogFetch --- fetch commit status of specified transaction id
*/
static XidStatus
-TransactionLogFetch(TransactionId transactionId)
+TransactionLogFetch(TransactionId transactionId, bool throwError)
{
XidStatus xidstatus;
XLogRecPtr xidlsn;
@@ -76,14 +77,16 @@ TransactionLogFetch(TransactionId transactionId)
/*
* Get the transaction status.
*/
- xidstatus = TransactionIdGetStatus(transactionId, &xidlsn);
+ xidstatus = TransactionIdGetStatus(transactionId, &xidlsn, throwError);
/*
* Cache it, but DO NOT cache status for unfinished or sub-committed
* transactions! We only cache status that is guaranteed not to change.
+ * Likewise, DO NOT cache when the status is unknown.
*/
if (xidstatus != TRANSACTION_STATUS_IN_PROGRESS &&
- xidstatus != TRANSACTION_STATUS_SUB_COMMITTED)
+ xidstatus != TRANSACTION_STATUS_SUB_COMMITTED &&
+ xidstatus != TRANSACTION_STATUS_UNKNOWN)
{
cachedFetchXid = transactionId;
cachedFetchXidStatus = xidstatus;
@@ -96,6 +99,7 @@ TransactionLogFetch(TransactionId transactionId)
/* ----------------------------------------------------------------
* Interface functions
*
+ * TransactionIdResolveStatus
* TransactionIdDidCommit
* TransactionIdDidAbort
* ========
@@ -115,24 +119,17 @@ TransactionLogFetch(TransactionId transactionId)
*/
/*
- * TransactionIdDidCommit
- * True iff transaction associated with the identifier did commit.
- *
- * Note:
- * Assumes transaction identifier is valid and exists in clog.
+ * TransactionIdResolveStatus
+ * Returns the status of the transaction associated with the identifer,
+ * recursively resolving sub-committed transaction status by checking
+ * the parent transaction.
*/
-bool /* true if given transaction committed */
-TransactionIdDidCommit(TransactionId transactionId)
+XidStatus
+TransactionIdResolveStatus(TransactionId transactionId, bool throwError)
{
XidStatus xidstatus;
- xidstatus = TransactionLogFetch(transactionId);
-
- /*
- * If it's marked committed, it's committed.
- */
- if (xidstatus == TRANSACTION_STATUS_COMMITTED)
- return true;
+ xidstatus = TransactionLogFetch(transactionId, throwError);
/*
* If it's marked subcommitted, we have to check the parent recursively.
@@ -153,21 +150,31 @@ TransactionIdDidCommit(TransactionId transactionId)
TransactionId parentXid;
if (TransactionIdPrecedes(transactionId, TransactionXmin))
- return false;
+ return TRANSACTION_STATUS_ABORTED;
parentXid = SubTransGetParent(transactionId);
if (!TransactionIdIsValid(parentXid))
{
elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
transactionId);
- return false;
+ return TRANSACTION_STATUS_ABORTED;
}
- return TransactionIdDidCommit(parentXid);
+ return TransactionIdResolveStatus(parentXid, throwError);
}
+ return xidstatus;
+}
- /*
- * It's not committed.
- */
- return false;
+/*
+ * TransactionIdDidCommit
+ * True iff transaction associated with the identifier did commit.
+ *
+ * Note:
+ * Assumes transaction identifier is valid and exists in clog.
+ */
+bool /* true if given transaction committed */
+TransactionIdDidCommit(TransactionId transactionId)
+{
+ return (TransactionIdResolveStatus(transactionId, true) ==
+ TRANSACTION_STATUS_COMMITTED);
}
/*
@@ -180,43 +187,8 @@ TransactionIdDidCommit(TransactionId transactionId)
bool /* true if given transaction aborted */
TransactionIdDidAbort(TransactionId transactionId)
{
- XidStatus xidstatus;
-
- xidstatus = TransactionLogFetch(transactionId);
-
- /*
- * If it's marked aborted, it's aborted.
- */
- if (xidstatus == TRANSACTION_STATUS_ABORTED)
- return true;
-
- /*
- * If it's marked subcommitted, we have to check the parent recursively.
- * However, if it's older than TransactionXmin, we can't look at
- * pg_subtrans; instead assume that the parent crashed without cleaning up
- * its children.
- */
- if (xidstatus == TRANSACTION_STATUS_SUB_COMMITTED)
- {
- TransactionId parentXid;
-
- if (TransactionIdPrecedes(transactionId, TransactionXmin))
- return true;
- parentXid = SubTransGetParent(transactionId);
- if (!TransactionIdIsValid(parentXid))
- {
- /* see notes in TransactionIdDidCommit */
- elog(WARNING, "no pg_subtrans entry for subcommitted XID %u",
- transactionId);
- return true;
- }
- return TransactionIdDidAbort(parentXid);
- }
-
- /*
- * It's not aborted.
- */
- return false;
+ return (TransactionIdResolveStatus(transactionId, true) ==
+ TRANSACTION_STATUS_ABORTED);
}
/*
@@ -419,7 +391,7 @@ TransactionIdGetCommitLSN(TransactionId xid)
/*
* Get the transaction status.
*/
- (void) TransactionIdGetStatus(xid, &result);
+ (void) TransactionIdGetStatus(xid, &result, true);
return result;
}
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 8dbcace3f9..a49126dba0 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -1477,7 +1477,7 @@ asyncQueueAddEntries(ListCell *nextNotify)
slotno = SimpleLruZeroPage(NotifyCtl, pageno);
else
slotno = SimpleLruReadPage(NotifyCtl, pageno, true,
- InvalidTransactionId);
+ InvalidTransactionId, true);
/* Note we mark the page dirty before writing in it */
NotifyCtl->shared->page_dirty[slotno] = true;
@@ -2010,7 +2010,7 @@ asyncQueueReadAllNotifications(void)
* part of the page we will actually inspect.
*/
slotno = SimpleLruReadPage_ReadOnly(NotifyCtl, curpage,
- InvalidTransactionId);
+ InvalidTransactionId, true);
if (curpage == QUEUE_POS_PAGE(head))
{
/* we only want to read as far as head */
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 8a365b400c..6cf12e46f6 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -904,7 +904,7 @@ SerialAdd(TransactionId xid, SerCommitSeqNo minConflictCommitSeqNo)
slotno = SimpleLruZeroPage(SerialSlruCtl, targetPage);
}
else
- slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
+ slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid, true);
SerialValue(slotno, xid) = minConflictCommitSeqNo;
SerialSlruCtl->shared->page_dirty[slotno] = true;
@@ -946,7 +946,7 @@ SerialGetMinConflictCommitSeqNo(TransactionId xid)
* but will return with that lock held, which must then be released.
*/
slotno = SimpleLruReadPage_ReadOnly(SerialSlruCtl,
- SerialPage(xid), xid);
+ SerialPage(xid), xid, true);
val = SerialValue(slotno, xid);
LWLockRelease(SerialSLRULock);
return val;
diff --git a/src/include/access/clog.h b/src/include/access/clog.h
index 6c840cbf29..cf299cd8f6 100644
--- a/src/include/access/clog.h
+++ b/src/include/access/clog.h
@@ -11,24 +11,11 @@
#ifndef CLOG_H
#define CLOG_H
+#include "access/clogdefs.h"
#include "access/xlogreader.h"
#include "storage/sync.h"
#include "lib/stringinfo.h"
-/*
- * Possible transaction statuses --- note that all-zeroes is the initial
- * state.
- *
- * A "subcommitted" transaction is a committed subtransaction whose parent
- * hasn't committed or aborted yet.
- */
-typedef int XidStatus;
-
-#define TRANSACTION_STATUS_IN_PROGRESS 0x00
-#define TRANSACTION_STATUS_COMMITTED 0x01
-#define TRANSACTION_STATUS_ABORTED 0x02
-#define TRANSACTION_STATUS_SUB_COMMITTED 0x03
-
typedef struct xl_clog_truncate
{
int pageno;
@@ -38,7 +25,8 @@ typedef struct xl_clog_truncate
extern void TransactionIdSetTreeStatus(TransactionId xid, int nsubxids,
TransactionId *subxids, XidStatus status, XLogRecPtr lsn);
-extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn);
+extern XidStatus TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn,
+ bool throwError);
extern Size CLOGShmemBuffers(void);
extern Size CLOGShmemSize(void);
diff --git a/src/include/access/clogdefs.h b/src/include/access/clogdefs.h
new file mode 100644
index 0000000000..0f9996bb08
--- /dev/null
+++ b/src/include/access/clogdefs.h
@@ -0,0 +1,33 @@
+/*
+ * clogdefs.h
+ *
+ * PostgreSQL transaction-commit-log manager
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/clogdefs.h
+ */
+#ifndef CLOGDEFS_H
+#define CLOGDEFS_H
+
+/*
+ * Possible transaction statuses --- note that all-zeroes is the initial
+ * state.
+ *
+ * A "subcommitted" transaction is a committed subtransaction whose parent
+ * hasn't committed or aborted yet.
+ *
+ * An "unknown" status indicates an error condition, such as when the clog has
+ * been erroneously truncated and the commit status of a transaction cannot be
+ * determined.
+ */
+typedef enum XidStatus {
+ TRANSACTION_STATUS_IN_PROGRESS = 0x00,
+ TRANSACTION_STATUS_COMMITTED = 0x01,
+ TRANSACTION_STATUS_ABORTED = 0x02,
+ TRANSACTION_STATUS_SUB_COMMITTED = 0x03,
+ TRANSACTION_STATUS_UNKNOWN = 0x04 /* error condition */
+} XidStatus;
+
+#endif /* CLOG_H */
diff --git a/src/include/access/slru.h b/src/include/access/slru.h
index b39b43504d..0b6a5669d8 100644
--- a/src/include/access/slru.h
+++ b/src/include/access/slru.h
@@ -133,6 +133,8 @@ typedef struct SlruCtlData
typedef SlruCtlData *SlruCtl;
+#define InvalidSlotNo ((int) -1)
+
extern Size SimpleLruShmemSize(int nslots, int nlsns);
extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
@@ -140,9 +142,9 @@ extern void SimpleLruInit(SlruCtl ctl, const char *name, int nslots, int nlsns,
SyncRequestHandler sync_handler);
extern int SimpleLruZeroPage(SlruCtl ctl, int pageno);
extern int SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok,
- TransactionId xid);
+ TransactionId xid, bool throwError);
extern int SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno,
- TransactionId xid);
+ TransactionId xid, bool throwError);
extern void SimpleLruWritePage(SlruCtl ctl, int slotno);
extern void SimpleLruWriteAll(SlruCtl ctl, bool allow_redirtied);
extern void SimpleLruTruncate(SlruCtl ctl, int cutoffPage);
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 2f1f144db4..7d5e2f614d 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -14,6 +14,7 @@
#ifndef TRANSAM_H
#define TRANSAM_H
+#include "access/clogdefs.h"
#include "access/xlogdefs.h"
@@ -264,6 +265,8 @@ extern PGDLLIMPORT VariableCache ShmemVariableCache;
/*
* prototypes for functions in transam/transam.c
*/
+extern XidStatus TransactionIdResolveStatus(TransactionId transactionId,
+ bool throwError);
extern bool TransactionIdDidCommit(TransactionId transactionId);
extern bool TransactionIdDidAbort(TransactionId transactionId);
extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
--
2.21.1 (Apple Git-122.3)
v24-0004-Using-non-throwing-clog-interface-from-amcheck.patchapplication/octet-stream; name=v24-0004-Using-non-throwing-clog-interface-from-amcheck.patch; x-unix-mode=0644Download
From c8bbb5e1d98c46c671b0e40c6f78b7b0e4932cc0 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 26 Oct 2020 07:34:42 -0700
Subject: [PATCH v24 3/4] Using non-throwing clog interface from amcheck
Converting the heap checking functions to use the recently introduced
non-throwing interface to clog when checking transaction commit status, and
adding corruption reports about missing clog rather than aborting.
---
contrib/amcheck/verify_heapam.c | 74 +++++++------
contrib/pg_amcheck/t/006_clog_truncation.pl | 111 ++++++++++++++++++++
src/tools/pgindent/typedefs.list | 1 -
3 files changed, 155 insertions(+), 31 deletions(-)
create mode 100644 contrib/pg_amcheck/t/006_clog_truncation.pl
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 8bb890438a..263cbb37c1 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -10,6 +10,7 @@
*/
#include "postgres.h"
+#include "access/clogdefs.h"
#include "access/detoast.h"
#include "access/genam.h"
#include "access/heapam.h"
@@ -43,13 +44,6 @@ typedef enum XidBoundsViolation
XID_BOUNDS_OK
} XidBoundsViolation;
-typedef enum XidCommitStatus
-{
- XID_COMMITTED,
- XID_IN_PROGRESS,
- XID_ABORTED
-} XidCommitStatus;
-
typedef enum SkipPages
{
SKIP_PAGES_ALL_FROZEN,
@@ -83,7 +77,7 @@ typedef struct HeapCheckContext
* Cached copies of the most recently checked xid and its status.
*/
TransactionId cached_xid;
- XidCommitStatus cached_status;
+ XidStatus cached_status;
/* Values concerning the heap relation being checked */
Relation rel;
@@ -148,7 +142,7 @@ static XidBoundsViolation check_mxid_valid_in_rel(MultiXactId mxid,
HeapCheckContext *ctx);
static XidBoundsViolation get_xid_status(TransactionId xid,
HeapCheckContext *ctx,
- XidCommitStatus *status);
+ XidStatus *status);
/*
* Scan and report corruption in heap pages, optionally reconciling toasted
@@ -675,7 +669,7 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
else if (infomask & HEAP_MOVED_OFF ||
infomask & HEAP_MOVED_IN)
{
- XidCommitStatus status;
+ XidStatus status;
TransactionId xvac = HeapTupleHeaderGetXvac(tuphdr);
switch (get_xid_status(xvac, ctx, &status))
@@ -710,17 +704,25 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
- case XID_COMMITTED:
- case XID_ABORTED:
+ case TRANSACTION_STATUS_COMMITTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ psprintf("old-style VACUUM FULL transaction ID %u transaction status is lost",
+ xvac));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
}
else
{
- XidCommitStatus status;
+ XidStatus status;
switch (get_xid_status(raw_xmin, ctx, &status))
{
@@ -752,12 +754,20 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_COMMITTED:
+ case TRANSACTION_STATUS_COMMITTED:
break;
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* insert or delete in progress */
- case XID_ABORTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ psprintf("raw xmin %u transaction status is lost",
+ raw_xmin));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
}
@@ -767,7 +777,7 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
{
if (infomask & HEAP_XMAX_IS_MULTI)
{
- XidCommitStatus status;
+ XidStatus status;
TransactionId xmax = HeapTupleGetUpdateXid(tuphdr);
switch (get_xid_status(xmax, ctx, &status))
@@ -801,12 +811,20 @@ check_tuple_header_and_visibilty(HeapTupleHeader tuphdr, HeapCheckContext *ctx)
case XID_BOUNDS_OK:
switch (status)
{
- case XID_IN_PROGRESS:
+ case TRANSACTION_STATUS_IN_PROGRESS:
return true; /* HEAPTUPLE_DELETE_IN_PROGRESS */
- case XID_COMMITTED:
- case XID_ABORTED:
+ case TRANSACTION_STATUS_COMMITTED:
+ case TRANSACTION_STATUS_ABORTED:
return false; /* HEAPTUPLE_RECENTLY_DEAD or
* HEAPTUPLE_DEAD */
+ case TRANSACTION_STATUS_UNKNOWN:
+ report_corruption(ctx,
+ psprintf("xmax %u transaction status is lost",
+ xmax));
+ return false; /* corruption */
+ case TRANSACTION_STATUS_SUB_COMMITTED:
+ elog(ERROR, "get_xid_status failed to resolve parent transaction status");
+ return false; /* not reached */
}
}
@@ -1401,7 +1419,7 @@ check_mxid_valid_in_rel(MultiXactId mxid, HeapCheckContext *ctx)
*/
static XidBoundsViolation
get_xid_status(TransactionId xid, HeapCheckContext *ctx,
- XidCommitStatus *status)
+ XidStatus *status)
{
FullTransactionId fxid;
FullTransactionId clog_horizon;
@@ -1412,7 +1430,7 @@ get_xid_status(TransactionId xid, HeapCheckContext *ctx,
else if (xid == BootstrapTransactionId || xid == FrozenTransactionId)
{
if (status != NULL)
- *status = XID_COMMITTED;
+ *status = TRANSACTION_STATUS_COMMITTED;
return XID_BOUNDS_OK;
}
@@ -1447,7 +1465,7 @@ get_xid_status(TransactionId xid, HeapCheckContext *ctx,
return XID_BOUNDS_OK;
}
- *status = XID_COMMITTED;
+ *status = TRANSACTION_STATUS_COMMITTED;
LWLockAcquire(XactTruncationLock, LW_SHARED);
clog_horizon =
FullTransactionIdFromXidAndCtx(ShmemVariableCache->oldestClogXid,
@@ -1455,13 +1473,9 @@ get_xid_status(TransactionId xid, HeapCheckContext *ctx,
if (FullTransactionIdPrecedesOrEquals(clog_horizon, fxid))
{
if (TransactionIdIsCurrentTransactionId(xid))
- *status = XID_IN_PROGRESS;
- else if (TransactionIdDidCommit(xid))
- *status = XID_COMMITTED;
- else if (TransactionIdDidAbort(xid))
- *status = XID_ABORTED;
+ *status = TRANSACTION_STATUS_IN_PROGRESS;
else
- *status = XID_IN_PROGRESS;
+ *status = TransactionIdResolveStatus(xid, false);
}
LWLockRelease(XactTruncationLock);
ctx->cached_xid = xid;
diff --git a/contrib/pg_amcheck/t/006_clog_truncation.pl b/contrib/pg_amcheck/t/006_clog_truncation.pl
new file mode 100644
index 0000000000..f205ae7ede
--- /dev/null
+++ b/contrib/pg_amcheck/t/006_clog_truncation.pl
@@ -0,0 +1,111 @@
+# This regression test checks the behavior of the heap validation in the
+# presence of clog corruption.
+
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 3;
+
+my ($node, $pgdata, $clogdir);
+
+sub count_clog_files
+{
+ my $result = 0;
+ opendir(DIR, $clogdir) or die "Cannot opendir $clogdir: $!";
+ while (my $fname = readdir(DIR))
+ {
+ $result++ if (-f "$clogdir/$fname");
+ }
+ closedir(DIR);
+ return $result;
+}
+
+# Burn through enough xids that at least three clog files exists in pg_xact/
+sub create_three_clog_files
+{
+ print STDERR "Generating clog entries....\n";
+
+ $node->safe_psql('postgres', q(
+ CREATE PROCEDURE burn_xids ()
+ LANGUAGE plpgsql
+ AS $$
+ DECLARE
+ loopcnt BIGINT;
+ BEGIN
+ FOR loopcnt IN 1..32768
+ LOOP
+ PERFORM txid_current();
+ COMMIT;
+ END LOOP;
+ END;
+ $$;
+ ));
+
+ do {
+ $node->safe_psql('postgres', 'INSERT INTO test_0 (i) VALUES (0)');
+ $node->safe_psql('postgres', 'CALL burn_xids()');
+ print STDERR "Burned transaction ids...\n";
+ $node->safe_psql('postgres', 'INSERT INTO test_1 (i) VALUES (1)');
+ } while (count_clog_files() < 3);
+}
+
+# Of the clog files in pg_xact, remove the second one, sorted by name order.
+# This function, used along with create_three_clog_files(), is intended to
+# remove neither the newest nor the oldest clog file. Experimentation shows
+# that removing the newest clog file works ok, but for future-proofing, remove
+# one less likely to be checked at server startup.
+sub unlink_second_clog_file
+{
+ my @paths;
+ opendir(DIR, $clogdir) or die "Cannot opendir $clogdir: $!";
+ while (my $fname = readdir(DIR))
+ {
+ my $path = "$clogdir/$fname";
+ next unless -f $path;
+ push @paths, $path;
+ }
+ closedir(DIR);
+
+ my @ordered = sort { $a cmp $b } @paths;
+ unlink $ordered[1];
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we corrupt clog, autovacuum workers visiting tables
+# could crash the backend. Disable autovacuum so that won't happen.
+$node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+$pgdata = $node->data_dir;
+$clogdir = join('/', $pgdata, 'pg_xact');
+$node->start;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE TABLE test_0 (i INTEGER)");
+$node->safe_psql('postgres', "CREATE TABLE test_1 (i INTEGER)");
+$node->safe_psql('postgres', "VACUUM FREEZE");
+
+create_three_clog_files();
+
+# Corruptly delete a clog file
+$node->stop;
+unlink_second_clog_file();
+$node->start;
+
+my $port = $node->port;
+
+# Run pg_amcheck against the corrupt database, looking for clog related
+# corruption messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ qr/transaction status is lost/ ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2408bb2bf6..c1144c1a92 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2796,7 +2796,6 @@ XactCallbackItem
XactEvent
XactLockTableWaitInfo
XidBoundsViolation
-XidCommitStatus
XidHorizonPrefetchState
XidStatus
XmlExpr
--
2.21.1 (Apple Git-122.3)
v24-0005-Adding-ACL-checks-for-verify_heapam.patchapplication/octet-stream; name=v24-0005-Adding-ACL-checks-for-verify_heapam.patch; x-unix-mode=0644Download
From 57e184b4d1577c7846b08603e5ad2f052f98b133 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 21 Oct 2020 20:27:23 -0700
Subject: [PATCH v24 4/4] Adding ACL checks for verify_heapam
Requiring select privileges tables scanned by verify_heapam, in
addition to the already required execute privileges on the function.
---
contrib/amcheck/expected/check_heap.out | 6 ++++++
contrib/amcheck/sql/check_heap.sql | 7 +++++++
contrib/amcheck/verify_heapam.c | 8 ++++++++
doc/src/sgml/pgamcheck.sgml | 2 +-
4 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/contrib/amcheck/expected/check_heap.out b/contrib/amcheck/expected/check_heap.out
index 882f853d56..41cdc6435c 100644
--- a/contrib/amcheck/expected/check_heap.out
+++ b/contrib/amcheck/expected/check_heap.out
@@ -95,6 +95,12 @@ SELECT * FROM verify_heapam(relation := 'heaptest');
ERROR: permission denied for function verify_heapam
RESET ROLE;
GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are checked (error due to no select privileges on relation)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+ERROR: permission denied for table heaptest
+RESET ROLE;
+GRANT SELECT ON heaptest TO regress_heaptest_role;
-- verify permissions are now sufficient
SET ROLE regress_heaptest_role;
SELECT * FROM verify_heapam(relation := 'heaptest');
diff --git a/contrib/amcheck/sql/check_heap.sql b/contrib/amcheck/sql/check_heap.sql
index c10a25f21c..c8397a46f0 100644
--- a/contrib/amcheck/sql/check_heap.sql
+++ b/contrib/amcheck/sql/check_heap.sql
@@ -41,6 +41,13 @@ RESET ROLE;
GRANT EXECUTE ON FUNCTION verify_heapam(regclass, boolean, boolean, text, bigint, bigint) TO regress_heaptest_role;
+-- verify permissions are checked (error due to no select privileges on relation)
+SET ROLE regress_heaptest_role;
+SELECT * FROM verify_heapam(relation := 'heaptest');
+RESET ROLE;
+
+GRANT SELECT ON heaptest TO regress_heaptest_role;
+
-- verify permissions are now sufficient
SET ROLE regress_heaptest_role;
SELECT * FROM verify_heapam(relation := 'heaptest');
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 263cbb37c1..f3fe9c44a4 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -23,6 +23,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "storage/procarray.h"
+#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/fmgroids.h"
@@ -478,6 +479,8 @@ verify_heapam(PG_FUNCTION_ARGS)
static void
sanity_check_relation(Relation rel)
{
+ AclResult aclresult;
+
if (rel->rd_rel->relkind != RELKIND_RELATION &&
rel->rd_rel->relkind != RELKIND_MATVIEW &&
rel->rd_rel->relkind != RELKIND_TOASTVALUE)
@@ -489,6 +492,11 @@ sanity_check_relation(Relation rel)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("only heap AM is supported")));
+ aclresult = pg_class_aclcheck(rel->rd_id, GetUserId(), ACL_SELECT);
+ if (aclresult != ACLCHECK_OK)
+ aclcheck_error(aclresult,
+ get_relkind_objtype(rel->rd_rel->relkind),
+ RelationGetRelationName(rel));
}
/*
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
index 3e059e7753..fc36447dda 100644
--- a/doc/src/sgml/pgamcheck.sgml
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -19,7 +19,7 @@
connecting as a user with sufficient privileges to check tables and indexes.
Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
<function>bt_index_parent_check</function> and <function>verify_heapam</function>
- functions.
+ functions, and on having privileges to access the relations being checked.
</para>
<synopsis>
--
2.21.1 (Apple Git-122.3)
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Oct 21, 2020 at 11:45 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:Done that way in the attached, which also include Robert's changes from v19 he posted earlier today.
Committed. Let's see what the buildfarm thinks.
Another thing that the buildfarm is pointing out is
[WARN] FOUserAgent - The contents of fo:block line 2 exceed the available area in the inline-progression direction by more than 50 points. (See position 148863:380)
This is coming from the sample output for verify_heapam(), which is too
wide to fit in even a normal-size browser window, let alone A4 PDF.
While we could perhaps hack it up to allow more line breaks, or see
if \x formatting helps, my own suggestion would be to just nuke the
sample output altogether. It doesn't look like it is any sort of
representative real output, and it is not useful enough to be worth
spending time to patch up.
regards, tom lane
On Oct 26, 2020, at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Oct 21, 2020 at 11:45 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:Done that way in the attached, which also include Robert's changes from v19 he posted earlier today.
Committed. Let's see what the buildfarm thinks.
Another thing that the buildfarm is pointing out is
[WARN] FOUserAgent - The contents of fo:block line 2 exceed the available area in the inline-progression direction by more than 50 points. (See position 148863:380)
This is coming from the sample output for verify_heapam(), which is too
wide to fit in even a normal-size browser window, let alone A4 PDF.While we could perhaps hack it up to allow more line breaks, or see
if \x formatting helps, my own suggestion would be to just nuke the
sample output altogether.
Ok.
It doesn't look like it is any sort of
representative real output,
It is not. It came from artificially created corruption in the regression tests. I may even have manually edited that, though I don't recall.
and it is not useful enough to be worth
spending time to patch up.
Ok.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Oct 26, 2020 at 12:12 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
The v20 patches 0002, 0003, and 0005 still apply cleanly, but 0004 required a rebase. (0001 was already committed last week.)
Here is a rebased set of 4 patches, numbered 0002..0005 to be consistent with the previous naming. There are no substantial changes.
Here's a review of 0002. I basically like the direction this is going
but I guess nobody will be surprised that there are some things in
here that I think could be improved.
+const char *usage_text[] = {
+ "pg_amcheck is the PostgreSQL command line frontend for the
amcheck database corruption checker.",
+ "",
This looks like a novel approach to the problem of printing out the
usage() information, and I think that it's inferior to the technique
used elsewhere of just having a bunch of printf() statements, because
unless I misunderstand, it doesn't permit localization.
+ " -b, --startblock begin checking table(s) at the
given starting block number",
+ " -e, --endblock check table(s) only up to the
given ending block number",
+ " -B, --toast-startblock begin checking toast table(s)
at the given starting block",
+ " -E, --toast-endblock check toast table(s) only up
to the given ending block",
I am not very convinced by this. What's the use case? If you're just
checking a single table, you might want to specify a start and end
block, but then you don't need separate options for the TOAST and
non-TOAST cases, do you? If I want to check pg_statistic, I'll say
pg_amcheck -t pg_catalog.pg_statistic. If I want to check the TOAST
table for pg_statistic, I'll say pg_amcheck -t pg_toast.pg_toast_2619.
In either case, if I want to check just the first three blocks, I can
add -b 0 -e 2.
+ " -f, --skip-all-frozen do NOT check blocks marked as
all frozen",
+ " -v, --skip-all-visible do NOT check blocks marked as
all visible",
I think this is using up too many one character option names for too
little benefit on things that are too closely related. How about, -s,
--skip=all-frozen|all-visible|none? And then -v could mean verbose,
which could trigger things like printing all the queries sent to the
server, setting PQERRORS_VERBOSE, etc.
+ " -x, --check-indexes check btree indexes associated
with tables being checked",
+ " -X, --skip-indexes do NOT check any btree indexes",
+ " -i, --index=PATTERN check the specified index(es) only",
+ " -I, --exclude-index=PATTERN do NOT check the specified index(es)",
This is a lotta controls for something that has gotta have some
default. Either the default is everything, in which case I don't see
why I need -x, or it's nothing, in which case I don't see why I need
-X.
+ " -c, --check-corrupt check indexes even if their
associated table is corrupt",
+ " -C, --skip-corrupt do NOT check indexes if their
associated table is corrupt",
Ditto. (I think the default be to check corrupt, and there can be an
option to skip it.)
+ " -a, --heapallindexed check index tuples against the
table tuples",
+ " -A, --no-heapallindexed do NOT check index tuples
against the table tuples",
Ditto. (Not sure what the default should be, though.)
+ " -r, --rootdescend search from the root page for
each index tuple",
+ " -R, --no-rootdescend do NOT search from the root
page for each index tuple",
Ditto. (Again, not sure about the default.)
I'm also not sure if these descriptions are clear enough, but it may
also be hard to do a good job in a brief space. Still, comparing this
to the documentation of heapallindexed makes me rather nervous. This
is only trying to verify that the index contains all the tuples in the
heap, not that the values in the heap and index tuples actually match.
+typedef struct
+AmCheckSettings
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
Making the struct name different from the type name seems not good,
and the struct name also shouldn't be on a separate line.
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
Ugh. It's not this patch's fault, but we really oughta move this to
someplace more centralized.
+typedef struct
...
+} AmCheckSettings;
I'm not sure I consider all of these things settings, "db" in
particular. But maybe that's nitpicking.
+static void expand_schema_name_patterns(const SimpleStringList *patterns,
+
const SimpleOidList *exclude_oids,
+
SimpleOidList *oids
+
bool strict_names);
This is copied from pg_dump, along with I think at least one other
function from nearby. Unlike the trivalue case above, this would be
the first duplication of this logic. Can we push this stuff into
pgcommon, perhaps?
+ /*
+ * Default behaviors for user settable options. Note that these default
+ * to doing all the safe checks and none of the unsafe ones,
on the theory
+ * that if a user says "pg_amcheck mydb" without specifying
any additional
+ * options, we should check everything we know how to check without
+ * risking any backend aborts.
+ */
This to me seems too conservative. The result is that by default we
check only tables, not indexes. I don't think that's going to be what
users want. I don't know whether they want the heapallindexed or
rootdescend behaviors for index checks, but I think they want their
indexes checked. Happy to hear opinions from actual users on what they
want; this is just me guessing that you've guessed wrong. :-)
+ if (settings.db == NULL)
+ {
+ pg_log_error("no connection to server after
initial attempt");
+ exit(EXIT_BADCONN);
+ }
I think this is documented as meaning out of memory, and reported that
way elsewhere. Anyway I am going to keep complaining until there are
no cases where we tell the user it broke without telling them what
broke. Which means this bit is a problem too:
+ if (!settings.db)
+ {
+ pg_log_error("no connection to server");
+ exit(EXIT_BADCONN);
+ }
Something went wrong, good luck figuring out what it was!
+ /*
+ * All information about corrupt indexes are returned via
ereport, not as
+ * tuples. We want all the details to report if corruption exists.
+ */
+ PQsetErrorVerbosity(settings.db, PQERRORS_VERBOSE);
Really? Why? If I need the source code file name, function name, and
line number to figure out what went wrong, that is not a great sign
for the quality of the error reports it produces.
+ /*
+ * The btree checking logic which optionally
checks the contents
+ * of an index against the corresponding table
has not yet been
+ * sufficiently hardened against corrupt
tables. In particular,
+ * when called with heapallindexed true, it
segfaults if the file
+ * backing the table relation has been
erroneously unlinked. In
+ * any event, it seems unwise to reconcile an
index against its
+ * table when we already know the table is corrupt.
+ */
+ old_heapallindexed = settings.heapallindexed;
+ if (corruptions)
+ settings.heapallindexed = false;
This seems pretty lame to me. Even if the btree checker can't tolerate
corruption to the extent that the heap checker does, seg faulting
because of a missing file seems like a bug that we should just fix
(and probably back-patch). I'm not very convinced by the decision to
override the user's decision about heapallindexed either. Maybe I lack
imagination, but that seems pretty arbitrary. Suppose there's a giant
index which is missing entries for 5 million heap tuples and also
there's 1 entry in the table which has an xmin that is less than the
pg_clas.relfrozenxid value by 1. You are proposing that because I have
the latter problem I don't want you to check for the former one. But
I, John Q. Smartuser, do not want you to second-guess what I told you
on the command line that I wanted. :-)
I think in general you're worrying too much about the possibility of
this tool causing backend crashes. I think it's good that you wrote
the heapcheck code in a way that's hardened against that, and I think
we should try to harden other things as time permits. But I don't
think that the remote possibility of a crash due to the lack of such
hardening should dictate the design behavior of this tool. If the
crash possibilities are not remote, then I think the solution is to
fix them, rather than cutting out important checks.
It doesn't seem like great design to me that get_table_check_list()
gets just the OID of the table itself, and then later if we decide to
check the TOAST table we've got to run a separate query for each table
we want to check to fetch the TOAST OID, when we could've just fetched
both in get_table_check_list() by including two columns in the query
rather than one and it would've been basically free. Imagine if some
user wrote a query that fetched the primary key value for all their
rows and then had their application run a separate query to fetch the
entire contents of each of those rows, said contents consisting of one
more integer. And then suppose they complained about performance. We'd
tell them they were doing it wrong, and so here.
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_table");
Uninformative. Is this basically an Assert? If so maybe just make it
one. If not maybe fail somewhere else with a better message?
+ if (startblock == NULL)
+ startblock = "NULL";
+ if (endblock == NULL)
+ endblock = "NULL";
It seems like it would be more elegant to initialize
settings.startblock and settings.endblock to "NULL." However, there's
also a related problem, which is that the startblock and endblock
values can be anything, and are interpolated with quoting. I don't
think that it's good to ship a tool with SQL injection hazards built
into it. I think that you should (a) check that these values are
integers during argument parsing and error out if they are not and
then (b) use either a prepared query or PQescapeLiteral() anyway.
+ stop = (on_error_stop) ? "true" : "false";
+ toast = (check_toast) ? "true" : "false";
The parens aren't really needed here.
+
printf("(relname=%s,blkno=%s,offnum=%s,attnum=%s)\n%s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
I am not quite sure how to format the output, but this looks like
something designed by an engineer who knows too much about the topic.
I suspect users won't find the use of things like "relname" and
"blkno" too easy to understand. At least I think we should say
"relation, block, offset, attribute" instead of "relname, blkno,
offnum, attnum". I would probably drop the parenthesis and add spaces,
so that you end up with something like:
relation "%s", block "%s", offset "%s", attribute "%s":
I would also define variant strings so that we entirely omit things
that are NULL. e.g. have four strings:
relation "%s":
relation "%s", block "%s":(
relation "%s", block "%s", offset "%s":
relation "%s", block "%s", offset "%s", attribute "%s":
Would it make it more readable if we indented the continuation line by
four spaces or something?
+ corruption_cnt++;
+ printf("%s\n", error);
+ pfree(error);
Seems like we could still print the relation name in this case, and
that it would be a good idea to do so, in case it's not in the message
that the server returns.
The general logic in this part of the code looks a bit strange to me.
If ExecuteSqlQuery() returns PGRES_TUPLES_OK, we print out the details
for each returned row. Otherwise, if error = true, we print the error.
But, what if neither of those things are the case? Then we'd just
print nothing despite having gotten back some weird response from the
server. That actually can't happen, because ExecuteSqlQuery() always
sets *error when the return code is not PGRES_TUPLES_OK, but you
wouldn't know that from looking at this code.
Honestly, as written, ExecSqlQuery() seems like kind of a waste. The
OrDie() version is useful as a notational shorthand, but this version
seems to add more confusion than clarity. It has only three callers:
the ones in check_table() and check_indexes() have the problem
described above, and the one in get_toast_oid() could just as well be
using the OrDie() version. And also we should probably get rid of it
entirely by fetching the toast OIDs the first time around, as
mentioned above.
check_indexes() lacks a function comment. It seems to have more or
less the same problem as get_toast_oid() -- an extra query per table
to get the list of indexes. I guess it has a better excuse: there
could be lots of indexes per table, and we're fetching multiple
columns of data for each one, whereas in the TOAST case we are issuing
an extra query per table to fetch a single integer. But, couldn't we
fetch information about all the indexes we want to check in one go,
rather than fetching them separately for each table being checked? I'm
not sure if that would create too much other complexity, but it seems
like it would be quicker.
+ if (settings.db == NULL)
+ fatal("no connection on entry to check_index");
+ if (idxname == NULL)
+ fatal("no index name on entry to check_index");
+ if (tblname == NULL)
+ fatal("no table name on entry to check_index");
Again, probably these should be asserts, or if they're not, the error
should be reported better and maybe elsewhere.
Similarly in some other places, like expand_schema_name_patterns().
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the Oid list, but we don't care.
This is missing a which, like the place you copied it from, but the
version in pg_dumpall.c is better.
expand_table_name_patterns() should be reformatted to not gratuitously
exceed 80 columns. Ditto for expand_index_name_patterns().
I sort of expected that this patch might use threads to allow parallel
checking - seems like it would be a useful feature.
I originally intended to review the docs and regression tests in the
same email as the patch itself, but this email has gotten rather long
and taken rather longer to get together than I had hoped, so I'm going
to stop here for now and come back to that stuff.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Nov 19, 2020 at 9:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
I'm also not sure if these descriptions are clear enough, but it may
also be hard to do a good job in a brief space. Still, comparing this
to the documentation of heapallindexed makes me rather nervous. This
is only trying to verify that the index contains all the tuples in the
heap, not that the values in the heap and index tuples actually match.
That's a good point. As things stand, heapallindexed verification does
not notice when there are extra index tuples in the index that are in
some way inconsistent with the heap. Hopefully this isn't too much of
a problem in practice because the presence of extra spurious tuples
gets detected by the index structure verification process. But in
general that might not happen.
Ideally heapallindex verification would verify 1:1 correspondence. It
doesn't do that right now, but it could.
This could work by having two bloom filters -- one for the heap,
another for the index. The implementation would look for the absence
of index tuples that should be in the index initially, just like
today. But at the end it would modify the index bloom filter by &= it
with the complement of the heap bloom filter. If any bits are left set
in the index bloom filter, we go back through the index once more and
locate index tuples that have at least some matching bits in the index
bloom filter (we cannot expect all of the bits from each of the hash
functions used by the bloom filter to still be matches).
From here we can do some kind of lookup for maybe-not-matching index
tuples that we locate. Make sure that they point to an LP_DEAD line
item in the heap or something. Make sure that they have the same
values as the heap tuple if they're still retrievable (i.e. if we
haven't pruned the heap tuple away already).
This to me seems too conservative. The result is that by default we
check only tables, not indexes. I don't think that's going to be what
users want. I don't know whether they want the heapallindexed or
rootdescend behaviors for index checks, but I think they want their
indexes checked. Happy to hear opinions from actual users on what they
want; this is just me guessing that you've guessed wrong. :-)
My thoughts on these two options:
* I don't think that users will ever want rootdescend verification.
That option exists now because I wanted to have something that relied
on the uniqueness property of B-Tree indexes following the Postgres 12
work. I didn't add retail index tuple deletion, so it seemed like a
good idea to have something that makes the same assumptions that it
would have to make. To validate the design.
Another factor is that Alexander Korotkov made the basic
bt_index_parent_check() tests a lot better for Postgres 13. This
undermined the practical argument for using rootdescend verification.
Finally, note that bt_index_parent_check() was always supposed to be
something that was to be used only when you already knew that you had
big problems, and wanted absolutely thorough verification without
regard for the costs. This isn't the common case at all. It would be
reasonable to not expose anything from bt_index_parent_check() at all,
or to give it much less prominence. Not really sure of what the right
balance is here myself, so I'm not insisting on anything. Just telling
you what I know about it.
* heapallindexed is kind of expensive, but valuable. But the extra
check is probably less likely to help on the second or subsequent
index on a table.
It might be worth considering an option that only uses it with only
one index: Preferably the primary key index, failing that some unique
index, and failing that some other index.
This seems pretty lame to me. Even if the btree checker can't tolerate
corruption to the extent that the heap checker does, seg faulting
because of a missing file seems like a bug that we should just fix
(and probably back-patch). I'm not very convinced by the decision to
override the user's decision about heapallindexed either.
I strongly agree.
Maybe I lack
imagination, but that seems pretty arbitrary. Suppose there's a giant
index which is missing entries for 5 million heap tuples and also
there's 1 entry in the table which has an xmin that is less than the
pg_clas.relfrozenxid value by 1. You are proposing that because I have
the latter problem I don't want you to check for the former one. But
I, John Q. Smartuser, do not want you to second-guess what I told you
on the command line that I wanted. :-)
Even if your user is just average, they still have one major advantage
over the architects of pg_amcheck: actual knowledge of the problem in
front of them.
I think in general you're worrying too much about the possibility of
this tool causing backend crashes. I think it's good that you wrote
the heapcheck code in a way that's hardened against that, and I think
we should try to harden other things as time permits. But I don't
think that the remote possibility of a crash due to the lack of such
hardening should dictate the design behavior of this tool. If the
crash possibilities are not remote, then I think the solution is to
fix them, rather than cutting out important checks.
I couldn't agree more.
I think that you need to have a kind of epistemic modesty with this
stuff. Okay, we guarantee that the backend won't crash when certain
amcheck functions are run, based on these caveats. But don't we always
guarantee something like that? And are the specific caveats actually
that different in each case, when you get right down to it? A
guarantee does not exist in a vacuum. It always has implicit
limitations. For example, any guarantee implicitly comes with the
caveat "unless I, the guarantor, am wrong". Normally this doesn't
really matter because normally we're not concerned about extreme
events that will probably never happen even once. But amcheck is very
much not like that. The chances of the guarantor being the weakest
link are actually rather high. Everyone is better off with a design
that accepts this view of things.
I'm also suspicious of guarantees like this for less philosophical
reasons. It seems to me like it solves our problem rather than the
user's problem. Having data that is so badly corrupt that it's
difficult to avoid segfaults when we perform some kind of standard
transformations on it is an appalling state of affairs for the user.
The segfault itself is very much not the point at all. We should focus
on making the tool as thorough and low overhead as possible. If we
have to make the tool significantly more complicated to avoid
extremely unlikely segfaults then we're actually doing the user a
disservice, because we're increasing the chances that we the
guarantors will be the weakest link (which was already high enough).
This smacks of hubris.
I also agree that hardening is a worthwhile exercise here, of course.
We should be holding amcheck to a higher standard when it comes to not
segfaulting with corrupt data.
--
Peter Geoghegan
On Thu, Nov 19, 2020 at 2:48 PM Peter Geoghegan <pg@bowt.ie> wrote:
Ideally heapallindex verification would verify 1:1 correspondence. It
doesn't do that right now, but it could.
Well, that might be a cool new mode, but it doesn't necessarily have
to supplant the thing we have now. The problem immediately before us
is just making sure that the user can understand what we will and
won't be checking.
My thoughts on these two options:
* I don't think that users will ever want rootdescend verification.
That seems too absolute. I think it's fine to say, we don't think that
users will want this, so let's not do it by default. But if it's so
useless as to not be worth a command-line option, then it was a
mistake to put it into contrib at all. Let's expose all the things we
have, and try to set the defaults according to what we expect to be
most useful.
* heapallindexed is kind of expensive, but valuable. But the extra
check is probably less likely to help on the second or subsequent
index on a table.It might be worth considering an option that only uses it with only
one index: Preferably the primary key index, failing that some unique
index, and failing that some other index.
This seems a bit too clever for me. I would prefer a simpler schema,
where we choose the default we think most people will want and use it
for everything -- and allow the user to override.
Even if your user is just average, they still have one major advantage
over the architects of pg_amcheck: actual knowledge of the problem in
front of them.
Quite so.
I think that you need to have a kind of epistemic modesty with this
stuff. Okay, we guarantee that the backend won't crash when certain
amcheck functions are run, based on these caveats. But don't we always
guarantee something like that? And are the specific caveats actually
that different in each case, when you get right down to it? A
guarantee does not exist in a vacuum. It always has implicit
limitations. For example, any guarantee implicitly comes with the
caveat "unless I, the guarantor, am wrong".
Yep.
I'm also suspicious of guarantees like this for less philosophical
reasons. It seems to me like it solves our problem rather than the
user's problem. Having data that is so badly corrupt that it's
difficult to avoid segfaults when we perform some kind of standard
transformations on it is an appalling state of affairs for the user.
The segfault itself is very much not the point at all.
I mostly agree with everything you say here, but I think we need to be
careful not to accept the position that seg faults are no big deal.
Consider the following users, all of whom start with a database that
they believe to be non-corrupt:
Alice runs pg_amcheck. It says that nothing is wrong, and that happens
to be true.
Bob runs pg_amcheck. It says that there are problems, and there are.
Carol runs pg_amcheck. It says that nothing is wrong, but in fact
something is wrong.
Dan runs pg_amcheck. It says that there are problems, but in fact
there are none.
Erin runs pg_amcheck. The server crashes.
Alice and Bob are clearly in the best shape here, but Carol and Dan
arguably haven't been harmed very much. Sure, Carol enjoys a false
sense of security, but since she otherwise believed things were OK,
the impact of whatever problems exist is evidently not that bad. Dan
is worrying over nothing, but the damage is only to his psyche, not
his database; we can hope he'll eventually sort out what has happened
without grave consequences. Erin, on the other hand, is very possibly
in a lot of trouble with her boss and her coworkers. She had what
seemed to be a healthy database, and from their perspective, she shot
it in the head without any real cause. It will be faint consolation to
her and her coworkers that the database was corrupt all along: until
she ran the %$! tool, they did not have a problem that affected the
ability of their business to generate revenue. Now they had an outage,
and that does.
While I obviously haven't seen this exact scenario play out for a
customer, because pg_amcheck is not committed, I have seen similar
scenarios over and over. It's REALLY bad when the database goes down.
Then the application goes down, and then it gets really ugly. As long
as the database was just returning wrong answers or eating data,
nobody's boss really cared that much, but now that it's down, they
care A LOT. This is of course not to say that nobody cares about the
accuracy of results from the database: many people care a lot, and
that's why it's good to have tools like this. But we should not
underestimate the horror caused by a crash. A working database, even
with some wrong data in it, is a problem people would probably like to
get fixed. A down database is an emergency. So I think we should
actually get a lot more serious about ensuring that corrupt data on
disk doesn't cause crashes, even for regular SELECT statements. I
don't think we can take an arbitrary performance hit to get there,
which is a challenge, but I do think that even a brief outage is
nothing to take lightly.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Nov 19, 2020 at 12:06 PM Robert Haas <robertmhaas@gmail.com> wrote:
I originally intended to review the docs and regression tests in the
same email as the patch itself, but this email has gotten rather long
and taken rather longer to get together than I had hoped, so I'm going
to stop here for now and come back to that stuff.
Broad question: Does pg_amcheck belong in src/bin, or in contrib? You
have it in the latter place, but I'm not sure if that's the right
idea. I'm not saying it *isn't* the right idea, but I'm just wondering
what other people think.
Now, on to the docs:
+ Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> and
<function>verify_heapam</function>
This makes me wonder why there isn't an option to call
bt_index_check() rather than bt_index_parent_check().
It doesn't seem to be standard practice to include the entire output
of the command's --help option in the documentation. That means as
soon as anybody changes anything they've got to change the
documentation too. I don't see anything like that in the pages for
psql or vacuumlo or pg_verifybackup. It also doesn't seem like a
useful thing to do. Anyone who is reading the documentation probably
is in a position to try --help if they wish; they don't need that
duplicated here.
Looking at those other pages, what seems to be typical for an SGML is
to list all the options and give a short paragraph on what each one
does. What you have instead is a narrative description. I recommend
looking over the reference page for one of those other command-line
utilities and adapting it to this case.
Back to the the code:
+static const char *
+get_index_relkind_quals(void)
+{
+ if (!index_relkind_quals)
+ index_relkind_quals = psprintf("'%c'", RELKIND_INDEX);
+ return index_relkind_quals;
+}
I feel like there ought to be a way to work this out at compile time
rather than leaving it to runtime. I think that replacing the function
body with "return CppAsString2(RELKIND_INDEX);" would have the same
result, and once you do that you don't really need the function any
more. This is arguably cheating a bit: RELKIND_INDEX is defined as 'i'
and CppAsString2() turns that into a string containing those three
characters. That happens to work because what we want to do is quote
this for use in SQL, and SQL happens to use single quotes for literals
just like C does for individual characters. It would be mor elegant to
figure out a way to interpolate just the character into C string, but
I don't know of a macro trick that will do that. I think one could
write char *something = { '\'', RELKIND_INDEX, '\'', '\0' } but that
would be pretty darn awkward for the table case where you want an ANY
with three relkinds in there.
But maybe you could get around that by changing the query slightly.
Suppose instead of relkind = BLAH, you write POSITION(relkind IN '%s')
0. Then you could just have the caller pass either:
char *index_relkinds = { RELKIND_INDEX, '\0' };
-or-
char *table_relkinds = { RELKIND_RELATION, RELKIND_MATVIEW,
RELKIND_TOASTVALUE, '\0' };
The patch actually has RELKIND_PARTITIONED_TABLE there rather than
RELKIND_RELATION, but that seems wrong to me, because partitioned
tables don't have storage, and toast tables do. And if we're going to
include RELKIND_PARTITIONED_TABLE for some reason, then why not
RELKIND_PARTITIONED_INDEX for the index case?
On the tests:
I think 003_check.pl needs to stop and restart the table between
populating the tables and corrupting them. Otherwise, how do we know
that the subsequent checks are going to actually see the corruption
rather than something already cached in memory?
There are some philosophical questions to consider too, about how
these tests are written and what our philosophy ought to be here, but
I am again going to push that off to a future email.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Nov 19, 2020, at 11:47 AM, Peter Geoghegan <pg@bowt.ie> wrote:
I think in general you're worrying too much about the possibility of
this tool causing backend crashes. I think it's good that you wrote
the heapcheck code in a way that's hardened against that, and I think
we should try to harden other things as time permits. But I don't
think that the remote possibility of a crash due to the lack of such
hardening should dictate the design behavior of this tool. If the
crash possibilities are not remote, then I think the solution is to
fix them, rather than cutting out important checks.I couldn't agree more.
Owing to how much run-time overhead it would entail, much of the backend code has not been, and probably will not be, hardened against corruption. The amcheck code uses backend code for accessing heaps and indexes. Only some of those uses can be preceded with sufficient safety checks to avoid stepping on landmines. It makes sense to me to have a "don't run through minefields" option, and a "go ahead, run through minefields" option for pg_amcheck, given that users in differing situations will have differing business consequences to bringing down the server in question.
As an example that we've already looked at, checking the status of an xid against clog is a dangerous thing to do. I wrote a patch to make it safer to query clog (0003) and a patch for pg_amcheck to use the safer interface (0004) and it looks unlikely either of those will ever be committed. I doubt other backend hardening is any more likely to get committed. It doesn't follow that if crash possibilities are not remote that we should therefore harden the backend. The performance considerations of the backend are not well aligned with the safety considerations of this tool. The backend code is written with the assumption of non-corrupt data, and this tool with the assumption of corrupt data, or at least a fair probability of corrupt data. I don't see how any one-hardening-fits-all will ever work.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Nov 19, 2020 at 1:50 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
It makes sense to me to have a "don't run through minefields" option, and a "go ahead, run through minefields" option for pg_amcheck, given that users in differing situations will have differing business consequences to bringing down the server in question.
This kind of framing suggests zero-risk bias to me:
https://en.wikipedia.org/wiki/Zero-risk_bias
It's simply not helpful to think of the risks as "running through a
minefield" versus "not running through a minefield". I also dislike
this framing because in reality nobody runs through a minefield,
unless maybe it's a battlefield and the alternative is probably even
worse. Risks are not discrete -- they're continuous. And they're
situational.
I accept that there are certain reasonable gradations in the degree to
which a segfault is bad, even in contexts in which pg_amcheck runs
into actual serious problems. And as Robert points out, experience
suggests that on average people care about availability the most when
push comes to shove (though I hasten to add that that's not the same
thing as considering a once-off segfault to be the greater evil here).
Even still, I firmly believe that it's a mistake to assign *infinite*
weight to not having a segfault. That is likely to have certain
unintended consequences that could be even worse than a segfault, such
as not detecting pernicious corruption over many months because our
can't-segfault version of core functionality fails to have the same
bugs as the actual core functionality (and thus fails to detect a
problem in the core functionality).
The problem with giving infinite weight to any one bad outcome is that
it makes it impossible to draw reasonable distinctions between it and
some other extreme bad outcome. For example, I would really not like
to get infected with Covid-19. But I also think that it would be much
worse to get infected with Ebola. It follows that Covid-19 must not be
infinitely bad, because if it is then I can't make this useful
distinction -- which might actually matter. If somebody hears me say
this, and takes it as evidence of my lackadaisical attitude towards
Covid-19, I can live with that. I care about avoiding criticism as
much as the next person, but I refuse to prioritize it over all other
things.
I doubt other backend hardening is any more likely to get committed.
I suspect you're right about that. Because of the risks of causing
real harm to users.
The backend code is obviously *not* written with the assumption that
data cannot be corrupt. There are lots of specific ways in which it is
hardened (e.g., there are many defensive "can't happen" elog()
statements). I really don't know why you insist on this black and
white framing.
--
Peter Geoghegan
On Tue, Oct 27, 2020 at 5:12 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
The v20 patches 0002, 0003, and 0005 still apply cleanly, but 0004 required a rebase. (0001 was already committed last week.)
Here is a rebased set of 4 patches, numbered 0002..0005 to be consistent with the previous naming. There are no substantial changes.
Hi Mark,
The command line stuff fails to build on Windows[1]https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.123328. I think it's
just missing #include "getopt_long.h" (see
contrib/vacuumlo/vacuumlo.c).
[1]: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.123328
On Nov 19, 2020, at 9:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 26, 2020 at 12:12 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:The v20 patches 0002, 0003, and 0005 still apply cleanly, but 0004 required a rebase. (0001 was already committed last week.)
Here is a rebased set of 4 patches, numbered 0002..0005 to be consistent with the previous naming. There are no substantial changes.
Here's a review of 0002. I basically like the direction this is going
but I guess nobody will be surprised that there are some things in
here that I think could be improved.
Thanks for the review!
The tools pg_dump and pg_amcheck both need to allow the user to specify which schemas, tables, and indexes either to dump or to check. There are command line options in pg_dump for this purpose, and functions for compiling lists of corresponding database objects. In prior versions of the pg_amcheck patch, I did some copy-and-pasting of this logic, and then had to fix up the copied functions a bit, given that pg_dump has its own ecosystem with things like fatal() and exit_nicely() and such.
In hindsight, it would have been better to factor these functions out into a shared location. I have done that, factoring them into fe_utils, and am attaching a series of patches that accomplishes that refactoring. Here are some brief explanations of what these are for. See also the commit comments in each patch:
v3-0001-Moving-exit_nicely-and-fatal-into-fe_utils.patch
pg_dump allows on-exit callbacks to be registered, which it expects to get called when exit_nicely() is invoked. It doesn't work to factor functions out of pg_dump without having this infrastructure, as the functions being factored out include facilities for logging and exiting on error. Therefore, moving these functions into fe_utils.
v3-0002-Refactoring-ExecuteSqlQuery-and-related-functions.patch
pg_dump has functions for running queries, but those functions take a pg_dump specific argument of type Archive rather than PGconn, with the expectation that the Archive's connection will be used. This has to be cleaned up a bit before these functions can be moved out of pg_dump to a shared location. Also, pg_dump has a fixed expectation that when a query fails, specific steps will be taken to print out the error information and exit. That's reasonable behavior, but not all callers will want that. Since the ultimate goal of this refactoring is to have higher level functions that translate shell patterns into oid lists, it's reasonable to imagine that not all callers will want to exit if the query fails. In particular, pg_amcheck won't want errors to automatically trigger exit() calls, given that pg_amcheck tries to continue in the face of errors. Therefore, adding a default error handler that does what pg_dump expects, but with an eye towards other callers being able to define handlers that behave differently.
v3-0003-Creating-query_utils-frontend-utility.patch
Moving the refactored functions to the shared location in fe_utils. This is kept separate from 0002 for ease of review.
v3-0004-Adding-CurrentQueryHandler-logic.patch
Extending the query error handling logic begun in the 0002 patch. It wasn't appropriate in the pg_dump project, but now the logic is in fe_utils.
v3-0005-Refactoring-pg_dumpall-functions.patch
Refactoring some remaining functions in the pg_dump project to use the new fe_utils facilities.
v3-0006-Refactoring-expand_schema_name_patterns-and-frien.patch
Refactoring functions in pg_dump that expand a list of patterns into a list of matching database objects. Specifically, changing them to no take pg_dump specific argument types, just as was done in 0002.
v3-0007-Moving-pg_dump-functions-to-new-file-option_utils.patch
Moving the functions refactored in 0006 into a new location fe_utils/option_utils
v3-0008-Normalizing-option_utils-interface.patch
Reworking the functions moved in 0007 to have a more general purpose interface. The refactoring in 0006 only went so far as to make the functions moveable out of pg_dump. This refactoring is intentionally kept separate for ease of review.
v3-0009-Adding-contrib-module-pg_amcheck.patch
Adding contrib/pg_amcheck project, about which your review comments below apply.
Not included in this patch set, but generated during the development of this patch set, I refactored processSQLNamePattern. string_utils mixes the logic for converting a shell-style pattern into a SQL style regex with the logic of performing the sql query to look up matching database objects. That makes it hard to look up multiple patterns in a single query, something that an intermediate version of this patch set was doing. I ultimately stopped doing that, as the code was overly complex, but the refactoring of processSQLNamePattern is not over-complicated and probably has some merit in its own right. Since it is not related to the pg_amcheck code, I expect that I will be posting that separately.
Also not included in this patch set, but likely to be in the next rev, is a patch that adds more interesting table and index corruption via PostgresNode, creating torn pages and such. That work is complete so far as I know, but I don't have all the regression tests that use it written yet, so I'll hold off posting it for now.
Not yet written but still needed is the parallelization of the checking. I'll be working on that for the next patch set.
There is enough work here in need of review that I'm posting this now, hoping to get feedback on the general direction I'm going with this.
To your review....
+const char *usage_text[] = { + "pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.", + "",This looks like a novel approach to the problem of printing out the
usage() information, and I think that it's inferior to the technique
used elsewhere of just having a bunch of printf() statements, because
unless I misunderstand, it doesn't permit localization.
Since contrib modules are not localized, it seemed not to be a problem, but you've raised the question of whether pg_amcheck might be moved into core. I've changed it as suggested so that such a move would incur less code churn. The advantage to how I had it before was that each line was a bit shorter, making it fit better into the 80 column limit.
+ " -b, --startblock begin checking table(s) at the given starting block number", + " -e, --endblock check table(s) only up to the given ending block number", + " -B, --toast-startblock begin checking toast table(s) at the given starting block", + " -E, --toast-endblock check toast table(s) only up to the given ending block",I am not very convinced by this. What's the use case? If you're just
checking a single table, you might want to specify a start and end
block, but then you don't need separate options for the TOAST and
non-TOAST cases, do you? If I want to check pg_statistic, I'll say
pg_amcheck -t pg_catalog.pg_statistic. If I want to check the TOAST
table for pg_statistic, I'll say pg_amcheck -t pg_toast.pg_toast_2619.
In either case, if I want to check just the first three blocks, I can
add -b 0 -e 2.
Removed -B, --toast-startblock and -E, --toast-endblock.
+ " -f, --skip-all-frozen do NOT check blocks marked as all frozen", + " -v, --skip-all-visible do NOT check blocks marked as all visible",I think this is using up too many one character option names for too
little benefit on things that are too closely related. How about, -s,
--skip=all-frozen|all-visible|none?
I'm already using -s for "strict-names', but I implemented your suggestion with -S, --skip
And then -v could mean verbose,
which could trigger things like printing all the queries sent to the
server, setting PQERRORS_VERBOSE, etc.
I added -v, --verbose as you suggest.
+ " -x, --check-indexes check btree indexes associated with tables being checked", + " -X, --skip-indexes do NOT check any btree indexes", + " -i, --index=PATTERN check the specified index(es) only", + " -I, --exclude-index=PATTERN do NOT check the specified index(es)",This is a lotta controls for something that has gotta have some
default. Either the default is everything, in which case I don't see
why I need -x, or it's nothing, in which case I don't see why I need
-X.
I removed -x, --check-indexes and instead made that the default.
+ " -c, --check-corrupt check indexes even if their associated table is corrupt", + " -C, --skip-corrupt do NOT check indexes if their associated table is corrupt",Ditto. (I think the default be to check corrupt, and there can be an
option to skip it.)
Likewise, I removed -c, --check-corrupt and made that the default.
+ " -a, --heapallindexed check index tuples against the table tuples", + " -A, --no-heapallindexed do NOT check index tuples against the table tuples",Ditto. (Not sure what the default should be, though.)
I removed -A, --no-heapallindexed and made that the default.
+ " -r, --rootdescend search from the root page for each index tuple", + " -R, --no-rootdescend do NOT search from the root page for each index tuple",Ditto. (Again, not sure about the default.)
I removed -R, --no-rootdescend and made that the default. Peter argued elsewhere for removing this altogether, but as I recall you argued against that, so for now I'm keeping the --rootdescend option.
I'm also not sure if these descriptions are clear enough, but it may
also be hard to do a good job in a brief space.
Yes. Better verbiage welcome.
Still, comparing this
to the documentation of heapallindexed makes me rather nervous. This
is only trying to verify that the index contains all the tuples in the
heap, not that the values in the heap and index tuples actually match.
This is complicated. The most reasonable approach from the point of view of somebody running pg_amcheck is to have the scan of the table and the scan of the index cooperate so that work is not duplicated. But from the point of view of amcheck (not pg_amcheck), there is no assumption that the table is being scanned just because the index is being checked. I'm not sure how best to resolve this, except that I'd rather punt this to a future version rather than require the first version of pg_amcheck to deal with it.
+typedef struct +AmCheckSettings +{ + char *dbname; + char *host; + char *port; + char *username; +} ConnectOptions;Making the struct name different from the type name seems not good,
and the struct name also shouldn't be on a separate line.
Fixed.
+typedef enum trivalue +{ + TRI_DEFAULT, + TRI_NO, + TRI_YES +} trivalue;Ugh. It's not this patch's fault, but we really oughta move this to
someplace more centralized.
Not changed in this patch.
+typedef struct ... +} AmCheckSettings;I'm not sure I consider all of these things settings, "db" in
particular. But maybe that's nitpicking.
It is definitely nitpicking, but I agree with it. This next patch uses a static variable named "conn" rather than "settings.db".
+static void expand_schema_name_patterns(const SimpleStringList *patterns, + const SimpleOidList *exclude_oids, + SimpleOidList *oids + bool strict_names);This is copied from pg_dump, along with I think at least one other
function from nearby. Unlike the trivalue case above, this would be
the first duplication of this logic. Can we push this stuff into
pgcommon, perhaps?
Yes, these functions were largely copied from pg_dump. I have moved them out of pg_dump and into fe_utils, but that was a large enough effort that it deserves its own thread, so I'm creating a thread for that work independent of this thread.
+ /* + * Default behaviors for user settable options. Note that these default + * to doing all the safe checks and none of the unsafe ones, on the theory + * that if a user says "pg_amcheck mydb" without specifying any additional + * options, we should check everything we know how to check without + * risking any backend aborts. + */This to me seems too conservative. The result is that by default we
check only tables, not indexes. I don't think that's going to be what
users want.
Checking indexes has been made the default, as discussed above.
I don't know whether they want the heapallindexed or
rootdescend behaviors for index checks, but I think they want their
indexes checked. Happy to hear opinions from actual users on what they
want; this is just me guessing that you've guessed wrong. :-)
The heapallindexed and rootdescend options still exist but are false by default.
+ if (settings.db == NULL) + { + pg_log_error("no connection to server after initial attempt"); + exit(EXIT_BADCONN); + }I think this is documented as meaning out of memory, and reported that
way elsewhere. Anyway I am going to keep complaining until there are
no cases where we tell the user it broke without telling them what
broke. Which means this bit is a problem too:+ if (!settings.db) + { + pg_log_error("no connection to server"); + exit(EXIT_BADCONN); + }Something went wrong, good luck figuring out what it was!
I have changed this to more closely follow the behavior in scripts/common.c:connectDatabase. If pg_amcheck were moved into src/bin/scripts, I could just use that function outright.
+ /* + * All information about corrupt indexes are returned via ereport, not as + * tuples. We want all the details to report if corruption exists. + */ + PQsetErrorVerbosity(settings.db, PQERRORS_VERBOSE);Really? Why? If I need the source code file name, function name, and
line number to figure out what went wrong, that is not a great sign
for the quality of the error reports it produces.
Yeah, you are right about that. In any event, the user can now specifiy --verbose if they like and get that extra information (not that they need it). I have removed this offending bit of code.
+ /* + * The btree checking logic which optionally checks the contents + * of an index against the corresponding table has not yet been + * sufficiently hardened against corrupt tables. In particular, + * when called with heapallindexed true, it segfaults if the file + * backing the table relation has been erroneously unlinked. In + * any event, it seems unwise to reconcile an index against its + * table when we already know the table is corrupt. + */ + old_heapallindexed = settings.heapallindexed; + if (corruptions) + settings.heapallindexed = false;This seems pretty lame to me. Even if the btree checker can't tolerate
corruption to the extent that the heap checker does, seg faulting
because of a missing file seems like a bug that we should just fix
(and probably back-patch). I'm not very convinced by the decision to
override the user's decision about heapallindexed either. Maybe I lack
imagination, but that seems pretty arbitrary. Suppose there's a giant
index which is missing entries for 5 million heap tuples and also
there's 1 entry in the table which has an xmin that is less than the
pg_clas.relfrozenxid value by 1. You are proposing that because I have
the latter problem I don't want you to check for the former one. But
I, John Q. Smartuser, do not want you to second-guess what I told you
on the command line that I wanted. :-)
I've removed this bit. I'm not sure what I was seeing back when I first wrote this code, but I no longer see any segfaults for missing relation files.
I think in general you're worrying too much about the possibility of
this tool causing backend crashes. I think it's good that you wrote
the heapcheck code in a way that's hardened against that, and I think
we should try to harden other things as time permits. But I don't
think that the remote possibility of a crash due to the lack of such
hardening should dictate the design behavior of this tool. If the
crash possibilities are not remote, then I think the solution is to
fix them, rather than cutting out important checks.
Right. I've been worrying a bit less about this lately, in part because you and Peter are less concerned about it than I was, and in part because I've been banging away with various test cases and don't see all that much worth worrying about.
It doesn't seem like great design to me that get_table_check_list()
gets just the OID of the table itself, and then later if we decide to
check the TOAST table we've got to run a separate query for each table
we want to check to fetch the TOAST OID, when we could've just fetched
both in get_table_check_list() by including two columns in the query
rather than one and it would've been basically free. Imagine if some
user wrote a query that fetched the primary key value for all their
rows and then had their application run a separate query to fetch the
entire contents of each of those rows, said contents consisting of one
more integer. And then suppose they complained about performance. We'd
tell them they were doing it wrong, and so here.
Good points. I've changed get_table_check_list to query both the main table and toast table oids as you suggest.
+ if (settings.db == NULL) + fatal("no connection on entry to check_table");Uninformative. Is this basically an Assert? If so maybe just make it
one. If not maybe fail somewhere else with a better message?
Looking at this again, I don't think it is even worth making it into an Assert, so I just removed it, along with similar useless checks of the same type elsewhere.
+ if (startblock == NULL) + startblock = "NULL"; + if (endblock == NULL) + endblock = "NULL";It seems like it would be more elegant to initialize
settings.startblock and settings.endblock to "NULL." However, there's
also a related problem, which is that the startblock and endblock
values can be anything, and are interpolated with quoting. I don't
think that it's good to ship a tool with SQL injection hazards built
into it. I think that you should (a) check that these values are
integers during argument parsing and error out if they are not and
then (b) use either a prepared query or PQescapeLiteral() anyway.
I've changed the logic to use strtol to parse these, and I'm storing them as long rather than as strings.
+ stop = (on_error_stop) ? "true" : "false"; + toast = (check_toast) ? "true" : "false";The parens aren't really needed here.
True. Removed.
+ printf("(relname=%s,blkno=%s,offnum=%s,attnum=%s)\n%s\n", + PQgetvalue(res, i, 0), /* relname */ + PQgetvalue(res, i, 1), /* blkno */ + PQgetvalue(res, i, 2), /* offnum */ + PQgetvalue(res, i, 3), /* attnum */ + PQgetvalue(res, i, 4)); /* msg */I am not quite sure how to format the output, but this looks like
something designed by an engineer who knows too much about the topic.
I suspect users won't find the use of things like "relname" and
"blkno" too easy to understand. At least I think we should say
"relation, block, offset, attribute" instead of "relname, blkno,
offnum, attnum". I would probably drop the parenthesis and add spaces,
so that you end up with something like:relation "%s", block "%s", offset "%s", attribute "%s":
I would also define variant strings so that we entirely omit things
that are NULL. e.g. have four strings:relation "%s":
relation "%s", block "%s":(
relation "%s", block "%s", offset "%s":
relation "%s", block "%s", offset "%s", attribute "%s":Would it make it more readable if we indented the continuation line by
four spaces or something?
I tried it that way and agree it looks better, including having the msg line indented four spaces. Changed.
+ corruption_cnt++; + printf("%s\n", error); + pfree(error);Seems like we could still print the relation name in this case, and
that it would be a good idea to do so, in case it's not in the message
that the server returns.
We don't know the relation name in this case, only the oid, but I agree that would be useful to have, so I added that.
The general logic in this part of the code looks a bit strange to me.
If ExecuteSqlQuery() returns PGRES_TUPLES_OK, we print out the details
for each returned row. Otherwise, if error = true, we print the error.
But, what if neither of those things are the case? Then we'd just
print nothing despite having gotten back some weird response from the
server. That actually can't happen, because ExecuteSqlQuery() always
sets *error when the return code is not PGRES_TUPLES_OK, but you
wouldn't know that from looking at this code.Honestly, as written, ExecSqlQuery() seems like kind of a waste. The
OrDie() version is useful as a notational shorthand, but this version
seems to add more confusion than clarity. It has only three callers:
the ones in check_table() and check_indexes() have the problem
described above, and the one in get_toast_oid() could just as well be
using the OrDie() version. And also we should probably get rid of it
entirely by fetching the toast OIDs the first time around, as
mentioned above.
These functions have been factored out of pg_dump into fe_utils, so this bit of code review doesn't refer to anything now.
check_indexes() lacks a function comment. It seems to have more or
less the same problem as get_toast_oid() -- an extra query per table
to get the list of indexes. I guess it has a better excuse: there
could be lots of indexes per table, and we're fetching multiple
columns of data for each one, whereas in the TOAST case we are issuing
an extra query per table to fetch a single integer. But, couldn't we
fetch information about all the indexes we want to check in one go,
rather than fetching them separately for each table being checked? I'm
not sure if that would create too much other complexity, but it seems
like it would be quicker.
If the --skip-corrupt option is given, we need to only check the indexes associated with a table once the table has been found to be non-corrupt. Querying for all the indexes upfront, we'd need to keep information about which table the index came from, and check that against lists of tables that have been checked, etc. It seems pretty messy, even more so when considering the limited list facilities available to frontend code.
I have made no changes in this version, though I'm not rejecting your idea here. Maybe I'll think of a clean way to do this for a later patch?
+ if (settings.db == NULL) + fatal("no connection on entry to check_index"); + if (idxname == NULL) + fatal("no index name on entry to check_index"); + if (tblname == NULL) + fatal("no table name on entry to check_index");Again, probably these should be asserts, or if they're not, the error
should be reported better and maybe elsewhere.Similarly in some other places, like expand_schema_name_patterns().
I removed these checks entirely.
+ * The loop below runs multiple SELECTs might sometimes result in + * duplicate entries in the Oid list, but we don't care.This is missing a which, like the place you copied it from, but the
version in pg_dumpall.c is better.expand_table_name_patterns() should be reformatted to not gratuitously
exceed 80 columns. Ditto for expand_index_name_patterns().
Refactoring into fe_utils, as mentioned above.
I sort of expected that this patch might use threads to allow parallel
checking - seems like it would be a useful feature.
Yes, I think that makes sense, but I'm going to work on that in the next patch.
Show quoted text
I originally intended to review the docs and regression tests in the
same email as the patch itself, but this email has gotten rather long
and taken rather longer to get together than I had hoped, so I'm going
to stop here for now and come back to that stuff.
Attachments:
v3-0005-Refactoring-pg_dumpall-functions.patchapplication/octet-stream; name=v3-0005-Refactoring-pg_dumpall-functions.patch; x-unix-mode=0644Download
From e4764472b21601a9cf94af5077aaf50cd334b130 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:37:42 -0800
Subject: [PATCH v3 5/9] Refactoring pg_dumpall functions.
The functions executeQuery and executeCommand in pg_dumpall.c were
not refactored in prior commits along with functions from
pg_backup_db.c because they were in a separate file, but now that
the infrastructure has been moved to fe_utils/query_utils,
refactoring these two functions to use it.
---
src/bin/pg_dump/pg_dumpall.c | 31 +++----------------------------
1 file changed, 3 insertions(+), 28 deletions(-)
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 85d08ad660..807226537a 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -23,6 +23,7 @@
#include "common/logging.h"
#include "common/string.h"
#include "dumputils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
#include "pg_backup.h"
@@ -1874,21 +1875,8 @@ constructConnStr(const char **keywords, const char **values)
static PGresult *
executeQuery(PGconn *conn, const char *query)
{
- PGresult *res;
-
pg_log_info("executing %s", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_TUPLES_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_error("query was: %s", query);
- PQfinish(conn);
- exit_nicely(1);
- }
-
- return res;
+ return ExecuteSqlQuery(conn, query, PGRES_TUPLES_OK);
}
/*
@@ -1897,21 +1885,8 @@ executeQuery(PGconn *conn, const char *query)
static void
executeCommand(PGconn *conn, const char *query)
{
- PGresult *res;
-
pg_log_info("executing %s", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_COMMAND_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_error("query was: %s", query);
- PQfinish(conn);
- exit_nicely(1);
- }
-
- PQclear(res);
+ PQclear(ExecuteSqlQuery(conn, query, PGRES_COMMAND_OK));
}
--
2.21.1 (Apple Git-122.3)
v3-0006-Refactoring-expand_schema_name_patterns-and-frien.patchapplication/octet-stream; name=v3-0006-Refactoring-expand_schema_name_patterns-and-frien.patch; x-unix-mode=0644Download
From 5cc6fbd70b889632238049f9477561c61a04e0a3 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:51:02 -0800
Subject: [PATCH v3 6/9] Refactoring expand_schema_name_patterns and friends.
Refactoring these functions to take a PGconn pointer rather than an
Archive pointer in preparation for moving these functions to
fe_utils. This is much like what was previously done for
ExecuteSqlQuery and friends, and for the same reasons.
---
src/bin/pg_dump/pg_dump.c | 47 ++++++++++++++++++++++-----------------
1 file changed, 27 insertions(+), 20 deletions(-)
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e8985a834f..41ce4b7866 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,6 +55,7 @@
#include "catalog/pg_type_d.h"
#include "common/connect.h"
#include "dumputils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
#include "libpq/libpq-fs.h"
@@ -147,14 +148,14 @@ static void setup_connection(Archive *AH,
const char *dumpencoding, const char *dumpsnapshot,
char *use_role);
static ArchiveFormat parseArchiveFormat(const char *format, ArchiveMode *mode);
-static void expand_schema_name_patterns(Archive *fout,
+static void expand_schema_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids,
bool strict_names);
-static void expand_foreign_server_name_patterns(Archive *fout,
+static void expand_foreign_server_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids);
-static void expand_table_name_patterns(Archive *fout,
+static void expand_table_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids,
bool strict_names);
@@ -798,13 +799,15 @@ main(int argc, char **argv)
/* Expand schema selection patterns into OID lists */
if (schema_include_patterns.head != NULL)
{
- expand_schema_name_patterns(fout, &schema_include_patterns,
+ expand_schema_name_patterns(GetConnection(fout),
+ &schema_include_patterns,
&schema_include_oids,
strict_names);
if (schema_include_oids.head == NULL)
fatal("no matching schemas were found");
}
- expand_schema_name_patterns(fout, &schema_exclude_patterns,
+ expand_schema_name_patterns(GetConnection(fout),
+ &schema_exclude_patterns,
&schema_exclude_oids,
false);
/* non-matching exclusion patterns aren't an error */
@@ -812,21 +815,25 @@ main(int argc, char **argv)
/* Expand table selection patterns into OID lists */
if (table_include_patterns.head != NULL)
{
- expand_table_name_patterns(fout, &table_include_patterns,
+ expand_table_name_patterns(GetConnection(fout),
+ &table_include_patterns,
&table_include_oids,
strict_names);
if (table_include_oids.head == NULL)
fatal("no matching tables were found");
}
- expand_table_name_patterns(fout, &table_exclude_patterns,
+ expand_table_name_patterns(GetConnection(fout),
+ &table_exclude_patterns,
&table_exclude_oids,
false);
- expand_table_name_patterns(fout, &tabledata_exclude_patterns,
+ expand_table_name_patterns(GetConnection(fout),
+ &tabledata_exclude_patterns,
&tabledata_exclude_oids,
false);
- expand_foreign_server_name_patterns(fout, &foreign_servers_include_patterns,
+ expand_foreign_server_name_patterns(GetConnection(fout),
+ &foreign_servers_include_patterns,
&foreign_servers_include_oids);
/* non-matching exclusion patterns aren't an error */
@@ -1316,7 +1323,7 @@ parseArchiveFormat(const char *format, ArchiveMode *mode)
* and append them to the given OID list.
*/
static void
-expand_schema_name_patterns(Archive *fout,
+expand_schema_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids,
bool strict_names)
@@ -1340,10 +1347,10 @@ expand_schema_name_patterns(Archive *fout,
{
appendPQExpBufferStr(query,
"SELECT oid FROM pg_catalog.pg_namespace n\n");
- processSQLNamePattern(GetConnection(fout), query, cell->val, false,
+ processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "n.nspname", NULL, NULL);
- res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
if (strict_names && PQntuples(res) == 0)
fatal("no matching schemas were found for pattern \"%s\"", cell->val);
@@ -1364,7 +1371,7 @@ expand_schema_name_patterns(Archive *fout,
* and append them to the given OID list.
*/
static void
-expand_foreign_server_name_patterns(Archive *fout,
+expand_foreign_server_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids)
{
@@ -1387,10 +1394,10 @@ expand_foreign_server_name_patterns(Archive *fout,
{
appendPQExpBufferStr(query,
"SELECT oid FROM pg_catalog.pg_foreign_server s\n");
- processSQLNamePattern(GetConnection(fout), query, cell->val, false,
+ processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "s.srvname", NULL, NULL);
- res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) == 0)
fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
@@ -1410,7 +1417,7 @@ expand_foreign_server_name_patterns(Archive *fout,
* in pg_dumpall.c
*/
static void
-expand_table_name_patterns(Archive *fout,
+expand_table_name_patterns(PGconn *conn,
SimpleStringList *patterns, SimpleOidList *oids,
bool strict_names)
{
@@ -1446,13 +1453,13 @@ expand_table_name_patterns(Archive *fout,
RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
RELKIND_PARTITIONED_TABLE);
- processSQLNamePattern(GetConnection(fout), query, cell->val, true,
+ processSQLNamePattern(conn, query, cell->val, true,
false, "n.nspname", "c.relname", NULL,
"pg_catalog.pg_table_is_visible(c.oid)");
- ExecuteSqlStatementAH(fout, "RESET search_path");
- res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
- PQclear(ExecuteSqlQueryForSingleRowAH(fout,
+ ExecuteSqlStatement(conn, "RESET search_path");
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(conn,
ALWAYS_SECURE_SEARCH_PATH_SQL));
if (strict_names && PQntuples(res) == 0)
fatal("no matching tables were found for pattern \"%s\"", cell->val);
--
2.21.1 (Apple Git-122.3)
v3-0007-Moving-pg_dump-functions-to-new-file-option_utils.patchapplication/octet-stream; name=v3-0007-Moving-pg_dump-functions-to-new-file-option_utils.patch; x-unix-mode=0644Download
From 9a5054196a13165b9a2c752ee822b84babead275 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 14:07:58 -0800
Subject: [PATCH v3 7/9] Moving pg_dump functions to new file option_utils
Moving the recently refactored functions
expand_schema_name_patterns, expand_foreign_server_name_patterns,
and expand_table_name_patterns from pg_dump.c, along with the
function expand_dbname_patterns from pg_dumpall.c, into the new file
fe_utils/option_utils.c
---
src/bin/pg_dump/pg_dump.c | 170 +--------------------
src/bin/pg_dump/pg_dumpall.c | 46 +-----
src/fe_utils/Makefile | 1 +
src/fe_utils/option_utils.c | 225 ++++++++++++++++++++++++++++
src/include/fe_utils/option_utils.h | 35 +++++
5 files changed, 263 insertions(+), 214 deletions(-)
create mode 100644 src/fe_utils/option_utils.c
create mode 100644 src/include/fe_utils/option_utils.h
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 41ce4b7866..c334b9e829 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,6 +55,7 @@
#include "catalog/pg_type_d.h"
#include "common/connect.h"
#include "dumputils.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
@@ -148,17 +149,6 @@ static void setup_connection(Archive *AH,
const char *dumpencoding, const char *dumpsnapshot,
char *use_role);
static ArchiveFormat parseArchiveFormat(const char *format, ArchiveMode *mode);
-static void expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names);
-static void expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids);
-static void expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names);
static NamespaceInfo *findNamespace(Oid nsoid);
static void dumpTableData(Archive *fout, TableDataInfo *tdinfo);
static void refreshMatViewData(Archive *fout, TableDataInfo *tdinfo);
@@ -1318,164 +1308,6 @@ parseArchiveFormat(const char *format, ArchiveMode *mode)
return archiveFormat;
}
-/*
- * Find the OIDs of all schemas matching the given list of patterns,
- * and append them to the given OID list.
- */
-static void
-expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names)
-{
- PQExpBuffer query;
- PGresult *res;
- SimpleStringListCell *cell;
- int i;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * The loop below runs multiple SELECTs might sometimes result in
- * duplicate entries in the OID list, but we don't care.
- */
-
- for (cell = patterns->head; cell; cell = cell->next)
- {
- appendPQExpBufferStr(query,
- "SELECT oid FROM pg_catalog.pg_namespace n\n");
- processSQLNamePattern(conn, query, cell->val, false,
- false, NULL, "n.nspname", NULL, NULL);
-
- res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- if (strict_names && PQntuples(res) == 0)
- fatal("no matching schemas were found for pattern \"%s\"", cell->val);
-
- for (i = 0; i < PQntuples(res); i++)
- {
- simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
- }
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
-/*
- * Find the OIDs of all foreign servers matching the given list of patterns,
- * and append them to the given OID list.
- */
-static void
-expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids)
-{
- PQExpBuffer query;
- PGresult *res;
- SimpleStringListCell *cell;
- int i;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * The loop below runs multiple SELECTs might sometimes result in
- * duplicate entries in the OID list, but we don't care.
- */
-
- for (cell = patterns->head; cell; cell = cell->next)
- {
- appendPQExpBufferStr(query,
- "SELECT oid FROM pg_catalog.pg_foreign_server s\n");
- processSQLNamePattern(conn, query, cell->val, false,
- false, NULL, "s.srvname", NULL, NULL);
-
- res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- if (PQntuples(res) == 0)
- fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
-
- for (i = 0; i < PQntuples(res); i++)
- simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
-/*
- * Find the OIDs of all tables matching the given list of patterns,
- * and append them to the given OID list. See also expand_dbname_patterns()
- * in pg_dumpall.c
- */
-static void
-expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns, SimpleOidList *oids,
- bool strict_names)
-{
- PQExpBuffer query;
- PGresult *res;
- SimpleStringListCell *cell;
- int i;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * this might sometimes result in duplicate entries in the OID list, but
- * we don't care.
- */
-
- for (cell = patterns->head; cell; cell = cell->next)
- {
- /*
- * Query must remain ABSOLUTELY devoid of unqualified names. This
- * would be unnecessary given a pg_table_is_visible() variant taking a
- * search_path argument.
- */
- appendPQExpBuffer(query,
- "SELECT c.oid"
- "\nFROM pg_catalog.pg_class c"
- "\n LEFT JOIN pg_catalog.pg_namespace n"
- "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
- "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
- "\n (array['%c', '%c', '%c', '%c', '%c', '%c'])\n",
- RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
- RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
- RELKIND_PARTITIONED_TABLE);
- processSQLNamePattern(conn, query, cell->val, true,
- false, "n.nspname", "c.relname", NULL,
- "pg_catalog.pg_table_is_visible(c.oid)");
-
- ExecuteSqlStatement(conn, "RESET search_path");
- res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- PQclear(ExecuteSqlQueryForSingleRow(conn,
- ALWAYS_SECURE_SEARCH_PATH_SQL));
- if (strict_names && PQntuples(res) == 0)
- fatal("no matching tables were found for pattern \"%s\"", cell->val);
-
- for (i = 0; i < PQntuples(res); i++)
- {
- simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
- }
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
/*
* checkExtensionMembership
* Determine whether object is an extension member, and if so,
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 807226537a..01db15dfda 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -23,6 +23,7 @@
#include "common/logging.h"
#include "common/string.h"
#include "dumputils.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
@@ -54,8 +55,6 @@ static PGconn *connectDatabase(const char *dbname, const char *connstr, const ch
static char *constructConnStr(const char **keywords, const char **values);
static PGresult *executeQuery(PGconn *conn, const char *query);
static void executeCommand(PGconn *conn, const char *query);
-static void expand_dbname_patterns(PGconn *conn, SimpleStringList *patterns,
- SimpleStringList *names);
static char pg_dump_bin[MAXPGPATH];
static const char *progname;
@@ -1409,49 +1408,6 @@ dumpUserConfig(PGconn *conn, const char *username)
destroyPQExpBuffer(buf);
}
-/*
- * Find a list of database names that match the given patterns.
- * See also expand_table_name_patterns() in pg_dump.c
- */
-static void
-expand_dbname_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleStringList *names)
-{
- PQExpBuffer query;
- PGresult *res;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * The loop below runs multiple SELECTs, which might sometimes result in
- * duplicate entries in the name list, but we don't care, since all we're
- * going to do is test membership of the list.
- */
-
- for (SimpleStringListCell *cell = patterns->head; cell; cell = cell->next)
- {
- appendPQExpBufferStr(query,
- "SELECT datname FROM pg_catalog.pg_database n\n");
- processSQLNamePattern(conn, query, cell->val, false,
- false, NULL, "datname", NULL, NULL);
-
- res = executeQuery(conn, query->data);
- for (int i = 0; i < PQntuples(res); i++)
- {
- simple_string_list_append(names, PQgetvalue(res, i, 0));
- }
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
/*
* Dump contents of databases.
*/
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 7fdbe08e11..eb937e4648 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -25,6 +25,7 @@ OBJS = \
conditional.o \
exit_utils.o \
mbprint.o \
+ option_utils.o \
print.o \
psqlscan.o \
query_utils.o \
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
new file mode 100644
index 0000000000..7893df77aa
--- /dev/null
+++ b/src/fe_utils/option_utils.c
@@ -0,0 +1,225 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command-line option facilities for frontend code
+ *
+ * Functions for converting shell-style patterns into simple lists of Oids for
+ * database objects that match the patterns.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/option_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_class.h"
+#include "common/connect.h"
+#include "fe_utils/exit_utils.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "libpq-fe.h"
+#include "pqexpbuffer.h"
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+void
+expand_schema_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(conn, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"", cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all foreign servers matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+void
+expand_foreign_server_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_foreign_server s\n");
+ processSQLNamePattern(conn, query, cell->val, false,
+ false, NULL, "s.srvname", NULL, NULL);
+
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ if (PQntuples(res) == 0)
+ fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+void
+expand_table_name_patterns(PGconn *conn,
+ SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c', '%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
+ RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(conn, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL,
+ "pg_catalog.pg_table_is_visible(c.oid)");
+
+ ExecuteSqlStatement(conn, "RESET search_path");
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(conn,
+ ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"", cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find a list of database names that match the given patterns.
+ */
+void
+expand_dbname_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleStringList *names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs, which might sometimes result in
+ * duplicate entries in the name list, but we don't care, since all we're
+ * going to do is test membership of the list.
+ */
+
+ for (SimpleStringListCell *cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT datname FROM pg_catalog.pg_database n\n");
+ processSQLNamePattern(conn, query, cell->val, false,
+ false, NULL, "datname", NULL, NULL);
+
+ pg_log_info("executing %s", query->data);
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ for (int i = 0; i < PQntuples(res); i++)
+ {
+ simple_string_list_append(names, PQgetvalue(res, i, 0));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
new file mode 100644
index 0000000000..d626a0bbc9
--- /dev/null
+++ b/src/include/fe_utils/option_utils.h
@@ -0,0 +1,35 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command-line option facilities for frontend code
+ *
+ * Functions for converting shell-style patterns into simple lists of Oids for
+ * database objects that match the patterns.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/option_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OPTION_UTILS_H
+#define OPTION_UTILS_H
+
+#include "fe_utils/simple_list.h"
+#include "libpq-fe.h"
+
+extern void expand_schema_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+extern void expand_foreign_server_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids);
+extern void expand_table_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+extern void expand_dbname_patterns(PGconn *conn, SimpleStringList *patterns,
+ SimpleStringList *names);
+
+#endif /* OPTION_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v3-0008-Normalizing-option_utils-interface.patchapplication/octet-stream; name=v3-0008-Normalizing-option_utils-interface.patch; x-unix-mode=0644Download
From 403acd1ce7a28aaa30b2fc54aa5f7e0cc9eab25d Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 14:37:44 -0800
Subject: [PATCH v3 8/9] Normalizing option_utils interface.
The functions in option_utils were copied from pg_dump, mostly preserving
the function signatures. But the signatures and corresponding functionality
were originally written based solely on pg_dump's needs, not with the goal
of creating a consistent interface. Fixing that.
---
src/bin/pg_dump/pg_dump.c | 58 +++++++++++++++++++------
src/bin/pg_dump/pg_dumpall.c | 4 +-
src/fe_utils/option_utils.c | 66 ++++++++++++++++++++---------
src/fe_utils/string_utils.c | 50 ++++++++++++++++++++++
src/include/fe_utils/option_utils.h | 29 +++++++++----
src/include/fe_utils/string_utils.h | 6 +++
6 files changed, 169 insertions(+), 44 deletions(-)
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c334b9e829..5c446c0f24 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -125,6 +125,17 @@ static SimpleOidList tabledata_exclude_oids = {NULL, NULL};
static SimpleStringList foreign_servers_include_patterns = {NULL, NULL};
static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
+/*
+ * Cstring list of relkinds which qualify as tables for our purposes when
+ * processing table inclusion or exclusion patterns.
+ */
+#define TABLE_RELKIND_LIST CppAsString2(RELKIND_RELATION) ", " \
+ CppAsString2(RELKIND_SEQUENCE) ", " \
+ CppAsString2(RELKIND_VIEW) ", " \
+ CppAsString2(RELKIND_MATVIEW) ", " \
+ CppAsString2(RELKIND_FOREIGN_TABLE) ", " \
+ CppAsString2(RELKIND_PARTITIONED_TABLE)
+
static const CatalogId nilCatalogId = {0, 0};
/* override for standard extra_float_digits setting */
@@ -791,6 +802,7 @@ main(int argc, char **argv)
{
expand_schema_name_patterns(GetConnection(fout),
&schema_include_patterns,
+ NULL,
&schema_include_oids,
strict_names);
if (schema_include_oids.head == NULL)
@@ -798,6 +810,7 @@ main(int argc, char **argv)
}
expand_schema_name_patterns(GetConnection(fout),
&schema_exclude_patterns,
+ NULL,
&schema_exclude_oids,
false);
/* non-matching exclusion patterns aren't an error */
@@ -805,26 +818,43 @@ main(int argc, char **argv)
/* Expand table selection patterns into OID lists */
if (table_include_patterns.head != NULL)
{
- expand_table_name_patterns(GetConnection(fout),
- &table_include_patterns,
- &table_include_oids,
- strict_names);
+ expand_rel_name_patterns(GetConnection(fout),
+ &table_include_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_include_oids,
+ strict_names,
+ true);
if (table_include_oids.head == NULL)
fatal("no matching tables were found");
}
- expand_table_name_patterns(GetConnection(fout),
- &table_exclude_patterns,
- &table_exclude_oids,
- false);
-
- expand_table_name_patterns(GetConnection(fout),
- &tabledata_exclude_patterns,
- &tabledata_exclude_oids,
- false);
+ expand_rel_name_patterns(GetConnection(fout),
+ &table_exclude_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_exclude_oids,
+ false,
+ true);
+
+ expand_rel_name_patterns(GetConnection(fout),
+ &tabledata_exclude_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &tabledata_exclude_oids,
+ false,
+ true);
expand_foreign_server_name_patterns(GetConnection(fout),
&foreign_servers_include_patterns,
- &foreign_servers_include_oids);
+ NULL,
+ &foreign_servers_include_oids,
+ true);
/* non-matching exclusion patterns aren't an error */
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 01db15dfda..2b3a4e3349 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -471,8 +471,8 @@ main(int argc, char *argv[])
/*
* Get a list of database names that match the exclude patterns
*/
- expand_dbname_patterns(conn, &database_exclude_patterns,
- &database_exclude_names);
+ expand_dbname_patterns(conn, &database_exclude_patterns, NULL,
+ &database_exclude_names, false);
/*
* Open the output file if required, otherwise use stdout
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
index 7893df77aa..76ca456784 100644
--- a/src/fe_utils/option_utils.c
+++ b/src/fe_utils/option_utils.c
@@ -14,6 +14,7 @@
*/
#include "postgres_fe.h"
+#include "catalog/pg_am.h"
#include "catalog/pg_class.h"
#include "common/connect.h"
#include "fe_utils/exit_utils.h"
@@ -30,7 +31,8 @@
*/
void
expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
SimpleOidList *oids,
bool strict_names)
{
@@ -55,6 +57,7 @@ expand_schema_name_patterns(PGconn *conn,
"SELECT oid FROM pg_catalog.pg_namespace n\n");
processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "n.nspname", NULL, NULL);
+ exclude_filter(query, "n.oid", exclude_oids);
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
if (strict_names && PQntuples(res) == 0)
@@ -78,8 +81,10 @@ expand_schema_name_patterns(PGconn *conn,
*/
void
expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids)
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names)
{
PQExpBuffer query;
PGresult *res;
@@ -102,9 +107,10 @@ expand_foreign_server_name_patterns(PGconn *conn,
"SELECT oid FROM pg_catalog.pg_foreign_server s\n");
processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "s.srvname", NULL, NULL);
+ exclude_filter(query, "s.oid", exclude_oids);
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- if (PQntuples(res) == 0)
+ if (strict_names && PQntuples(res) == 0)
fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
for (i = 0; i < PQntuples(res); i++)
@@ -118,18 +124,32 @@ expand_foreign_server_name_patterns(PGconn *conn,
}
/*
- * Find the OIDs of all tables matching the given list of patterns,
- * and append them to the given OID list.
+ * Find the OIDs of all relations matching the given list of patterns
+ * and restrictions, and append them to the given OID list.
*/
void
-expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns, SimpleOidList *oids,
- bool strict_names)
+expand_rel_name_patterns(PGconn *conn,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ const char *relkinds,
+ char amtype,
+ SimpleOidList *oids,
+ bool strict_names,
+ bool restrict_visible)
{
PQExpBuffer query;
PGresult *res;
SimpleStringListCell *cell;
int i;
+ const char *visibility_rule;
+
+ Assert(amtype == AMTYPE_TABLE || amtype == AMTYPE_INDEX);
+
+ if (restrict_visible)
+ visibility_rule = "pg_catalog.pg_table_is_visible(c.oid)";
+ else
+ visibility_rule = NULL;
if (patterns->head == NULL)
return; /* nothing to do */
@@ -154,20 +174,22 @@ expand_table_name_patterns(PGconn *conn,
"\n LEFT JOIN pg_catalog.pg_namespace n"
"\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
"\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
- "\n (array['%c', '%c', '%c', '%c', '%c', '%c'])\n",
- RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
- RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
- RELKIND_PARTITIONED_TABLE);
+ "\n (array[%s])\n", relkinds);
processSQLNamePattern(conn, query, cell->val, true,
- false, "n.nspname", "c.relname", NULL,
- "pg_catalog.pg_table_is_visible(c.oid)");
+ false, "n.nspname", "c.relname", NULL, visibility_rule);
+ exclude_filter(query, "n.oid", exclude_nsp_oids);
+ exclude_filter(query, "c.oid", exclude_oids);
ExecuteSqlStatement(conn, "RESET search_path");
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
PQclear(ExecuteSqlQueryForSingleRow(conn,
ALWAYS_SECURE_SEARCH_PATH_SQL));
if (strict_names && PQntuples(res) == 0)
- fatal("no matching tables were found for pattern \"%s\"", cell->val);
+ {
+ if (amtype == AMTYPE_TABLE)
+ fatal("no matching tables were found for pattern \"%s\"", cell->val);
+ fatal("no matching indexes were found for pattern \"%s\"", cell->val);
+ }
for (i = 0; i < PQntuples(res); i++)
{
@@ -186,8 +208,10 @@ expand_table_name_patterns(PGconn *conn,
*/
void
expand_dbname_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleStringList *names)
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleStringList *names,
+ bool strict_names)
{
PQExpBuffer query;
PGresult *res;
@@ -206,12 +230,16 @@ expand_dbname_patterns(PGconn *conn,
for (SimpleStringListCell *cell = patterns->head; cell; cell = cell->next)
{
appendPQExpBufferStr(query,
- "SELECT datname FROM pg_catalog.pg_database n\n");
+ "SELECT datname FROM pg_catalog.pg_database d\n");
processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "datname", NULL, NULL);
+ exclude_filter(query, "d.oid", exclude_oids);
pg_log_info("executing %s", query->data);
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching databases were found for pattern \"%s\"", cell->val);
+
for (int i = 0; i < PQntuples(res); i++)
{
simple_string_list_append(names, PQgetvalue(res, i, 0));
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index a1a9d691d5..4e57a6f940 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -797,6 +797,56 @@ appendReloptionsArray(PQExpBuffer buffer, const char *reloptions,
return true;
}
+/*
+ * Internal implementation of include_filter and exclude_filter.
+ */
+static void
+apply_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids,
+ bool include)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ if (!oids || !oids->head)
+ return;
+
+ if (include)
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.=) ANY(array[", lval);
+ else
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.!=) ALL(array[", lval);
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(querybuf, "%s%u", comma, cell->val);
+ appendPQExpBuffer(querybuf, "]::OID[])");
+}
+
+/*
+ * Conditionally add a restriction to a query such that lval must be an Oid in
+ * the given list of Oids, except that for a null or empty oids list argument,
+ * no filtering is done and we return without having modified the query buffer.
+ *
+ * The query argument must already have begun the WHERE clause and must be in a
+ * state where we can append an AND clause. No checking of this requirement is
+ * done here.
+ *
+ * On return, the query buffer will be extended with an AND clause that filters
+ * only those rows where the lval is an Oid present in the given list of oids.
+ */
+void
+include_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, true);
+}
+
+/*
+ * Same as include_filter, above, except that for a non-null, non-empty oids
+ * list, the lval is restricted to not be any of the values in the list.
+ */
+void
+exclude_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, false);
+}
/*
* processSQLNamePattern
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
index d626a0bbc9..53da30754f 100644
--- a/src/include/fe_utils/option_utils.h
+++ b/src/include/fe_utils/option_utils.h
@@ -19,17 +19,28 @@
#include "libpq-fe.h"
extern void expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
SimpleOidList *oids,
bool strict_names);
extern void expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids);
-extern void expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names);
-extern void expand_dbname_patterns(PGconn *conn, SimpleStringList *patterns,
- SimpleStringList *names);
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+extern void expand_rel_name_patterns(PGconn *conn,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ const char *relkinds,
+ char amtype,
+ SimpleOidList *oids,
+ bool strict_names,
+ bool restrict_visible);
+extern void expand_dbname_patterns(PGconn *conn,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleStringList *names,
+ bool strict_names);
#endif /* OPTION_UTILS_H */
diff --git a/src/include/fe_utils/string_utils.h b/src/include/fe_utils/string_utils.h
index c290c302f5..301a8eef4d 100644
--- a/src/include/fe_utils/string_utils.h
+++ b/src/include/fe_utils/string_utils.h
@@ -16,6 +16,7 @@
#ifndef STRING_UTILS_H
#define STRING_UTILS_H
+#include "fe_utils/simple_list.h"
#include "libpq-fe.h"
#include "pqexpbuffer.h"
@@ -50,6 +51,11 @@ extern bool parsePGArray(const char *atext, char ***itemarray, int *nitems);
extern bool appendReloptionsArray(PQExpBuffer buffer, const char *reloptions,
const char *prefix, int encoding, bool std_strings);
+extern void include_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids);
+extern void exclude_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids);
+
extern bool processSQLNamePattern(PGconn *conn, PQExpBuffer buf,
const char *pattern,
bool have_where, bool force_escape,
--
2.21.1 (Apple Git-122.3)
v3-0009-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v3-0009-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From c0d58f1273dee6bd5f4a907c16d88b1cb69cc958 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 15:47:08 -0800
Subject: [PATCH v3 9/9] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 884 +++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.control | 5 +
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 60 ++
contrib/pg_amcheck/t/003_check.pl | 248 ++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 496 ++++++++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 ++
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 493 ++++++++++++
12 files changed, 2278 insertions(+)
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.control
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 7a4866e338..0fd4125902 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..992615ab3b
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,884 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "common/connect.h"
+#include "common/string.h"
+#include "fe_utils/exit_utils.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/print.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "pg_getopt.h"
+#include "storage/block.h"
+
+typedef struct ConnectOptions
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes */
+ bool check_toast; /* Check associated toast tables and indexes */
+ bool check_corrupt; /* Check indexes even if table is corrupt */
+ bool heapallindexed; /* Perform index to table reconciling checks */
+ bool rootdescend; /* Perform index rootdescend checks */
+ bool verbose;
+ long startblock; /* Block number where checking begins */
+ long endblock; /* Block number where checking ends, inclusive */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/* Connection to backend */
+static PGconn *conn;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of Oids of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+static SimpleStringList index_include_patterns = {NULL, NULL};
+static SimpleOidList index_include_oids = {NULL, NULL};
+static SimpleStringList index_exclude_patterns = {NULL, NULL};
+static SimpleOidList index_exclude_oids = {NULL, NULL};
+
+/*
+ * Cstring list of relkinds which qualify as tables for our purposes when
+ * processing table inclusion or exclusion patterns.
+ */
+#define TABLE_RELKIND_LIST CppAsString2(RELKIND_RELATION) ", " \
+ CppAsString2(RELKIND_MATVIEW) ", " \
+ CppAsString2(RELKIND_PARTITIONED_TABLE)
+
+#define INDEX_RELKIND_LIST CppAsString2(RELKIND_INDEX)
+
+/*
+ * List of main tables to be checked, compiled from above lists, and
+ * corresponding list of toast tables. The lists should always be
+ * the same length, with InvalidOid in the toastlist for main relations
+ * without a corresponding toast relation.
+ */
+static SimpleOidList mainlist = {NULL, NULL};
+static SimpleOidList toastlist = {NULL, NULL};
+
+
+/*
+ * Functions for running the various corruption checks.
+ */
+static void check_tables(SimpleOidList *checklist);
+static uint64 check_table(Oid tbloid, long startblock, long endblock,
+ bool on_error_stop, bool check_toast);
+static uint64 check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids);
+static uint64 check_index(const char *idxoid, const char *idxname,
+ const char *tblname);
+
+/*
+ * Functions implementing standard command line behaviors.
+ */
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+static void NoticeProcessor(void *arg, const char *message);
+
+static void get_table_check_list(const SimpleOidList *include_nsp,
+ const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl,
+ const SimpleOidList *exclude_tbl);
+
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char *password = NULL;
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ conn = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ settings.startblock = -1;
+ settings.endblock = -1;
+
+ /*
+ * Default behaviors for user settable options. Note that these default
+ * to doing all the safe checks and none of the unsafe ones, on the theory
+ * that if a user says "pg_amcheck mydb" without specifying any additional
+ * options, we should check everything we know how to check without
+ * risking any backend aborts.
+ */
+
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+
+ /* Index checking options */
+ settings.check_indexes = true;
+ settings.check_corrupt = true;
+ settings.heapallindexed = false;
+ settings.rootdescend = false;
+
+ /*
+ * Reconciling toasted attributes from the main table against the toast
+ * table can crash the backend if the toast table or index are corrupt.
+ * We can optionally check the toast table and then the toast index prior
+ * to checking the main table, but if the toast table or index are
+ * concurrently corrupted after we conclude they are valid, the check of
+ * the main table can crash the backend. The oneous is on any caller who
+ * enables this option to make certain the environment is sufficiently
+ * stable that concurrent corruptions of the toast is not possible.
+ */
+ settings.check_toast = false;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ password = simple_prompt("Password: ", false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ conn = PQconnectdbParams(keywords, values, true);
+ if (!conn)
+ fatal("could not connect to database %s: out of memory", values[4]);
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(conn) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(conn) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(conn);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf("Password for user %s: ",
+ realusername);
+ else
+ password_prompt = pg_strdup("Password: ");
+ PQfinish(conn);
+
+ password = simple_prompt(password_prompt, false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+
+ if (!new_pass && PQstatus(conn) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to database %s: %s",
+ values[4], PQerrorMessage(conn));
+ PQfinish(conn);
+ exit(1);
+ }
+ } while (new_pass);
+
+ if (settings.verbose)
+ PQsetErrorVerbosity(conn, PQERRORS_VERBOSE);
+
+ /*
+ * Expand schema, table, and index exclusion patterns, if any. Note that
+ * non-matching exclusion patterns are not an error, even when
+ * --strict-names was specified.
+ */
+ expand_schema_name_patterns(conn,
+ &schema_exclude_patterns,
+ NULL,
+ &schema_exclude_oids,
+ false);
+ expand_rel_name_patterns(conn,
+ &table_exclude_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_exclude_oids,
+ false,
+ false);
+ expand_rel_name_patterns(conn,
+ &index_exclude_patterns,
+ NULL,
+ NULL,
+ INDEX_RELKIND_LIST,
+ AMTYPE_INDEX,
+ &index_exclude_oids,
+ false,
+ false);
+
+ /* Expand schema selection patterns into Oid lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(conn,
+ &schema_include_patterns,
+ &schema_exclude_oids,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+
+ /* Expand table selection patterns into Oid lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_rel_name_patterns(conn,
+ &table_include_patterns,
+ &schema_exclude_oids,
+ &table_exclude_oids,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_include_oids,
+ settings.strict_names,
+ false);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+
+ /* Expand index selection patterns into Oid lists */
+ if (index_include_patterns.head != NULL)
+ {
+ expand_rel_name_patterns(conn,
+ &index_include_patterns,
+ &schema_exclude_oids,
+ &index_exclude_oids,
+ INDEX_RELKIND_LIST,
+ AMTYPE_INDEX,
+ &index_include_oids,
+ settings.strict_names,
+ false);
+ if (index_include_oids.head == NULL)
+ fatal("no matching indexes were found");
+ }
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids);
+
+ PQsetNoticeProcessor(conn, NoticeProcessor, NULL);
+
+ if (settings.check_toast)
+ check_tables(&toastlist);
+ check_tables(&mainlist);
+
+ return 0;
+}
+
+/*
+ * Check each table from the given checklist per the user specified options.
+ */
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ uint64 corruptions = 0;
+
+ if (!OidIsValid(cell->val))
+ continue;
+
+ corruptions = check_table(cell->val,
+ settings.startblock,
+ settings.endblock,
+ settings.on_error_stop,
+ settings.check_toast);
+
+ if (settings.check_indexes)
+ {
+ /* Optionally skip the index checks for a corrupt table. */
+ if (corruptions && !settings.check_corrupt)
+ continue;
+
+ corruptions += check_indexes(cell->val,
+ &index_include_oids,
+ &index_exclude_oids);
+ }
+ }
+}
+
+/*
+ * Checks the given table for corruption, returning the number of corruptions
+ * detected and printed to the user.
+ */
+static uint64
+check_table(Oid tbloid, long startblock, long endblock,
+ bool on_error_stop, bool check_toast)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *skip;
+ char *toast;
+ const char *stop;
+ uint64 corruption_cnt = 0;
+
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("'none'");
+ stop = on_error_stop ? "true" : "false";
+ toast = check_toast ? "true" : "false";
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg "
+ "FROM public.verify_heapam("
+ "relation := %u, "
+ "on_error_stop := %s, "
+ "skip := %s, "
+ "check_toast := %s, ",
+ tbloid, stop, skip, toast);
+ if (startblock < 0)
+ appendPQExpBuffer(querybuf, "startblock := NULL, ");
+ else
+ appendPQExpBuffer(querybuf, "startblock := %ld, ", startblock);
+
+ if (endblock < 0)
+ appendPQExpBuffer(querybuf, "endblock := NULL");
+ else
+ appendPQExpBuffer(querybuf, "endblock := %ld", endblock);
+
+ appendPQExpBuffer(querybuf, ") v, pg_catalog.pg_class c "
+ "WHERE c.oid = %u", tbloid);
+
+ res = PQexec(conn, querybuf->data);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ {
+ corruption_cnt += PQntuples(res);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (!PQgetisnull(res, i, 3))
+ printf("relation %s, block %s, offset %s, attribute %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ else if (!PQgetisnull(res, i, 2))
+ printf("relation %s, block %s, offset %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ else if (!PQgetisnull(res, i, 1))
+ printf("relation %s, block %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 4)); /* msg */
+ else if (!PQgetisnull(res, i, 0))
+ printf("relation %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 4)); /* msg */
+ else
+ printf("%s\n", PQgetvalue(res, i, 4)); /* msg */
+ }
+ }
+ else if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("relation with OID %u\n %s\n", tbloid, PQerrorMessage(conn));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+ return corruption_cnt;
+}
+
+static uint64
+check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ uint64 corruption_cnt = 0;
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT i.indexrelid, ci.relname, ct.relname"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class ci, "
+ "pg_catalog.pg_class ct"
+ "\nWHERE i.indexrelid = ci.oid"
+ "\n AND i.indrelid = ct.oid"
+ "\n AND ci.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+ include_filter(querybuf, "i.indexrelid", include_oids);
+ exclude_filter(querybuf, "i.indexrelid", exclude_oids);
+
+ res = PQexec(conn, querybuf->data);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ for (i = 0; i < PQntuples(res); i++)
+ corruption_cnt += check_index(PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2));
+ }
+ else
+ {
+ corruption_cnt++;
+ printf("%s\n", PQerrorMessage(conn));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static uint64
+check_index(const char *idxoid, const char *idxname, const char *tblname)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ uint64 corruption_cnt = 0;
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT public.bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
+ res = PQexec(conn, querybuf->data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("index check failed for index %s of table %s:\n",
+ idxname, tblname);
+ printf("%s", PQerrorMessage(conn));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"check-toast", no_argument, NULL, 'z'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"heapallindexed", no_argument, NULL, 'a'},
+ {"help", optional_argument, NULL, '?'},
+ {"host", required_argument, NULL, 'h'},
+ {"index", required_argument, NULL, 'i'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"password", no_argument, NULL, 'W'},
+ {"port", required_argument, NULL, 'p'},
+ {"rootdescend", no_argument, NULL, 'r'},
+ {"schema", required_argument, NULL, 'n'},
+ {"skip", required_argument, NULL, 'S'},
+ {"skip-corrupt", no_argument, NULL, 'C'},
+ {"skip-indexes", no_argument, NULL, 'X'},
+ {"skip-toast", no_argument, NULL, 'Z'},
+ {"startblock", required_argument, NULL, 'b'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"toast-endblock", required_argument, NULL, 'E'},
+ {"toast-startblock", required_argument, NULL, 'B'},
+ {"username", required_argument, NULL, 'U'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"version", no_argument, NULL, 'V'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "ab:Cd:e:h:i:I:n:N:op:rsS:t:T:U:vVwWXzZ?1",
+ long_options, &optindex)) != -1)
+ {
+ char *endptr;
+ switch (c)
+ {
+ case 'a':
+ settings.heapallindexed = true;
+ break;
+ case 'b':
+ settings.startblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ fatal("relation starting block argument contains garbage characters");
+ if (settings.startblock > (long)MaxBlockNumber)
+ fatal("relation starting block argument out of bounds");
+ break;
+ case 'C':
+ settings.check_corrupt = false;
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ fatal("relation ending block argument contains garbage characters");
+ if (settings.endblock > (long)MaxBlockNumber)
+ fatal("relation ending block argument out of bounds");
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ simple_string_list_append(&index_include_patterns, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&index_exclude_patterns, optarg);
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 'S':
+ if (pg_strcasecmp(optarg, "all-visible") == 0)
+ {
+ settings.skip_visible = true;
+ settings.skip_frozen = false;
+ }
+ else if (pg_strcasecmp(optarg, "all-frozen") == 0)
+ {
+ settings.skip_frozen = true;
+ settings.skip_visible = false;
+ }
+ else
+ {
+ exit(EXIT_FAILURE);
+ pg_log_error("invalid skip option");
+ }
+ break;
+ case 'r':
+ settings.rootdescend = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case 'X':
+ settings.check_indexes = false;
+ break;
+ case 'v':
+ settings.verbose = true;
+ break;
+ case 'z':
+ settings.check_toast = true;
+ break;
+ case 'Z':
+ settings.check_toast = false;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, "Try \"%s --help\" for more information.\n",
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+ if (settings.endblock >= 0 && settings.endblock < settings.startblock)
+ fatal("relation ending block argument precedes starting block argument");
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ printf("pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.\n");
+ printf("\n");
+ printf("Usage:\n");
+ printf(" pg_amcheck [OPTION]... [DBNAME [USERNAME]]\n");
+ printf("\n");
+ printf("General options:\n");
+ printf(" -V, --version output version information, then exit\n");
+ printf(" -?, --help show this help, then exit\n");
+ printf(" -s, --strict-names require include patterns to match at least one entity each\n");
+ printf(" -o, --on-error-stop stop checking at end of first corrupt page\n");
+ printf(" -v, --verbose output verbose messages\n");
+ printf("\n");
+ printf("Schema checking options:\n");
+ printf(" -n, --schema=PATTERN check relations in the specified schema(s) only\n");
+ printf(" -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)\n");
+ printf("\n");
+ printf("Table checking options:\n");
+ printf(" -t, --table=PATTERN check the specified table(s) only\n");
+ printf(" -T, --exclude-table=PATTERN do NOT check the specified table(s)\n");
+ printf(" -b, --startblock begin checking table(s) at the given starting block number\n");
+ printf(" -e, --endblock check table(s) only up to the given ending block number\n");
+ printf(" -S, --skip=OPTION do NOT check \"all-frozen\" or \"all-visible\" blocks\n");
+ printf("\n");
+ printf("TOAST table checking options:\n");
+ printf(" -z, --check-toast check associated toast tables and toast indexes\n");
+ printf(" -Z, --skip-toast do NOT check associated toast tables and toast indexes\n");
+ printf("\n");
+ printf("Index checking options:\n");
+ printf(" -X, --skip-indexes do NOT check any btree indexes\n");
+ printf(" -i, --index=PATTERN check the specified index(es) only\n");
+ printf(" -I, --exclude-index=PATTERN do NOT check the specified index(es)\n");
+ printf(" -C, --skip-corrupt do NOT check indexes if their associated table is corrupt\n");
+ printf(" -a, --heapallindexed check index tuples against the table tuples\n");
+ printf(" -r, --rootdescend search from the root page for each index tuple\n");
+ printf("\n");
+ printf("Connection options:\n");
+ printf(" -d, --dbname=DBNAME database name to connect to\n");
+ printf(" -h, --host=HOSTNAME database server host or socket directory\n");
+ printf(" -p, --port=PORT database server port\n");
+ printf(" -U, --username=USERNAME database user name\n");
+ printf(" -w, --no-password never prompt for password\n");
+ printf(" -W, --password force password prompt (should happen automatically)\n");
+ printf("\n");
+ printf("Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+static void
+get_table_check_list(const SimpleOidList *include_nsp, const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl, const SimpleOidList *exclude_tbl)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid, c.reltoastrelid"
+ "\nFROM pg_catalog.pg_class c, pg_catalog.pg_namespace n"
+ "\nWHERE n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\n AND c.relkind OPERATOR(pg_catalog.=) ANY(ARRAY[%s])\n",
+ TABLE_RELKIND_LIST);
+ include_filter(querybuf, "n.oid", include_nsp);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+ include_filter(querybuf, "c.oid", include_tbl);
+ exclude_filter(querybuf, "c.oid", exclude_tbl);
+
+ res = ExecuteSqlQuery(conn, querybuf->data, PGRES_TUPLES_OK);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(&mainlist, atooid(PQgetvalue(res, i, 0)));
+ simple_oid_list_append(&toastlist, atooid(PQgetvalue(res, i, 1)));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.control b/contrib/pg_amcheck/pg_amcheck.control
new file mode 100644
index 0000000000..395f368101
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.control
@@ -0,0 +1,5 @@
+# pg_amcheck extension
+comment = 'command-line tool for verifying relation integrity'
+default_version = '1.3'
+module_pathname = '$libdir/pg_amcheck'
+relocatable = true
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..189f05ef0a
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,60 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to database qqq: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user', 'postgres' ],
+ qr/\Qpg_amcheck: error: could not connect to database postgres: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-i', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching indexes were found for pattern\E/,
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..30bbbdeddd
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,248 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 45;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($)
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the node is running.
+sub corrupt_first_page($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+-- We'll corrupt all indexes in s1
+CREATE SCHEMA s1;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s1.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables in s2
+CREATE SCHEMA s2;
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+INSERT INTO s2.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s2.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables and indexes in s3
+CREATE SCHEMA s3;
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+INSERT INTO s3.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s3.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll leave everything in s4 uncorrupted
+CREATE SCHEMA s4;
+CREATE TABLE s4.t1 (a TEXT);
+CREATE TABLE s4.t2 (a TEXT);
+CREATE INDEX i1 ON s4.t1(a);
+CREATE INDEX i2 ON s4.t2(a);
+INSERT INTO s4.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s4.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('s1.i1');
+corrupt_first_page('s1.i2');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('s2.t1');
+corrupt_first_page('s2.t2');
+
+# Corrupt tables and indexes in schema "s3"
+remove_relation_file('s3.i1');
+corrupt_first_page('s3.i2');
+remove_relation_file('s3.t1');
+corrupt_first_page('s3.t2');
+
+# Leave schema "s4" alone
+
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's*', '-t', 't1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s1.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+
+# In schema s3, the tables and indexes are both corrupt. Ordinarily, checking
+# of indexes will not be performed for corrupt tables, but the --check-corrupt
+# option (-c) forces the indexes to also be checked.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's3', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s3.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's3', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s3.s2 reports index corruption');
+
+
+# Check that '-x' and '-X' work as expected. Since only index corruption
+# (and not table corruption) exists in s1, '-X' should give no errors, and
+# '-x' should give errors about index corruption.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck over tables and indexes in schema s1 reports corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over only tables in schema s1 reports no corruption');
+
+
+# Check that table corruption is reported as expected, with or without
+# index checking
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables in schema s2 reports table corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables and indexes in schema s2 reports table corruption');
+
+# Check that no corruption is reported in schema s4
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s4 reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt schemas
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-N', 's1', '-N', 's2', '-N', 's3' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt schemas reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt tables
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt tables reports no corruption');
+
+# Check errors about bad block range command line arguments. We use schema s4
+# to avoid getting messages about corrupt tables or indexes.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4', '-b', 'junk' ],
+ qr/\Qpg_amcheck: error: relation starting block argument contains garbage characters\E/,
+ 'pg_amcheck rejects garbage startblock');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4', '-e', '1234junk' ],
+ qr/\Qpg_amcheck: error: relation ending block argument contains garbage characters\E/,
+ 'pg_amcheck rejects garbage endblock');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4', '-b', '5', '-e', '4' ],
+ qr/\Qpg_amcheck: error: relation ending block argument precedes starting block argument\E/,
+ 'pg_amcheck rejects invalid block range');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..271eca7da6
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,496 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper function to generate a regular expression matching the header we
+# expect verify_heapam() to return given which fields we expect to be non-null.
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ return qr/relation test, block $blkno, offset $offnum, attribute $attnum\s+/ms
+ if (defined $attnum);
+ return qr/relation test, block $blkno, offset $offnum\s+/ms
+ if (defined $offnum);
+ return qr/relation test\s+/ms
+ if (defined $blkno);
+ return qr/relation test\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, undef);
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..c24f154883
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index ae2759be55..797b4dc61e 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -119,6 +119,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&oldsnapshot;
&pageinspect;
&passwordcheck;
+ &pgamcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a4e1b28b38 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..00643d2e58
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,493 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<sect1 id="pgamcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> and <function>verify_heapam</function>
+ functions.
+ </para>
+
+<synopsis>
+pg_amcheck mydb
+</synopsis>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ The following command-line options for controlling general program behavior
+ are recognized.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Show <application>pg_amcheck</application> version number, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_amcheck</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Specifies verbose mode. This will cause
+ <application>pg_amcheck</application> to output more detailed information
+ about its activities, mostly to do with its communication with the
+ database.
+ </para>
+ <para>
+ Note that this does not increase the number of corruptions reported nor
+ the level of detail reported about each of them.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ The following command-line options control which datatabase objects
+ <application>pg_amcheck</application> checks and how such options
+ are interpreted.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-i <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--index=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ For indexes associated with tables being checked, check only those
+ indexes with names matching <replaceable
+ class="parameter">pattern</replaceable>. Multiple indexes can be
+ selected by writing multiple <option>-i</option> switches. The
+ <replaceable class="parameter">pattern</replaceable> parameter is
+ interpreted as a pattern according to the same rules used by
+ <application>psql</application>'s <literal>\d</literal> commands (see
+ <xref linkend="app-psql-patterns"/>), so multiple indexes can also
+ be selected by writing wildcard characters in the pattern. When using
+ wildcards, be careful to quote the pattern if needed to prevent the
+ shell from expanding the wildcards.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-I <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--exclude-index=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check any indexes matching <replaceable
+ class="parameter">pattern</replaceable>. The pattern is interpreted
+ according to the same rules as for <option>-i</option>.
+ <option>-I</option> can be given more than once to exclude indexes
+ matching any of several patterns.
+ </para>
+
+ <para>
+ When both <option>-i</option> and <option>-I</option> are given, the
+ behavior is to check just the indexes that match at least one
+ <option>-i</option> switch but no <option>-I</option> switches. If
+ <option>-I</option> appears without <option>-i</option>, then indexes
+ matching <option>-I</option> are excluded from what is otherwise a check
+ of all indexes associated with tables that are checked.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-n <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--schema=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Dump only schemas matching <replaceable
+ class="parameter">pattern</replaceable>; this selects both the
+ schema itself, and all its contained objects. When this option is
+ not specified, all non-system schemas in the target database will be
+ checked. Multiple schemas can be
+ selected by writing multiple <option>-n</option> switches. The
+ <replaceable class="parameter">pattern</replaceable> parameter is
+ interpreted as a pattern according to the same rules used by
+ <application>psql</application>'s <literal>\d</literal> commands
+ (see <xref linkend="app-psql-patterns"/>),
+ so multiple schemas can also be selected by writing wildcard characters
+ in the pattern. When using wildcards, be careful to quote the pattern
+ if needed to prevent the shell from expanding the wildcards.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-N <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--exclude-schema=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check any schemas matching <replaceable
+ class="parameter">pattern</replaceable>. The pattern is
+ interpreted according to the same rules as for <option>-n</option>.
+ <option>-N</option> can be given more than once to exclude schemas
+ matching any of several patterns.
+ </para>
+
+ <para>
+ When both <option>-n</option> and <option>-N</option> are given, the behavior
+ is to check just the schemas that match at least one <option>-n</option>
+ switch but no <option>-N</option> switches. If <option>-N</option> appears
+ without <option>-n</option>, then schemas matching <option>-N</option> are
+ excluded from what is otherwise a check of all schemas.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--strict-names</option></term>
+ <listitem>
+ <para>
+ Requires that each schema
+ (<option>-n</option>/<option>--schema</option>), table
+ (<option>-t</option>/<option>--table</option>) and index
+ (<option>-i</option>/<option>--index</option>) qualifier match at least
+ one schema/table/index in the database to be checked.
+ </para>
+ <para>
+ This option has no effect on
+ <option>-N</option>/<option>--exclude-schema</option>,
+ <option>-T</option>/<option>--exclude-table</option>,
+ or <option>-I</option><option>--exclude-index</option>. An exclude
+ pattern failing to match any objects is not considered an error.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--table=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Check only tables with names matching
+ <replaceable class="parameter">pattern</replaceable>. Multiple tables
+ can be selected by writing multiple <option>-t</option> switches. The
+ <replaceable class="parameter">pattern</replaceable> parameter is
+ interpreted as a pattern according to the same rules used by
+ <application>psql</application>'s <literal>\d</literal> commands
+ (see <xref linkend="app-psql-patterns"/>),
+ so multiple tables can also be selected by writing wildcard characters
+ in the pattern. When using wildcards, be careful to quote the pattern
+ if needed to prevent the shell from expanding the wildcards.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--exclude-table=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check any tables matching <replaceable
+ class="parameter">pattern</replaceable>. The pattern is interpreted
+ according to the same rules as for <option>-t</option>.
+ <option>-T</option> can be given more than once to exclude tables
+ matching any of several patterns.
+ </para>
+
+ <para>
+ When both <option>-t</option> and <option>-T</option> are given, the
+ behavior is to check just the tables that match at least one
+ <option>-t</option> switch but no <option>-T</option> switches. If
+ <option>-T</option> appears without <option>-t</option>, then tables
+ matching <option>-T</option> are excluded from what is otherwise a check
+ of all tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ The following command-line options control additional behaviors.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-a</option></term>
+ <term><option>--heapallindexed</option></term>
+ <listitem>
+ <para>
+ When checking indexes, additionally verify the presence of all heap
+ tuples as index tuples within the index.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-b <replaceable class="parameter">block</replaceable></option></term>
+ <term><option>--startblock=<replaceable class="parameter">block</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check blocks prior to <replaceable
+ class="parameter">block</replaceable>, which should be a non-negative
+ integer. (Negative values disable the option).
+ </para>
+ <para>
+ When both <option>-b</option> <option>--startblock</option> and
+ <option>-e</option> <option>--endblock</option> are specified, the end
+ block must not be less than the start block.
+ </para>
+ <para>
+ The <option>-b</option> <option>--startblock</option> option will be
+ applied to all tables that are checked, including toast tables. The
+ option is most useful when checking exactly one table, to focus the
+ checking on just specific blocks of that one table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-C</option></term>
+ <term><option>--skip-corrupt</option></term>
+ <listitem>
+ <para>
+ Skip checking indexes for a table if the table is found to be corrupt.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-e <replaceable class="parameter">block</replaceable></option></term>
+ <term><option>--endblock=<replaceable class="parameter">block</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check blocks after <replaceable
+ class="parameter">block</replaceable>, which should be a non-negative
+ integer. (Negative values disable the option).
+ </para>
+ <para>
+ When both <option>-b</option> <option>--startblock</option> and
+ <option>-e</option> <option>--endblock</option> are specified, the end
+ block must not be less than the start block.
+ </para>
+ <para>
+ The <option>-e</option> <option>--endblock</option> option will be
+ applied to all tables that are checked, including toast tables. The
+ option is most useful when checking exactly one table, to focus the
+ checking on just specific blocks of that one table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-o</option></term>
+ <term><option>--on-error-stop</option></term>
+ <listitem>
+ <para>
+ Stop checking database objects at the end of the first page on
+ which the first corruption is found. Note that even with this option
+ enabled, more than one corruption message may be reported.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-r</option></term>
+ <term><option>--rootdescend</option></term>
+ <listitem>
+ <para>
+ When checking indexes, for each tuple, perform addition verification by
+ re-finding the tuple on the leaf level by performing a new search from
+ the root page.
+ </para>
+ <para>
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited or even of no
+ use in helping detect the kinds of corruption that occur in practice.
+ In any event, it is known to be a rather expensive check to perform.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-S</option></term>
+ <term><option>--skip=<replaceable class="parameter">blocks</replaceable></option></term>
+ <listitem>
+ <para>
+ When <option>-S</option><replaceable
+ class="parameter">all-visible</replaceable> is given, skips corruption
+ checking of blocks marked as all visible in the visibility map.
+ </para>
+ <para>
+ When <option>-S</option><replaceable
+ class="parameter">all-frozen</replaceable> is given, skips corruption
+ checking of blocks marked as all frozen in the visibility map.
+ </para>
+ <para>
+ The default is to check blocks without regard to their marking in the
+ visibility map.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-X</option></term>
+ <term><option>--skip-indexes</option></term>
+ <listitem>
+ <para>
+ Check only tables but not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-Z</option></term>
+ <term><option>--skip-toast</option></term>
+ <listitem>
+ <para>
+ Do not check toast tables nor their indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+
+ <para>
+ The following additional command-line options control the database
+ connection parameters.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-d <replaceable class="parameter">dbname</replaceable></option></term>
+ <term><option>--dbname=<replaceable class="parameter">dbname</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to connect to. This is
+ equivalent to specifying <replaceable
+ class="parameter">dbname</replaceable> as the first non-option
+ argument on the command line. The <replaceable>dbname</replaceable>
+ can be a <link linkend="libpq-connstring">connection string</link>.
+ If so, connection string parameters will override any conflicting
+ command line options.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+ <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server is
+ running. If the value begins with a slash, it is used as the
+ directory for the Unix domain socket. The default is taken
+ from the <envar>PGHOST</envar> environment variable, if set,
+ else a Unix domain socket connection is attempted.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+ <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file
+ extension on which the server is listening for connections.
+ Defaults to the <envar>PGPORT</envar> environment variable, if
+ set, or a compiled-in default.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-U <replaceable>username</replaceable></option></term>
+ <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires
+ password authentication and a password is not available by
+ other means such as a <filename>.pgpass</filename> file, the
+ connection attempt will fail. This option can be useful in
+ batch jobs and scripts where no user is present to enter a
+ password.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_dump</application> to prompt for a
+ password before connecting to a database.
+ </para>
+
+ <para>
+ This option is never essential, since
+ <application>pg_dump</application> will automatically prompt
+ for a password if the server demands password authentication.
+ However, <application>pg_dump</application> will waste a
+ connection attempt finding out that the server wants a password.
+ In some cases it is worth typing <option>-W</option> to avoid the extra
+ connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--role=<replaceable class="parameter">rolename</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies a role name to be used to create the dump.
+ This option causes <application>pg_dump</application> to issue a
+ <command>SET ROLE</command> <replaceable class="parameter">rolename</replaceable>
+ command after connecting to the database. It is useful when the
+ authenticated user (specified by <option>-U</option>) lacks privileges
+ needed by <application>pg_dump</application>, but can switch to a role with
+ the required rights. Some installations have a policy against
+ logging in directly as a superuser, and use of this option allows
+ dumps to be made without violating the policy.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </sect2>
+</sect1>
--
2.21.1 (Apple Git-122.3)
v3-0001-Moving-exit_nicely-and-fatal-into-fe_utils.patchapplication/octet-stream; name=v3-0001-Moving-exit_nicely-and-fatal-into-fe_utils.patch; x-unix-mode=0644Download
From 5c1d781b949b06b84dd1dc06bb8b644b279c4fbd Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 30 Dec 2020 12:50:41 -0800
Subject: [PATCH v3 1/9] Moving exit_nicely and fatal into fe_utils
In preparation for moving other pg_dump functionality into fe_utils,
moving the functions "on_exit_nicely" and "exit_nicely", and the
macro "fatal" from pg_dump into fe_utils.
Various frontend executables in src/bin, src/bin/scripts, and
contrib/ have logic for logging and exiting under error conditions.
The logging code itself is already under common/, but executables
differ in their calls to exit() vs. exit_nicely(), with
exit_nicely() not uniformly defined, and sometimes all of this
wrapped up under a macro named fatal(), the definition of that macro
also not uniformly defined. This makes it harder to move code out
of these executables into a shared library under fe_utils/.
Standardizing all executables to define these things the same way or
to use a single fe_utils/ library is beyond the scope of this patch,
but this patch should get the ball rolling in that direction.
---
src/bin/pg_dump/pg_backup_archiver.h | 1 +
src/bin/pg_dump/pg_backup_utils.c | 59 -----------------------
src/bin/pg_dump/pg_backup_utils.h | 8 ----
src/fe_utils/Makefile | 1 +
src/fe_utils/exit_utils.c | 71 ++++++++++++++++++++++++++++
src/include/fe_utils/exit_utils.h | 25 ++++++++++
6 files changed, 98 insertions(+), 67 deletions(-)
create mode 100644 src/fe_utils/exit_utils.c
create mode 100644 src/include/fe_utils/exit_utils.h
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index a8ea5c7eae..37d157b7ad 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -26,6 +26,7 @@
#include <time.h>
+#include "fe_utils/exit_utils.h"
#include "libpq-fe.h"
#include "pg_backup.h"
#include "pqexpbuffer.h"
diff --git a/src/bin/pg_dump/pg_backup_utils.c b/src/bin/pg_dump/pg_backup_utils.c
index c709a40e06..631e88f7db 100644
--- a/src/bin/pg_dump/pg_backup_utils.c
+++ b/src/bin/pg_dump/pg_backup_utils.c
@@ -19,16 +19,6 @@
/* Globals exported by this file */
const char *progname = NULL;
-#define MAX_ON_EXIT_NICELY 20
-
-static struct
-{
- on_exit_nicely_callback function;
- void *arg;
-} on_exit_nicely_list[MAX_ON_EXIT_NICELY];
-
-static int on_exit_nicely_index;
-
/*
* Parse a --section=foo command line argument.
*
@@ -57,52 +47,3 @@ set_dump_section(const char *arg, int *dumpSections)
exit_nicely(1);
}
}
-
-
-/* Register a callback to be run when exit_nicely is invoked. */
-void
-on_exit_nicely(on_exit_nicely_callback function, void *arg)
-{
- if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
- {
- pg_log_fatal("out of on_exit_nicely slots");
- exit_nicely(1);
- }
- on_exit_nicely_list[on_exit_nicely_index].function = function;
- on_exit_nicely_list[on_exit_nicely_index].arg = arg;
- on_exit_nicely_index++;
-}
-
-/*
- * Run accumulated on_exit_nicely callbacks in reverse order and then exit
- * without printing any message.
- *
- * If running in a parallel worker thread on Windows, we only exit the thread,
- * not the whole process.
- *
- * Note that in parallel operation on Windows, the callback(s) will be run
- * by each thread since the list state is necessarily shared by all threads;
- * each callback must contain logic to ensure it does only what's appropriate
- * for its thread. On Unix, callbacks are also run by each process, but only
- * for callbacks established before we fork off the child processes. (It'd
- * be cleaner to reset the list after fork(), and let each child establish
- * its own callbacks; but then the behavior would be completely inconsistent
- * between Windows and Unix. For now, just be sure to establish callbacks
- * before forking to avoid inconsistency.)
- */
-void
-exit_nicely(int code)
-{
- int i;
-
- for (i = on_exit_nicely_index - 1; i >= 0; i--)
- on_exit_nicely_list[i].function(code,
- on_exit_nicely_list[i].arg);
-
-#ifdef WIN32
- if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
- _endthreadex(code);
-#endif
-
- exit(code);
-}
diff --git a/src/bin/pg_dump/pg_backup_utils.h b/src/bin/pg_dump/pg_backup_utils.h
index 306798f9ac..ee4409c274 100644
--- a/src/bin/pg_dump/pg_backup_utils.h
+++ b/src/bin/pg_dump/pg_backup_utils.h
@@ -15,22 +15,14 @@
#ifndef PG_BACKUP_UTILS_H
#define PG_BACKUP_UTILS_H
-#include "common/logging.h"
-
/* bits returned by set_dump_section */
#define DUMP_PRE_DATA 0x01
#define DUMP_DATA 0x02
#define DUMP_POST_DATA 0x04
#define DUMP_UNSECTIONED 0xff
-typedef void (*on_exit_nicely_callback) (int code, void *arg);
-
extern const char *progname;
extern void set_dump_section(const char *arg, int *dumpSections);
-extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
-extern void exit_nicely(int code) pg_attribute_noreturn();
-
-#define fatal(...) do { pg_log_error(__VA_ARGS__); exit_nicely(1); } while(0)
#endif /* PG_BACKUP_UTILS_H */
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 10d6838cf9..d6c328faf1 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -23,6 +23,7 @@ OBJS = \
archive.o \
cancel.o \
conditional.o \
+ exit_utils.o \
mbprint.o \
print.o \
psqlscan.o \
diff --git a/src/fe_utils/exit_utils.c b/src/fe_utils/exit_utils.c
new file mode 100644
index 0000000000..e61bd438fc
--- /dev/null
+++ b/src/fe_utils/exit_utils.c
@@ -0,0 +1,71 @@
+/*-------------------------------------------------------------------------
+ *
+ * Exiting with cleanup callback facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/exit_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "fe_utils/exit_utils.h"
+
+#define MAX_ON_EXIT_NICELY 20
+
+static struct
+{
+ on_exit_nicely_callback function;
+ void *arg;
+} on_exit_nicely_list[MAX_ON_EXIT_NICELY];
+
+static int on_exit_nicely_index;
+
+/* Register a callback to be run when exit_nicely is invoked. */
+void
+on_exit_nicely(on_exit_nicely_callback function, void *arg)
+{
+ if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
+ {
+ pg_log_fatal("out of on_exit_nicely slots");
+ exit_nicely(1);
+ }
+ on_exit_nicely_list[on_exit_nicely_index].function = function;
+ on_exit_nicely_list[on_exit_nicely_index].arg = arg;
+ on_exit_nicely_index++;
+}
+
+/*
+ * Run accumulated on_exit_nicely callbacks in reverse order and then exit
+ * without printing any message.
+ *
+ * If running in a parallel worker thread on Windows, we only exit the thread,
+ * not the whole process.
+ *
+ * Note that in parallel operation on Windows, the callback(s) will be run
+ * by each thread since the list state is necessarily shared by all threads;
+ * each callback must contain logic to ensure it does only what's appropriate
+ * for its thread. On Unix, callbacks are also run by each process, but only
+ * for callbacks established before we fork off the child processes. (It'd
+ * be cleaner to reset the list after fork(), and let each child establish
+ * its own callbacks; but then the behavior would be completely inconsistent
+ * between Windows and Unix. For now, just be sure to establish callbacks
+ * before forking to avoid inconsistency.)
+ */
+void
+exit_nicely(int code)
+{
+ int i;
+
+ for (i = on_exit_nicely_index - 1; i >= 0; i--)
+ on_exit_nicely_list[i].function(code,
+ on_exit_nicely_list[i].arg);
+
+#ifdef WIN32
+ if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
+ _endthreadex(code);
+#endif
+
+ exit(code);
+}
diff --git a/src/include/fe_utils/exit_utils.h b/src/include/fe_utils/exit_utils.h
new file mode 100644
index 0000000000..948d2fdb51
--- /dev/null
+++ b/src/include/fe_utils/exit_utils.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * Exiting with cleanup callback facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/exit_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXIT_UTILS_H
+#define EXIT_UTILS_H
+
+#include "postgres_fe.h"
+#include "common/logging.h"
+
+typedef void (*on_exit_nicely_callback) (int code, void *arg);
+
+extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
+extern void exit_nicely(int code) pg_attribute_noreturn();
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit_nicely(1); } while(0)
+
+#endif /* EXIT_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v3-0002-Refactoring-ExecuteSqlQuery-and-related-functions.patchapplication/octet-stream; name=v3-0002-Refactoring-ExecuteSqlQuery-and-related-functions.patch; x-unix-mode=0644Download
From 32e471ebf262c7e9d1edb9960907f1ec74321cef Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 4 Jan 2021 12:44:20 -0800
Subject: [PATCH v3 2/9] Refactoring ExecuteSqlQuery and related functions.
ExecuteSqlQuery, ExecuteSqlQueryForSingleRow, and
ExecuteSqlStatement in the pg_dump project were defined to take a
pointer to struct Archive, which is a struct unused outside pg_dump.
In preparation for moving these functions to fe_utils, refactoring
these functions to take a PGconn pointer. These functions also
embedded pg_dump assumptions about the correct error handling
behavior, specifically to do with logging error messages before
calling exit_nicely(). Refactoring the error handling logic into a
handler function. The full design of the handler is not yet
present, as it will be developed further after moving to fe_utils,
but the idea is that callers will ultimately be able to override the
error handling behavior by defining alternate handlers.
To minimize changes to pg_dump and friends, creating thin wrappers
around these functions that take an Archive pointer. It might be
marginally cleaner in the long run to refactor pg_dump.c to call
with a PGconn pointer in all relevant call sites, but that would
result in a nontrivially larger patch and more code churn, so not
doing that here. Another option might be to define the thin
wrappers as static inline functions, but that seems inconsistent
with the rest of the pg_dump project style, so not doing that
either. Should we?
---
src/bin/pg_dump/pg_backup_db.c | 144 ++++++++++++++-----
src/bin/pg_dump/pg_backup_db.h | 26 +++-
src/bin/pg_dump/pg_dump.c | 248 ++++++++++++++++-----------------
3 files changed, 253 insertions(+), 165 deletions(-)
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index 5ba43441f5..b55a968da2 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -61,7 +61,7 @@ _check_database_version(ArchiveHandle *AH)
*/
if (remoteversion >= 90000)
{
- res = ExecuteSqlQueryForSingleRow((Archive *) AH, "SELECT pg_catalog.pg_is_in_recovery()");
+ res = ExecuteSqlQueryForSingleRowAH((Archive *) AH, "SELECT pg_catalog.pg_is_in_recovery()");
AH->public.isStandby = (strcmp(PQgetvalue(res, 0, 0), "t") == 0);
PQclear(res);
@@ -198,8 +198,8 @@ ConnectDatabase(Archive *AHX,
}
/* Start strict; later phases may override this. */
- PQclear(ExecuteSqlQueryForSingleRow((Archive *) AH,
- ALWAYS_SECURE_SEARCH_PATH_SQL));
+ PQclear(ExecuteSqlQueryForSingleRowAH((Archive *) AH,
+ ALWAYS_SECURE_SEARCH_PATH_SQL));
if (password && password != AH->savedPassword)
free(password);
@@ -271,59 +271,129 @@ notice_processor(void *arg, const char *message)
pg_log_generic(PG_LOG_INFO, "%s", message);
}
-/* Like fatal(), but with a complaint about a particular query. */
-static void
-die_on_query_failure(ArchiveHandle *AH, const char *query)
+/*
+ * The exiting query result handler embeds the historical pg_dump behavior
+ * under query error conditions, including exiting nicely. The 'conn' object
+ * is unused here, but is included in the interface for alternate query result
+ * handler implementations.
+ *
+ * Whether the query was successful is determined by comparing the returned
+ * status code against the expected status code, and by comparing the number of
+ * tuples returned from the query against expected_ntups. Special negative
+ * values of expected_ntups can be used to require at least one row or to
+ * disables ntup checking.
+ *
+ * Exits on failure. On successful query completion, returns the 'res'
+ * argument as a notational convenience.
+ */
+PGresult *
+exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
{
- pg_log_error("query failed: %s",
- PQerrorMessage(AH->connection));
- fatal("query was: %s", query);
+ if (PQresultStatus(res) != expected_status)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", query);
+ PQfinish(conn);
+ exit_nicely(1);
+ }
+ if (expected_ntups == POSITIVE_NTUPS || expected_ntups >= 0)
+ {
+ int ntups = PQntuples(res);
+
+ if (expected_ntups == POSITIVE_NTUPS)
+ {
+ if (ntups == 0)
+ fatal("query returned no rows: %s", query);
+ }
+ else if (ntups != expected_ntups)
+ {
+ /*
+ * Preserve historical message behavior of spelling "one" as the
+ * expected row count.
+ */
+ if (expected_ntups == 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+ fatal(ngettext("query returned %d row instead of %d: %s",
+ "query returned %d rows instead of %d: %s",
+ ntups),
+ ntups, expected_ntups, query);
+ }
+ }
+ return res;
}
+/*
+ * Executes the given SQL query statement.
+ *
+ * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
+ */
void
-ExecuteSqlStatement(Archive *AHX, const char *query)
+ExecuteSqlStatement(PGconn *conn, const char *query)
{
- ArchiveHandle *AH = (ArchiveHandle *) AHX;
- PGresult *res;
+ PQclear(exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_COMMAND_OK,
+ ANY_NTUPS,
+ query));
+}
- res = PQexec(AH->connection, query);
- if (PQresultStatus(res) != PGRES_COMMAND_OK)
- die_on_query_failure(AH, query);
- PQclear(res);
+/*
+ * Executes the given SQL query.
+ *
+ * Invokes the exiting handler unless the given 'status' results.
+ *
+ * If successful, returns the query result.
+ */
+PGresult *
+ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ status,
+ ANY_NTUPS,
+ query);
}
+/*
+ * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
+ * requires that exactly one row be returned.
+ */
PGresult *
-ExecuteSqlQuery(Archive *AHX, const char *query, ExecStatusType status)
+ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_TUPLES_OK,
+ 1,
+ query);
+}
+
+void
+ExecuteSqlStatementAH(Archive *AHX, const char *query)
{
ArchiveHandle *AH = (ArchiveHandle *) AHX;
- PGresult *res;
- res = PQexec(AH->connection, query);
- if (PQresultStatus(res) != status)
- die_on_query_failure(AH, query);
- return res;
+ ExecuteSqlStatement(AH->connection, query);
}
-/*
- * Execute an SQL query and verify that we got exactly one row back.
- */
PGresult *
-ExecuteSqlQueryForSingleRow(Archive *fout, const char *query)
+ExecuteSqlQueryAH(Archive *AHX, const char *query, ExecStatusType status)
{
- PGresult *res;
- int ntups;
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
- res = ExecuteSqlQuery(fout, query, PGRES_TUPLES_OK);
+ return ExecuteSqlQuery(AH->connection, query, status);
+}
- /* Expecting a single result only */
- ntups = PQntuples(res);
- if (ntups != 1)
- fatal(ngettext("query returned %d row instead of one: %s",
- "query returned %d rows instead of one: %s",
- ntups),
- ntups, query);
+PGresult *
+ExecuteSqlQueryForSingleRowAH(Archive *AHX, const char *query)
+{
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
- return res;
+ return ExecuteSqlQueryForSingleRow(AH->connection, query);
}
/*
diff --git a/src/bin/pg_dump/pg_backup_db.h b/src/bin/pg_dump/pg_backup_db.h
index 8888dd34b9..1aac600ece 100644
--- a/src/bin/pg_dump/pg_backup_db.h
+++ b/src/bin/pg_dump/pg_backup_db.h
@@ -13,10 +13,28 @@
extern int ExecuteSqlCommandBuf(Archive *AHX, const char *buf, size_t bufLen);
-extern void ExecuteSqlStatement(Archive *AHX, const char *query);
-extern PGresult *ExecuteSqlQuery(Archive *AHX, const char *query,
- ExecStatusType status);
-extern PGresult *ExecuteSqlQueryForSingleRow(Archive *fout, const char *query);
+#define POSITIVE_NTUPS (-1)
+#define ANY_NTUPS (-2)
+typedef PGresult *(*PGresultHandler) (PGresult *res,
+ PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups,
+ const char *query);
+
+extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+
+extern void ExecuteSqlStatement(PGconn *conn, const char *query);
+extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
+ ExecStatusType expected_status);
+extern PGresult *ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query);
+
+extern void ExecuteSqlStatementAH(Archive *AHX, const char *query);
+extern PGresult *ExecuteSqlQueryAH(Archive *AHX, const char *query,
+ ExecStatusType status);
+extern PGresult *ExecuteSqlQueryForSingleRowAH(Archive *fout,
+ const char *query);
extern void EndDBCopyMode(Archive *AHX, const char *tocEntryTag);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 1f70653c02..e8985a834f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1084,7 +1084,7 @@ setup_connection(Archive *AH, const char *dumpencoding,
PGconn *conn = GetConnection(AH);
const char *std_strings;
- PQclear(ExecuteSqlQueryForSingleRow(AH, ALWAYS_SECURE_SEARCH_PATH_SQL));
+ PQclear(ExecuteSqlQueryForSingleRowAH(AH, ALWAYS_SECURE_SEARCH_PATH_SQL));
/*
* Set the client encoding if requested.
@@ -1119,7 +1119,7 @@ setup_connection(Archive *AH, const char *dumpencoding,
PQExpBuffer query = createPQExpBuffer();
appendPQExpBuffer(query, "SET ROLE %s", fmtId(use_role));
- ExecuteSqlStatement(AH, query->data);
+ ExecuteSqlStatementAH(AH, query->data);
destroyPQExpBuffer(query);
/* save it for possible later use by parallel workers */
@@ -1128,11 +1128,11 @@ setup_connection(Archive *AH, const char *dumpencoding,
}
/* Set the datestyle to ISO to ensure the dump's portability */
- ExecuteSqlStatement(AH, "SET DATESTYLE = ISO");
+ ExecuteSqlStatementAH(AH, "SET DATESTYLE = ISO");
/* Likewise, avoid using sql_standard intervalstyle */
if (AH->remoteVersion >= 80400)
- ExecuteSqlStatement(AH, "SET INTERVALSTYLE = POSTGRES");
+ ExecuteSqlStatementAH(AH, "SET INTERVALSTYLE = POSTGRES");
/*
* Use an explicitly specified extra_float_digits if it has been provided.
@@ -1145,35 +1145,35 @@ setup_connection(Archive *AH, const char *dumpencoding,
appendPQExpBuffer(q, "SET extra_float_digits TO %d",
extra_float_digits);
- ExecuteSqlStatement(AH, q->data);
+ ExecuteSqlStatementAH(AH, q->data);
destroyPQExpBuffer(q);
}
else if (AH->remoteVersion >= 90000)
- ExecuteSqlStatement(AH, "SET extra_float_digits TO 3");
+ ExecuteSqlStatementAH(AH, "SET extra_float_digits TO 3");
else
- ExecuteSqlStatement(AH, "SET extra_float_digits TO 2");
+ ExecuteSqlStatementAH(AH, "SET extra_float_digits TO 2");
/*
* If synchronized scanning is supported, disable it, to prevent
* unpredictable changes in row ordering across a dump and reload.
*/
if (AH->remoteVersion >= 80300)
- ExecuteSqlStatement(AH, "SET synchronize_seqscans TO off");
+ ExecuteSqlStatementAH(AH, "SET synchronize_seqscans TO off");
/*
* Disable timeouts if supported.
*/
- ExecuteSqlStatement(AH, "SET statement_timeout = 0");
+ ExecuteSqlStatementAH(AH, "SET statement_timeout = 0");
if (AH->remoteVersion >= 90300)
- ExecuteSqlStatement(AH, "SET lock_timeout = 0");
+ ExecuteSqlStatementAH(AH, "SET lock_timeout = 0");
if (AH->remoteVersion >= 90600)
- ExecuteSqlStatement(AH, "SET idle_in_transaction_session_timeout = 0");
+ ExecuteSqlStatementAH(AH, "SET idle_in_transaction_session_timeout = 0");
/*
* Quote all identifiers, if requested.
*/
if (quote_all_identifiers && AH->remoteVersion >= 90100)
- ExecuteSqlStatement(AH, "SET quote_all_identifiers = true");
+ ExecuteSqlStatementAH(AH, "SET quote_all_identifiers = true");
/*
* Adjust row-security mode, if supported.
@@ -1181,15 +1181,15 @@ setup_connection(Archive *AH, const char *dumpencoding,
if (AH->remoteVersion >= 90500)
{
if (dopt->enable_row_security)
- ExecuteSqlStatement(AH, "SET row_security = on");
+ ExecuteSqlStatementAH(AH, "SET row_security = on");
else
- ExecuteSqlStatement(AH, "SET row_security = off");
+ ExecuteSqlStatementAH(AH, "SET row_security = off");
}
/*
* Start transaction-snapshot mode transaction to dump consistent data.
*/
- ExecuteSqlStatement(AH, "BEGIN");
+ ExecuteSqlStatementAH(AH, "BEGIN");
if (AH->remoteVersion >= 90100)
{
/*
@@ -1201,17 +1201,17 @@ setup_connection(Archive *AH, const char *dumpencoding,
* guarantees. This is a kluge, but safe for back-patching.
*/
if (dopt->serializable_deferrable && AH->sync_snapshot_id == NULL)
- ExecuteSqlStatement(AH,
+ ExecuteSqlStatementAH(AH,
"SET TRANSACTION ISOLATION LEVEL "
"SERIALIZABLE, READ ONLY, DEFERRABLE");
else
- ExecuteSqlStatement(AH,
+ ExecuteSqlStatementAH(AH,
"SET TRANSACTION ISOLATION LEVEL "
"REPEATABLE READ, READ ONLY");
}
else
{
- ExecuteSqlStatement(AH,
+ ExecuteSqlStatementAH(AH,
"SET TRANSACTION ISOLATION LEVEL "
"SERIALIZABLE, READ ONLY");
}
@@ -1230,7 +1230,7 @@ setup_connection(Archive *AH, const char *dumpencoding,
appendPQExpBufferStr(query, "SET TRANSACTION SNAPSHOT ");
appendStringLiteralConn(query, AH->sync_snapshot_id, conn);
- ExecuteSqlStatement(AH, query->data);
+ ExecuteSqlStatementAH(AH, query->data);
destroyPQExpBuffer(query);
}
else if (AH->numWorkers > 1 &&
@@ -1270,7 +1270,7 @@ get_synchronized_snapshot(Archive *fout)
char *result;
PGresult *res;
- res = ExecuteSqlQueryForSingleRow(fout, query);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query);
result = pg_strdup(PQgetvalue(res, 0, 0));
PQclear(res);
@@ -1343,7 +1343,7 @@ expand_schema_name_patterns(Archive *fout,
processSQLNamePattern(GetConnection(fout), query, cell->val, false,
false, NULL, "n.nspname", NULL, NULL);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (strict_names && PQntuples(res) == 0)
fatal("no matching schemas were found for pattern \"%s\"", cell->val);
@@ -1390,7 +1390,7 @@ expand_foreign_server_name_patterns(Archive *fout,
processSQLNamePattern(GetConnection(fout), query, cell->val, false,
false, NULL, "s.srvname", NULL, NULL);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) == 0)
fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
@@ -1450,9 +1450,9 @@ expand_table_name_patterns(Archive *fout,
false, "n.nspname", "c.relname", NULL,
"pg_catalog.pg_table_is_visible(c.oid)");
- ExecuteSqlStatement(fout, "RESET search_path");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
- PQclear(ExecuteSqlQueryForSingleRow(fout,
+ ExecuteSqlStatementAH(fout, "RESET search_path");
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRowAH(fout,
ALWAYS_SECURE_SEARCH_PATH_SQL));
if (strict_names && PQntuples(res) == 0)
fatal("no matching tables were found for pattern \"%s\"", cell->val);
@@ -1907,7 +1907,7 @@ dumpTableData_copy(Archive *fout, void *dcontext)
fmtQualifiedDumpable(tbinfo),
column_list);
}
- res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_COPY_OUT);
PQclear(res);
destroyPQExpBuffer(clistBuf);
@@ -2028,11 +2028,11 @@ dumpTableData_insert(Archive *fout, void *dcontext)
if (tdinfo->filtercond)
appendPQExpBuffer(q, " %s", tdinfo->filtercond);
- ExecuteSqlStatement(fout, q->data);
+ ExecuteSqlStatementAH(fout, q->data);
while (1)
{
- res = ExecuteSqlQuery(fout, "FETCH 100 FROM _pg_dump_cursor",
+ res = ExecuteSqlQueryAH(fout, "FETCH 100 FROM _pg_dump_cursor",
PGRES_TUPLES_OK);
nfields = PQnfields(res);
@@ -2220,7 +2220,7 @@ dumpTableData_insert(Archive *fout, void *dcontext)
archputs("\n\n", fout);
- ExecuteSqlStatement(fout, "CLOSE _pg_dump_cursor");
+ ExecuteSqlStatementAH(fout, "CLOSE _pg_dump_cursor");
destroyPQExpBuffer(q);
if (insertStmt != NULL)
@@ -2520,7 +2520,7 @@ buildMatViewRefreshDependencies(Archive *fout)
"FROM w "
"WHERE refrelkind = " CppAsString2(RELKIND_MATVIEW));
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -2847,7 +2847,7 @@ dumpDatabase(Archive *fout)
username_subquery);
}
- res = ExecuteSqlQueryForSingleRow(fout, dbQry->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, dbQry->data);
i_tableoid = PQfnumber(res, "tableoid");
i_oid = PQfnumber(res, "oid");
@@ -2992,7 +2992,7 @@ dumpDatabase(Archive *fout)
seclabelQry = createPQExpBuffer();
buildShSecLabelQuery("pg_database", dbCatId.oid, seclabelQry);
- shres = ExecuteSqlQuery(fout, seclabelQry->data, PGRES_TUPLES_OK);
+ shres = ExecuteSqlQueryAH(fout, seclabelQry->data, PGRES_TUPLES_OK);
resetPQExpBuffer(seclabelQry);
emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
if (seclabelQry->len > 0)
@@ -3103,7 +3103,7 @@ dumpDatabase(Archive *fout)
"WHERE oid = %u;\n",
LargeObjectRelationId);
- lo_res = ExecuteSqlQueryForSingleRow(fout, loFrozenQry->data);
+ lo_res = ExecuteSqlQueryForSingleRowAH(fout, loFrozenQry->data);
i_relfrozenxid = PQfnumber(lo_res, "relfrozenxid");
i_relminmxid = PQfnumber(lo_res, "relminmxid");
@@ -3162,7 +3162,7 @@ dumpDatabaseConfig(Archive *AH, PQExpBuffer outbuf,
else
printfPQExpBuffer(buf, "SELECT datconfig[%d] FROM pg_database WHERE oid = '%u'::oid", count, dboid);
- res = ExecuteSqlQuery(AH, buf->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(AH, buf->data, PGRES_TUPLES_OK);
if (PQntuples(res) == 1 &&
!PQgetisnull(res, 0, 0))
@@ -3189,7 +3189,7 @@ dumpDatabaseConfig(Archive *AH, PQExpBuffer outbuf,
"WHERE setrole = r.oid AND setdatabase = '%u'::oid",
dboid);
- res = ExecuteSqlQuery(AH, buf->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(AH, buf->data, PGRES_TUPLES_OK);
if (PQntuples(res) > 0)
{
@@ -3277,7 +3277,7 @@ dumpSearchPath(Archive *AH)
* listing schemas that may appear in search_path but not actually exist,
* which seems like a prudent exclusion.
*/
- res = ExecuteSqlQueryForSingleRow(AH,
+ res = ExecuteSqlQueryForSingleRowAH(AH,
"SELECT pg_catalog.current_schemas(false)");
if (!parsePGArray(PQgetvalue(res, 0, 0), &schemanames, &nschemanames))
@@ -3391,7 +3391,7 @@ getBlobs(Archive *fout)
"NULL::oid AS initrlomacl "
" FROM pg_largeobject");
- res = ExecuteSqlQuery(fout, blobQry->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, blobQry->data, PGRES_TUPLES_OK);
i_oid = PQfnumber(res, "oid");
i_lomowner = PQfnumber(res, "rolname");
@@ -3537,7 +3537,7 @@ dumpBlobs(Archive *fout, void *arg)
"DECLARE bloboid CURSOR FOR "
"SELECT DISTINCT loid FROM pg_largeobject ORDER BY 1";
- ExecuteSqlStatement(fout, blobQry);
+ ExecuteSqlStatementAH(fout, blobQry);
/* Command to fetch from cursor */
blobFetchQry = "FETCH 1000 IN bloboid";
@@ -3545,7 +3545,7 @@ dumpBlobs(Archive *fout, void *arg)
do
{
/* Do a fetch */
- res = ExecuteSqlQuery(fout, blobFetchQry, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, blobFetchQry, PGRES_TUPLES_OK);
/* Process the tuples, if any */
ntups = PQntuples(res);
@@ -3678,7 +3678,7 @@ getPolicies(Archive *fout, TableInfo tblinfo[], int numTables)
"FROM pg_catalog.pg_policy pol "
"WHERE polrelid = '%u'",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -3914,7 +3914,7 @@ getPublications(Archive *fout)
"FROM pg_publication p",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4112,7 +4112,7 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables)
"WHERE pr.prrelid = '%u'"
" AND p.oid = pr.prpubid",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4237,7 +4237,7 @@ getSubscriptions(Archive *fout)
{
int n;
- res = ExecuteSqlQuery(fout,
+ res = ExecuteSqlQueryAH(fout,
"SELECT count(*) FROM pg_subscription "
"WHERE subdbid = (SELECT oid FROM pg_database"
" WHERE datname = current_database())",
@@ -4274,7 +4274,7 @@ getSubscriptions(Archive *fout)
"WHERE s.subdbid = (SELECT oid FROM pg_database\n"
" WHERE datname = current_database())");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4446,7 +4446,7 @@ append_depends_on_extension(Archive *fout,
"AND refclassid = 'pg_catalog.pg_extension'::pg_catalog.regclass",
catalog,
dobj->catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_extname = PQfnumber(res, "extname");
for (i = 0; i < ntups; i++)
@@ -4485,7 +4485,7 @@ get_next_possible_free_pg_type_oid(Archive *fout, PQExpBuffer upgrade_query)
"FROM pg_catalog.pg_type "
"WHERE oid = '%u'::pg_catalog.oid);",
next_possible_free_oid);
- res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
is_dup = (PQgetvalue(res, 0, 0)[0] == 't');
PQclear(res);
} while (is_dup);
@@ -4518,7 +4518,7 @@ binary_upgrade_set_type_oids_by_type_oid(Archive *fout,
"WHERE oid = '%u'::pg_catalog.oid;",
pg_type_oid);
- res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_type_array_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typarray")));
@@ -4551,7 +4551,7 @@ binary_upgrade_set_type_oids_by_type_oid(Archive *fout,
"WHERE r.rngtypid = '%u'::pg_catalog.oid;",
pg_type_oid);
- res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_type_multirange_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "oid")));
pg_type_multirange_array_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typarray")));
@@ -4594,7 +4594,7 @@ binary_upgrade_set_type_oids_by_rel_oid(Archive *fout,
"WHERE c.oid = '%u'::pg_catalog.oid;",
pg_rel_oid);
- upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ upgrade_res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_type_oid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "crel")));
@@ -4645,7 +4645,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"WHERE c.oid = '%u'::pg_catalog.oid;",
pg_class_oid);
- upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ upgrade_res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0,
PQfnumber(upgrade_res, "reltoastrelid")));
@@ -4803,7 +4803,7 @@ getNamespaces(Archive *fout, int *numNamespaces)
"FROM pg_namespace",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4916,7 +4916,7 @@ getExtensions(Archive *fout, int *numExtensions)
"FROM pg_extension x "
"JOIN pg_namespace n ON n.oid = x.extnamespace");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -5093,7 +5093,7 @@ getTypes(Archive *fout, int *numTypes)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -5249,7 +5249,7 @@ getOperators(Archive *fout, int *numOprs)
"FROM pg_operator",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numOprs = ntups;
@@ -5336,7 +5336,7 @@ getCollations(Archive *fout, int *numCollations)
"FROM pg_collation",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numCollations = ntups;
@@ -5408,7 +5408,7 @@ getConversions(Archive *fout, int *numConversions)
"FROM pg_conversion",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numConversions = ntups;
@@ -5481,7 +5481,7 @@ getAccessMethods(Archive *fout, int *numAccessMethods)
"amhandler::pg_catalog.regproc AS amhandler "
"FROM pg_am");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numAccessMethods = ntups;
@@ -5552,7 +5552,7 @@ getOpclasses(Archive *fout, int *numOpclasses)
"FROM pg_opclass",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numOpclasses = ntups;
@@ -5635,7 +5635,7 @@ getOpfamilies(Archive *fout, int *numOpfamilies)
"FROM pg_opfamily",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numOpfamilies = ntups;
@@ -5804,7 +5804,7 @@ getAggregates(Archive *fout, int *numAggs)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numAggs = ntups;
@@ -6035,7 +6035,7 @@ getFuncs(Archive *fout, int *numFuncs)
appendPQExpBufferChar(query, ')');
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -6721,7 +6721,7 @@ getTables(Archive *fout, int *numTables)
RELKIND_VIEW, RELKIND_COMPOSITE_TYPE);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -6791,7 +6791,7 @@ getTables(Archive *fout, int *numTables)
resetPQExpBuffer(query);
appendPQExpBufferStr(query, "SET statement_timeout = ");
appendStringLiteralConn(query, dopt->lockWaitTimeout, GetConnection(fout));
- ExecuteSqlStatement(fout, query->data);
+ ExecuteSqlStatementAH(fout, query->data);
}
for (i = 0; i < ntups; i++)
@@ -6915,7 +6915,7 @@ getTables(Archive *fout, int *numTables)
appendPQExpBuffer(query,
"LOCK TABLE %s IN ACCESS SHARE MODE",
fmtQualifiedDumpable(&tblinfo[i]));
- ExecuteSqlStatement(fout, query->data);
+ ExecuteSqlStatementAH(fout, query->data);
}
/* Emit notice if join for owner failed */
@@ -6926,7 +6926,7 @@ getTables(Archive *fout, int *numTables)
if (dopt->lockWaitTimeout)
{
- ExecuteSqlStatement(fout, "SET statement_timeout = 0");
+ ExecuteSqlStatementAH(fout, "SET statement_timeout = 0");
}
PQclear(res);
@@ -7021,7 +7021,7 @@ getInherits(Archive *fout, int *numInherits)
/* find all the inheritance information */
appendPQExpBufferStr(query, "SELECT inhrelid, inhparent FROM pg_inherits");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7365,7 +7365,7 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7512,7 +7512,7 @@ getExtendedStatistics(Archive *fout)
"FROM pg_catalog.pg_statistic_ext",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7610,7 +7610,7 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables)
"WHERE conrelid = '%u'::pg_catalog.oid "
"AND contype = 'f'",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7746,7 +7746,7 @@ getDomainConstraints(Archive *fout, TypeInfo *tyinfo)
"ORDER BY conname",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7839,7 +7839,7 @@ getRules(Archive *fout, int *numRules)
"ORDER BY oid");
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8013,7 +8013,7 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int numTables)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8148,7 +8148,7 @@ getEventTriggers(Archive *fout, int *numEventTriggers)
"ORDER BY e.oid",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8318,7 +8318,7 @@ getProcLangs(Archive *fout, int *numProcLangs)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8427,7 +8427,7 @@ getCasts(Archive *fout, int *numCasts)
"FROM pg_cast ORDER BY 3,4");
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8495,7 +8495,7 @@ get_language_name(Archive *fout, Oid langid)
query = createPQExpBuffer();
appendPQExpBuffer(query, "SELECT lanname FROM pg_language WHERE oid = %u", langid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
lanname = pg_strdup(fmtId(PQgetvalue(res, 0, 0)));
destroyPQExpBuffer(query);
PQclear(res);
@@ -8538,7 +8538,7 @@ getTransforms(Archive *fout, int *numTransforms)
"FROM pg_transform "
"ORDER BY 3,4");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8720,7 +8720,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
"ORDER BY a.attnum",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8797,7 +8797,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
"WHERE adrelid = '%u'::pg_catalog.oid",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_TUPLES_OK);
numDefaults = PQntuples(res);
attrdefs = (AttrDefInfo *) pg_malloc(numDefaults * sizeof(AttrDefInfo));
@@ -8919,7 +8919,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_TUPLES_OK);
numConstrs = PQntuples(res);
if (numConstrs != tbinfo->ncheck)
@@ -9062,7 +9062,7 @@ getTSParsers(Archive *fout, int *numTSParsers)
"prsend::oid, prsheadline::oid, prslextype::oid "
"FROM pg_ts_parser");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSParsers = ntups;
@@ -9146,7 +9146,7 @@ getTSDictionaries(Archive *fout, int *numTSDicts)
"FROM pg_ts_dict",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSDicts = ntups;
@@ -9226,7 +9226,7 @@ getTSTemplates(Archive *fout, int *numTSTemplates)
"tmplnamespace, tmplinit::oid, tmpllexize::oid "
"FROM pg_ts_template");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSTemplates = ntups;
@@ -9302,7 +9302,7 @@ getTSConfigurations(Archive *fout, int *numTSConfigs)
"FROM pg_ts_config",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSConfigs = ntups;
@@ -9455,7 +9455,7 @@ getForeignDataWrappers(Archive *fout, int *numForeignDataWrappers)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numForeignDataWrappers = ntups;
@@ -9603,7 +9603,7 @@ getForeignServers(Archive *fout, int *numForeignServers)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numForeignServers = ntups;
@@ -9742,7 +9742,7 @@ getDefaultACLs(Archive *fout, int *numDefaultACLs)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numDefaultACLs = ntups;
@@ -10098,7 +10098,7 @@ collectComments(Archive *fout, CommentItem **items)
"FROM pg_catalog.pg_description "
"ORDER BY classoid, objoid, objsubid");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
/* Construct lookup table containing OIDs in numeric form */
@@ -10546,7 +10546,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
"ORDER BY oid",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
num = PQntuples(res);
@@ -10684,7 +10684,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
"rngtypid = '%u'",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
qtypname = pg_strdup(fmtId(tyinfo->dobj.name));
qualtypname = pg_strdup(fmtQualifiedDumpable(tyinfo));
@@ -10942,7 +10942,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
"WHERE oid = '%u'::pg_catalog.oid",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
typlen = PQgetvalue(res, 0, PQfnumber(res, "typlen"));
typinput = PQgetvalue(res, 0, PQfnumber(res, "typinput"));
@@ -11165,7 +11165,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.catId.oid);
}
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
typnotnull = PQgetvalue(res, 0, PQfnumber(res, "typnotnull"));
typdefn = PQgetvalue(res, 0, PQfnumber(res, "typdefn"));
@@ -11357,7 +11357,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -11536,7 +11536,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
tyinfo->typrelid);
/* Fetch column attnames */
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
if (ntups < 1)
@@ -12064,7 +12064,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
"WHERE oid = '%u'::pg_catalog.oid",
finfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
proretset = PQgetvalue(res, 0, PQfnumber(res, "proretset"));
prosrc = PQgetvalue(res, 0, PQfnumber(res, "prosrc"));
@@ -12754,7 +12754,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
oprinfo->dobj.catId.oid);
}
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_oprkind = PQfnumber(res, "oprkind");
i_oprcode = PQfnumber(res, "oprcode");
@@ -12977,7 +12977,7 @@ convertTSFunction(Archive *fout, Oid funcOid)
snprintf(query, sizeof(query),
"SELECT '%u'::pg_catalog.regproc", funcOid);
- res = ExecuteSqlQueryForSingleRow(fout, query);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query);
result = pg_strdup(PQgetvalue(res, 0, 0));
@@ -13140,7 +13140,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.catId.oid);
}
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_opcintype = PQfnumber(res, "opcintype");
i_opckeytype = PQfnumber(res, "opckeytype");
@@ -13268,7 +13268,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -13347,7 +13347,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -13530,7 +13530,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
opfinfo->dobj.catId.oid);
}
- res_ops = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res_ops = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
resetPQExpBuffer(query);
@@ -13546,7 +13546,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
"ORDER BY amprocnum",
opfinfo->dobj.catId.oid);
- res_procs = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res_procs = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
/* Get additional fields from the pg_opfamily row */
resetPQExpBuffer(query);
@@ -13557,7 +13557,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
"WHERE oid = '%u'::pg_catalog.oid",
opfinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_amname = PQfnumber(res, "amname");
@@ -13744,7 +13744,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
"WHERE c.oid = '%u'::pg_catalog.oid",
collinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_collprovider = PQfnumber(res, "collprovider");
i_collisdeterministic = PQfnumber(res, "collisdeterministic");
@@ -13861,7 +13861,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
"WHERE c.oid = '%u'::pg_catalog.oid",
convinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_conforencoding = PQfnumber(res, "conforencoding");
i_contoencoding = PQfnumber(res, "contoencoding");
@@ -14078,7 +14078,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
"AND p.oid = '%u'::pg_catalog.oid",
agginfo->aggfn.dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_agginitval = PQfnumber(res, "agginitval");
i_aggminitval = PQfnumber(res, "aggminitval");
@@ -14413,7 +14413,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
"FROM pg_ts_template p, pg_namespace n "
"WHERE p.oid = '%u' AND n.oid = tmplnamespace",
dictinfo->dicttemplate);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
nspname = PQgetvalue(res, 0, 0);
tmplname = PQgetvalue(res, 0, 1);
@@ -14555,7 +14555,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
"FROM pg_ts_parser p, pg_namespace n "
"WHERE p.oid = '%u' AND n.oid = prsnamespace",
cfginfo->cfgparser);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
nspname = PQgetvalue(res, 0, 0);
prsname = PQgetvalue(res, 0, 1);
@@ -14578,7 +14578,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
"ORDER BY m.mapcfg, m.maptokentype, m.mapseqno",
cfginfo->cfgparser, cfginfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_tokenname = PQfnumber(res, "tokenname");
@@ -14742,7 +14742,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
"FROM pg_foreign_data_wrapper w "
"WHERE w.oid = '%u'",
srvinfo->srvfdw);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
fdwname = PQgetvalue(res, 0, 0);
appendPQExpBuffer(q, "CREATE SERVER %s", qsrvname);
@@ -14858,7 +14858,7 @@ dumpUserMappings(Archive *fout,
"ORDER BY usename",
catalogId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_usename = PQfnumber(res, "usename");
@@ -15379,7 +15379,7 @@ collectSecLabels(Archive *fout, SecLabelItem **items)
"FROM pg_catalog.pg_seclabel "
"ORDER BY classoid, objoid, objsubid");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
/* Construct lookup table containing OIDs in numeric form */
i_label = PQfnumber(res, "label");
@@ -15515,7 +15515,7 @@ dumpTable(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
for (i = 0; i < PQntuples(res); i++)
{
@@ -15565,7 +15565,7 @@ createViewAsClause(Archive *fout, TableInfo *tbinfo)
"SELECT pg_catalog.pg_get_viewdef('%u'::pg_catalog.oid) AS viewdef",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -15740,7 +15740,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
"ON (fs.oid = ft.ftserver) "
"WHERE ft.ftrelid = '%u'",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_srvname = PQfnumber(res, "srvname");
i_ftoptions = PQfnumber(res, "ftoptions");
srvname = pg_strdup(PQgetvalue(res, 0, i_srvname));
@@ -16693,7 +16693,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
"pg_catalog.pg_get_statisticsobjdef('%u'::pg_catalog.oid)",
statsextinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
stxdef = PQgetvalue(res, 0, 0);
@@ -17061,7 +17061,7 @@ findLastBuiltinOid_V71(Archive *fout)
PGresult *res;
Oid last_oid;
- res = ExecuteSqlQueryForSingleRow(fout,
+ res = ExecuteSqlQueryForSingleRowAH(fout,
"SELECT datlastsysoid FROM pg_database WHERE datname = current_database()");
last_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "datlastsysoid")));
PQclear(res);
@@ -17130,7 +17130,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
fmtQualifiedDumpable(tbinfo));
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -17353,7 +17353,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
"SELECT last_value, is_called FROM %s",
fmtQualifiedDumpable(tbinfo));
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -17761,7 +17761,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
"SELECT pg_catalog.pg_get_ruledef('%u'::pg_catalog.oid)",
rinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -17891,7 +17891,7 @@ getExtensionMembership(Archive *fout, ExtensionInfo extinfo[],
"AND deptype = 'e' "
"ORDER BY 3");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -18090,7 +18090,7 @@ processExtensionTables(Archive *fout, ExtensionInfo extinfo[],
"AND refclassid = 'pg_extension'::regclass "
"AND classid = 'pg_class'::regclass;");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_conrelid = PQfnumber(res, "conrelid");
@@ -18196,7 +18196,7 @@ getDependencies(Archive *fout)
/* Sort the output for efficiency below */
appendPQExpBufferStr(query, "ORDER BY 1,2");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -18549,7 +18549,7 @@ getFormattedTypeName(Archive *fout, Oid oid, OidOptions opts)
appendPQExpBuffer(query, "SELECT pg_catalog.format_type('%u'::pg_catalog.oid, NULL)",
oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
/* result of format_type is already quoted */
result = pg_strdup(PQgetvalue(res, 0, 0));
--
2.21.1 (Apple Git-122.3)
v3-0003-Creating-query_utils-frontend-utility.patchapplication/octet-stream; name=v3-0003-Creating-query_utils-frontend-utility.patch; x-unix-mode=0644Download
From 8fadc245dff6f71a420c1a13b9648b000600065f Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:01:02 -0800
Subject: [PATCH v3 3/9] Creating query_utils frontend utility
Moving the ExecuteSqlQuery, ExecuteSqlQueryForSingleRow, and
ExecuteSqlStatement out of the pg_dump project into a new shared
location.
---
src/bin/pg_dump/pg_backup_db.c | 102 +-------------------------
src/bin/pg_dump/pg_backup_db.h | 17 -----
src/fe_utils/Makefile | 1 +
src/fe_utils/query_utils.c | 114 +++++++++++++++++++++++++++++
src/include/fe_utils/query_utils.h | 34 +++++++++
5 files changed, 150 insertions(+), 118 deletions(-)
create mode 100644 src/fe_utils/query_utils.c
create mode 100644 src/include/fe_utils/query_utils.h
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index b55a968da2..38402d0831 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -20,6 +20,7 @@
#include "common/connect.h"
#include "common/string.h"
#include "dumputils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "parallel.h"
#include "pg_backup_archiver.h"
@@ -271,107 +272,6 @@ notice_processor(void *arg, const char *message)
pg_log_generic(PG_LOG_INFO, "%s", message);
}
-/*
- * The exiting query result handler embeds the historical pg_dump behavior
- * under query error conditions, including exiting nicely. The 'conn' object
- * is unused here, but is included in the interface for alternate query result
- * handler implementations.
- *
- * Whether the query was successful is determined by comparing the returned
- * status code against the expected status code, and by comparing the number of
- * tuples returned from the query against expected_ntups. Special negative
- * values of expected_ntups can be used to require at least one row or to
- * disables ntup checking.
- *
- * Exits on failure. On successful query completion, returns the 'res'
- * argument as a notational convenience.
- */
-PGresult *
-exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
- int expected_ntups, const char *query)
-{
- if (PQresultStatus(res) != expected_status)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_error("query was: %s", query);
- PQfinish(conn);
- exit_nicely(1);
- }
- if (expected_ntups == POSITIVE_NTUPS || expected_ntups >= 0)
- {
- int ntups = PQntuples(res);
-
- if (expected_ntups == POSITIVE_NTUPS)
- {
- if (ntups == 0)
- fatal("query returned no rows: %s", query);
- }
- else if (ntups != expected_ntups)
- {
- /*
- * Preserve historical message behavior of spelling "one" as the
- * expected row count.
- */
- if (expected_ntups == 1)
- fatal(ngettext("query returned %d row instead of one: %s",
- "query returned %d rows instead of one: %s",
- ntups),
- ntups, query);
- fatal(ngettext("query returned %d row instead of %d: %s",
- "query returned %d rows instead of %d: %s",
- ntups),
- ntups, expected_ntups, query);
- }
- }
- return res;
-}
-
-/*
- * Executes the given SQL query statement.
- *
- * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
- */
-void
-ExecuteSqlStatement(PGconn *conn, const char *query)
-{
- PQclear(exiting_handler(PQexec(conn, query),
- conn,
- PGRES_COMMAND_OK,
- ANY_NTUPS,
- query));
-}
-
-/*
- * Executes the given SQL query.
- *
- * Invokes the exiting handler unless the given 'status' results.
- *
- * If successful, returns the query result.
- */
-PGresult *
-ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
-{
- return exiting_handler(PQexec(conn, query),
- conn,
- status,
- ANY_NTUPS,
- query);
-}
-
-/*
- * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
- * requires that exactly one row be returned.
- */
-PGresult *
-ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
-{
- return exiting_handler(PQexec(conn, query),
- conn,
- PGRES_TUPLES_OK,
- 1,
- query);
-}
-
void
ExecuteSqlStatementAH(Archive *AHX, const char *query)
{
diff --git a/src/bin/pg_dump/pg_backup_db.h b/src/bin/pg_dump/pg_backup_db.h
index 1aac600ece..018a28908e 100644
--- a/src/bin/pg_dump/pg_backup_db.h
+++ b/src/bin/pg_dump/pg_backup_db.h
@@ -13,23 +13,6 @@
extern int ExecuteSqlCommandBuf(Archive *AHX, const char *buf, size_t bufLen);
-#define POSITIVE_NTUPS (-1)
-#define ANY_NTUPS (-2)
-typedef PGresult *(*PGresultHandler) (PGresult *res,
- PGconn *conn,
- ExecStatusType expected_status,
- int expected_ntups,
- const char *query);
-
-extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
- ExecStatusType expected_status,
- int expected_ntups, const char *query);
-
-extern void ExecuteSqlStatement(PGconn *conn, const char *query);
-extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
- ExecStatusType expected_status);
-extern PGresult *ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query);
-
extern void ExecuteSqlStatementAH(Archive *AHX, const char *query);
extern PGresult *ExecuteSqlQueryAH(Archive *AHX, const char *query,
ExecStatusType status);
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index d6c328faf1..7fdbe08e11 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -27,6 +27,7 @@ OBJS = \
mbprint.o \
print.o \
psqlscan.o \
+ query_utils.o \
recovery_gen.o \
simple_list.o \
string_utils.o
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
new file mode 100644
index 0000000000..b28750f4b2
--- /dev/null
+++ b/src/fe_utils/query_utils.c
@@ -0,0 +1,114 @@
+/*-------------------------------------------------------------------------
+ *
+ * Query executing routines with facilities for modular error handling.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/query_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "fe_utils/exit_utils.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * The exiting query result handler embeds the historical pg_dump behavior
+ * under query error conditions, including exiting nicely. The 'conn' object
+ * is unused here, but is included in the interface for alternate query result
+ * handler implementations.
+ *
+ * Whether the query was successful is determined by comparing the returned
+ * status code against the expected status code, and by comparing the number of
+ * tuples returned from the query against expected_ntups. Special negative
+ * values of expected_ntups can be used to require at least one row or to
+ * disables ntup checking.
+ *
+ * Exits on failure. On successful query completion, returns the 'res'
+ * argument as a notational convenience.
+ */
+PGresult *
+exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
+{
+ if (PQresultStatus(res) != expected_status)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", query);
+ PQfinish(conn);
+ exit_nicely(1);
+ }
+ if (expected_ntups == POSITIVE_NTUPS || expected_ntups >= 0)
+ {
+ int ntups = PQntuples(res);
+
+ if (expected_ntups == POSITIVE_NTUPS)
+ {
+ if (ntups == 0)
+ fatal("query returned no rows: %s", query);
+ }
+ else if (ntups != expected_ntups)
+ {
+ /*
+ * Preserve historical message behavior of spelling "one" as the
+ * expected row count.
+ */
+ if (expected_ntups == 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+ fatal(ngettext("query returned %d row instead of %d: %s",
+ "query returned %d rows instead of %d: %s",
+ ntups),
+ ntups, expected_ntups, query);
+ }
+ }
+ return res;
+}
+
+/*
+ * Executes the given SQL query statement.
+ *
+ * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
+ */
+void
+ExecuteSqlStatement(PGconn *conn, const char *query)
+{
+ PQclear(exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_COMMAND_OK,
+ ANY_NTUPS,
+ query));
+}
+
+/*
+ * Executes the given SQL query.
+ *
+ * Invokes the exiting handler unless the given 'status' results.
+ *
+ * If successful, returns the query result.
+ */
+PGresult *
+ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ status,
+ ANY_NTUPS,
+ query);
+}
+
+/*
+ * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
+ * requires that exactly one row be returned.
+ */
+PGresult *
+ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_TUPLES_OK,
+ 1,
+ query);
+}
diff --git a/src/include/fe_utils/query_utils.h b/src/include/fe_utils/query_utils.h
new file mode 100644
index 0000000000..f03d17b1ed
--- /dev/null
+++ b/src/include/fe_utils/query_utils.h
@@ -0,0 +1,34 @@
+/*-------------------------------------------------------------------------
+ *
+ * Query executing routines with facilities for modular error handling.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/query_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERY_UTILS_H
+#define QUERY_UTILS_H
+
+#include "libpq-fe.h"
+
+#define POSITIVE_NTUPS (-1)
+#define ANY_NTUPS (-2)
+typedef PGresult *(*PGresultHandler) (PGresult *res,
+ PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups,
+ const char *query);
+
+extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+
+extern void ExecuteSqlStatement(PGconn *conn, const char *query);
+extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
+ ExecStatusType expected_status);
+extern PGresult *ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query);
+
+#endif /* QUERY_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v3-0004-Adding-CurrentQueryHandler-logic.patchapplication/octet-stream; name=v3-0004-Adding-CurrentQueryHandler-logic.patch; x-unix-mode=0644Download
From 023d5103cce11336701c81e4c3ad312e72277cee Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:25:29 -0800
Subject: [PATCH v3 4/9] Adding CurrentQueryHandler logic.
Extending the default set of PGresultHandlers and creating a
mechanism to switch between them using a new function
ResultHandlerSwitchTo, analogous to MemoryContextSwithTo. In
addition to the exiting_handler already created in a prior commit
(which embeds the historical behavior from pg_dump), adding a
quiet_handler which cleans up and exits without logging anything,
and a noop_handler which does nothing, leaving the responsibility
for cleanup handling to the caller.
---
src/fe_utils/query_utils.c | 81 ++++++++++++++++++++++--------
src/include/fe_utils/query_utils.h | 17 +++++++
2 files changed, 76 insertions(+), 22 deletions(-)
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
index b28750f4b2..355da6edaf 100644
--- a/src/fe_utils/query_utils.c
+++ b/src/fe_utils/query_utils.c
@@ -12,6 +12,12 @@
#include "fe_utils/exit_utils.h"
#include "fe_utils/query_utils.h"
+/*
+ * Global memory.
+ */
+
+PGresultHandler CurrentResultHandler = exiting_handler;
+
/*
* The exiting query result handler embeds the historical pg_dump behavior
* under query error conditions, including exiting nicely. The 'conn' object
@@ -29,7 +35,7 @@
*/
PGresult *
exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
- int expected_ntups, const char *query)
+ int expected_ntups, const char *query)
{
if (PQresultStatus(res) != expected_status)
{
@@ -67,48 +73,79 @@ exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
return res;
}
+/*
+ * Quietly cleans up and exits nicely unless the expected conditions were met.
+ */
+PGresult *
+quiet_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
+{
+ int ntups = PQntuples(res);
+
+ if ((PQresultStatus(res) != expected_status) ||
+ (expected_ntups == POSITIVE_NTUPS && ntups == 0) ||
+ (expected_ntups >= 0 && ntups != expected_ntups))
+ {
+ PQfinish(conn);
+ exit_nicely(1);
+ }
+
+ return res;
+}
+
+/*
+ * Does nothing other than returning the 'res' argument back to the caller.
+ * This handler is intended for callers who prefer to perform the error
+ * handling themselves.
+ */
+PGresult *
+noop_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
+{
+ return res;
+}
+
/*
* Executes the given SQL query statement.
*
- * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
+ * Expects a PGRES_COMMAND_OK status.
*/
void
ExecuteSqlStatement(PGconn *conn, const char *query)
{
- PQclear(exiting_handler(PQexec(conn, query),
- conn,
- PGRES_COMMAND_OK,
- ANY_NTUPS,
- query));
+ PQclear(CurrentResultHandler(PQexec(conn, query),
+ conn,
+ PGRES_COMMAND_OK,
+ ANY_NTUPS,
+ query));
}
/*
* Executes the given SQL query.
*
- * Invokes the exiting handler unless the given 'status' results.
- *
- * If successful, returns the query result.
+ * Expects the given status.
*/
PGresult *
ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
{
- return exiting_handler(PQexec(conn, query),
- conn,
- status,
- ANY_NTUPS,
- query);
+ return CurrentResultHandler(PQexec(conn, query),
+ conn,
+ status,
+ ANY_NTUPS,
+ query);
}
/*
- * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
- * requires that exactly one row be returned.
+ * Executes the given SQL query.
+ *
+ * Expects a PGRES_TUPLES_OK status and precisely one row.
*/
PGresult *
ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
{
- return exiting_handler(PQexec(conn, query),
- conn,
- PGRES_TUPLES_OK,
- 1,
- query);
+ return CurrentResultHandler(PQexec(conn, query),
+ conn,
+ PGRES_TUPLES_OK,
+ 1,
+ query);
}
diff --git a/src/include/fe_utils/query_utils.h b/src/include/fe_utils/query_utils.h
index f03d17b1ed..80958e94fb 100644
--- a/src/include/fe_utils/query_utils.h
+++ b/src/include/fe_utils/query_utils.h
@@ -22,9 +22,26 @@ typedef PGresult *(*PGresultHandler) (PGresult *res,
int expected_ntups,
const char *query);
+extern PGresultHandler CurrentResultHandler;
+
+static inline PGresultHandler
+ResultHandlerSwitchTo(PGresultHandler handler)
+{
+ PGresultHandler old = CurrentResultHandler;
+
+ CurrentResultHandler = handler;
+ return old;
+}
+
extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
ExecStatusType expected_status,
int expected_ntups, const char *query);
+extern PGresult *quiet_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+extern PGresult *noop_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
extern void ExecuteSqlStatement(PGconn *conn, const char *query);
extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
--
2.21.1 (Apple Git-122.3)
On Jan 6, 2021, at 11:05 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
I have done that, factoring them into fe_utils, and am attaching a series of patches that accomplishes that refactoring.
The previous set should have been named v30, not v3. My apologies for any confusion.
The attached patches, v31, are mostly the same, but with "getopt_long.h" included from pg_amcheck.c per Thomas's review, and a .gitignore file added in contrib/pg_amcheck/
Attachments:
v31-0001-Moving-exit_nicely-and-fatal-into-fe_utils.patchapplication/octet-stream; name=v31-0001-Moving-exit_nicely-and-fatal-into-fe_utils.patch; x-unix-mode=0644Download
From d75ac49112aaf2c862ac63fcca722caebbda76e5 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 30 Dec 2020 12:50:41 -0800
Subject: [PATCH v31 1/9] Moving exit_nicely and fatal into fe_utils
In preparation for moving other pg_dump functionality into fe_utils,
moving the functions "on_exit_nicely" and "exit_nicely", and the
macro "fatal" from pg_dump into fe_utils.
Various frontend executables in src/bin, src/bin/scripts, and
contrib/ have logic for logging and exiting under error conditions.
The logging code itself is already under common/, but executables
differ in their calls to exit() vs. exit_nicely(), with
exit_nicely() not uniformly defined, and sometimes all of this
wrapped up under a macro named fatal(), the definition of that macro
also not uniformly defined. This makes it harder to move code out
of these executables into a shared library under fe_utils/.
Standardizing all executables to define these things the same way or
to use a single fe_utils/ library is beyond the scope of this patch,
but this patch should get the ball rolling in that direction.
---
src/bin/pg_dump/pg_backup_archiver.h | 1 +
src/bin/pg_dump/pg_backup_utils.c | 59 -----------------------
src/bin/pg_dump/pg_backup_utils.h | 8 ----
src/fe_utils/Makefile | 1 +
src/fe_utils/exit_utils.c | 71 ++++++++++++++++++++++++++++
src/include/fe_utils/exit_utils.h | 25 ++++++++++
6 files changed, 98 insertions(+), 67 deletions(-)
create mode 100644 src/fe_utils/exit_utils.c
create mode 100644 src/include/fe_utils/exit_utils.h
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index a8ea5c7eae..37d157b7ad 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -26,6 +26,7 @@
#include <time.h>
+#include "fe_utils/exit_utils.h"
#include "libpq-fe.h"
#include "pg_backup.h"
#include "pqexpbuffer.h"
diff --git a/src/bin/pg_dump/pg_backup_utils.c b/src/bin/pg_dump/pg_backup_utils.c
index c709a40e06..631e88f7db 100644
--- a/src/bin/pg_dump/pg_backup_utils.c
+++ b/src/bin/pg_dump/pg_backup_utils.c
@@ -19,16 +19,6 @@
/* Globals exported by this file */
const char *progname = NULL;
-#define MAX_ON_EXIT_NICELY 20
-
-static struct
-{
- on_exit_nicely_callback function;
- void *arg;
-} on_exit_nicely_list[MAX_ON_EXIT_NICELY];
-
-static int on_exit_nicely_index;
-
/*
* Parse a --section=foo command line argument.
*
@@ -57,52 +47,3 @@ set_dump_section(const char *arg, int *dumpSections)
exit_nicely(1);
}
}
-
-
-/* Register a callback to be run when exit_nicely is invoked. */
-void
-on_exit_nicely(on_exit_nicely_callback function, void *arg)
-{
- if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
- {
- pg_log_fatal("out of on_exit_nicely slots");
- exit_nicely(1);
- }
- on_exit_nicely_list[on_exit_nicely_index].function = function;
- on_exit_nicely_list[on_exit_nicely_index].arg = arg;
- on_exit_nicely_index++;
-}
-
-/*
- * Run accumulated on_exit_nicely callbacks in reverse order and then exit
- * without printing any message.
- *
- * If running in a parallel worker thread on Windows, we only exit the thread,
- * not the whole process.
- *
- * Note that in parallel operation on Windows, the callback(s) will be run
- * by each thread since the list state is necessarily shared by all threads;
- * each callback must contain logic to ensure it does only what's appropriate
- * for its thread. On Unix, callbacks are also run by each process, but only
- * for callbacks established before we fork off the child processes. (It'd
- * be cleaner to reset the list after fork(), and let each child establish
- * its own callbacks; but then the behavior would be completely inconsistent
- * between Windows and Unix. For now, just be sure to establish callbacks
- * before forking to avoid inconsistency.)
- */
-void
-exit_nicely(int code)
-{
- int i;
-
- for (i = on_exit_nicely_index - 1; i >= 0; i--)
- on_exit_nicely_list[i].function(code,
- on_exit_nicely_list[i].arg);
-
-#ifdef WIN32
- if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
- _endthreadex(code);
-#endif
-
- exit(code);
-}
diff --git a/src/bin/pg_dump/pg_backup_utils.h b/src/bin/pg_dump/pg_backup_utils.h
index 306798f9ac..ee4409c274 100644
--- a/src/bin/pg_dump/pg_backup_utils.h
+++ b/src/bin/pg_dump/pg_backup_utils.h
@@ -15,22 +15,14 @@
#ifndef PG_BACKUP_UTILS_H
#define PG_BACKUP_UTILS_H
-#include "common/logging.h"
-
/* bits returned by set_dump_section */
#define DUMP_PRE_DATA 0x01
#define DUMP_DATA 0x02
#define DUMP_POST_DATA 0x04
#define DUMP_UNSECTIONED 0xff
-typedef void (*on_exit_nicely_callback) (int code, void *arg);
-
extern const char *progname;
extern void set_dump_section(const char *arg, int *dumpSections);
-extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
-extern void exit_nicely(int code) pg_attribute_noreturn();
-
-#define fatal(...) do { pg_log_error(__VA_ARGS__); exit_nicely(1); } while(0)
#endif /* PG_BACKUP_UTILS_H */
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 10d6838cf9..d6c328faf1 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -23,6 +23,7 @@ OBJS = \
archive.o \
cancel.o \
conditional.o \
+ exit_utils.o \
mbprint.o \
print.o \
psqlscan.o \
diff --git a/src/fe_utils/exit_utils.c b/src/fe_utils/exit_utils.c
new file mode 100644
index 0000000000..e61bd438fc
--- /dev/null
+++ b/src/fe_utils/exit_utils.c
@@ -0,0 +1,71 @@
+/*-------------------------------------------------------------------------
+ *
+ * Exiting with cleanup callback facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/exit_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "fe_utils/exit_utils.h"
+
+#define MAX_ON_EXIT_NICELY 20
+
+static struct
+{
+ on_exit_nicely_callback function;
+ void *arg;
+} on_exit_nicely_list[MAX_ON_EXIT_NICELY];
+
+static int on_exit_nicely_index;
+
+/* Register a callback to be run when exit_nicely is invoked. */
+void
+on_exit_nicely(on_exit_nicely_callback function, void *arg)
+{
+ if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
+ {
+ pg_log_fatal("out of on_exit_nicely slots");
+ exit_nicely(1);
+ }
+ on_exit_nicely_list[on_exit_nicely_index].function = function;
+ on_exit_nicely_list[on_exit_nicely_index].arg = arg;
+ on_exit_nicely_index++;
+}
+
+/*
+ * Run accumulated on_exit_nicely callbacks in reverse order and then exit
+ * without printing any message.
+ *
+ * If running in a parallel worker thread on Windows, we only exit the thread,
+ * not the whole process.
+ *
+ * Note that in parallel operation on Windows, the callback(s) will be run
+ * by each thread since the list state is necessarily shared by all threads;
+ * each callback must contain logic to ensure it does only what's appropriate
+ * for its thread. On Unix, callbacks are also run by each process, but only
+ * for callbacks established before we fork off the child processes. (It'd
+ * be cleaner to reset the list after fork(), and let each child establish
+ * its own callbacks; but then the behavior would be completely inconsistent
+ * between Windows and Unix. For now, just be sure to establish callbacks
+ * before forking to avoid inconsistency.)
+ */
+void
+exit_nicely(int code)
+{
+ int i;
+
+ for (i = on_exit_nicely_index - 1; i >= 0; i--)
+ on_exit_nicely_list[i].function(code,
+ on_exit_nicely_list[i].arg);
+
+#ifdef WIN32
+ if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
+ _endthreadex(code);
+#endif
+
+ exit(code);
+}
diff --git a/src/include/fe_utils/exit_utils.h b/src/include/fe_utils/exit_utils.h
new file mode 100644
index 0000000000..948d2fdb51
--- /dev/null
+++ b/src/include/fe_utils/exit_utils.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * Exiting with cleanup callback facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/exit_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXIT_UTILS_H
+#define EXIT_UTILS_H
+
+#include "postgres_fe.h"
+#include "common/logging.h"
+
+typedef void (*on_exit_nicely_callback) (int code, void *arg);
+
+extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
+extern void exit_nicely(int code) pg_attribute_noreturn();
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit_nicely(1); } while(0)
+
+#endif /* EXIT_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v31-0002-Refactoring-ExecuteSqlQuery-and-related-function.patchapplication/octet-stream; name=v31-0002-Refactoring-ExecuteSqlQuery-and-related-function.patch; x-unix-mode=0644Download
From e5d531e5d50dd32c9f1f420daa04128588c71514 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 4 Jan 2021 12:44:20 -0800
Subject: [PATCH v31 2/9] Refactoring ExecuteSqlQuery and related functions.
ExecuteSqlQuery, ExecuteSqlQueryForSingleRow, and
ExecuteSqlStatement in the pg_dump project were defined to take a
pointer to struct Archive, which is a struct unused outside pg_dump.
In preparation for moving these functions to fe_utils, refactoring
these functions to take a PGconn pointer. These functions also
embedded pg_dump assumptions about the correct error handling
behavior, specifically to do with logging error messages before
calling exit_nicely(). Refactoring the error handling logic into a
handler function. The full design of the handler is not yet
present, as it will be developed further after moving to fe_utils,
but the idea is that callers will ultimately be able to override the
error handling behavior by defining alternate handlers.
To minimize changes to pg_dump and friends, creating thin wrappers
around these functions that take an Archive pointer. It might be
marginally cleaner in the long run to refactor pg_dump.c to call
with a PGconn pointer in all relevant call sites, but that would
result in a nontrivially larger patch and more code churn, so not
doing that here. Another option might be to define the thin
wrappers as static inline functions, but that seems inconsistent
with the rest of the pg_dump project style, so not doing that
either. Should we?
---
src/bin/pg_dump/pg_backup_db.c | 144 ++++++++++++++-----
src/bin/pg_dump/pg_backup_db.h | 26 +++-
src/bin/pg_dump/pg_dump.c | 248 ++++++++++++++++-----------------
3 files changed, 253 insertions(+), 165 deletions(-)
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index 5ba43441f5..b55a968da2 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -61,7 +61,7 @@ _check_database_version(ArchiveHandle *AH)
*/
if (remoteversion >= 90000)
{
- res = ExecuteSqlQueryForSingleRow((Archive *) AH, "SELECT pg_catalog.pg_is_in_recovery()");
+ res = ExecuteSqlQueryForSingleRowAH((Archive *) AH, "SELECT pg_catalog.pg_is_in_recovery()");
AH->public.isStandby = (strcmp(PQgetvalue(res, 0, 0), "t") == 0);
PQclear(res);
@@ -198,8 +198,8 @@ ConnectDatabase(Archive *AHX,
}
/* Start strict; later phases may override this. */
- PQclear(ExecuteSqlQueryForSingleRow((Archive *) AH,
- ALWAYS_SECURE_SEARCH_PATH_SQL));
+ PQclear(ExecuteSqlQueryForSingleRowAH((Archive *) AH,
+ ALWAYS_SECURE_SEARCH_PATH_SQL));
if (password && password != AH->savedPassword)
free(password);
@@ -271,59 +271,129 @@ notice_processor(void *arg, const char *message)
pg_log_generic(PG_LOG_INFO, "%s", message);
}
-/* Like fatal(), but with a complaint about a particular query. */
-static void
-die_on_query_failure(ArchiveHandle *AH, const char *query)
+/*
+ * The exiting query result handler embeds the historical pg_dump behavior
+ * under query error conditions, including exiting nicely. The 'conn' object
+ * is unused here, but is included in the interface for alternate query result
+ * handler implementations.
+ *
+ * Whether the query was successful is determined by comparing the returned
+ * status code against the expected status code, and by comparing the number of
+ * tuples returned from the query against expected_ntups. Special negative
+ * values of expected_ntups can be used to require at least one row or to
+ * disables ntup checking.
+ *
+ * Exits on failure. On successful query completion, returns the 'res'
+ * argument as a notational convenience.
+ */
+PGresult *
+exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
{
- pg_log_error("query failed: %s",
- PQerrorMessage(AH->connection));
- fatal("query was: %s", query);
+ if (PQresultStatus(res) != expected_status)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", query);
+ PQfinish(conn);
+ exit_nicely(1);
+ }
+ if (expected_ntups == POSITIVE_NTUPS || expected_ntups >= 0)
+ {
+ int ntups = PQntuples(res);
+
+ if (expected_ntups == POSITIVE_NTUPS)
+ {
+ if (ntups == 0)
+ fatal("query returned no rows: %s", query);
+ }
+ else if (ntups != expected_ntups)
+ {
+ /*
+ * Preserve historical message behavior of spelling "one" as the
+ * expected row count.
+ */
+ if (expected_ntups == 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+ fatal(ngettext("query returned %d row instead of %d: %s",
+ "query returned %d rows instead of %d: %s",
+ ntups),
+ ntups, expected_ntups, query);
+ }
+ }
+ return res;
}
+/*
+ * Executes the given SQL query statement.
+ *
+ * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
+ */
void
-ExecuteSqlStatement(Archive *AHX, const char *query)
+ExecuteSqlStatement(PGconn *conn, const char *query)
{
- ArchiveHandle *AH = (ArchiveHandle *) AHX;
- PGresult *res;
+ PQclear(exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_COMMAND_OK,
+ ANY_NTUPS,
+ query));
+}
- res = PQexec(AH->connection, query);
- if (PQresultStatus(res) != PGRES_COMMAND_OK)
- die_on_query_failure(AH, query);
- PQclear(res);
+/*
+ * Executes the given SQL query.
+ *
+ * Invokes the exiting handler unless the given 'status' results.
+ *
+ * If successful, returns the query result.
+ */
+PGresult *
+ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ status,
+ ANY_NTUPS,
+ query);
}
+/*
+ * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
+ * requires that exactly one row be returned.
+ */
PGresult *
-ExecuteSqlQuery(Archive *AHX, const char *query, ExecStatusType status)
+ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_TUPLES_OK,
+ 1,
+ query);
+}
+
+void
+ExecuteSqlStatementAH(Archive *AHX, const char *query)
{
ArchiveHandle *AH = (ArchiveHandle *) AHX;
- PGresult *res;
- res = PQexec(AH->connection, query);
- if (PQresultStatus(res) != status)
- die_on_query_failure(AH, query);
- return res;
+ ExecuteSqlStatement(AH->connection, query);
}
-/*
- * Execute an SQL query and verify that we got exactly one row back.
- */
PGresult *
-ExecuteSqlQueryForSingleRow(Archive *fout, const char *query)
+ExecuteSqlQueryAH(Archive *AHX, const char *query, ExecStatusType status)
{
- PGresult *res;
- int ntups;
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
- res = ExecuteSqlQuery(fout, query, PGRES_TUPLES_OK);
+ return ExecuteSqlQuery(AH->connection, query, status);
+}
- /* Expecting a single result only */
- ntups = PQntuples(res);
- if (ntups != 1)
- fatal(ngettext("query returned %d row instead of one: %s",
- "query returned %d rows instead of one: %s",
- ntups),
- ntups, query);
+PGresult *
+ExecuteSqlQueryForSingleRowAH(Archive *AHX, const char *query)
+{
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
- return res;
+ return ExecuteSqlQueryForSingleRow(AH->connection, query);
}
/*
diff --git a/src/bin/pg_dump/pg_backup_db.h b/src/bin/pg_dump/pg_backup_db.h
index 8888dd34b9..1aac600ece 100644
--- a/src/bin/pg_dump/pg_backup_db.h
+++ b/src/bin/pg_dump/pg_backup_db.h
@@ -13,10 +13,28 @@
extern int ExecuteSqlCommandBuf(Archive *AHX, const char *buf, size_t bufLen);
-extern void ExecuteSqlStatement(Archive *AHX, const char *query);
-extern PGresult *ExecuteSqlQuery(Archive *AHX, const char *query,
- ExecStatusType status);
-extern PGresult *ExecuteSqlQueryForSingleRow(Archive *fout, const char *query);
+#define POSITIVE_NTUPS (-1)
+#define ANY_NTUPS (-2)
+typedef PGresult *(*PGresultHandler) (PGresult *res,
+ PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups,
+ const char *query);
+
+extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+
+extern void ExecuteSqlStatement(PGconn *conn, const char *query);
+extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
+ ExecStatusType expected_status);
+extern PGresult *ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query);
+
+extern void ExecuteSqlStatementAH(Archive *AHX, const char *query);
+extern PGresult *ExecuteSqlQueryAH(Archive *AHX, const char *query,
+ ExecStatusType status);
+extern PGresult *ExecuteSqlQueryForSingleRowAH(Archive *fout,
+ const char *query);
extern void EndDBCopyMode(Archive *AHX, const char *tocEntryTag);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 1f70653c02..e8985a834f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1084,7 +1084,7 @@ setup_connection(Archive *AH, const char *dumpencoding,
PGconn *conn = GetConnection(AH);
const char *std_strings;
- PQclear(ExecuteSqlQueryForSingleRow(AH, ALWAYS_SECURE_SEARCH_PATH_SQL));
+ PQclear(ExecuteSqlQueryForSingleRowAH(AH, ALWAYS_SECURE_SEARCH_PATH_SQL));
/*
* Set the client encoding if requested.
@@ -1119,7 +1119,7 @@ setup_connection(Archive *AH, const char *dumpencoding,
PQExpBuffer query = createPQExpBuffer();
appendPQExpBuffer(query, "SET ROLE %s", fmtId(use_role));
- ExecuteSqlStatement(AH, query->data);
+ ExecuteSqlStatementAH(AH, query->data);
destroyPQExpBuffer(query);
/* save it for possible later use by parallel workers */
@@ -1128,11 +1128,11 @@ setup_connection(Archive *AH, const char *dumpencoding,
}
/* Set the datestyle to ISO to ensure the dump's portability */
- ExecuteSqlStatement(AH, "SET DATESTYLE = ISO");
+ ExecuteSqlStatementAH(AH, "SET DATESTYLE = ISO");
/* Likewise, avoid using sql_standard intervalstyle */
if (AH->remoteVersion >= 80400)
- ExecuteSqlStatement(AH, "SET INTERVALSTYLE = POSTGRES");
+ ExecuteSqlStatementAH(AH, "SET INTERVALSTYLE = POSTGRES");
/*
* Use an explicitly specified extra_float_digits if it has been provided.
@@ -1145,35 +1145,35 @@ setup_connection(Archive *AH, const char *dumpencoding,
appendPQExpBuffer(q, "SET extra_float_digits TO %d",
extra_float_digits);
- ExecuteSqlStatement(AH, q->data);
+ ExecuteSqlStatementAH(AH, q->data);
destroyPQExpBuffer(q);
}
else if (AH->remoteVersion >= 90000)
- ExecuteSqlStatement(AH, "SET extra_float_digits TO 3");
+ ExecuteSqlStatementAH(AH, "SET extra_float_digits TO 3");
else
- ExecuteSqlStatement(AH, "SET extra_float_digits TO 2");
+ ExecuteSqlStatementAH(AH, "SET extra_float_digits TO 2");
/*
* If synchronized scanning is supported, disable it, to prevent
* unpredictable changes in row ordering across a dump and reload.
*/
if (AH->remoteVersion >= 80300)
- ExecuteSqlStatement(AH, "SET synchronize_seqscans TO off");
+ ExecuteSqlStatementAH(AH, "SET synchronize_seqscans TO off");
/*
* Disable timeouts if supported.
*/
- ExecuteSqlStatement(AH, "SET statement_timeout = 0");
+ ExecuteSqlStatementAH(AH, "SET statement_timeout = 0");
if (AH->remoteVersion >= 90300)
- ExecuteSqlStatement(AH, "SET lock_timeout = 0");
+ ExecuteSqlStatementAH(AH, "SET lock_timeout = 0");
if (AH->remoteVersion >= 90600)
- ExecuteSqlStatement(AH, "SET idle_in_transaction_session_timeout = 0");
+ ExecuteSqlStatementAH(AH, "SET idle_in_transaction_session_timeout = 0");
/*
* Quote all identifiers, if requested.
*/
if (quote_all_identifiers && AH->remoteVersion >= 90100)
- ExecuteSqlStatement(AH, "SET quote_all_identifiers = true");
+ ExecuteSqlStatementAH(AH, "SET quote_all_identifiers = true");
/*
* Adjust row-security mode, if supported.
@@ -1181,15 +1181,15 @@ setup_connection(Archive *AH, const char *dumpencoding,
if (AH->remoteVersion >= 90500)
{
if (dopt->enable_row_security)
- ExecuteSqlStatement(AH, "SET row_security = on");
+ ExecuteSqlStatementAH(AH, "SET row_security = on");
else
- ExecuteSqlStatement(AH, "SET row_security = off");
+ ExecuteSqlStatementAH(AH, "SET row_security = off");
}
/*
* Start transaction-snapshot mode transaction to dump consistent data.
*/
- ExecuteSqlStatement(AH, "BEGIN");
+ ExecuteSqlStatementAH(AH, "BEGIN");
if (AH->remoteVersion >= 90100)
{
/*
@@ -1201,17 +1201,17 @@ setup_connection(Archive *AH, const char *dumpencoding,
* guarantees. This is a kluge, but safe for back-patching.
*/
if (dopt->serializable_deferrable && AH->sync_snapshot_id == NULL)
- ExecuteSqlStatement(AH,
+ ExecuteSqlStatementAH(AH,
"SET TRANSACTION ISOLATION LEVEL "
"SERIALIZABLE, READ ONLY, DEFERRABLE");
else
- ExecuteSqlStatement(AH,
+ ExecuteSqlStatementAH(AH,
"SET TRANSACTION ISOLATION LEVEL "
"REPEATABLE READ, READ ONLY");
}
else
{
- ExecuteSqlStatement(AH,
+ ExecuteSqlStatementAH(AH,
"SET TRANSACTION ISOLATION LEVEL "
"SERIALIZABLE, READ ONLY");
}
@@ -1230,7 +1230,7 @@ setup_connection(Archive *AH, const char *dumpencoding,
appendPQExpBufferStr(query, "SET TRANSACTION SNAPSHOT ");
appendStringLiteralConn(query, AH->sync_snapshot_id, conn);
- ExecuteSqlStatement(AH, query->data);
+ ExecuteSqlStatementAH(AH, query->data);
destroyPQExpBuffer(query);
}
else if (AH->numWorkers > 1 &&
@@ -1270,7 +1270,7 @@ get_synchronized_snapshot(Archive *fout)
char *result;
PGresult *res;
- res = ExecuteSqlQueryForSingleRow(fout, query);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query);
result = pg_strdup(PQgetvalue(res, 0, 0));
PQclear(res);
@@ -1343,7 +1343,7 @@ expand_schema_name_patterns(Archive *fout,
processSQLNamePattern(GetConnection(fout), query, cell->val, false,
false, NULL, "n.nspname", NULL, NULL);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (strict_names && PQntuples(res) == 0)
fatal("no matching schemas were found for pattern \"%s\"", cell->val);
@@ -1390,7 +1390,7 @@ expand_foreign_server_name_patterns(Archive *fout,
processSQLNamePattern(GetConnection(fout), query, cell->val, false,
false, NULL, "s.srvname", NULL, NULL);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) == 0)
fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
@@ -1450,9 +1450,9 @@ expand_table_name_patterns(Archive *fout,
false, "n.nspname", "c.relname", NULL,
"pg_catalog.pg_table_is_visible(c.oid)");
- ExecuteSqlStatement(fout, "RESET search_path");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
- PQclear(ExecuteSqlQueryForSingleRow(fout,
+ ExecuteSqlStatementAH(fout, "RESET search_path");
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRowAH(fout,
ALWAYS_SECURE_SEARCH_PATH_SQL));
if (strict_names && PQntuples(res) == 0)
fatal("no matching tables were found for pattern \"%s\"", cell->val);
@@ -1907,7 +1907,7 @@ dumpTableData_copy(Archive *fout, void *dcontext)
fmtQualifiedDumpable(tbinfo),
column_list);
}
- res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_COPY_OUT);
PQclear(res);
destroyPQExpBuffer(clistBuf);
@@ -2028,11 +2028,11 @@ dumpTableData_insert(Archive *fout, void *dcontext)
if (tdinfo->filtercond)
appendPQExpBuffer(q, " %s", tdinfo->filtercond);
- ExecuteSqlStatement(fout, q->data);
+ ExecuteSqlStatementAH(fout, q->data);
while (1)
{
- res = ExecuteSqlQuery(fout, "FETCH 100 FROM _pg_dump_cursor",
+ res = ExecuteSqlQueryAH(fout, "FETCH 100 FROM _pg_dump_cursor",
PGRES_TUPLES_OK);
nfields = PQnfields(res);
@@ -2220,7 +2220,7 @@ dumpTableData_insert(Archive *fout, void *dcontext)
archputs("\n\n", fout);
- ExecuteSqlStatement(fout, "CLOSE _pg_dump_cursor");
+ ExecuteSqlStatementAH(fout, "CLOSE _pg_dump_cursor");
destroyPQExpBuffer(q);
if (insertStmt != NULL)
@@ -2520,7 +2520,7 @@ buildMatViewRefreshDependencies(Archive *fout)
"FROM w "
"WHERE refrelkind = " CppAsString2(RELKIND_MATVIEW));
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -2847,7 +2847,7 @@ dumpDatabase(Archive *fout)
username_subquery);
}
- res = ExecuteSqlQueryForSingleRow(fout, dbQry->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, dbQry->data);
i_tableoid = PQfnumber(res, "tableoid");
i_oid = PQfnumber(res, "oid");
@@ -2992,7 +2992,7 @@ dumpDatabase(Archive *fout)
seclabelQry = createPQExpBuffer();
buildShSecLabelQuery("pg_database", dbCatId.oid, seclabelQry);
- shres = ExecuteSqlQuery(fout, seclabelQry->data, PGRES_TUPLES_OK);
+ shres = ExecuteSqlQueryAH(fout, seclabelQry->data, PGRES_TUPLES_OK);
resetPQExpBuffer(seclabelQry);
emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
if (seclabelQry->len > 0)
@@ -3103,7 +3103,7 @@ dumpDatabase(Archive *fout)
"WHERE oid = %u;\n",
LargeObjectRelationId);
- lo_res = ExecuteSqlQueryForSingleRow(fout, loFrozenQry->data);
+ lo_res = ExecuteSqlQueryForSingleRowAH(fout, loFrozenQry->data);
i_relfrozenxid = PQfnumber(lo_res, "relfrozenxid");
i_relminmxid = PQfnumber(lo_res, "relminmxid");
@@ -3162,7 +3162,7 @@ dumpDatabaseConfig(Archive *AH, PQExpBuffer outbuf,
else
printfPQExpBuffer(buf, "SELECT datconfig[%d] FROM pg_database WHERE oid = '%u'::oid", count, dboid);
- res = ExecuteSqlQuery(AH, buf->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(AH, buf->data, PGRES_TUPLES_OK);
if (PQntuples(res) == 1 &&
!PQgetisnull(res, 0, 0))
@@ -3189,7 +3189,7 @@ dumpDatabaseConfig(Archive *AH, PQExpBuffer outbuf,
"WHERE setrole = r.oid AND setdatabase = '%u'::oid",
dboid);
- res = ExecuteSqlQuery(AH, buf->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(AH, buf->data, PGRES_TUPLES_OK);
if (PQntuples(res) > 0)
{
@@ -3277,7 +3277,7 @@ dumpSearchPath(Archive *AH)
* listing schemas that may appear in search_path but not actually exist,
* which seems like a prudent exclusion.
*/
- res = ExecuteSqlQueryForSingleRow(AH,
+ res = ExecuteSqlQueryForSingleRowAH(AH,
"SELECT pg_catalog.current_schemas(false)");
if (!parsePGArray(PQgetvalue(res, 0, 0), &schemanames, &nschemanames))
@@ -3391,7 +3391,7 @@ getBlobs(Archive *fout)
"NULL::oid AS initrlomacl "
" FROM pg_largeobject");
- res = ExecuteSqlQuery(fout, blobQry->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, blobQry->data, PGRES_TUPLES_OK);
i_oid = PQfnumber(res, "oid");
i_lomowner = PQfnumber(res, "rolname");
@@ -3537,7 +3537,7 @@ dumpBlobs(Archive *fout, void *arg)
"DECLARE bloboid CURSOR FOR "
"SELECT DISTINCT loid FROM pg_largeobject ORDER BY 1";
- ExecuteSqlStatement(fout, blobQry);
+ ExecuteSqlStatementAH(fout, blobQry);
/* Command to fetch from cursor */
blobFetchQry = "FETCH 1000 IN bloboid";
@@ -3545,7 +3545,7 @@ dumpBlobs(Archive *fout, void *arg)
do
{
/* Do a fetch */
- res = ExecuteSqlQuery(fout, blobFetchQry, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, blobFetchQry, PGRES_TUPLES_OK);
/* Process the tuples, if any */
ntups = PQntuples(res);
@@ -3678,7 +3678,7 @@ getPolicies(Archive *fout, TableInfo tblinfo[], int numTables)
"FROM pg_catalog.pg_policy pol "
"WHERE polrelid = '%u'",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -3914,7 +3914,7 @@ getPublications(Archive *fout)
"FROM pg_publication p",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4112,7 +4112,7 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables)
"WHERE pr.prrelid = '%u'"
" AND p.oid = pr.prpubid",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4237,7 +4237,7 @@ getSubscriptions(Archive *fout)
{
int n;
- res = ExecuteSqlQuery(fout,
+ res = ExecuteSqlQueryAH(fout,
"SELECT count(*) FROM pg_subscription "
"WHERE subdbid = (SELECT oid FROM pg_database"
" WHERE datname = current_database())",
@@ -4274,7 +4274,7 @@ getSubscriptions(Archive *fout)
"WHERE s.subdbid = (SELECT oid FROM pg_database\n"
" WHERE datname = current_database())");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4446,7 +4446,7 @@ append_depends_on_extension(Archive *fout,
"AND refclassid = 'pg_catalog.pg_extension'::pg_catalog.regclass",
catalog,
dobj->catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_extname = PQfnumber(res, "extname");
for (i = 0; i < ntups; i++)
@@ -4485,7 +4485,7 @@ get_next_possible_free_pg_type_oid(Archive *fout, PQExpBuffer upgrade_query)
"FROM pg_catalog.pg_type "
"WHERE oid = '%u'::pg_catalog.oid);",
next_possible_free_oid);
- res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
is_dup = (PQgetvalue(res, 0, 0)[0] == 't');
PQclear(res);
} while (is_dup);
@@ -4518,7 +4518,7 @@ binary_upgrade_set_type_oids_by_type_oid(Archive *fout,
"WHERE oid = '%u'::pg_catalog.oid;",
pg_type_oid);
- res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_type_array_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typarray")));
@@ -4551,7 +4551,7 @@ binary_upgrade_set_type_oids_by_type_oid(Archive *fout,
"WHERE r.rngtypid = '%u'::pg_catalog.oid;",
pg_type_oid);
- res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_type_multirange_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "oid")));
pg_type_multirange_array_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typarray")));
@@ -4594,7 +4594,7 @@ binary_upgrade_set_type_oids_by_rel_oid(Archive *fout,
"WHERE c.oid = '%u'::pg_catalog.oid;",
pg_rel_oid);
- upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ upgrade_res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_type_oid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "crel")));
@@ -4645,7 +4645,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"WHERE c.oid = '%u'::pg_catalog.oid;",
pg_class_oid);
- upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ upgrade_res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0,
PQfnumber(upgrade_res, "reltoastrelid")));
@@ -4803,7 +4803,7 @@ getNamespaces(Archive *fout, int *numNamespaces)
"FROM pg_namespace",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4916,7 +4916,7 @@ getExtensions(Archive *fout, int *numExtensions)
"FROM pg_extension x "
"JOIN pg_namespace n ON n.oid = x.extnamespace");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -5093,7 +5093,7 @@ getTypes(Archive *fout, int *numTypes)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -5249,7 +5249,7 @@ getOperators(Archive *fout, int *numOprs)
"FROM pg_operator",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numOprs = ntups;
@@ -5336,7 +5336,7 @@ getCollations(Archive *fout, int *numCollations)
"FROM pg_collation",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numCollations = ntups;
@@ -5408,7 +5408,7 @@ getConversions(Archive *fout, int *numConversions)
"FROM pg_conversion",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numConversions = ntups;
@@ -5481,7 +5481,7 @@ getAccessMethods(Archive *fout, int *numAccessMethods)
"amhandler::pg_catalog.regproc AS amhandler "
"FROM pg_am");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numAccessMethods = ntups;
@@ -5552,7 +5552,7 @@ getOpclasses(Archive *fout, int *numOpclasses)
"FROM pg_opclass",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numOpclasses = ntups;
@@ -5635,7 +5635,7 @@ getOpfamilies(Archive *fout, int *numOpfamilies)
"FROM pg_opfamily",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numOpfamilies = ntups;
@@ -5804,7 +5804,7 @@ getAggregates(Archive *fout, int *numAggs)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numAggs = ntups;
@@ -6035,7 +6035,7 @@ getFuncs(Archive *fout, int *numFuncs)
appendPQExpBufferChar(query, ')');
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -6721,7 +6721,7 @@ getTables(Archive *fout, int *numTables)
RELKIND_VIEW, RELKIND_COMPOSITE_TYPE);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -6791,7 +6791,7 @@ getTables(Archive *fout, int *numTables)
resetPQExpBuffer(query);
appendPQExpBufferStr(query, "SET statement_timeout = ");
appendStringLiteralConn(query, dopt->lockWaitTimeout, GetConnection(fout));
- ExecuteSqlStatement(fout, query->data);
+ ExecuteSqlStatementAH(fout, query->data);
}
for (i = 0; i < ntups; i++)
@@ -6915,7 +6915,7 @@ getTables(Archive *fout, int *numTables)
appendPQExpBuffer(query,
"LOCK TABLE %s IN ACCESS SHARE MODE",
fmtQualifiedDumpable(&tblinfo[i]));
- ExecuteSqlStatement(fout, query->data);
+ ExecuteSqlStatementAH(fout, query->data);
}
/* Emit notice if join for owner failed */
@@ -6926,7 +6926,7 @@ getTables(Archive *fout, int *numTables)
if (dopt->lockWaitTimeout)
{
- ExecuteSqlStatement(fout, "SET statement_timeout = 0");
+ ExecuteSqlStatementAH(fout, "SET statement_timeout = 0");
}
PQclear(res);
@@ -7021,7 +7021,7 @@ getInherits(Archive *fout, int *numInherits)
/* find all the inheritance information */
appendPQExpBufferStr(query, "SELECT inhrelid, inhparent FROM pg_inherits");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7365,7 +7365,7 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7512,7 +7512,7 @@ getExtendedStatistics(Archive *fout)
"FROM pg_catalog.pg_statistic_ext",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7610,7 +7610,7 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables)
"WHERE conrelid = '%u'::pg_catalog.oid "
"AND contype = 'f'",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7746,7 +7746,7 @@ getDomainConstraints(Archive *fout, TypeInfo *tyinfo)
"ORDER BY conname",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7839,7 +7839,7 @@ getRules(Archive *fout, int *numRules)
"ORDER BY oid");
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8013,7 +8013,7 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int numTables)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8148,7 +8148,7 @@ getEventTriggers(Archive *fout, int *numEventTriggers)
"ORDER BY e.oid",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8318,7 +8318,7 @@ getProcLangs(Archive *fout, int *numProcLangs)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8427,7 +8427,7 @@ getCasts(Archive *fout, int *numCasts)
"FROM pg_cast ORDER BY 3,4");
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8495,7 +8495,7 @@ get_language_name(Archive *fout, Oid langid)
query = createPQExpBuffer();
appendPQExpBuffer(query, "SELECT lanname FROM pg_language WHERE oid = %u", langid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
lanname = pg_strdup(fmtId(PQgetvalue(res, 0, 0)));
destroyPQExpBuffer(query);
PQclear(res);
@@ -8538,7 +8538,7 @@ getTransforms(Archive *fout, int *numTransforms)
"FROM pg_transform "
"ORDER BY 3,4");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8720,7 +8720,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
"ORDER BY a.attnum",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8797,7 +8797,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
"WHERE adrelid = '%u'::pg_catalog.oid",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_TUPLES_OK);
numDefaults = PQntuples(res);
attrdefs = (AttrDefInfo *) pg_malloc(numDefaults * sizeof(AttrDefInfo));
@@ -8919,7 +8919,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_TUPLES_OK);
numConstrs = PQntuples(res);
if (numConstrs != tbinfo->ncheck)
@@ -9062,7 +9062,7 @@ getTSParsers(Archive *fout, int *numTSParsers)
"prsend::oid, prsheadline::oid, prslextype::oid "
"FROM pg_ts_parser");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSParsers = ntups;
@@ -9146,7 +9146,7 @@ getTSDictionaries(Archive *fout, int *numTSDicts)
"FROM pg_ts_dict",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSDicts = ntups;
@@ -9226,7 +9226,7 @@ getTSTemplates(Archive *fout, int *numTSTemplates)
"tmplnamespace, tmplinit::oid, tmpllexize::oid "
"FROM pg_ts_template");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSTemplates = ntups;
@@ -9302,7 +9302,7 @@ getTSConfigurations(Archive *fout, int *numTSConfigs)
"FROM pg_ts_config",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSConfigs = ntups;
@@ -9455,7 +9455,7 @@ getForeignDataWrappers(Archive *fout, int *numForeignDataWrappers)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numForeignDataWrappers = ntups;
@@ -9603,7 +9603,7 @@ getForeignServers(Archive *fout, int *numForeignServers)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numForeignServers = ntups;
@@ -9742,7 +9742,7 @@ getDefaultACLs(Archive *fout, int *numDefaultACLs)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numDefaultACLs = ntups;
@@ -10098,7 +10098,7 @@ collectComments(Archive *fout, CommentItem **items)
"FROM pg_catalog.pg_description "
"ORDER BY classoid, objoid, objsubid");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
/* Construct lookup table containing OIDs in numeric form */
@@ -10546,7 +10546,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
"ORDER BY oid",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
num = PQntuples(res);
@@ -10684,7 +10684,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
"rngtypid = '%u'",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
qtypname = pg_strdup(fmtId(tyinfo->dobj.name));
qualtypname = pg_strdup(fmtQualifiedDumpable(tyinfo));
@@ -10942,7 +10942,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
"WHERE oid = '%u'::pg_catalog.oid",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
typlen = PQgetvalue(res, 0, PQfnumber(res, "typlen"));
typinput = PQgetvalue(res, 0, PQfnumber(res, "typinput"));
@@ -11165,7 +11165,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.catId.oid);
}
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
typnotnull = PQgetvalue(res, 0, PQfnumber(res, "typnotnull"));
typdefn = PQgetvalue(res, 0, PQfnumber(res, "typdefn"));
@@ -11357,7 +11357,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -11536,7 +11536,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
tyinfo->typrelid);
/* Fetch column attnames */
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
if (ntups < 1)
@@ -12064,7 +12064,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
"WHERE oid = '%u'::pg_catalog.oid",
finfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
proretset = PQgetvalue(res, 0, PQfnumber(res, "proretset"));
prosrc = PQgetvalue(res, 0, PQfnumber(res, "prosrc"));
@@ -12754,7 +12754,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
oprinfo->dobj.catId.oid);
}
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_oprkind = PQfnumber(res, "oprkind");
i_oprcode = PQfnumber(res, "oprcode");
@@ -12977,7 +12977,7 @@ convertTSFunction(Archive *fout, Oid funcOid)
snprintf(query, sizeof(query),
"SELECT '%u'::pg_catalog.regproc", funcOid);
- res = ExecuteSqlQueryForSingleRow(fout, query);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query);
result = pg_strdup(PQgetvalue(res, 0, 0));
@@ -13140,7 +13140,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.catId.oid);
}
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_opcintype = PQfnumber(res, "opcintype");
i_opckeytype = PQfnumber(res, "opckeytype");
@@ -13268,7 +13268,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -13347,7 +13347,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -13530,7 +13530,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
opfinfo->dobj.catId.oid);
}
- res_ops = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res_ops = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
resetPQExpBuffer(query);
@@ -13546,7 +13546,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
"ORDER BY amprocnum",
opfinfo->dobj.catId.oid);
- res_procs = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res_procs = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
/* Get additional fields from the pg_opfamily row */
resetPQExpBuffer(query);
@@ -13557,7 +13557,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
"WHERE oid = '%u'::pg_catalog.oid",
opfinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_amname = PQfnumber(res, "amname");
@@ -13744,7 +13744,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
"WHERE c.oid = '%u'::pg_catalog.oid",
collinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_collprovider = PQfnumber(res, "collprovider");
i_collisdeterministic = PQfnumber(res, "collisdeterministic");
@@ -13861,7 +13861,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
"WHERE c.oid = '%u'::pg_catalog.oid",
convinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_conforencoding = PQfnumber(res, "conforencoding");
i_contoencoding = PQfnumber(res, "contoencoding");
@@ -14078,7 +14078,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
"AND p.oid = '%u'::pg_catalog.oid",
agginfo->aggfn.dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_agginitval = PQfnumber(res, "agginitval");
i_aggminitval = PQfnumber(res, "aggminitval");
@@ -14413,7 +14413,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
"FROM pg_ts_template p, pg_namespace n "
"WHERE p.oid = '%u' AND n.oid = tmplnamespace",
dictinfo->dicttemplate);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
nspname = PQgetvalue(res, 0, 0);
tmplname = PQgetvalue(res, 0, 1);
@@ -14555,7 +14555,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
"FROM pg_ts_parser p, pg_namespace n "
"WHERE p.oid = '%u' AND n.oid = prsnamespace",
cfginfo->cfgparser);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
nspname = PQgetvalue(res, 0, 0);
prsname = PQgetvalue(res, 0, 1);
@@ -14578,7 +14578,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
"ORDER BY m.mapcfg, m.maptokentype, m.mapseqno",
cfginfo->cfgparser, cfginfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_tokenname = PQfnumber(res, "tokenname");
@@ -14742,7 +14742,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
"FROM pg_foreign_data_wrapper w "
"WHERE w.oid = '%u'",
srvinfo->srvfdw);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
fdwname = PQgetvalue(res, 0, 0);
appendPQExpBuffer(q, "CREATE SERVER %s", qsrvname);
@@ -14858,7 +14858,7 @@ dumpUserMappings(Archive *fout,
"ORDER BY usename",
catalogId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_usename = PQfnumber(res, "usename");
@@ -15379,7 +15379,7 @@ collectSecLabels(Archive *fout, SecLabelItem **items)
"FROM pg_catalog.pg_seclabel "
"ORDER BY classoid, objoid, objsubid");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
/* Construct lookup table containing OIDs in numeric form */
i_label = PQfnumber(res, "label");
@@ -15515,7 +15515,7 @@ dumpTable(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
for (i = 0; i < PQntuples(res); i++)
{
@@ -15565,7 +15565,7 @@ createViewAsClause(Archive *fout, TableInfo *tbinfo)
"SELECT pg_catalog.pg_get_viewdef('%u'::pg_catalog.oid) AS viewdef",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -15740,7 +15740,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
"ON (fs.oid = ft.ftserver) "
"WHERE ft.ftrelid = '%u'",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_srvname = PQfnumber(res, "srvname");
i_ftoptions = PQfnumber(res, "ftoptions");
srvname = pg_strdup(PQgetvalue(res, 0, i_srvname));
@@ -16693,7 +16693,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
"pg_catalog.pg_get_statisticsobjdef('%u'::pg_catalog.oid)",
statsextinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
stxdef = PQgetvalue(res, 0, 0);
@@ -17061,7 +17061,7 @@ findLastBuiltinOid_V71(Archive *fout)
PGresult *res;
Oid last_oid;
- res = ExecuteSqlQueryForSingleRow(fout,
+ res = ExecuteSqlQueryForSingleRowAH(fout,
"SELECT datlastsysoid FROM pg_database WHERE datname = current_database()");
last_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "datlastsysoid")));
PQclear(res);
@@ -17130,7 +17130,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
fmtQualifiedDumpable(tbinfo));
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -17353,7 +17353,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
"SELECT last_value, is_called FROM %s",
fmtQualifiedDumpable(tbinfo));
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -17761,7 +17761,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
"SELECT pg_catalog.pg_get_ruledef('%u'::pg_catalog.oid)",
rinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -17891,7 +17891,7 @@ getExtensionMembership(Archive *fout, ExtensionInfo extinfo[],
"AND deptype = 'e' "
"ORDER BY 3");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -18090,7 +18090,7 @@ processExtensionTables(Archive *fout, ExtensionInfo extinfo[],
"AND refclassid = 'pg_extension'::regclass "
"AND classid = 'pg_class'::regclass;");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_conrelid = PQfnumber(res, "conrelid");
@@ -18196,7 +18196,7 @@ getDependencies(Archive *fout)
/* Sort the output for efficiency below */
appendPQExpBufferStr(query, "ORDER BY 1,2");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -18549,7 +18549,7 @@ getFormattedTypeName(Archive *fout, Oid oid, OidOptions opts)
appendPQExpBuffer(query, "SELECT pg_catalog.format_type('%u'::pg_catalog.oid, NULL)",
oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
/* result of format_type is already quoted */
result = pg_strdup(PQgetvalue(res, 0, 0));
--
2.21.1 (Apple Git-122.3)
v31-0003-Creating-query_utils-frontend-utility.patchapplication/octet-stream; name=v31-0003-Creating-query_utils-frontend-utility.patch; x-unix-mode=0644Download
From b7ad987e87283dc9dddc79c399892fa5ff84803b Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:01:02 -0800
Subject: [PATCH v31 3/9] Creating query_utils frontend utility
Moving the ExecuteSqlQuery, ExecuteSqlQueryForSingleRow, and
ExecuteSqlStatement out of the pg_dump project into a new shared
location.
---
src/bin/pg_dump/pg_backup_db.c | 102 +-------------------------
src/bin/pg_dump/pg_backup_db.h | 17 -----
src/fe_utils/Makefile | 1 +
src/fe_utils/query_utils.c | 114 +++++++++++++++++++++++++++++
src/include/fe_utils/query_utils.h | 34 +++++++++
5 files changed, 150 insertions(+), 118 deletions(-)
create mode 100644 src/fe_utils/query_utils.c
create mode 100644 src/include/fe_utils/query_utils.h
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index b55a968da2..38402d0831 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -20,6 +20,7 @@
#include "common/connect.h"
#include "common/string.h"
#include "dumputils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "parallel.h"
#include "pg_backup_archiver.h"
@@ -271,107 +272,6 @@ notice_processor(void *arg, const char *message)
pg_log_generic(PG_LOG_INFO, "%s", message);
}
-/*
- * The exiting query result handler embeds the historical pg_dump behavior
- * under query error conditions, including exiting nicely. The 'conn' object
- * is unused here, but is included in the interface for alternate query result
- * handler implementations.
- *
- * Whether the query was successful is determined by comparing the returned
- * status code against the expected status code, and by comparing the number of
- * tuples returned from the query against expected_ntups. Special negative
- * values of expected_ntups can be used to require at least one row or to
- * disables ntup checking.
- *
- * Exits on failure. On successful query completion, returns the 'res'
- * argument as a notational convenience.
- */
-PGresult *
-exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
- int expected_ntups, const char *query)
-{
- if (PQresultStatus(res) != expected_status)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_error("query was: %s", query);
- PQfinish(conn);
- exit_nicely(1);
- }
- if (expected_ntups == POSITIVE_NTUPS || expected_ntups >= 0)
- {
- int ntups = PQntuples(res);
-
- if (expected_ntups == POSITIVE_NTUPS)
- {
- if (ntups == 0)
- fatal("query returned no rows: %s", query);
- }
- else if (ntups != expected_ntups)
- {
- /*
- * Preserve historical message behavior of spelling "one" as the
- * expected row count.
- */
- if (expected_ntups == 1)
- fatal(ngettext("query returned %d row instead of one: %s",
- "query returned %d rows instead of one: %s",
- ntups),
- ntups, query);
- fatal(ngettext("query returned %d row instead of %d: %s",
- "query returned %d rows instead of %d: %s",
- ntups),
- ntups, expected_ntups, query);
- }
- }
- return res;
-}
-
-/*
- * Executes the given SQL query statement.
- *
- * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
- */
-void
-ExecuteSqlStatement(PGconn *conn, const char *query)
-{
- PQclear(exiting_handler(PQexec(conn, query),
- conn,
- PGRES_COMMAND_OK,
- ANY_NTUPS,
- query));
-}
-
-/*
- * Executes the given SQL query.
- *
- * Invokes the exiting handler unless the given 'status' results.
- *
- * If successful, returns the query result.
- */
-PGresult *
-ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
-{
- return exiting_handler(PQexec(conn, query),
- conn,
- status,
- ANY_NTUPS,
- query);
-}
-
-/*
- * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
- * requires that exactly one row be returned.
- */
-PGresult *
-ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
-{
- return exiting_handler(PQexec(conn, query),
- conn,
- PGRES_TUPLES_OK,
- 1,
- query);
-}
-
void
ExecuteSqlStatementAH(Archive *AHX, const char *query)
{
diff --git a/src/bin/pg_dump/pg_backup_db.h b/src/bin/pg_dump/pg_backup_db.h
index 1aac600ece..018a28908e 100644
--- a/src/bin/pg_dump/pg_backup_db.h
+++ b/src/bin/pg_dump/pg_backup_db.h
@@ -13,23 +13,6 @@
extern int ExecuteSqlCommandBuf(Archive *AHX, const char *buf, size_t bufLen);
-#define POSITIVE_NTUPS (-1)
-#define ANY_NTUPS (-2)
-typedef PGresult *(*PGresultHandler) (PGresult *res,
- PGconn *conn,
- ExecStatusType expected_status,
- int expected_ntups,
- const char *query);
-
-extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
- ExecStatusType expected_status,
- int expected_ntups, const char *query);
-
-extern void ExecuteSqlStatement(PGconn *conn, const char *query);
-extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
- ExecStatusType expected_status);
-extern PGresult *ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query);
-
extern void ExecuteSqlStatementAH(Archive *AHX, const char *query);
extern PGresult *ExecuteSqlQueryAH(Archive *AHX, const char *query,
ExecStatusType status);
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index d6c328faf1..7fdbe08e11 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -27,6 +27,7 @@ OBJS = \
mbprint.o \
print.o \
psqlscan.o \
+ query_utils.o \
recovery_gen.o \
simple_list.o \
string_utils.o
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
new file mode 100644
index 0000000000..b28750f4b2
--- /dev/null
+++ b/src/fe_utils/query_utils.c
@@ -0,0 +1,114 @@
+/*-------------------------------------------------------------------------
+ *
+ * Query executing routines with facilities for modular error handling.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/query_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "fe_utils/exit_utils.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * The exiting query result handler embeds the historical pg_dump behavior
+ * under query error conditions, including exiting nicely. The 'conn' object
+ * is unused here, but is included in the interface for alternate query result
+ * handler implementations.
+ *
+ * Whether the query was successful is determined by comparing the returned
+ * status code against the expected status code, and by comparing the number of
+ * tuples returned from the query against expected_ntups. Special negative
+ * values of expected_ntups can be used to require at least one row or to
+ * disables ntup checking.
+ *
+ * Exits on failure. On successful query completion, returns the 'res'
+ * argument as a notational convenience.
+ */
+PGresult *
+exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
+{
+ if (PQresultStatus(res) != expected_status)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", query);
+ PQfinish(conn);
+ exit_nicely(1);
+ }
+ if (expected_ntups == POSITIVE_NTUPS || expected_ntups >= 0)
+ {
+ int ntups = PQntuples(res);
+
+ if (expected_ntups == POSITIVE_NTUPS)
+ {
+ if (ntups == 0)
+ fatal("query returned no rows: %s", query);
+ }
+ else if (ntups != expected_ntups)
+ {
+ /*
+ * Preserve historical message behavior of spelling "one" as the
+ * expected row count.
+ */
+ if (expected_ntups == 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+ fatal(ngettext("query returned %d row instead of %d: %s",
+ "query returned %d rows instead of %d: %s",
+ ntups),
+ ntups, expected_ntups, query);
+ }
+ }
+ return res;
+}
+
+/*
+ * Executes the given SQL query statement.
+ *
+ * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
+ */
+void
+ExecuteSqlStatement(PGconn *conn, const char *query)
+{
+ PQclear(exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_COMMAND_OK,
+ ANY_NTUPS,
+ query));
+}
+
+/*
+ * Executes the given SQL query.
+ *
+ * Invokes the exiting handler unless the given 'status' results.
+ *
+ * If successful, returns the query result.
+ */
+PGresult *
+ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ status,
+ ANY_NTUPS,
+ query);
+}
+
+/*
+ * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
+ * requires that exactly one row be returned.
+ */
+PGresult *
+ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_TUPLES_OK,
+ 1,
+ query);
+}
diff --git a/src/include/fe_utils/query_utils.h b/src/include/fe_utils/query_utils.h
new file mode 100644
index 0000000000..f03d17b1ed
--- /dev/null
+++ b/src/include/fe_utils/query_utils.h
@@ -0,0 +1,34 @@
+/*-------------------------------------------------------------------------
+ *
+ * Query executing routines with facilities for modular error handling.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/query_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERY_UTILS_H
+#define QUERY_UTILS_H
+
+#include "libpq-fe.h"
+
+#define POSITIVE_NTUPS (-1)
+#define ANY_NTUPS (-2)
+typedef PGresult *(*PGresultHandler) (PGresult *res,
+ PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups,
+ const char *query);
+
+extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+
+extern void ExecuteSqlStatement(PGconn *conn, const char *query);
+extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
+ ExecStatusType expected_status);
+extern PGresult *ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query);
+
+#endif /* QUERY_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v31-0004-Adding-CurrentQueryHandler-logic.patchapplication/octet-stream; name=v31-0004-Adding-CurrentQueryHandler-logic.patch; x-unix-mode=0644Download
From be764e6508541d27bf66e71b933d02cca42de35a Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:25:29 -0800
Subject: [PATCH v31 4/9] Adding CurrentQueryHandler logic.
Extending the default set of PGresultHandlers and creating a
mechanism to switch between them using a new function
ResultHandlerSwitchTo, analogous to MemoryContextSwithTo. In
addition to the exiting_handler already created in a prior commit
(which embeds the historical behavior from pg_dump), adding a
quiet_handler which cleans up and exits without logging anything,
and a noop_handler which does nothing, leaving the responsibility
for cleanup handling to the caller.
---
src/fe_utils/query_utils.c | 81 ++++++++++++++++++++++--------
src/include/fe_utils/query_utils.h | 17 +++++++
2 files changed, 76 insertions(+), 22 deletions(-)
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
index b28750f4b2..355da6edaf 100644
--- a/src/fe_utils/query_utils.c
+++ b/src/fe_utils/query_utils.c
@@ -12,6 +12,12 @@
#include "fe_utils/exit_utils.h"
#include "fe_utils/query_utils.h"
+/*
+ * Global memory.
+ */
+
+PGresultHandler CurrentResultHandler = exiting_handler;
+
/*
* The exiting query result handler embeds the historical pg_dump behavior
* under query error conditions, including exiting nicely. The 'conn' object
@@ -29,7 +35,7 @@
*/
PGresult *
exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
- int expected_ntups, const char *query)
+ int expected_ntups, const char *query)
{
if (PQresultStatus(res) != expected_status)
{
@@ -67,48 +73,79 @@ exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
return res;
}
+/*
+ * Quietly cleans up and exits nicely unless the expected conditions were met.
+ */
+PGresult *
+quiet_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
+{
+ int ntups = PQntuples(res);
+
+ if ((PQresultStatus(res) != expected_status) ||
+ (expected_ntups == POSITIVE_NTUPS && ntups == 0) ||
+ (expected_ntups >= 0 && ntups != expected_ntups))
+ {
+ PQfinish(conn);
+ exit_nicely(1);
+ }
+
+ return res;
+}
+
+/*
+ * Does nothing other than returning the 'res' argument back to the caller.
+ * This handler is intended for callers who prefer to perform the error
+ * handling themselves.
+ */
+PGresult *
+noop_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
+{
+ return res;
+}
+
/*
* Executes the given SQL query statement.
*
- * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
+ * Expects a PGRES_COMMAND_OK status.
*/
void
ExecuteSqlStatement(PGconn *conn, const char *query)
{
- PQclear(exiting_handler(PQexec(conn, query),
- conn,
- PGRES_COMMAND_OK,
- ANY_NTUPS,
- query));
+ PQclear(CurrentResultHandler(PQexec(conn, query),
+ conn,
+ PGRES_COMMAND_OK,
+ ANY_NTUPS,
+ query));
}
/*
* Executes the given SQL query.
*
- * Invokes the exiting handler unless the given 'status' results.
- *
- * If successful, returns the query result.
+ * Expects the given status.
*/
PGresult *
ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
{
- return exiting_handler(PQexec(conn, query),
- conn,
- status,
- ANY_NTUPS,
- query);
+ return CurrentResultHandler(PQexec(conn, query),
+ conn,
+ status,
+ ANY_NTUPS,
+ query);
}
/*
- * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
- * requires that exactly one row be returned.
+ * Executes the given SQL query.
+ *
+ * Expects a PGRES_TUPLES_OK status and precisely one row.
*/
PGresult *
ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
{
- return exiting_handler(PQexec(conn, query),
- conn,
- PGRES_TUPLES_OK,
- 1,
- query);
+ return CurrentResultHandler(PQexec(conn, query),
+ conn,
+ PGRES_TUPLES_OK,
+ 1,
+ query);
}
diff --git a/src/include/fe_utils/query_utils.h b/src/include/fe_utils/query_utils.h
index f03d17b1ed..80958e94fb 100644
--- a/src/include/fe_utils/query_utils.h
+++ b/src/include/fe_utils/query_utils.h
@@ -22,9 +22,26 @@ typedef PGresult *(*PGresultHandler) (PGresult *res,
int expected_ntups,
const char *query);
+extern PGresultHandler CurrentResultHandler;
+
+static inline PGresultHandler
+ResultHandlerSwitchTo(PGresultHandler handler)
+{
+ PGresultHandler old = CurrentResultHandler;
+
+ CurrentResultHandler = handler;
+ return old;
+}
+
extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
ExecStatusType expected_status,
int expected_ntups, const char *query);
+extern PGresult *quiet_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+extern PGresult *noop_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
extern void ExecuteSqlStatement(PGconn *conn, const char *query);
extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
--
2.21.1 (Apple Git-122.3)
v31-0005-Refactoring-pg_dumpall-functions.patchapplication/octet-stream; name=v31-0005-Refactoring-pg_dumpall-functions.patch; x-unix-mode=0644Download
From 68fd186541af667729795561b3ddbe248ed348fe Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:37:42 -0800
Subject: [PATCH v31 5/9] Refactoring pg_dumpall functions.
The functions executeQuery and executeCommand in pg_dumpall.c were
not refactored in prior commits along with functions from
pg_backup_db.c because they were in a separate file, but now that
the infrastructure has been moved to fe_utils/query_utils,
refactoring these two functions to use it.
---
src/bin/pg_dump/pg_dumpall.c | 31 +++----------------------------
1 file changed, 3 insertions(+), 28 deletions(-)
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 85d08ad660..807226537a 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -23,6 +23,7 @@
#include "common/logging.h"
#include "common/string.h"
#include "dumputils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
#include "pg_backup.h"
@@ -1874,21 +1875,8 @@ constructConnStr(const char **keywords, const char **values)
static PGresult *
executeQuery(PGconn *conn, const char *query)
{
- PGresult *res;
-
pg_log_info("executing %s", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_TUPLES_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_error("query was: %s", query);
- PQfinish(conn);
- exit_nicely(1);
- }
-
- return res;
+ return ExecuteSqlQuery(conn, query, PGRES_TUPLES_OK);
}
/*
@@ -1897,21 +1885,8 @@ executeQuery(PGconn *conn, const char *query)
static void
executeCommand(PGconn *conn, const char *query)
{
- PGresult *res;
-
pg_log_info("executing %s", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_COMMAND_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_error("query was: %s", query);
- PQfinish(conn);
- exit_nicely(1);
- }
-
- PQclear(res);
+ PQclear(ExecuteSqlQuery(conn, query, PGRES_COMMAND_OK));
}
--
2.21.1 (Apple Git-122.3)
v31-0006-Refactoring-expand_schema_name_patterns-and-frie.patchapplication/octet-stream; name=v31-0006-Refactoring-expand_schema_name_patterns-and-frie.patch; x-unix-mode=0644Download
From 89cc7fd4276ee9a2e786b889acdc46225b5e00f0 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:51:02 -0800
Subject: [PATCH v31 6/9] Refactoring expand_schema_name_patterns and friends.
Refactoring these functions to take a PGconn pointer rather than an
Archive pointer in preparation for moving these functions to
fe_utils. This is much like what was previously done for
ExecuteSqlQuery and friends, and for the same reasons.
---
src/bin/pg_dump/pg_dump.c | 47 ++++++++++++++++++++++-----------------
1 file changed, 27 insertions(+), 20 deletions(-)
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e8985a834f..41ce4b7866 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,6 +55,7 @@
#include "catalog/pg_type_d.h"
#include "common/connect.h"
#include "dumputils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
#include "libpq/libpq-fs.h"
@@ -147,14 +148,14 @@ static void setup_connection(Archive *AH,
const char *dumpencoding, const char *dumpsnapshot,
char *use_role);
static ArchiveFormat parseArchiveFormat(const char *format, ArchiveMode *mode);
-static void expand_schema_name_patterns(Archive *fout,
+static void expand_schema_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids,
bool strict_names);
-static void expand_foreign_server_name_patterns(Archive *fout,
+static void expand_foreign_server_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids);
-static void expand_table_name_patterns(Archive *fout,
+static void expand_table_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids,
bool strict_names);
@@ -798,13 +799,15 @@ main(int argc, char **argv)
/* Expand schema selection patterns into OID lists */
if (schema_include_patterns.head != NULL)
{
- expand_schema_name_patterns(fout, &schema_include_patterns,
+ expand_schema_name_patterns(GetConnection(fout),
+ &schema_include_patterns,
&schema_include_oids,
strict_names);
if (schema_include_oids.head == NULL)
fatal("no matching schemas were found");
}
- expand_schema_name_patterns(fout, &schema_exclude_patterns,
+ expand_schema_name_patterns(GetConnection(fout),
+ &schema_exclude_patterns,
&schema_exclude_oids,
false);
/* non-matching exclusion patterns aren't an error */
@@ -812,21 +815,25 @@ main(int argc, char **argv)
/* Expand table selection patterns into OID lists */
if (table_include_patterns.head != NULL)
{
- expand_table_name_patterns(fout, &table_include_patterns,
+ expand_table_name_patterns(GetConnection(fout),
+ &table_include_patterns,
&table_include_oids,
strict_names);
if (table_include_oids.head == NULL)
fatal("no matching tables were found");
}
- expand_table_name_patterns(fout, &table_exclude_patterns,
+ expand_table_name_patterns(GetConnection(fout),
+ &table_exclude_patterns,
&table_exclude_oids,
false);
- expand_table_name_patterns(fout, &tabledata_exclude_patterns,
+ expand_table_name_patterns(GetConnection(fout),
+ &tabledata_exclude_patterns,
&tabledata_exclude_oids,
false);
- expand_foreign_server_name_patterns(fout, &foreign_servers_include_patterns,
+ expand_foreign_server_name_patterns(GetConnection(fout),
+ &foreign_servers_include_patterns,
&foreign_servers_include_oids);
/* non-matching exclusion patterns aren't an error */
@@ -1316,7 +1323,7 @@ parseArchiveFormat(const char *format, ArchiveMode *mode)
* and append them to the given OID list.
*/
static void
-expand_schema_name_patterns(Archive *fout,
+expand_schema_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids,
bool strict_names)
@@ -1340,10 +1347,10 @@ expand_schema_name_patterns(Archive *fout,
{
appendPQExpBufferStr(query,
"SELECT oid FROM pg_catalog.pg_namespace n\n");
- processSQLNamePattern(GetConnection(fout), query, cell->val, false,
+ processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "n.nspname", NULL, NULL);
- res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
if (strict_names && PQntuples(res) == 0)
fatal("no matching schemas were found for pattern \"%s\"", cell->val);
@@ -1364,7 +1371,7 @@ expand_schema_name_patterns(Archive *fout,
* and append them to the given OID list.
*/
static void
-expand_foreign_server_name_patterns(Archive *fout,
+expand_foreign_server_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids)
{
@@ -1387,10 +1394,10 @@ expand_foreign_server_name_patterns(Archive *fout,
{
appendPQExpBufferStr(query,
"SELECT oid FROM pg_catalog.pg_foreign_server s\n");
- processSQLNamePattern(GetConnection(fout), query, cell->val, false,
+ processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "s.srvname", NULL, NULL);
- res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) == 0)
fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
@@ -1410,7 +1417,7 @@ expand_foreign_server_name_patterns(Archive *fout,
* in pg_dumpall.c
*/
static void
-expand_table_name_patterns(Archive *fout,
+expand_table_name_patterns(PGconn *conn,
SimpleStringList *patterns, SimpleOidList *oids,
bool strict_names)
{
@@ -1446,13 +1453,13 @@ expand_table_name_patterns(Archive *fout,
RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
RELKIND_PARTITIONED_TABLE);
- processSQLNamePattern(GetConnection(fout), query, cell->val, true,
+ processSQLNamePattern(conn, query, cell->val, true,
false, "n.nspname", "c.relname", NULL,
"pg_catalog.pg_table_is_visible(c.oid)");
- ExecuteSqlStatementAH(fout, "RESET search_path");
- res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
- PQclear(ExecuteSqlQueryForSingleRowAH(fout,
+ ExecuteSqlStatement(conn, "RESET search_path");
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(conn,
ALWAYS_SECURE_SEARCH_PATH_SQL));
if (strict_names && PQntuples(res) == 0)
fatal("no matching tables were found for pattern \"%s\"", cell->val);
--
2.21.1 (Apple Git-122.3)
v31-0007-Moving-pg_dump-functions-to-new-file-option_util.patchapplication/octet-stream; name=v31-0007-Moving-pg_dump-functions-to-new-file-option_util.patch; x-unix-mode=0644Download
From 2f54b0ba329aad1dceec4e09d09f5945ba3033d3 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 14:07:58 -0800
Subject: [PATCH v31 7/9] Moving pg_dump functions to new file option_utils
Moving the recently refactored functions
expand_schema_name_patterns, expand_foreign_server_name_patterns,
and expand_table_name_patterns from pg_dump.c, along with the
function expand_dbname_patterns from pg_dumpall.c, into the new file
fe_utils/option_utils.c
---
src/bin/pg_dump/pg_dump.c | 170 +--------------------
src/bin/pg_dump/pg_dumpall.c | 46 +-----
src/fe_utils/Makefile | 1 +
src/fe_utils/option_utils.c | 225 ++++++++++++++++++++++++++++
src/include/fe_utils/option_utils.h | 35 +++++
5 files changed, 263 insertions(+), 214 deletions(-)
create mode 100644 src/fe_utils/option_utils.c
create mode 100644 src/include/fe_utils/option_utils.h
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 41ce4b7866..c334b9e829 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,6 +55,7 @@
#include "catalog/pg_type_d.h"
#include "common/connect.h"
#include "dumputils.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
@@ -148,17 +149,6 @@ static void setup_connection(Archive *AH,
const char *dumpencoding, const char *dumpsnapshot,
char *use_role);
static ArchiveFormat parseArchiveFormat(const char *format, ArchiveMode *mode);
-static void expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names);
-static void expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids);
-static void expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names);
static NamespaceInfo *findNamespace(Oid nsoid);
static void dumpTableData(Archive *fout, TableDataInfo *tdinfo);
static void refreshMatViewData(Archive *fout, TableDataInfo *tdinfo);
@@ -1318,164 +1308,6 @@ parseArchiveFormat(const char *format, ArchiveMode *mode)
return archiveFormat;
}
-/*
- * Find the OIDs of all schemas matching the given list of patterns,
- * and append them to the given OID list.
- */
-static void
-expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names)
-{
- PQExpBuffer query;
- PGresult *res;
- SimpleStringListCell *cell;
- int i;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * The loop below runs multiple SELECTs might sometimes result in
- * duplicate entries in the OID list, but we don't care.
- */
-
- for (cell = patterns->head; cell; cell = cell->next)
- {
- appendPQExpBufferStr(query,
- "SELECT oid FROM pg_catalog.pg_namespace n\n");
- processSQLNamePattern(conn, query, cell->val, false,
- false, NULL, "n.nspname", NULL, NULL);
-
- res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- if (strict_names && PQntuples(res) == 0)
- fatal("no matching schemas were found for pattern \"%s\"", cell->val);
-
- for (i = 0; i < PQntuples(res); i++)
- {
- simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
- }
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
-/*
- * Find the OIDs of all foreign servers matching the given list of patterns,
- * and append them to the given OID list.
- */
-static void
-expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids)
-{
- PQExpBuffer query;
- PGresult *res;
- SimpleStringListCell *cell;
- int i;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * The loop below runs multiple SELECTs might sometimes result in
- * duplicate entries in the OID list, but we don't care.
- */
-
- for (cell = patterns->head; cell; cell = cell->next)
- {
- appendPQExpBufferStr(query,
- "SELECT oid FROM pg_catalog.pg_foreign_server s\n");
- processSQLNamePattern(conn, query, cell->val, false,
- false, NULL, "s.srvname", NULL, NULL);
-
- res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- if (PQntuples(res) == 0)
- fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
-
- for (i = 0; i < PQntuples(res); i++)
- simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
-/*
- * Find the OIDs of all tables matching the given list of patterns,
- * and append them to the given OID list. See also expand_dbname_patterns()
- * in pg_dumpall.c
- */
-static void
-expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns, SimpleOidList *oids,
- bool strict_names)
-{
- PQExpBuffer query;
- PGresult *res;
- SimpleStringListCell *cell;
- int i;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * this might sometimes result in duplicate entries in the OID list, but
- * we don't care.
- */
-
- for (cell = patterns->head; cell; cell = cell->next)
- {
- /*
- * Query must remain ABSOLUTELY devoid of unqualified names. This
- * would be unnecessary given a pg_table_is_visible() variant taking a
- * search_path argument.
- */
- appendPQExpBuffer(query,
- "SELECT c.oid"
- "\nFROM pg_catalog.pg_class c"
- "\n LEFT JOIN pg_catalog.pg_namespace n"
- "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
- "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
- "\n (array['%c', '%c', '%c', '%c', '%c', '%c'])\n",
- RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
- RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
- RELKIND_PARTITIONED_TABLE);
- processSQLNamePattern(conn, query, cell->val, true,
- false, "n.nspname", "c.relname", NULL,
- "pg_catalog.pg_table_is_visible(c.oid)");
-
- ExecuteSqlStatement(conn, "RESET search_path");
- res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- PQclear(ExecuteSqlQueryForSingleRow(conn,
- ALWAYS_SECURE_SEARCH_PATH_SQL));
- if (strict_names && PQntuples(res) == 0)
- fatal("no matching tables were found for pattern \"%s\"", cell->val);
-
- for (i = 0; i < PQntuples(res); i++)
- {
- simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
- }
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
/*
* checkExtensionMembership
* Determine whether object is an extension member, and if so,
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 807226537a..01db15dfda 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -23,6 +23,7 @@
#include "common/logging.h"
#include "common/string.h"
#include "dumputils.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
@@ -54,8 +55,6 @@ static PGconn *connectDatabase(const char *dbname, const char *connstr, const ch
static char *constructConnStr(const char **keywords, const char **values);
static PGresult *executeQuery(PGconn *conn, const char *query);
static void executeCommand(PGconn *conn, const char *query);
-static void expand_dbname_patterns(PGconn *conn, SimpleStringList *patterns,
- SimpleStringList *names);
static char pg_dump_bin[MAXPGPATH];
static const char *progname;
@@ -1409,49 +1408,6 @@ dumpUserConfig(PGconn *conn, const char *username)
destroyPQExpBuffer(buf);
}
-/*
- * Find a list of database names that match the given patterns.
- * See also expand_table_name_patterns() in pg_dump.c
- */
-static void
-expand_dbname_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleStringList *names)
-{
- PQExpBuffer query;
- PGresult *res;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * The loop below runs multiple SELECTs, which might sometimes result in
- * duplicate entries in the name list, but we don't care, since all we're
- * going to do is test membership of the list.
- */
-
- for (SimpleStringListCell *cell = patterns->head; cell; cell = cell->next)
- {
- appendPQExpBufferStr(query,
- "SELECT datname FROM pg_catalog.pg_database n\n");
- processSQLNamePattern(conn, query, cell->val, false,
- false, NULL, "datname", NULL, NULL);
-
- res = executeQuery(conn, query->data);
- for (int i = 0; i < PQntuples(res); i++)
- {
- simple_string_list_append(names, PQgetvalue(res, i, 0));
- }
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
/*
* Dump contents of databases.
*/
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 7fdbe08e11..eb937e4648 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -25,6 +25,7 @@ OBJS = \
conditional.o \
exit_utils.o \
mbprint.o \
+ option_utils.o \
print.o \
psqlscan.o \
query_utils.o \
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
new file mode 100644
index 0000000000..7893df77aa
--- /dev/null
+++ b/src/fe_utils/option_utils.c
@@ -0,0 +1,225 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command-line option facilities for frontend code
+ *
+ * Functions for converting shell-style patterns into simple lists of Oids for
+ * database objects that match the patterns.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/option_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_class.h"
+#include "common/connect.h"
+#include "fe_utils/exit_utils.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "libpq-fe.h"
+#include "pqexpbuffer.h"
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+void
+expand_schema_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(conn, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"", cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all foreign servers matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+void
+expand_foreign_server_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_foreign_server s\n");
+ processSQLNamePattern(conn, query, cell->val, false,
+ false, NULL, "s.srvname", NULL, NULL);
+
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ if (PQntuples(res) == 0)
+ fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+void
+expand_table_name_patterns(PGconn *conn,
+ SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c', '%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
+ RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(conn, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL,
+ "pg_catalog.pg_table_is_visible(c.oid)");
+
+ ExecuteSqlStatement(conn, "RESET search_path");
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(conn,
+ ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"", cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find a list of database names that match the given patterns.
+ */
+void
+expand_dbname_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleStringList *names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs, which might sometimes result in
+ * duplicate entries in the name list, but we don't care, since all we're
+ * going to do is test membership of the list.
+ */
+
+ for (SimpleStringListCell *cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT datname FROM pg_catalog.pg_database n\n");
+ processSQLNamePattern(conn, query, cell->val, false,
+ false, NULL, "datname", NULL, NULL);
+
+ pg_log_info("executing %s", query->data);
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ for (int i = 0; i < PQntuples(res); i++)
+ {
+ simple_string_list_append(names, PQgetvalue(res, i, 0));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
new file mode 100644
index 0000000000..d626a0bbc9
--- /dev/null
+++ b/src/include/fe_utils/option_utils.h
@@ -0,0 +1,35 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command-line option facilities for frontend code
+ *
+ * Functions for converting shell-style patterns into simple lists of Oids for
+ * database objects that match the patterns.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/option_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OPTION_UTILS_H
+#define OPTION_UTILS_H
+
+#include "fe_utils/simple_list.h"
+#include "libpq-fe.h"
+
+extern void expand_schema_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+extern void expand_foreign_server_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids);
+extern void expand_table_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+extern void expand_dbname_patterns(PGconn *conn, SimpleStringList *patterns,
+ SimpleStringList *names);
+
+#endif /* OPTION_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v31-0008-Normalizing-option_utils-interface.patchapplication/octet-stream; name=v31-0008-Normalizing-option_utils-interface.patch; x-unix-mode=0644Download
From e0eb07b74c78a6259dbe8546c6d03699ec1e0fbd Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 14:37:44 -0800
Subject: [PATCH v31 8/9] Normalizing option_utils interface.
The functions in option_utils were copied from pg_dump, mostly preserving
the function signatures. But the signatures and corresponding functionality
were originally written based solely on pg_dump's needs, not with the goal
of creating a consistent interface. Fixing that.
---
src/bin/pg_dump/pg_dump.c | 58 +++++++++++++++++++------
src/bin/pg_dump/pg_dumpall.c | 4 +-
src/fe_utils/option_utils.c | 66 ++++++++++++++++++++---------
src/fe_utils/string_utils.c | 50 ++++++++++++++++++++++
src/include/fe_utils/option_utils.h | 29 +++++++++----
src/include/fe_utils/string_utils.h | 6 +++
6 files changed, 169 insertions(+), 44 deletions(-)
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c334b9e829..5c446c0f24 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -125,6 +125,17 @@ static SimpleOidList tabledata_exclude_oids = {NULL, NULL};
static SimpleStringList foreign_servers_include_patterns = {NULL, NULL};
static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
+/*
+ * Cstring list of relkinds which qualify as tables for our purposes when
+ * processing table inclusion or exclusion patterns.
+ */
+#define TABLE_RELKIND_LIST CppAsString2(RELKIND_RELATION) ", " \
+ CppAsString2(RELKIND_SEQUENCE) ", " \
+ CppAsString2(RELKIND_VIEW) ", " \
+ CppAsString2(RELKIND_MATVIEW) ", " \
+ CppAsString2(RELKIND_FOREIGN_TABLE) ", " \
+ CppAsString2(RELKIND_PARTITIONED_TABLE)
+
static const CatalogId nilCatalogId = {0, 0};
/* override for standard extra_float_digits setting */
@@ -791,6 +802,7 @@ main(int argc, char **argv)
{
expand_schema_name_patterns(GetConnection(fout),
&schema_include_patterns,
+ NULL,
&schema_include_oids,
strict_names);
if (schema_include_oids.head == NULL)
@@ -798,6 +810,7 @@ main(int argc, char **argv)
}
expand_schema_name_patterns(GetConnection(fout),
&schema_exclude_patterns,
+ NULL,
&schema_exclude_oids,
false);
/* non-matching exclusion patterns aren't an error */
@@ -805,26 +818,43 @@ main(int argc, char **argv)
/* Expand table selection patterns into OID lists */
if (table_include_patterns.head != NULL)
{
- expand_table_name_patterns(GetConnection(fout),
- &table_include_patterns,
- &table_include_oids,
- strict_names);
+ expand_rel_name_patterns(GetConnection(fout),
+ &table_include_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_include_oids,
+ strict_names,
+ true);
if (table_include_oids.head == NULL)
fatal("no matching tables were found");
}
- expand_table_name_patterns(GetConnection(fout),
- &table_exclude_patterns,
- &table_exclude_oids,
- false);
-
- expand_table_name_patterns(GetConnection(fout),
- &tabledata_exclude_patterns,
- &tabledata_exclude_oids,
- false);
+ expand_rel_name_patterns(GetConnection(fout),
+ &table_exclude_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_exclude_oids,
+ false,
+ true);
+
+ expand_rel_name_patterns(GetConnection(fout),
+ &tabledata_exclude_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &tabledata_exclude_oids,
+ false,
+ true);
expand_foreign_server_name_patterns(GetConnection(fout),
&foreign_servers_include_patterns,
- &foreign_servers_include_oids);
+ NULL,
+ &foreign_servers_include_oids,
+ true);
/* non-matching exclusion patterns aren't an error */
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 01db15dfda..2b3a4e3349 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -471,8 +471,8 @@ main(int argc, char *argv[])
/*
* Get a list of database names that match the exclude patterns
*/
- expand_dbname_patterns(conn, &database_exclude_patterns,
- &database_exclude_names);
+ expand_dbname_patterns(conn, &database_exclude_patterns, NULL,
+ &database_exclude_names, false);
/*
* Open the output file if required, otherwise use stdout
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
index 7893df77aa..76ca456784 100644
--- a/src/fe_utils/option_utils.c
+++ b/src/fe_utils/option_utils.c
@@ -14,6 +14,7 @@
*/
#include "postgres_fe.h"
+#include "catalog/pg_am.h"
#include "catalog/pg_class.h"
#include "common/connect.h"
#include "fe_utils/exit_utils.h"
@@ -30,7 +31,8 @@
*/
void
expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
SimpleOidList *oids,
bool strict_names)
{
@@ -55,6 +57,7 @@ expand_schema_name_patterns(PGconn *conn,
"SELECT oid FROM pg_catalog.pg_namespace n\n");
processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "n.nspname", NULL, NULL);
+ exclude_filter(query, "n.oid", exclude_oids);
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
if (strict_names && PQntuples(res) == 0)
@@ -78,8 +81,10 @@ expand_schema_name_patterns(PGconn *conn,
*/
void
expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids)
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names)
{
PQExpBuffer query;
PGresult *res;
@@ -102,9 +107,10 @@ expand_foreign_server_name_patterns(PGconn *conn,
"SELECT oid FROM pg_catalog.pg_foreign_server s\n");
processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "s.srvname", NULL, NULL);
+ exclude_filter(query, "s.oid", exclude_oids);
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- if (PQntuples(res) == 0)
+ if (strict_names && PQntuples(res) == 0)
fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
for (i = 0; i < PQntuples(res); i++)
@@ -118,18 +124,32 @@ expand_foreign_server_name_patterns(PGconn *conn,
}
/*
- * Find the OIDs of all tables matching the given list of patterns,
- * and append them to the given OID list.
+ * Find the OIDs of all relations matching the given list of patterns
+ * and restrictions, and append them to the given OID list.
*/
void
-expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns, SimpleOidList *oids,
- bool strict_names)
+expand_rel_name_patterns(PGconn *conn,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ const char *relkinds,
+ char amtype,
+ SimpleOidList *oids,
+ bool strict_names,
+ bool restrict_visible)
{
PQExpBuffer query;
PGresult *res;
SimpleStringListCell *cell;
int i;
+ const char *visibility_rule;
+
+ Assert(amtype == AMTYPE_TABLE || amtype == AMTYPE_INDEX);
+
+ if (restrict_visible)
+ visibility_rule = "pg_catalog.pg_table_is_visible(c.oid)";
+ else
+ visibility_rule = NULL;
if (patterns->head == NULL)
return; /* nothing to do */
@@ -154,20 +174,22 @@ expand_table_name_patterns(PGconn *conn,
"\n LEFT JOIN pg_catalog.pg_namespace n"
"\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
"\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
- "\n (array['%c', '%c', '%c', '%c', '%c', '%c'])\n",
- RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
- RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
- RELKIND_PARTITIONED_TABLE);
+ "\n (array[%s])\n", relkinds);
processSQLNamePattern(conn, query, cell->val, true,
- false, "n.nspname", "c.relname", NULL,
- "pg_catalog.pg_table_is_visible(c.oid)");
+ false, "n.nspname", "c.relname", NULL, visibility_rule);
+ exclude_filter(query, "n.oid", exclude_nsp_oids);
+ exclude_filter(query, "c.oid", exclude_oids);
ExecuteSqlStatement(conn, "RESET search_path");
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
PQclear(ExecuteSqlQueryForSingleRow(conn,
ALWAYS_SECURE_SEARCH_PATH_SQL));
if (strict_names && PQntuples(res) == 0)
- fatal("no matching tables were found for pattern \"%s\"", cell->val);
+ {
+ if (amtype == AMTYPE_TABLE)
+ fatal("no matching tables were found for pattern \"%s\"", cell->val);
+ fatal("no matching indexes were found for pattern \"%s\"", cell->val);
+ }
for (i = 0; i < PQntuples(res); i++)
{
@@ -186,8 +208,10 @@ expand_table_name_patterns(PGconn *conn,
*/
void
expand_dbname_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleStringList *names)
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleStringList *names,
+ bool strict_names)
{
PQExpBuffer query;
PGresult *res;
@@ -206,12 +230,16 @@ expand_dbname_patterns(PGconn *conn,
for (SimpleStringListCell *cell = patterns->head; cell; cell = cell->next)
{
appendPQExpBufferStr(query,
- "SELECT datname FROM pg_catalog.pg_database n\n");
+ "SELECT datname FROM pg_catalog.pg_database d\n");
processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "datname", NULL, NULL);
+ exclude_filter(query, "d.oid", exclude_oids);
pg_log_info("executing %s", query->data);
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching databases were found for pattern \"%s\"", cell->val);
+
for (int i = 0; i < PQntuples(res); i++)
{
simple_string_list_append(names, PQgetvalue(res, i, 0));
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index a1a9d691d5..4e57a6f940 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -797,6 +797,56 @@ appendReloptionsArray(PQExpBuffer buffer, const char *reloptions,
return true;
}
+/*
+ * Internal implementation of include_filter and exclude_filter.
+ */
+static void
+apply_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids,
+ bool include)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ if (!oids || !oids->head)
+ return;
+
+ if (include)
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.=) ANY(array[", lval);
+ else
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.!=) ALL(array[", lval);
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(querybuf, "%s%u", comma, cell->val);
+ appendPQExpBuffer(querybuf, "]::OID[])");
+}
+
+/*
+ * Conditionally add a restriction to a query such that lval must be an Oid in
+ * the given list of Oids, except that for a null or empty oids list argument,
+ * no filtering is done and we return without having modified the query buffer.
+ *
+ * The query argument must already have begun the WHERE clause and must be in a
+ * state where we can append an AND clause. No checking of this requirement is
+ * done here.
+ *
+ * On return, the query buffer will be extended with an AND clause that filters
+ * only those rows where the lval is an Oid present in the given list of oids.
+ */
+void
+include_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, true);
+}
+
+/*
+ * Same as include_filter, above, except that for a non-null, non-empty oids
+ * list, the lval is restricted to not be any of the values in the list.
+ */
+void
+exclude_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, false);
+}
/*
* processSQLNamePattern
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
index d626a0bbc9..53da30754f 100644
--- a/src/include/fe_utils/option_utils.h
+++ b/src/include/fe_utils/option_utils.h
@@ -19,17 +19,28 @@
#include "libpq-fe.h"
extern void expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
SimpleOidList *oids,
bool strict_names);
extern void expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids);
-extern void expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names);
-extern void expand_dbname_patterns(PGconn *conn, SimpleStringList *patterns,
- SimpleStringList *names);
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+extern void expand_rel_name_patterns(PGconn *conn,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ const char *relkinds,
+ char amtype,
+ SimpleOidList *oids,
+ bool strict_names,
+ bool restrict_visible);
+extern void expand_dbname_patterns(PGconn *conn,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleStringList *names,
+ bool strict_names);
#endif /* OPTION_UTILS_H */
diff --git a/src/include/fe_utils/string_utils.h b/src/include/fe_utils/string_utils.h
index c290c302f5..301a8eef4d 100644
--- a/src/include/fe_utils/string_utils.h
+++ b/src/include/fe_utils/string_utils.h
@@ -16,6 +16,7 @@
#ifndef STRING_UTILS_H
#define STRING_UTILS_H
+#include "fe_utils/simple_list.h"
#include "libpq-fe.h"
#include "pqexpbuffer.h"
@@ -50,6 +51,11 @@ extern bool parsePGArray(const char *atext, char ***itemarray, int *nitems);
extern bool appendReloptionsArray(PQExpBuffer buffer, const char *reloptions,
const char *prefix, int encoding, bool std_strings);
+extern void include_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids);
+extern void exclude_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids);
+
extern bool processSQLNamePattern(PGconn *conn, PQExpBuffer buf,
const char *pattern,
bool have_where, bool force_escape,
--
2.21.1 (Apple Git-122.3)
v31-0009-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v31-0009-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 453dd48d1e9cee45808abddddb563e7a42a76086 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 15:47:08 -0800
Subject: [PATCH v31 9/9] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 28 +
contrib/pg_amcheck/pg_amcheck.c | 885 +++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.control | 5 +
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 60 ++
contrib/pg_amcheck/t/003_check.pl | 248 ++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 496 ++++++++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 ++
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 493 ++++++++++++
13 files changed, 2282 insertions(+)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.control
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 7a4866e338..0fd4125902 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..c21a14de31
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..74554b9e8d
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,28 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..aa40c6247e
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,885 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "common/string.h"
+#include "common/username.h"
+#include "fe_utils/exit_utils.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/print.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "getopt_long.h"
+#include "pg_getopt.h"
+#include "storage/block.h"
+
+typedef struct ConnectOptions
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes */
+ bool check_toast; /* Check associated toast tables and indexes */
+ bool check_corrupt; /* Check indexes even if table is corrupt */
+ bool heapallindexed; /* Perform index to table reconciling checks */
+ bool rootdescend; /* Perform index rootdescend checks */
+ bool verbose;
+ long startblock; /* Block number where checking begins */
+ long endblock; /* Block number where checking ends, inclusive */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/* Connection to backend */
+static PGconn *conn;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of Oids of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+static SimpleStringList index_include_patterns = {NULL, NULL};
+static SimpleOidList index_include_oids = {NULL, NULL};
+static SimpleStringList index_exclude_patterns = {NULL, NULL};
+static SimpleOidList index_exclude_oids = {NULL, NULL};
+
+/*
+ * Cstring list of relkinds which qualify as tables for our purposes when
+ * processing table inclusion or exclusion patterns.
+ */
+#define TABLE_RELKIND_LIST CppAsString2(RELKIND_RELATION) ", " \
+ CppAsString2(RELKIND_MATVIEW) ", " \
+ CppAsString2(RELKIND_PARTITIONED_TABLE)
+
+#define INDEX_RELKIND_LIST CppAsString2(RELKIND_INDEX)
+
+/*
+ * List of main tables to be checked, compiled from above lists, and
+ * corresponding list of toast tables. The lists should always be
+ * the same length, with InvalidOid in the toastlist for main relations
+ * without a corresponding toast relation.
+ */
+static SimpleOidList mainlist = {NULL, NULL};
+static SimpleOidList toastlist = {NULL, NULL};
+
+
+/*
+ * Functions for running the various corruption checks.
+ */
+static void check_tables(SimpleOidList *checklist);
+static uint64 check_table(Oid tbloid, long startblock, long endblock,
+ bool on_error_stop, bool check_toast);
+static uint64 check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids);
+static uint64 check_index(const char *idxoid, const char *idxname,
+ const char *tblname);
+
+/*
+ * Functions implementing standard command line behaviors.
+ */
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+static void NoticeProcessor(void *arg, const char *message);
+
+static void get_table_check_list(const SimpleOidList *include_nsp,
+ const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl,
+ const SimpleOidList *exclude_tbl);
+
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char *password = NULL;
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ conn = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ settings.startblock = -1;
+ settings.endblock = -1;
+
+ /*
+ * Default behaviors for user settable options. Note that these default
+ * to doing all the safe checks and none of the unsafe ones, on the theory
+ * that if a user says "pg_amcheck mydb" without specifying any additional
+ * options, we should check everything we know how to check without
+ * risking any backend aborts.
+ */
+
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+
+ /* Index checking options */
+ settings.check_indexes = true;
+ settings.check_corrupt = true;
+ settings.heapallindexed = false;
+ settings.rootdescend = false;
+
+ /*
+ * Reconciling toasted attributes from the main table against the toast
+ * table can crash the backend if the toast table or index are corrupt.
+ * We can optionally check the toast table and then the toast index prior
+ * to checking the main table, but if the toast table or index are
+ * concurrently corrupted after we conclude they are valid, the check of
+ * the main table can crash the backend. The oneous is on any caller who
+ * enables this option to make certain the environment is sufficiently
+ * stable that concurrent corruptions of the toast is not possible.
+ */
+ settings.check_toast = false;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ password = simple_prompt("Password: ", false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ conn = PQconnectdbParams(keywords, values, true);
+ if (!conn)
+ fatal("could not connect to database %s: out of memory", values[4]);
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(conn) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(conn) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(conn);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf("Password for user %s: ",
+ realusername);
+ else
+ password_prompt = pg_strdup("Password: ");
+ PQfinish(conn);
+
+ password = simple_prompt(password_prompt, false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+
+ if (!new_pass && PQstatus(conn) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to database %s: %s",
+ values[4], PQerrorMessage(conn));
+ PQfinish(conn);
+ exit(1);
+ }
+ } while (new_pass);
+
+ if (settings.verbose)
+ PQsetErrorVerbosity(conn, PQERRORS_VERBOSE);
+
+ /*
+ * Expand schema, table, and index exclusion patterns, if any. Note that
+ * non-matching exclusion patterns are not an error, even when
+ * --strict-names was specified.
+ */
+ expand_schema_name_patterns(conn,
+ &schema_exclude_patterns,
+ NULL,
+ &schema_exclude_oids,
+ false);
+ expand_rel_name_patterns(conn,
+ &table_exclude_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_exclude_oids,
+ false,
+ false);
+ expand_rel_name_patterns(conn,
+ &index_exclude_patterns,
+ NULL,
+ NULL,
+ INDEX_RELKIND_LIST,
+ AMTYPE_INDEX,
+ &index_exclude_oids,
+ false,
+ false);
+
+ /* Expand schema selection patterns into Oid lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(conn,
+ &schema_include_patterns,
+ &schema_exclude_oids,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+
+ /* Expand table selection patterns into Oid lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_rel_name_patterns(conn,
+ &table_include_patterns,
+ &schema_exclude_oids,
+ &table_exclude_oids,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_include_oids,
+ settings.strict_names,
+ false);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+
+ /* Expand index selection patterns into Oid lists */
+ if (index_include_patterns.head != NULL)
+ {
+ expand_rel_name_patterns(conn,
+ &index_include_patterns,
+ &schema_exclude_oids,
+ &index_exclude_oids,
+ INDEX_RELKIND_LIST,
+ AMTYPE_INDEX,
+ &index_include_oids,
+ settings.strict_names,
+ false);
+ if (index_include_oids.head == NULL)
+ fatal("no matching indexes were found");
+ }
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids);
+
+ PQsetNoticeProcessor(conn, NoticeProcessor, NULL);
+
+ if (settings.check_toast)
+ check_tables(&toastlist);
+ check_tables(&mainlist);
+
+ return 0;
+}
+
+/*
+ * Check each table from the given checklist per the user specified options.
+ */
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ uint64 corruptions = 0;
+
+ if (!OidIsValid(cell->val))
+ continue;
+
+ corruptions = check_table(cell->val,
+ settings.startblock,
+ settings.endblock,
+ settings.on_error_stop,
+ settings.check_toast);
+
+ if (settings.check_indexes)
+ {
+ /* Optionally skip the index checks for a corrupt table. */
+ if (corruptions && !settings.check_corrupt)
+ continue;
+
+ corruptions += check_indexes(cell->val,
+ &index_include_oids,
+ &index_exclude_oids);
+ }
+ }
+}
+
+/*
+ * Checks the given table for corruption, returning the number of corruptions
+ * detected and printed to the user.
+ */
+static uint64
+check_table(Oid tbloid, long startblock, long endblock,
+ bool on_error_stop, bool check_toast)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *skip;
+ char *toast;
+ const char *stop;
+ uint64 corruption_cnt = 0;
+
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("'none'");
+ stop = on_error_stop ? "true" : "false";
+ toast = check_toast ? "true" : "false";
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg "
+ "FROM public.verify_heapam("
+ "relation := %u, "
+ "on_error_stop := %s, "
+ "skip := %s, "
+ "check_toast := %s, ",
+ tbloid, stop, skip, toast);
+ if (startblock < 0)
+ appendPQExpBuffer(querybuf, "startblock := NULL, ");
+ else
+ appendPQExpBuffer(querybuf, "startblock := %ld, ", startblock);
+
+ if (endblock < 0)
+ appendPQExpBuffer(querybuf, "endblock := NULL");
+ else
+ appendPQExpBuffer(querybuf, "endblock := %ld", endblock);
+
+ appendPQExpBuffer(querybuf, ") v, pg_catalog.pg_class c "
+ "WHERE c.oid = %u", tbloid);
+
+ res = PQexec(conn, querybuf->data);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ {
+ corruption_cnt += PQntuples(res);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (!PQgetisnull(res, i, 3))
+ printf("relation %s, block %s, offset %s, attribute %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ else if (!PQgetisnull(res, i, 2))
+ printf("relation %s, block %s, offset %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ else if (!PQgetisnull(res, i, 1))
+ printf("relation %s, block %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 4)); /* msg */
+ else if (!PQgetisnull(res, i, 0))
+ printf("relation %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 4)); /* msg */
+ else
+ printf("%s\n", PQgetvalue(res, i, 4)); /* msg */
+ }
+ }
+ else if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("relation with OID %u\n %s\n", tbloid, PQerrorMessage(conn));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+ return corruption_cnt;
+}
+
+static uint64
+check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ uint64 corruption_cnt = 0;
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT i.indexrelid, ci.relname, ct.relname"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class ci, "
+ "pg_catalog.pg_class ct"
+ "\nWHERE i.indexrelid = ci.oid"
+ "\n AND i.indrelid = ct.oid"
+ "\n AND ci.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+ include_filter(querybuf, "i.indexrelid", include_oids);
+ exclude_filter(querybuf, "i.indexrelid", exclude_oids);
+
+ res = PQexec(conn, querybuf->data);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ for (i = 0; i < PQntuples(res); i++)
+ corruption_cnt += check_index(PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2));
+ }
+ else
+ {
+ corruption_cnt++;
+ printf("%s\n", PQerrorMessage(conn));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static uint64
+check_index(const char *idxoid, const char *idxname, const char *tblname)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ uint64 corruption_cnt = 0;
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT public.bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
+ res = PQexec(conn, querybuf->data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("index check failed for index %s of table %s:\n",
+ idxname, tblname);
+ printf("%s", PQerrorMessage(conn));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"check-toast", no_argument, NULL, 'z'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"heapallindexed", no_argument, NULL, 'a'},
+ {"help", optional_argument, NULL, '?'},
+ {"host", required_argument, NULL, 'h'},
+ {"index", required_argument, NULL, 'i'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"password", no_argument, NULL, 'W'},
+ {"port", required_argument, NULL, 'p'},
+ {"rootdescend", no_argument, NULL, 'r'},
+ {"schema", required_argument, NULL, 'n'},
+ {"skip", required_argument, NULL, 'S'},
+ {"skip-corrupt", no_argument, NULL, 'C'},
+ {"skip-indexes", no_argument, NULL, 'X'},
+ {"skip-toast", no_argument, NULL, 'Z'},
+ {"startblock", required_argument, NULL, 'b'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"toast-endblock", required_argument, NULL, 'E'},
+ {"toast-startblock", required_argument, NULL, 'B'},
+ {"username", required_argument, NULL, 'U'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"version", no_argument, NULL, 'V'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "ab:Cd:e:h:i:I:n:N:op:rsS:t:T:U:vVwWXzZ?1",
+ long_options, &optindex)) != -1)
+ {
+ char *endptr;
+ switch (c)
+ {
+ case 'a':
+ settings.heapallindexed = true;
+ break;
+ case 'b':
+ settings.startblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ fatal("relation starting block argument contains garbage characters");
+ if (settings.startblock > (long)MaxBlockNumber)
+ fatal("relation starting block argument out of bounds");
+ break;
+ case 'C':
+ settings.check_corrupt = false;
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ fatal("relation ending block argument contains garbage characters");
+ if (settings.endblock > (long)MaxBlockNumber)
+ fatal("relation ending block argument out of bounds");
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ simple_string_list_append(&index_include_patterns, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&index_exclude_patterns, optarg);
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 'S':
+ if (pg_strcasecmp(optarg, "all-visible") == 0)
+ {
+ settings.skip_visible = true;
+ settings.skip_frozen = false;
+ }
+ else if (pg_strcasecmp(optarg, "all-frozen") == 0)
+ {
+ settings.skip_frozen = true;
+ settings.skip_visible = false;
+ }
+ else
+ {
+ exit(EXIT_FAILURE);
+ pg_log_error("invalid skip option");
+ }
+ break;
+ case 'r':
+ settings.rootdescend = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case 'X':
+ settings.check_indexes = false;
+ break;
+ case 'v':
+ settings.verbose = true;
+ break;
+ case 'z':
+ settings.check_toast = true;
+ break;
+ case 'Z':
+ settings.check_toast = false;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, "Try \"%s --help\" for more information.\n",
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+ if (settings.endblock >= 0 && settings.endblock < settings.startblock)
+ fatal("relation ending block argument precedes starting block argument");
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ printf("pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.\n");
+ printf("\n");
+ printf("Usage:\n");
+ printf(" pg_amcheck [OPTION]... [DBNAME [USERNAME]]\n");
+ printf("\n");
+ printf("General options:\n");
+ printf(" -V, --version output version information, then exit\n");
+ printf(" -?, --help show this help, then exit\n");
+ printf(" -s, --strict-names require include patterns to match at least one entity each\n");
+ printf(" -o, --on-error-stop stop checking at end of first corrupt page\n");
+ printf(" -v, --verbose output verbose messages\n");
+ printf("\n");
+ printf("Schema checking options:\n");
+ printf(" -n, --schema=PATTERN check relations in the specified schema(s) only\n");
+ printf(" -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)\n");
+ printf("\n");
+ printf("Table checking options:\n");
+ printf(" -t, --table=PATTERN check the specified table(s) only\n");
+ printf(" -T, --exclude-table=PATTERN do NOT check the specified table(s)\n");
+ printf(" -b, --startblock begin checking table(s) at the given starting block number\n");
+ printf(" -e, --endblock check table(s) only up to the given ending block number\n");
+ printf(" -S, --skip=OPTION do NOT check \"all-frozen\" or \"all-visible\" blocks\n");
+ printf("\n");
+ printf("TOAST table checking options:\n");
+ printf(" -z, --check-toast check associated toast tables and toast indexes\n");
+ printf(" -Z, --skip-toast do NOT check associated toast tables and toast indexes\n");
+ printf("\n");
+ printf("Index checking options:\n");
+ printf(" -X, --skip-indexes do NOT check any btree indexes\n");
+ printf(" -i, --index=PATTERN check the specified index(es) only\n");
+ printf(" -I, --exclude-index=PATTERN do NOT check the specified index(es)\n");
+ printf(" -C, --skip-corrupt do NOT check indexes if their associated table is corrupt\n");
+ printf(" -a, --heapallindexed check index tuples against the table tuples\n");
+ printf(" -r, --rootdescend search from the root page for each index tuple\n");
+ printf("\n");
+ printf("Connection options:\n");
+ printf(" -d, --dbname=DBNAME database name to connect to\n");
+ printf(" -h, --host=HOSTNAME database server host or socket directory\n");
+ printf(" -p, --port=PORT database server port\n");
+ printf(" -U, --username=USERNAME database user name\n");
+ printf(" -w, --no-password never prompt for password\n");
+ printf(" -W, --password force password prompt (should happen automatically)\n");
+ printf("\n");
+ printf("Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+static void
+get_table_check_list(const SimpleOidList *include_nsp, const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl, const SimpleOidList *exclude_tbl)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid, c.reltoastrelid"
+ "\nFROM pg_catalog.pg_class c, pg_catalog.pg_namespace n"
+ "\nWHERE n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\n AND c.relkind OPERATOR(pg_catalog.=) ANY(ARRAY[%s])\n",
+ TABLE_RELKIND_LIST);
+ include_filter(querybuf, "n.oid", include_nsp);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+ include_filter(querybuf, "c.oid", include_tbl);
+ exclude_filter(querybuf, "c.oid", exclude_tbl);
+
+ res = ExecuteSqlQuery(conn, querybuf->data, PGRES_TUPLES_OK);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(&mainlist, atooid(PQgetvalue(res, i, 0)));
+ simple_oid_list_append(&toastlist, atooid(PQgetvalue(res, i, 1)));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.control b/contrib/pg_amcheck/pg_amcheck.control
new file mode 100644
index 0000000000..395f368101
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.control
@@ -0,0 +1,5 @@
+# pg_amcheck extension
+comment = 'command-line tool for verifying relation integrity'
+default_version = '1.3'
+module_pathname = '$libdir/pg_amcheck'
+relocatable = true
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..189f05ef0a
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,60 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to database qqq: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user', 'postgres' ],
+ qr/\Qpg_amcheck: error: could not connect to database postgres: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-i', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching indexes were found for pattern\E/,
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..30bbbdeddd
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,248 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 45;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($)
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the node is running.
+sub corrupt_first_page($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+-- We'll corrupt all indexes in s1
+CREATE SCHEMA s1;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s1.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables in s2
+CREATE SCHEMA s2;
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+INSERT INTO s2.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s2.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables and indexes in s3
+CREATE SCHEMA s3;
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+INSERT INTO s3.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s3.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll leave everything in s4 uncorrupted
+CREATE SCHEMA s4;
+CREATE TABLE s4.t1 (a TEXT);
+CREATE TABLE s4.t2 (a TEXT);
+CREATE INDEX i1 ON s4.t1(a);
+CREATE INDEX i2 ON s4.t2(a);
+INSERT INTO s4.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s4.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('s1.i1');
+corrupt_first_page('s1.i2');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('s2.t1');
+corrupt_first_page('s2.t2');
+
+# Corrupt tables and indexes in schema "s3"
+remove_relation_file('s3.i1');
+corrupt_first_page('s3.i2');
+remove_relation_file('s3.t1');
+corrupt_first_page('s3.t2');
+
+# Leave schema "s4" alone
+
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's*', '-t', 't1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s1.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+
+# In schema s3, the tables and indexes are both corrupt. Ordinarily, checking
+# of indexes will not be performed for corrupt tables, but the --check-corrupt
+# option (-c) forces the indexes to also be checked.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's3', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s3.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's3', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s3.s2 reports index corruption');
+
+
+# Check that '-x' and '-X' work as expected. Since only index corruption
+# (and not table corruption) exists in s1, '-X' should give no errors, and
+# '-x' should give errors about index corruption.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck over tables and indexes in schema s1 reports corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over only tables in schema s1 reports no corruption');
+
+
+# Check that table corruption is reported as expected, with or without
+# index checking
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables in schema s2 reports table corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables and indexes in schema s2 reports table corruption');
+
+# Check that no corruption is reported in schema s4
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s4 reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt schemas
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-N', 's1', '-N', 's2', '-N', 's3' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt schemas reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt tables
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt tables reports no corruption');
+
+# Check errors about bad block range command line arguments. We use schema s4
+# to avoid getting messages about corrupt tables or indexes.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4', '-b', 'junk' ],
+ qr/\Qpg_amcheck: error: relation starting block argument contains garbage characters\E/,
+ 'pg_amcheck rejects garbage startblock');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4', '-e', '1234junk' ],
+ qr/\Qpg_amcheck: error: relation ending block argument contains garbage characters\E/,
+ 'pg_amcheck rejects garbage endblock');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4', '-b', '5', '-e', '4' ],
+ qr/\Qpg_amcheck: error: relation ending block argument precedes starting block argument\E/,
+ 'pg_amcheck rejects invalid block range');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..271eca7da6
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,496 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper function to generate a regular expression matching the header we
+# expect verify_heapam() to return given which fields we expect to be non-null.
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ return qr/relation test, block $blkno, offset $offnum, attribute $attnum\s+/ms
+ if (defined $attnum);
+ return qr/relation test, block $blkno, offset $offnum\s+/ms
+ if (defined $offnum);
+ return qr/relation test\s+/ms
+ if (defined $blkno);
+ return qr/relation test\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, undef);
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..c24f154883
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index ae2759be55..797b4dc61e 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -119,6 +119,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&oldsnapshot;
&pageinspect;
&passwordcheck;
+ &pgamcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a4e1b28b38 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..00643d2e58
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,493 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<sect1 id="pgamcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> and <function>verify_heapam</function>
+ functions.
+ </para>
+
+<synopsis>
+pg_amcheck mydb
+</synopsis>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ The following command-line options for controlling general program behavior
+ are recognized.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Show <application>pg_amcheck</application> version number, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_amcheck</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Specifies verbose mode. This will cause
+ <application>pg_amcheck</application> to output more detailed information
+ about its activities, mostly to do with its communication with the
+ database.
+ </para>
+ <para>
+ Note that this does not increase the number of corruptions reported nor
+ the level of detail reported about each of them.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ The following command-line options control which datatabase objects
+ <application>pg_amcheck</application> checks and how such options
+ are interpreted.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-i <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--index=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ For indexes associated with tables being checked, check only those
+ indexes with names matching <replaceable
+ class="parameter">pattern</replaceable>. Multiple indexes can be
+ selected by writing multiple <option>-i</option> switches. The
+ <replaceable class="parameter">pattern</replaceable> parameter is
+ interpreted as a pattern according to the same rules used by
+ <application>psql</application>'s <literal>\d</literal> commands (see
+ <xref linkend="app-psql-patterns"/>), so multiple indexes can also
+ be selected by writing wildcard characters in the pattern. When using
+ wildcards, be careful to quote the pattern if needed to prevent the
+ shell from expanding the wildcards.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-I <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--exclude-index=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check any indexes matching <replaceable
+ class="parameter">pattern</replaceable>. The pattern is interpreted
+ according to the same rules as for <option>-i</option>.
+ <option>-I</option> can be given more than once to exclude indexes
+ matching any of several patterns.
+ </para>
+
+ <para>
+ When both <option>-i</option> and <option>-I</option> are given, the
+ behavior is to check just the indexes that match at least one
+ <option>-i</option> switch but no <option>-I</option> switches. If
+ <option>-I</option> appears without <option>-i</option>, then indexes
+ matching <option>-I</option> are excluded from what is otherwise a check
+ of all indexes associated with tables that are checked.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-n <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--schema=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Dump only schemas matching <replaceable
+ class="parameter">pattern</replaceable>; this selects both the
+ schema itself, and all its contained objects. When this option is
+ not specified, all non-system schemas in the target database will be
+ checked. Multiple schemas can be
+ selected by writing multiple <option>-n</option> switches. The
+ <replaceable class="parameter">pattern</replaceable> parameter is
+ interpreted as a pattern according to the same rules used by
+ <application>psql</application>'s <literal>\d</literal> commands
+ (see <xref linkend="app-psql-patterns"/>),
+ so multiple schemas can also be selected by writing wildcard characters
+ in the pattern. When using wildcards, be careful to quote the pattern
+ if needed to prevent the shell from expanding the wildcards.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-N <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--exclude-schema=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check any schemas matching <replaceable
+ class="parameter">pattern</replaceable>. The pattern is
+ interpreted according to the same rules as for <option>-n</option>.
+ <option>-N</option> can be given more than once to exclude schemas
+ matching any of several patterns.
+ </para>
+
+ <para>
+ When both <option>-n</option> and <option>-N</option> are given, the behavior
+ is to check just the schemas that match at least one <option>-n</option>
+ switch but no <option>-N</option> switches. If <option>-N</option> appears
+ without <option>-n</option>, then schemas matching <option>-N</option> are
+ excluded from what is otherwise a check of all schemas.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--strict-names</option></term>
+ <listitem>
+ <para>
+ Requires that each schema
+ (<option>-n</option>/<option>--schema</option>), table
+ (<option>-t</option>/<option>--table</option>) and index
+ (<option>-i</option>/<option>--index</option>) qualifier match at least
+ one schema/table/index in the database to be checked.
+ </para>
+ <para>
+ This option has no effect on
+ <option>-N</option>/<option>--exclude-schema</option>,
+ <option>-T</option>/<option>--exclude-table</option>,
+ or <option>-I</option><option>--exclude-index</option>. An exclude
+ pattern failing to match any objects is not considered an error.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--table=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Check only tables with names matching
+ <replaceable class="parameter">pattern</replaceable>. Multiple tables
+ can be selected by writing multiple <option>-t</option> switches. The
+ <replaceable class="parameter">pattern</replaceable> parameter is
+ interpreted as a pattern according to the same rules used by
+ <application>psql</application>'s <literal>\d</literal> commands
+ (see <xref linkend="app-psql-patterns"/>),
+ so multiple tables can also be selected by writing wildcard characters
+ in the pattern. When using wildcards, be careful to quote the pattern
+ if needed to prevent the shell from expanding the wildcards.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--exclude-table=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check any tables matching <replaceable
+ class="parameter">pattern</replaceable>. The pattern is interpreted
+ according to the same rules as for <option>-t</option>.
+ <option>-T</option> can be given more than once to exclude tables
+ matching any of several patterns.
+ </para>
+
+ <para>
+ When both <option>-t</option> and <option>-T</option> are given, the
+ behavior is to check just the tables that match at least one
+ <option>-t</option> switch but no <option>-T</option> switches. If
+ <option>-T</option> appears without <option>-t</option>, then tables
+ matching <option>-T</option> are excluded from what is otherwise a check
+ of all tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ The following command-line options control additional behaviors.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-a</option></term>
+ <term><option>--heapallindexed</option></term>
+ <listitem>
+ <para>
+ When checking indexes, additionally verify the presence of all heap
+ tuples as index tuples within the index.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-b <replaceable class="parameter">block</replaceable></option></term>
+ <term><option>--startblock=<replaceable class="parameter">block</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check blocks prior to <replaceable
+ class="parameter">block</replaceable>, which should be a non-negative
+ integer. (Negative values disable the option).
+ </para>
+ <para>
+ When both <option>-b</option> <option>--startblock</option> and
+ <option>-e</option> <option>--endblock</option> are specified, the end
+ block must not be less than the start block.
+ </para>
+ <para>
+ The <option>-b</option> <option>--startblock</option> option will be
+ applied to all tables that are checked, including toast tables. The
+ option is most useful when checking exactly one table, to focus the
+ checking on just specific blocks of that one table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-C</option></term>
+ <term><option>--skip-corrupt</option></term>
+ <listitem>
+ <para>
+ Skip checking indexes for a table if the table is found to be corrupt.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-e <replaceable class="parameter">block</replaceable></option></term>
+ <term><option>--endblock=<replaceable class="parameter">block</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check blocks after <replaceable
+ class="parameter">block</replaceable>, which should be a non-negative
+ integer. (Negative values disable the option).
+ </para>
+ <para>
+ When both <option>-b</option> <option>--startblock</option> and
+ <option>-e</option> <option>--endblock</option> are specified, the end
+ block must not be less than the start block.
+ </para>
+ <para>
+ The <option>-e</option> <option>--endblock</option> option will be
+ applied to all tables that are checked, including toast tables. The
+ option is most useful when checking exactly one table, to focus the
+ checking on just specific blocks of that one table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-o</option></term>
+ <term><option>--on-error-stop</option></term>
+ <listitem>
+ <para>
+ Stop checking database objects at the end of the first page on
+ which the first corruption is found. Note that even with this option
+ enabled, more than one corruption message may be reported.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-r</option></term>
+ <term><option>--rootdescend</option></term>
+ <listitem>
+ <para>
+ When checking indexes, for each tuple, perform addition verification by
+ re-finding the tuple on the leaf level by performing a new search from
+ the root page.
+ </para>
+ <para>
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited or even of no
+ use in helping detect the kinds of corruption that occur in practice.
+ In any event, it is known to be a rather expensive check to perform.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-S</option></term>
+ <term><option>--skip=<replaceable class="parameter">blocks</replaceable></option></term>
+ <listitem>
+ <para>
+ When <option>-S</option><replaceable
+ class="parameter">all-visible</replaceable> is given, skips corruption
+ checking of blocks marked as all visible in the visibility map.
+ </para>
+ <para>
+ When <option>-S</option><replaceable
+ class="parameter">all-frozen</replaceable> is given, skips corruption
+ checking of blocks marked as all frozen in the visibility map.
+ </para>
+ <para>
+ The default is to check blocks without regard to their marking in the
+ visibility map.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-X</option></term>
+ <term><option>--skip-indexes</option></term>
+ <listitem>
+ <para>
+ Check only tables but not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-Z</option></term>
+ <term><option>--skip-toast</option></term>
+ <listitem>
+ <para>
+ Do not check toast tables nor their indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+
+ <para>
+ The following additional command-line options control the database
+ connection parameters.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-d <replaceable class="parameter">dbname</replaceable></option></term>
+ <term><option>--dbname=<replaceable class="parameter">dbname</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to connect to. This is
+ equivalent to specifying <replaceable
+ class="parameter">dbname</replaceable> as the first non-option
+ argument on the command line. The <replaceable>dbname</replaceable>
+ can be a <link linkend="libpq-connstring">connection string</link>.
+ If so, connection string parameters will override any conflicting
+ command line options.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+ <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server is
+ running. If the value begins with a slash, it is used as the
+ directory for the Unix domain socket. The default is taken
+ from the <envar>PGHOST</envar> environment variable, if set,
+ else a Unix domain socket connection is attempted.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+ <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file
+ extension on which the server is listening for connections.
+ Defaults to the <envar>PGPORT</envar> environment variable, if
+ set, or a compiled-in default.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-U <replaceable>username</replaceable></option></term>
+ <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires
+ password authentication and a password is not available by
+ other means such as a <filename>.pgpass</filename> file, the
+ connection attempt will fail. This option can be useful in
+ batch jobs and scripts where no user is present to enter a
+ password.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_dump</application> to prompt for a
+ password before connecting to a database.
+ </para>
+
+ <para>
+ This option is never essential, since
+ <application>pg_dump</application> will automatically prompt
+ for a password if the server demands password authentication.
+ However, <application>pg_dump</application> will waste a
+ connection attempt finding out that the server wants a password.
+ In some cases it is worth typing <option>-W</option> to avoid the extra
+ connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--role=<replaceable class="parameter">rolename</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies a role name to be used to create the dump.
+ This option causes <application>pg_dump</application> to issue a
+ <command>SET ROLE</command> <replaceable class="parameter">rolename</replaceable>
+ command after connecting to the database. It is useful when the
+ authenticated user (specified by <option>-U</option>) lacks privileges
+ needed by <application>pg_dump</application>, but can switch to a role with
+ the required rights. Some installations have a policy against
+ logging in directly as a superuser, and use of this option allows
+ dumps to be made without violating the policy.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </sect2>
+</sect1>
--
2.21.1 (Apple Git-122.3)
On Nov 19, 2020, at 11:47 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Nov 19, 2020 at 9:06 AM Robert Haas <robertmhaas@gmail.com> wrote:
I'm also not sure if these descriptions are clear enough, but it may
also be hard to do a good job in a brief space. Still, comparing this
to the documentation of heapallindexed makes me rather nervous. This
is only trying to verify that the index contains all the tuples in the
heap, not that the values in the heap and index tuples actually match.That's a good point. As things stand, heapallindexed verification does
not notice when there are extra index tuples in the index that are in
some way inconsistent with the heap. Hopefully this isn't too much of
a problem in practice because the presence of extra spurious tuples
gets detected by the index structure verification process. But in
general that might not happen.Ideally heapallindex verification would verify 1:1 correspondence. It
doesn't do that right now, but it could.This could work by having two bloom filters -- one for the heap,
another for the index. The implementation would look for the absence
of index tuples that should be in the index initially, just like
today. But at the end it would modify the index bloom filter by &= it
with the complement of the heap bloom filter. If any bits are left set
in the index bloom filter, we go back through the index once more and
locate index tuples that have at least some matching bits in the index
bloom filter (we cannot expect all of the bits from each of the hash
functions used by the bloom filter to still be matches).From here we can do some kind of lookup for maybe-not-matching index
tuples that we locate. Make sure that they point to an LP_DEAD line
item in the heap or something. Make sure that they have the same
values as the heap tuple if they're still retrievable (i.e. if we
haven't pruned the heap tuple away already).
This approach sounds very good to me, but beyond the scope of what I'm planning for this release cycle.
This to me seems too conservative. The result is that by default we
check only tables, not indexes. I don't think that's going to be what
users want. I don't know whether they want the heapallindexed or
rootdescend behaviors for index checks, but I think they want their
indexes checked. Happy to hear opinions from actual users on what they
want; this is just me guessing that you've guessed wrong. :-)My thoughts on these two options:
* I don't think that users will ever want rootdescend verification.
That option exists now because I wanted to have something that relied
on the uniqueness property of B-Tree indexes following the Postgres 12
work. I didn't add retail index tuple deletion, so it seemed like a
good idea to have something that makes the same assumptions that it
would have to make. To validate the design.Another factor is that Alexander Korotkov made the basic
bt_index_parent_check() tests a lot better for Postgres 13. This
undermined the practical argument for using rootdescend verification.
The latest version of the patch has rootdescend off by default, but a switch to turn it on. The documentation for that switch in doc/src/sgml/pgamcheck.sgml summarizes your comments:
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited or even of no
+ use in helping detect the kinds of corruption that occur in practice.
+ In any event, it is known to be a rather expensive check to perform.
For my own self, I don't care if rootdescend is an option in pg_amcheck. You and Robert expressed somewhat different opinions, and I tried to split the difference. I'm happy to go a different direction if that's what the consensus is.
Finally, note that bt_index_parent_check() was always supposed to be
something that was to be used only when you already knew that you had
big problems, and wanted absolutely thorough verification without
regard for the costs. This isn't the common case at all. It would be
reasonable to not expose anything from bt_index_parent_check() at all,
or to give it much less prominence. Not really sure of what the right
balance is here myself, so I'm not insisting on anything. Just telling
you what I know about it.
This still needs work. Currently, there is a switch to turn off index checking, with the checks on by default. But there is no switch controlling which kind of check is performed (bt_index_check vs. bt_index_parent_check). Making matters more complicated, selecting both rootdescend and bt_index_check wouldn't make sense, as there is no rootdescend option on that function. So users would need multiple flags to turn on various options, with some flag combinations drawing an error about the flags not being mutually compatible. That's doable, but people may not like that interface.
* heapallindexed is kind of expensive, but valuable. But the extra
check is probably less likely to help on the second or subsequent
index on a table.
There is a switch for enabling this. It is off by default.
It might be worth considering an option that only uses it with only
one index: Preferably the primary key index, failing that some unique
index, and failing that some other index.
It might make sense for somebody to submit this for a later release. I don't have any plans to work on this during this release cycle.
I'm not very convinced by the decision to
override the user's decision about heapallindexed either.I strongly agree.
I have removed the override.
Maybe I lack
imagination, but that seems pretty arbitrary. Suppose there's a giant
index which is missing entries for 5 million heap tuples and also
there's 1 entry in the table which has an xmin that is less than the
pg_clas.relfrozenxid value by 1. You are proposing that because I have
the latter problem I don't want you to check for the former one. But
I, John Q. Smartuser, do not want you to second-guess what I told you
on the command line that I wanted. :-)Even if your user is just average, they still have one major advantage
over the architects of pg_amcheck: actual knowledge of the problem in
front of them.
There is a switch for skipping index checks on corrupt tables. By default, the indexes will be checked.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, Jan 8, 2021 at 6:33 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
The attached patches, v31, are mostly the same, but with "getopt_long.h" included from pg_amcheck.c per Thomas's review, and a .gitignore file added in contrib/pg_amcheck/
I couple more little things from Windows CI:
C:\projects\postgresql\src\include\fe_utils/option_utils.h(19):
fatal error C1083: Cannot open include file: 'libpq-fe.h': No such
file or directory [C:\projects\postgresql\pg_amcheck.vcxproj]
Does contrib/amcheck/Makefile need to say "SHLIB_PREREQS =
submake-libpq" like other contrib modules that use libpq?
pg_backup_utils.obj : error LNK2001: unresolved external symbol
exit_nicely [C:\projects\postgresql\pg_dump.vcxproj]
I think this is probably because additions to src/fe_utils/Makefile's
OBJS list need to be manually replicated in
src/tools/msvc/Mkvcbuild.pm's @pgfeutilsfiles list. (If I'm right
about that, perhaps it needs a comment to remind us Unix hackers of
that, or perhaps it should be automated...)
On Jan 10, 2021, at 12:41 PM, Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Jan 8, 2021 at 6:33 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
The attached patches, v31, are mostly the same, but with "getopt_long.h" included from pg_amcheck.c per Thomas's review, and a .gitignore file added in contrib/pg_amcheck/
I couple more little things from Windows CI:
C:\projects\postgresql\src\include\fe_utils/option_utils.h(19):
fatal error C1083: Cannot open include file: 'libpq-fe.h': No such
file or directory [C:\projects\postgresql\pg_amcheck.vcxproj]Does contrib/amcheck/Makefile need to say "SHLIB_PREREQS =
submake-libpq" like other contrib modules that use libpq?
Added in v32.
pg_backup_utils.obj : error LNK2001: unresolved external symbol
exit_nicely [C:\projects\postgresql\pg_dump.vcxproj]I think this is probably because additions to src/fe_utils/Makefile's
OBJS list need to be manually replicated in
src/tools/msvc/Mkvcbuild.pm's @pgfeutilsfiles list. (If I'm right
about that, perhaps it needs a comment to remind us Unix hackers of
that, or perhaps it should be automated...)
Added in v32, along with adding pg_amcheck to @contrib_uselibpq, @contrib_uselibpgport, and @contrib_uselibpgcommon
There are also a few additions in v32 to typedefs.list, and some whitespace changes due to running pgindent.
Attachments:
v32-0001-Moving-exit_nicely-and-fatal-into-fe_utils.patchapplication/octet-stream; name=v32-0001-Moving-exit_nicely-and-fatal-into-fe_utils.patch; x-unix-mode=0644Download
From b4a3769891e3669c2b6e5c4b0db757f70fff6d84 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 30 Dec 2020 12:50:41 -0800
Subject: [PATCH v32 1/9] Moving exit_nicely and fatal into fe_utils
In preparation for moving other pg_dump functionality into fe_utils,
moving the functions "on_exit_nicely" and "exit_nicely", and the
macro "fatal" from pg_dump into fe_utils.
Various frontend executables in src/bin, src/bin/scripts, and
contrib/ have logic for logging and exiting under error conditions.
The logging code itself is already under common/, but executables
differ in their calls to exit() vs. exit_nicely(), with
exit_nicely() not uniformly defined, and sometimes all of this
wrapped up under a macro named fatal(), the definition of that macro
also not uniformly defined. This makes it harder to move code out
of these executables into a shared library under fe_utils/.
Standardizing all executables to define these things the same way or
to use a single fe_utils/ library is beyond the scope of this patch,
but this patch should get the ball rolling in that direction.
---
src/bin/pg_dump/pg_backup_archiver.h | 1 +
src/bin/pg_dump/pg_backup_utils.c | 59 -----------------------
src/bin/pg_dump/pg_backup_utils.h | 8 ----
src/fe_utils/Makefile | 1 +
src/fe_utils/exit_utils.c | 71 ++++++++++++++++++++++++++++
src/include/fe_utils/exit_utils.h | 25 ++++++++++
6 files changed, 98 insertions(+), 67 deletions(-)
create mode 100644 src/fe_utils/exit_utils.c
create mode 100644 src/include/fe_utils/exit_utils.h
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index a8ea5c7eae..37d157b7ad 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -26,6 +26,7 @@
#include <time.h>
+#include "fe_utils/exit_utils.h"
#include "libpq-fe.h"
#include "pg_backup.h"
#include "pqexpbuffer.h"
diff --git a/src/bin/pg_dump/pg_backup_utils.c b/src/bin/pg_dump/pg_backup_utils.c
index c709a40e06..631e88f7db 100644
--- a/src/bin/pg_dump/pg_backup_utils.c
+++ b/src/bin/pg_dump/pg_backup_utils.c
@@ -19,16 +19,6 @@
/* Globals exported by this file */
const char *progname = NULL;
-#define MAX_ON_EXIT_NICELY 20
-
-static struct
-{
- on_exit_nicely_callback function;
- void *arg;
-} on_exit_nicely_list[MAX_ON_EXIT_NICELY];
-
-static int on_exit_nicely_index;
-
/*
* Parse a --section=foo command line argument.
*
@@ -57,52 +47,3 @@ set_dump_section(const char *arg, int *dumpSections)
exit_nicely(1);
}
}
-
-
-/* Register a callback to be run when exit_nicely is invoked. */
-void
-on_exit_nicely(on_exit_nicely_callback function, void *arg)
-{
- if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
- {
- pg_log_fatal("out of on_exit_nicely slots");
- exit_nicely(1);
- }
- on_exit_nicely_list[on_exit_nicely_index].function = function;
- on_exit_nicely_list[on_exit_nicely_index].arg = arg;
- on_exit_nicely_index++;
-}
-
-/*
- * Run accumulated on_exit_nicely callbacks in reverse order and then exit
- * without printing any message.
- *
- * If running in a parallel worker thread on Windows, we only exit the thread,
- * not the whole process.
- *
- * Note that in parallel operation on Windows, the callback(s) will be run
- * by each thread since the list state is necessarily shared by all threads;
- * each callback must contain logic to ensure it does only what's appropriate
- * for its thread. On Unix, callbacks are also run by each process, but only
- * for callbacks established before we fork off the child processes. (It'd
- * be cleaner to reset the list after fork(), and let each child establish
- * its own callbacks; but then the behavior would be completely inconsistent
- * between Windows and Unix. For now, just be sure to establish callbacks
- * before forking to avoid inconsistency.)
- */
-void
-exit_nicely(int code)
-{
- int i;
-
- for (i = on_exit_nicely_index - 1; i >= 0; i--)
- on_exit_nicely_list[i].function(code,
- on_exit_nicely_list[i].arg);
-
-#ifdef WIN32
- if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
- _endthreadex(code);
-#endif
-
- exit(code);
-}
diff --git a/src/bin/pg_dump/pg_backup_utils.h b/src/bin/pg_dump/pg_backup_utils.h
index 306798f9ac..ee4409c274 100644
--- a/src/bin/pg_dump/pg_backup_utils.h
+++ b/src/bin/pg_dump/pg_backup_utils.h
@@ -15,22 +15,14 @@
#ifndef PG_BACKUP_UTILS_H
#define PG_BACKUP_UTILS_H
-#include "common/logging.h"
-
/* bits returned by set_dump_section */
#define DUMP_PRE_DATA 0x01
#define DUMP_DATA 0x02
#define DUMP_POST_DATA 0x04
#define DUMP_UNSECTIONED 0xff
-typedef void (*on_exit_nicely_callback) (int code, void *arg);
-
extern const char *progname;
extern void set_dump_section(const char *arg, int *dumpSections);
-extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
-extern void exit_nicely(int code) pg_attribute_noreturn();
-
-#define fatal(...) do { pg_log_error(__VA_ARGS__); exit_nicely(1); } while(0)
#endif /* PG_BACKUP_UTILS_H */
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 10d6838cf9..d6c328faf1 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -23,6 +23,7 @@ OBJS = \
archive.o \
cancel.o \
conditional.o \
+ exit_utils.o \
mbprint.o \
print.o \
psqlscan.o \
diff --git a/src/fe_utils/exit_utils.c b/src/fe_utils/exit_utils.c
new file mode 100644
index 0000000000..e61bd438fc
--- /dev/null
+++ b/src/fe_utils/exit_utils.c
@@ -0,0 +1,71 @@
+/*-------------------------------------------------------------------------
+ *
+ * Exiting with cleanup callback facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/exit_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "fe_utils/exit_utils.h"
+
+#define MAX_ON_EXIT_NICELY 20
+
+static struct
+{
+ on_exit_nicely_callback function;
+ void *arg;
+} on_exit_nicely_list[MAX_ON_EXIT_NICELY];
+
+static int on_exit_nicely_index;
+
+/* Register a callback to be run when exit_nicely is invoked. */
+void
+on_exit_nicely(on_exit_nicely_callback function, void *arg)
+{
+ if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
+ {
+ pg_log_fatal("out of on_exit_nicely slots");
+ exit_nicely(1);
+ }
+ on_exit_nicely_list[on_exit_nicely_index].function = function;
+ on_exit_nicely_list[on_exit_nicely_index].arg = arg;
+ on_exit_nicely_index++;
+}
+
+/*
+ * Run accumulated on_exit_nicely callbacks in reverse order and then exit
+ * without printing any message.
+ *
+ * If running in a parallel worker thread on Windows, we only exit the thread,
+ * not the whole process.
+ *
+ * Note that in parallel operation on Windows, the callback(s) will be run
+ * by each thread since the list state is necessarily shared by all threads;
+ * each callback must contain logic to ensure it does only what's appropriate
+ * for its thread. On Unix, callbacks are also run by each process, but only
+ * for callbacks established before we fork off the child processes. (It'd
+ * be cleaner to reset the list after fork(), and let each child establish
+ * its own callbacks; but then the behavior would be completely inconsistent
+ * between Windows and Unix. For now, just be sure to establish callbacks
+ * before forking to avoid inconsistency.)
+ */
+void
+exit_nicely(int code)
+{
+ int i;
+
+ for (i = on_exit_nicely_index - 1; i >= 0; i--)
+ on_exit_nicely_list[i].function(code,
+ on_exit_nicely_list[i].arg);
+
+#ifdef WIN32
+ if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
+ _endthreadex(code);
+#endif
+
+ exit(code);
+}
diff --git a/src/include/fe_utils/exit_utils.h b/src/include/fe_utils/exit_utils.h
new file mode 100644
index 0000000000..948d2fdb51
--- /dev/null
+++ b/src/include/fe_utils/exit_utils.h
@@ -0,0 +1,25 @@
+/*-------------------------------------------------------------------------
+ *
+ * Exiting with cleanup callback facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/exit_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXIT_UTILS_H
+#define EXIT_UTILS_H
+
+#include "postgres_fe.h"
+#include "common/logging.h"
+
+typedef void (*on_exit_nicely_callback) (int code, void *arg);
+
+extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
+extern void exit_nicely(int code) pg_attribute_noreturn();
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit_nicely(1); } while(0)
+
+#endif /* EXIT_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v32-0002-Refactoring-ExecuteSqlQuery-and-related-function.patchapplication/octet-stream; name=v32-0002-Refactoring-ExecuteSqlQuery-and-related-function.patch; x-unix-mode=0644Download
From 19a006bf7841fa4405e42be790dd5fe6f24b14a8 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 4 Jan 2021 12:44:20 -0800
Subject: [PATCH v32 2/9] Refactoring ExecuteSqlQuery and related functions.
ExecuteSqlQuery, ExecuteSqlQueryForSingleRow, and
ExecuteSqlStatement in the pg_dump project were defined to take a
pointer to struct Archive, which is a struct unused outside pg_dump.
In preparation for moving these functions to fe_utils, refactoring
these functions to take a PGconn pointer. These functions also
embedded pg_dump assumptions about the correct error handling
behavior, specifically to do with logging error messages before
calling exit_nicely(). Refactoring the error handling logic into a
handler function. The full design of the handler is not yet
present, as it will be developed further after moving to fe_utils,
but the idea is that callers will ultimately be able to override the
error handling behavior by defining alternate handlers.
To minimize changes to pg_dump and friends, creating thin wrappers
around these functions that take an Archive pointer. It might be
marginally cleaner in the long run to refactor pg_dump.c to call
with a PGconn pointer in all relevant call sites, but that would
result in a nontrivially larger patch and more code churn, so not
doing that here. Another option might be to define the thin
wrappers as static inline functions, but that seems inconsistent
with the rest of the pg_dump project style, so not doing that
either. Should we?
---
src/bin/pg_dump/pg_backup_db.c | 144 ++++++++++++++-----
src/bin/pg_dump/pg_backup_db.h | 26 +++-
src/bin/pg_dump/pg_dump.c | 248 ++++++++++++++++-----------------
3 files changed, 253 insertions(+), 165 deletions(-)
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index 5ba43441f5..b55a968da2 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -61,7 +61,7 @@ _check_database_version(ArchiveHandle *AH)
*/
if (remoteversion >= 90000)
{
- res = ExecuteSqlQueryForSingleRow((Archive *) AH, "SELECT pg_catalog.pg_is_in_recovery()");
+ res = ExecuteSqlQueryForSingleRowAH((Archive *) AH, "SELECT pg_catalog.pg_is_in_recovery()");
AH->public.isStandby = (strcmp(PQgetvalue(res, 0, 0), "t") == 0);
PQclear(res);
@@ -198,8 +198,8 @@ ConnectDatabase(Archive *AHX,
}
/* Start strict; later phases may override this. */
- PQclear(ExecuteSqlQueryForSingleRow((Archive *) AH,
- ALWAYS_SECURE_SEARCH_PATH_SQL));
+ PQclear(ExecuteSqlQueryForSingleRowAH((Archive *) AH,
+ ALWAYS_SECURE_SEARCH_PATH_SQL));
if (password && password != AH->savedPassword)
free(password);
@@ -271,59 +271,129 @@ notice_processor(void *arg, const char *message)
pg_log_generic(PG_LOG_INFO, "%s", message);
}
-/* Like fatal(), but with a complaint about a particular query. */
-static void
-die_on_query_failure(ArchiveHandle *AH, const char *query)
+/*
+ * The exiting query result handler embeds the historical pg_dump behavior
+ * under query error conditions, including exiting nicely. The 'conn' object
+ * is unused here, but is included in the interface for alternate query result
+ * handler implementations.
+ *
+ * Whether the query was successful is determined by comparing the returned
+ * status code against the expected status code, and by comparing the number of
+ * tuples returned from the query against expected_ntups. Special negative
+ * values of expected_ntups can be used to require at least one row or to
+ * disables ntup checking.
+ *
+ * Exits on failure. On successful query completion, returns the 'res'
+ * argument as a notational convenience.
+ */
+PGresult *
+exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
{
- pg_log_error("query failed: %s",
- PQerrorMessage(AH->connection));
- fatal("query was: %s", query);
+ if (PQresultStatus(res) != expected_status)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", query);
+ PQfinish(conn);
+ exit_nicely(1);
+ }
+ if (expected_ntups == POSITIVE_NTUPS || expected_ntups >= 0)
+ {
+ int ntups = PQntuples(res);
+
+ if (expected_ntups == POSITIVE_NTUPS)
+ {
+ if (ntups == 0)
+ fatal("query returned no rows: %s", query);
+ }
+ else if (ntups != expected_ntups)
+ {
+ /*
+ * Preserve historical message behavior of spelling "one" as the
+ * expected row count.
+ */
+ if (expected_ntups == 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+ fatal(ngettext("query returned %d row instead of %d: %s",
+ "query returned %d rows instead of %d: %s",
+ ntups),
+ ntups, expected_ntups, query);
+ }
+ }
+ return res;
}
+/*
+ * Executes the given SQL query statement.
+ *
+ * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
+ */
void
-ExecuteSqlStatement(Archive *AHX, const char *query)
+ExecuteSqlStatement(PGconn *conn, const char *query)
{
- ArchiveHandle *AH = (ArchiveHandle *) AHX;
- PGresult *res;
+ PQclear(exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_COMMAND_OK,
+ ANY_NTUPS,
+ query));
+}
- res = PQexec(AH->connection, query);
- if (PQresultStatus(res) != PGRES_COMMAND_OK)
- die_on_query_failure(AH, query);
- PQclear(res);
+/*
+ * Executes the given SQL query.
+ *
+ * Invokes the exiting handler unless the given 'status' results.
+ *
+ * If successful, returns the query result.
+ */
+PGresult *
+ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ status,
+ ANY_NTUPS,
+ query);
}
+/*
+ * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
+ * requires that exactly one row be returned.
+ */
PGresult *
-ExecuteSqlQuery(Archive *AHX, const char *query, ExecStatusType status)
+ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_TUPLES_OK,
+ 1,
+ query);
+}
+
+void
+ExecuteSqlStatementAH(Archive *AHX, const char *query)
{
ArchiveHandle *AH = (ArchiveHandle *) AHX;
- PGresult *res;
- res = PQexec(AH->connection, query);
- if (PQresultStatus(res) != status)
- die_on_query_failure(AH, query);
- return res;
+ ExecuteSqlStatement(AH->connection, query);
}
-/*
- * Execute an SQL query and verify that we got exactly one row back.
- */
PGresult *
-ExecuteSqlQueryForSingleRow(Archive *fout, const char *query)
+ExecuteSqlQueryAH(Archive *AHX, const char *query, ExecStatusType status)
{
- PGresult *res;
- int ntups;
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
- res = ExecuteSqlQuery(fout, query, PGRES_TUPLES_OK);
+ return ExecuteSqlQuery(AH->connection, query, status);
+}
- /* Expecting a single result only */
- ntups = PQntuples(res);
- if (ntups != 1)
- fatal(ngettext("query returned %d row instead of one: %s",
- "query returned %d rows instead of one: %s",
- ntups),
- ntups, query);
+PGresult *
+ExecuteSqlQueryForSingleRowAH(Archive *AHX, const char *query)
+{
+ ArchiveHandle *AH = (ArchiveHandle *) AHX;
- return res;
+ return ExecuteSqlQueryForSingleRow(AH->connection, query);
}
/*
diff --git a/src/bin/pg_dump/pg_backup_db.h b/src/bin/pg_dump/pg_backup_db.h
index 8888dd34b9..1aac600ece 100644
--- a/src/bin/pg_dump/pg_backup_db.h
+++ b/src/bin/pg_dump/pg_backup_db.h
@@ -13,10 +13,28 @@
extern int ExecuteSqlCommandBuf(Archive *AHX, const char *buf, size_t bufLen);
-extern void ExecuteSqlStatement(Archive *AHX, const char *query);
-extern PGresult *ExecuteSqlQuery(Archive *AHX, const char *query,
- ExecStatusType status);
-extern PGresult *ExecuteSqlQueryForSingleRow(Archive *fout, const char *query);
+#define POSITIVE_NTUPS (-1)
+#define ANY_NTUPS (-2)
+typedef PGresult *(*PGresultHandler) (PGresult *res,
+ PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups,
+ const char *query);
+
+extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+
+extern void ExecuteSqlStatement(PGconn *conn, const char *query);
+extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
+ ExecStatusType expected_status);
+extern PGresult *ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query);
+
+extern void ExecuteSqlStatementAH(Archive *AHX, const char *query);
+extern PGresult *ExecuteSqlQueryAH(Archive *AHX, const char *query,
+ ExecStatusType status);
+extern PGresult *ExecuteSqlQueryForSingleRowAH(Archive *fout,
+ const char *query);
extern void EndDBCopyMode(Archive *AHX, const char *tocEntryTag);
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 1f70653c02..e8985a834f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1084,7 +1084,7 @@ setup_connection(Archive *AH, const char *dumpencoding,
PGconn *conn = GetConnection(AH);
const char *std_strings;
- PQclear(ExecuteSqlQueryForSingleRow(AH, ALWAYS_SECURE_SEARCH_PATH_SQL));
+ PQclear(ExecuteSqlQueryForSingleRowAH(AH, ALWAYS_SECURE_SEARCH_PATH_SQL));
/*
* Set the client encoding if requested.
@@ -1119,7 +1119,7 @@ setup_connection(Archive *AH, const char *dumpencoding,
PQExpBuffer query = createPQExpBuffer();
appendPQExpBuffer(query, "SET ROLE %s", fmtId(use_role));
- ExecuteSqlStatement(AH, query->data);
+ ExecuteSqlStatementAH(AH, query->data);
destroyPQExpBuffer(query);
/* save it for possible later use by parallel workers */
@@ -1128,11 +1128,11 @@ setup_connection(Archive *AH, const char *dumpencoding,
}
/* Set the datestyle to ISO to ensure the dump's portability */
- ExecuteSqlStatement(AH, "SET DATESTYLE = ISO");
+ ExecuteSqlStatementAH(AH, "SET DATESTYLE = ISO");
/* Likewise, avoid using sql_standard intervalstyle */
if (AH->remoteVersion >= 80400)
- ExecuteSqlStatement(AH, "SET INTERVALSTYLE = POSTGRES");
+ ExecuteSqlStatementAH(AH, "SET INTERVALSTYLE = POSTGRES");
/*
* Use an explicitly specified extra_float_digits if it has been provided.
@@ -1145,35 +1145,35 @@ setup_connection(Archive *AH, const char *dumpencoding,
appendPQExpBuffer(q, "SET extra_float_digits TO %d",
extra_float_digits);
- ExecuteSqlStatement(AH, q->data);
+ ExecuteSqlStatementAH(AH, q->data);
destroyPQExpBuffer(q);
}
else if (AH->remoteVersion >= 90000)
- ExecuteSqlStatement(AH, "SET extra_float_digits TO 3");
+ ExecuteSqlStatementAH(AH, "SET extra_float_digits TO 3");
else
- ExecuteSqlStatement(AH, "SET extra_float_digits TO 2");
+ ExecuteSqlStatementAH(AH, "SET extra_float_digits TO 2");
/*
* If synchronized scanning is supported, disable it, to prevent
* unpredictable changes in row ordering across a dump and reload.
*/
if (AH->remoteVersion >= 80300)
- ExecuteSqlStatement(AH, "SET synchronize_seqscans TO off");
+ ExecuteSqlStatementAH(AH, "SET synchronize_seqscans TO off");
/*
* Disable timeouts if supported.
*/
- ExecuteSqlStatement(AH, "SET statement_timeout = 0");
+ ExecuteSqlStatementAH(AH, "SET statement_timeout = 0");
if (AH->remoteVersion >= 90300)
- ExecuteSqlStatement(AH, "SET lock_timeout = 0");
+ ExecuteSqlStatementAH(AH, "SET lock_timeout = 0");
if (AH->remoteVersion >= 90600)
- ExecuteSqlStatement(AH, "SET idle_in_transaction_session_timeout = 0");
+ ExecuteSqlStatementAH(AH, "SET idle_in_transaction_session_timeout = 0");
/*
* Quote all identifiers, if requested.
*/
if (quote_all_identifiers && AH->remoteVersion >= 90100)
- ExecuteSqlStatement(AH, "SET quote_all_identifiers = true");
+ ExecuteSqlStatementAH(AH, "SET quote_all_identifiers = true");
/*
* Adjust row-security mode, if supported.
@@ -1181,15 +1181,15 @@ setup_connection(Archive *AH, const char *dumpencoding,
if (AH->remoteVersion >= 90500)
{
if (dopt->enable_row_security)
- ExecuteSqlStatement(AH, "SET row_security = on");
+ ExecuteSqlStatementAH(AH, "SET row_security = on");
else
- ExecuteSqlStatement(AH, "SET row_security = off");
+ ExecuteSqlStatementAH(AH, "SET row_security = off");
}
/*
* Start transaction-snapshot mode transaction to dump consistent data.
*/
- ExecuteSqlStatement(AH, "BEGIN");
+ ExecuteSqlStatementAH(AH, "BEGIN");
if (AH->remoteVersion >= 90100)
{
/*
@@ -1201,17 +1201,17 @@ setup_connection(Archive *AH, const char *dumpencoding,
* guarantees. This is a kluge, but safe for back-patching.
*/
if (dopt->serializable_deferrable && AH->sync_snapshot_id == NULL)
- ExecuteSqlStatement(AH,
+ ExecuteSqlStatementAH(AH,
"SET TRANSACTION ISOLATION LEVEL "
"SERIALIZABLE, READ ONLY, DEFERRABLE");
else
- ExecuteSqlStatement(AH,
+ ExecuteSqlStatementAH(AH,
"SET TRANSACTION ISOLATION LEVEL "
"REPEATABLE READ, READ ONLY");
}
else
{
- ExecuteSqlStatement(AH,
+ ExecuteSqlStatementAH(AH,
"SET TRANSACTION ISOLATION LEVEL "
"SERIALIZABLE, READ ONLY");
}
@@ -1230,7 +1230,7 @@ setup_connection(Archive *AH, const char *dumpencoding,
appendPQExpBufferStr(query, "SET TRANSACTION SNAPSHOT ");
appendStringLiteralConn(query, AH->sync_snapshot_id, conn);
- ExecuteSqlStatement(AH, query->data);
+ ExecuteSqlStatementAH(AH, query->data);
destroyPQExpBuffer(query);
}
else if (AH->numWorkers > 1 &&
@@ -1270,7 +1270,7 @@ get_synchronized_snapshot(Archive *fout)
char *result;
PGresult *res;
- res = ExecuteSqlQueryForSingleRow(fout, query);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query);
result = pg_strdup(PQgetvalue(res, 0, 0));
PQclear(res);
@@ -1343,7 +1343,7 @@ expand_schema_name_patterns(Archive *fout,
processSQLNamePattern(GetConnection(fout), query, cell->val, false,
false, NULL, "n.nspname", NULL, NULL);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (strict_names && PQntuples(res) == 0)
fatal("no matching schemas were found for pattern \"%s\"", cell->val);
@@ -1390,7 +1390,7 @@ expand_foreign_server_name_patterns(Archive *fout,
processSQLNamePattern(GetConnection(fout), query, cell->val, false,
false, NULL, "s.srvname", NULL, NULL);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) == 0)
fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
@@ -1450,9 +1450,9 @@ expand_table_name_patterns(Archive *fout,
false, "n.nspname", "c.relname", NULL,
"pg_catalog.pg_table_is_visible(c.oid)");
- ExecuteSqlStatement(fout, "RESET search_path");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
- PQclear(ExecuteSqlQueryForSingleRow(fout,
+ ExecuteSqlStatementAH(fout, "RESET search_path");
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRowAH(fout,
ALWAYS_SECURE_SEARCH_PATH_SQL));
if (strict_names && PQntuples(res) == 0)
fatal("no matching tables were found for pattern \"%s\"", cell->val);
@@ -1907,7 +1907,7 @@ dumpTableData_copy(Archive *fout, void *dcontext)
fmtQualifiedDumpable(tbinfo),
column_list);
}
- res = ExecuteSqlQuery(fout, q->data, PGRES_COPY_OUT);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_COPY_OUT);
PQclear(res);
destroyPQExpBuffer(clistBuf);
@@ -2028,11 +2028,11 @@ dumpTableData_insert(Archive *fout, void *dcontext)
if (tdinfo->filtercond)
appendPQExpBuffer(q, " %s", tdinfo->filtercond);
- ExecuteSqlStatement(fout, q->data);
+ ExecuteSqlStatementAH(fout, q->data);
while (1)
{
- res = ExecuteSqlQuery(fout, "FETCH 100 FROM _pg_dump_cursor",
+ res = ExecuteSqlQueryAH(fout, "FETCH 100 FROM _pg_dump_cursor",
PGRES_TUPLES_OK);
nfields = PQnfields(res);
@@ -2220,7 +2220,7 @@ dumpTableData_insert(Archive *fout, void *dcontext)
archputs("\n\n", fout);
- ExecuteSqlStatement(fout, "CLOSE _pg_dump_cursor");
+ ExecuteSqlStatementAH(fout, "CLOSE _pg_dump_cursor");
destroyPQExpBuffer(q);
if (insertStmt != NULL)
@@ -2520,7 +2520,7 @@ buildMatViewRefreshDependencies(Archive *fout)
"FROM w "
"WHERE refrelkind = " CppAsString2(RELKIND_MATVIEW));
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -2847,7 +2847,7 @@ dumpDatabase(Archive *fout)
username_subquery);
}
- res = ExecuteSqlQueryForSingleRow(fout, dbQry->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, dbQry->data);
i_tableoid = PQfnumber(res, "tableoid");
i_oid = PQfnumber(res, "oid");
@@ -2992,7 +2992,7 @@ dumpDatabase(Archive *fout)
seclabelQry = createPQExpBuffer();
buildShSecLabelQuery("pg_database", dbCatId.oid, seclabelQry);
- shres = ExecuteSqlQuery(fout, seclabelQry->data, PGRES_TUPLES_OK);
+ shres = ExecuteSqlQueryAH(fout, seclabelQry->data, PGRES_TUPLES_OK);
resetPQExpBuffer(seclabelQry);
emitShSecLabels(conn, shres, seclabelQry, "DATABASE", datname);
if (seclabelQry->len > 0)
@@ -3103,7 +3103,7 @@ dumpDatabase(Archive *fout)
"WHERE oid = %u;\n",
LargeObjectRelationId);
- lo_res = ExecuteSqlQueryForSingleRow(fout, loFrozenQry->data);
+ lo_res = ExecuteSqlQueryForSingleRowAH(fout, loFrozenQry->data);
i_relfrozenxid = PQfnumber(lo_res, "relfrozenxid");
i_relminmxid = PQfnumber(lo_res, "relminmxid");
@@ -3162,7 +3162,7 @@ dumpDatabaseConfig(Archive *AH, PQExpBuffer outbuf,
else
printfPQExpBuffer(buf, "SELECT datconfig[%d] FROM pg_database WHERE oid = '%u'::oid", count, dboid);
- res = ExecuteSqlQuery(AH, buf->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(AH, buf->data, PGRES_TUPLES_OK);
if (PQntuples(res) == 1 &&
!PQgetisnull(res, 0, 0))
@@ -3189,7 +3189,7 @@ dumpDatabaseConfig(Archive *AH, PQExpBuffer outbuf,
"WHERE setrole = r.oid AND setdatabase = '%u'::oid",
dboid);
- res = ExecuteSqlQuery(AH, buf->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(AH, buf->data, PGRES_TUPLES_OK);
if (PQntuples(res) > 0)
{
@@ -3277,7 +3277,7 @@ dumpSearchPath(Archive *AH)
* listing schemas that may appear in search_path but not actually exist,
* which seems like a prudent exclusion.
*/
- res = ExecuteSqlQueryForSingleRow(AH,
+ res = ExecuteSqlQueryForSingleRowAH(AH,
"SELECT pg_catalog.current_schemas(false)");
if (!parsePGArray(PQgetvalue(res, 0, 0), &schemanames, &nschemanames))
@@ -3391,7 +3391,7 @@ getBlobs(Archive *fout)
"NULL::oid AS initrlomacl "
" FROM pg_largeobject");
- res = ExecuteSqlQuery(fout, blobQry->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, blobQry->data, PGRES_TUPLES_OK);
i_oid = PQfnumber(res, "oid");
i_lomowner = PQfnumber(res, "rolname");
@@ -3537,7 +3537,7 @@ dumpBlobs(Archive *fout, void *arg)
"DECLARE bloboid CURSOR FOR "
"SELECT DISTINCT loid FROM pg_largeobject ORDER BY 1";
- ExecuteSqlStatement(fout, blobQry);
+ ExecuteSqlStatementAH(fout, blobQry);
/* Command to fetch from cursor */
blobFetchQry = "FETCH 1000 IN bloboid";
@@ -3545,7 +3545,7 @@ dumpBlobs(Archive *fout, void *arg)
do
{
/* Do a fetch */
- res = ExecuteSqlQuery(fout, blobFetchQry, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, blobFetchQry, PGRES_TUPLES_OK);
/* Process the tuples, if any */
ntups = PQntuples(res);
@@ -3678,7 +3678,7 @@ getPolicies(Archive *fout, TableInfo tblinfo[], int numTables)
"FROM pg_catalog.pg_policy pol "
"WHERE polrelid = '%u'",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -3914,7 +3914,7 @@ getPublications(Archive *fout)
"FROM pg_publication p",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4112,7 +4112,7 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables)
"WHERE pr.prrelid = '%u'"
" AND p.oid = pr.prpubid",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4237,7 +4237,7 @@ getSubscriptions(Archive *fout)
{
int n;
- res = ExecuteSqlQuery(fout,
+ res = ExecuteSqlQueryAH(fout,
"SELECT count(*) FROM pg_subscription "
"WHERE subdbid = (SELECT oid FROM pg_database"
" WHERE datname = current_database())",
@@ -4274,7 +4274,7 @@ getSubscriptions(Archive *fout)
"WHERE s.subdbid = (SELECT oid FROM pg_database\n"
" WHERE datname = current_database())");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4446,7 +4446,7 @@ append_depends_on_extension(Archive *fout,
"AND refclassid = 'pg_catalog.pg_extension'::pg_catalog.regclass",
catalog,
dobj->catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_extname = PQfnumber(res, "extname");
for (i = 0; i < ntups; i++)
@@ -4485,7 +4485,7 @@ get_next_possible_free_pg_type_oid(Archive *fout, PQExpBuffer upgrade_query)
"FROM pg_catalog.pg_type "
"WHERE oid = '%u'::pg_catalog.oid);",
next_possible_free_oid);
- res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
is_dup = (PQgetvalue(res, 0, 0)[0] == 't');
PQclear(res);
} while (is_dup);
@@ -4518,7 +4518,7 @@ binary_upgrade_set_type_oids_by_type_oid(Archive *fout,
"WHERE oid = '%u'::pg_catalog.oid;",
pg_type_oid);
- res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_type_array_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typarray")));
@@ -4551,7 +4551,7 @@ binary_upgrade_set_type_oids_by_type_oid(Archive *fout,
"WHERE r.rngtypid = '%u'::pg_catalog.oid;",
pg_type_oid);
- res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_type_multirange_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "oid")));
pg_type_multirange_array_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typarray")));
@@ -4594,7 +4594,7 @@ binary_upgrade_set_type_oids_by_rel_oid(Archive *fout,
"WHERE c.oid = '%u'::pg_catalog.oid;",
pg_rel_oid);
- upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ upgrade_res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_type_oid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "crel")));
@@ -4645,7 +4645,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"WHERE c.oid = '%u'::pg_catalog.oid;",
pg_class_oid);
- upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
+ upgrade_res = ExecuteSqlQueryForSingleRowAH(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0,
PQfnumber(upgrade_res, "reltoastrelid")));
@@ -4803,7 +4803,7 @@ getNamespaces(Archive *fout, int *numNamespaces)
"FROM pg_namespace",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -4916,7 +4916,7 @@ getExtensions(Archive *fout, int *numExtensions)
"FROM pg_extension x "
"JOIN pg_namespace n ON n.oid = x.extnamespace");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -5093,7 +5093,7 @@ getTypes(Archive *fout, int *numTypes)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -5249,7 +5249,7 @@ getOperators(Archive *fout, int *numOprs)
"FROM pg_operator",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numOprs = ntups;
@@ -5336,7 +5336,7 @@ getCollations(Archive *fout, int *numCollations)
"FROM pg_collation",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numCollations = ntups;
@@ -5408,7 +5408,7 @@ getConversions(Archive *fout, int *numConversions)
"FROM pg_conversion",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numConversions = ntups;
@@ -5481,7 +5481,7 @@ getAccessMethods(Archive *fout, int *numAccessMethods)
"amhandler::pg_catalog.regproc AS amhandler "
"FROM pg_am");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numAccessMethods = ntups;
@@ -5552,7 +5552,7 @@ getOpclasses(Archive *fout, int *numOpclasses)
"FROM pg_opclass",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numOpclasses = ntups;
@@ -5635,7 +5635,7 @@ getOpfamilies(Archive *fout, int *numOpfamilies)
"FROM pg_opfamily",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numOpfamilies = ntups;
@@ -5804,7 +5804,7 @@ getAggregates(Archive *fout, int *numAggs)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numAggs = ntups;
@@ -6035,7 +6035,7 @@ getFuncs(Archive *fout, int *numFuncs)
appendPQExpBufferChar(query, ')');
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -6721,7 +6721,7 @@ getTables(Archive *fout, int *numTables)
RELKIND_VIEW, RELKIND_COMPOSITE_TYPE);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -6791,7 +6791,7 @@ getTables(Archive *fout, int *numTables)
resetPQExpBuffer(query);
appendPQExpBufferStr(query, "SET statement_timeout = ");
appendStringLiteralConn(query, dopt->lockWaitTimeout, GetConnection(fout));
- ExecuteSqlStatement(fout, query->data);
+ ExecuteSqlStatementAH(fout, query->data);
}
for (i = 0; i < ntups; i++)
@@ -6915,7 +6915,7 @@ getTables(Archive *fout, int *numTables)
appendPQExpBuffer(query,
"LOCK TABLE %s IN ACCESS SHARE MODE",
fmtQualifiedDumpable(&tblinfo[i]));
- ExecuteSqlStatement(fout, query->data);
+ ExecuteSqlStatementAH(fout, query->data);
}
/* Emit notice if join for owner failed */
@@ -6926,7 +6926,7 @@ getTables(Archive *fout, int *numTables)
if (dopt->lockWaitTimeout)
{
- ExecuteSqlStatement(fout, "SET statement_timeout = 0");
+ ExecuteSqlStatementAH(fout, "SET statement_timeout = 0");
}
PQclear(res);
@@ -7021,7 +7021,7 @@ getInherits(Archive *fout, int *numInherits)
/* find all the inheritance information */
appendPQExpBufferStr(query, "SELECT inhrelid, inhparent FROM pg_inherits");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7365,7 +7365,7 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7512,7 +7512,7 @@ getExtendedStatistics(Archive *fout)
"FROM pg_catalog.pg_statistic_ext",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7610,7 +7610,7 @@ getConstraints(Archive *fout, TableInfo tblinfo[], int numTables)
"WHERE conrelid = '%u'::pg_catalog.oid "
"AND contype = 'f'",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7746,7 +7746,7 @@ getDomainConstraints(Archive *fout, TypeInfo *tyinfo)
"ORDER BY conname",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -7839,7 +7839,7 @@ getRules(Archive *fout, int *numRules)
"ORDER BY oid");
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8013,7 +8013,7 @@ getTriggers(Archive *fout, TableInfo tblinfo[], int numTables)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8148,7 +8148,7 @@ getEventTriggers(Archive *fout, int *numEventTriggers)
"ORDER BY e.oid",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8318,7 +8318,7 @@ getProcLangs(Archive *fout, int *numProcLangs)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8427,7 +8427,7 @@ getCasts(Archive *fout, int *numCasts)
"FROM pg_cast ORDER BY 3,4");
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8495,7 +8495,7 @@ get_language_name(Archive *fout, Oid langid)
query = createPQExpBuffer();
appendPQExpBuffer(query, "SELECT lanname FROM pg_language WHERE oid = %u", langid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
lanname = pg_strdup(fmtId(PQgetvalue(res, 0, 0)));
destroyPQExpBuffer(query);
PQclear(res);
@@ -8538,7 +8538,7 @@ getTransforms(Archive *fout, int *numTransforms)
"FROM pg_transform "
"ORDER BY 3,4");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8720,7 +8720,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
"ORDER BY a.attnum",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -8797,7 +8797,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
"WHERE adrelid = '%u'::pg_catalog.oid",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_TUPLES_OK);
numDefaults = PQntuples(res);
attrdefs = (AttrDefInfo *) pg_malloc(numDefaults * sizeof(AttrDefInfo));
@@ -8919,7 +8919,7 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int numTables)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, q->data, PGRES_TUPLES_OK);
numConstrs = PQntuples(res);
if (numConstrs != tbinfo->ncheck)
@@ -9062,7 +9062,7 @@ getTSParsers(Archive *fout, int *numTSParsers)
"prsend::oid, prsheadline::oid, prslextype::oid "
"FROM pg_ts_parser");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSParsers = ntups;
@@ -9146,7 +9146,7 @@ getTSDictionaries(Archive *fout, int *numTSDicts)
"FROM pg_ts_dict",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSDicts = ntups;
@@ -9226,7 +9226,7 @@ getTSTemplates(Archive *fout, int *numTSTemplates)
"tmplnamespace, tmplinit::oid, tmpllexize::oid "
"FROM pg_ts_template");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSTemplates = ntups;
@@ -9302,7 +9302,7 @@ getTSConfigurations(Archive *fout, int *numTSConfigs)
"FROM pg_ts_config",
username_subquery);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numTSConfigs = ntups;
@@ -9455,7 +9455,7 @@ getForeignDataWrappers(Archive *fout, int *numForeignDataWrappers)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numForeignDataWrappers = ntups;
@@ -9603,7 +9603,7 @@ getForeignServers(Archive *fout, int *numForeignServers)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numForeignServers = ntups;
@@ -9742,7 +9742,7 @@ getDefaultACLs(Archive *fout, int *numDefaultACLs)
username_subquery);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
*numDefaultACLs = ntups;
@@ -10098,7 +10098,7 @@ collectComments(Archive *fout, CommentItem **items)
"FROM pg_catalog.pg_description "
"ORDER BY classoid, objoid, objsubid");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
/* Construct lookup table containing OIDs in numeric form */
@@ -10546,7 +10546,7 @@ dumpEnumType(Archive *fout, TypeInfo *tyinfo)
"ORDER BY oid",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
num = PQntuples(res);
@@ -10684,7 +10684,7 @@ dumpRangeType(Archive *fout, TypeInfo *tyinfo)
"rngtypid = '%u'",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
qtypname = pg_strdup(fmtId(tyinfo->dobj.name));
qualtypname = pg_strdup(fmtQualifiedDumpable(tyinfo));
@@ -10942,7 +10942,7 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo)
"WHERE oid = '%u'::pg_catalog.oid",
tyinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
typlen = PQgetvalue(res, 0, PQfnumber(res, "typlen"));
typinput = PQgetvalue(res, 0, PQfnumber(res, "typinput"));
@@ -11165,7 +11165,7 @@ dumpDomain(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.catId.oid);
}
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
typnotnull = PQgetvalue(res, 0, PQfnumber(res, "typnotnull"));
typdefn = PQgetvalue(res, 0, PQfnumber(res, "typdefn"));
@@ -11357,7 +11357,7 @@ dumpCompositeType(Archive *fout, TypeInfo *tyinfo)
tyinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -11536,7 +11536,7 @@ dumpCompositeTypeColComments(Archive *fout, TypeInfo *tyinfo)
tyinfo->typrelid);
/* Fetch column attnames */
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
if (ntups < 1)
@@ -12064,7 +12064,7 @@ dumpFunc(Archive *fout, FuncInfo *finfo)
"WHERE oid = '%u'::pg_catalog.oid",
finfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
proretset = PQgetvalue(res, 0, PQfnumber(res, "proretset"));
prosrc = PQgetvalue(res, 0, PQfnumber(res, "prosrc"));
@@ -12754,7 +12754,7 @@ dumpOpr(Archive *fout, OprInfo *oprinfo)
oprinfo->dobj.catId.oid);
}
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_oprkind = PQfnumber(res, "oprkind");
i_oprcode = PQfnumber(res, "oprcode");
@@ -12977,7 +12977,7 @@ convertTSFunction(Archive *fout, Oid funcOid)
snprintf(query, sizeof(query),
"SELECT '%u'::pg_catalog.regproc", funcOid);
- res = ExecuteSqlQueryForSingleRow(fout, query);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query);
result = pg_strdup(PQgetvalue(res, 0, 0));
@@ -13140,7 +13140,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.catId.oid);
}
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_opcintype = PQfnumber(res, "opcintype");
i_opckeytype = PQfnumber(res, "opckeytype");
@@ -13268,7 +13268,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -13347,7 +13347,7 @@ dumpOpclass(Archive *fout, OpclassInfo *opcinfo)
opcinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -13530,7 +13530,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
opfinfo->dobj.catId.oid);
}
- res_ops = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res_ops = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
resetPQExpBuffer(query);
@@ -13546,7 +13546,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
"ORDER BY amprocnum",
opfinfo->dobj.catId.oid);
- res_procs = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res_procs = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
/* Get additional fields from the pg_opfamily row */
resetPQExpBuffer(query);
@@ -13557,7 +13557,7 @@ dumpOpfamily(Archive *fout, OpfamilyInfo *opfinfo)
"WHERE oid = '%u'::pg_catalog.oid",
opfinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_amname = PQfnumber(res, "amname");
@@ -13744,7 +13744,7 @@ dumpCollation(Archive *fout, CollInfo *collinfo)
"WHERE c.oid = '%u'::pg_catalog.oid",
collinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_collprovider = PQfnumber(res, "collprovider");
i_collisdeterministic = PQfnumber(res, "collisdeterministic");
@@ -13861,7 +13861,7 @@ dumpConversion(Archive *fout, ConvInfo *convinfo)
"WHERE c.oid = '%u'::pg_catalog.oid",
convinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_conforencoding = PQfnumber(res, "conforencoding");
i_contoencoding = PQfnumber(res, "contoencoding");
@@ -14078,7 +14078,7 @@ dumpAgg(Archive *fout, AggInfo *agginfo)
"AND p.oid = '%u'::pg_catalog.oid",
agginfo->aggfn.dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_agginitval = PQfnumber(res, "agginitval");
i_aggminitval = PQfnumber(res, "aggminitval");
@@ -14413,7 +14413,7 @@ dumpTSDictionary(Archive *fout, TSDictInfo *dictinfo)
"FROM pg_ts_template p, pg_namespace n "
"WHERE p.oid = '%u' AND n.oid = tmplnamespace",
dictinfo->dicttemplate);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
nspname = PQgetvalue(res, 0, 0);
tmplname = PQgetvalue(res, 0, 1);
@@ -14555,7 +14555,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
"FROM pg_ts_parser p, pg_namespace n "
"WHERE p.oid = '%u' AND n.oid = prsnamespace",
cfginfo->cfgparser);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
nspname = PQgetvalue(res, 0, 0);
prsname = PQgetvalue(res, 0, 1);
@@ -14578,7 +14578,7 @@ dumpTSConfig(Archive *fout, TSConfigInfo *cfginfo)
"ORDER BY m.mapcfg, m.maptokentype, m.mapseqno",
cfginfo->cfgparser, cfginfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_tokenname = PQfnumber(res, "tokenname");
@@ -14742,7 +14742,7 @@ dumpForeignServer(Archive *fout, ForeignServerInfo *srvinfo)
"FROM pg_foreign_data_wrapper w "
"WHERE w.oid = '%u'",
srvinfo->srvfdw);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
fdwname = PQgetvalue(res, 0, 0);
appendPQExpBuffer(q, "CREATE SERVER %s", qsrvname);
@@ -14858,7 +14858,7 @@ dumpUserMappings(Archive *fout,
"ORDER BY usename",
catalogId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_usename = PQfnumber(res, "usename");
@@ -15379,7 +15379,7 @@ collectSecLabels(Archive *fout, SecLabelItem **items)
"FROM pg_catalog.pg_seclabel "
"ORDER BY classoid, objoid, objsubid");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
/* Construct lookup table containing OIDs in numeric form */
i_label = PQfnumber(res, "label");
@@ -15515,7 +15515,7 @@ dumpTable(Archive *fout, TableInfo *tbinfo)
tbinfo->dobj.catId.oid);
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
for (i = 0; i < PQntuples(res); i++)
{
@@ -15565,7 +15565,7 @@ createViewAsClause(Archive *fout, TableInfo *tbinfo)
"SELECT pg_catalog.pg_get_viewdef('%u'::pg_catalog.oid) AS viewdef",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -15740,7 +15740,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
"ON (fs.oid = ft.ftserver) "
"WHERE ft.ftrelid = '%u'",
tbinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
i_srvname = PQfnumber(res, "srvname");
i_ftoptions = PQfnumber(res, "ftoptions");
srvname = pg_strdup(PQgetvalue(res, 0, i_srvname));
@@ -16693,7 +16693,7 @@ dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo)
"pg_catalog.pg_get_statisticsobjdef('%u'::pg_catalog.oid)",
statsextinfo->dobj.catId.oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
stxdef = PQgetvalue(res, 0, 0);
@@ -17061,7 +17061,7 @@ findLastBuiltinOid_V71(Archive *fout)
PGresult *res;
Oid last_oid;
- res = ExecuteSqlQueryForSingleRow(fout,
+ res = ExecuteSqlQueryForSingleRowAH(fout,
"SELECT datlastsysoid FROM pg_database WHERE datname = current_database()");
last_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "datlastsysoid")));
PQclear(res);
@@ -17130,7 +17130,7 @@ dumpSequence(Archive *fout, TableInfo *tbinfo)
fmtQualifiedDumpable(tbinfo));
}
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -17353,7 +17353,7 @@ dumpSequenceData(Archive *fout, TableDataInfo *tdinfo)
"SELECT last_value, is_called FROM %s",
fmtQualifiedDumpable(tbinfo));
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -17761,7 +17761,7 @@ dumpRule(Archive *fout, RuleInfo *rinfo)
"SELECT pg_catalog.pg_get_ruledef('%u'::pg_catalog.oid)",
rinfo->dobj.catId.oid);
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) != 1)
{
@@ -17891,7 +17891,7 @@ getExtensionMembership(Archive *fout, ExtensionInfo extinfo[],
"AND deptype = 'e' "
"ORDER BY 3");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -18090,7 +18090,7 @@ processExtensionTables(Archive *fout, ExtensionInfo extinfo[],
"AND refclassid = 'pg_extension'::regclass "
"AND classid = 'pg_class'::regclass;");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
i_conrelid = PQfnumber(res, "conrelid");
@@ -18196,7 +18196,7 @@ getDependencies(Archive *fout)
/* Sort the output for efficiency below */
appendPQExpBufferStr(query, "ORDER BY 1,2");
- res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
ntups = PQntuples(res);
@@ -18549,7 +18549,7 @@ getFormattedTypeName(Archive *fout, Oid oid, OidOptions opts)
appendPQExpBuffer(query, "SELECT pg_catalog.format_type('%u'::pg_catalog.oid, NULL)",
oid);
- res = ExecuteSqlQueryForSingleRow(fout, query->data);
+ res = ExecuteSqlQueryForSingleRowAH(fout, query->data);
/* result of format_type is already quoted */
result = pg_strdup(PQgetvalue(res, 0, 0));
--
2.21.1 (Apple Git-122.3)
v32-0003-Creating-query_utils-frontend-utility.patchapplication/octet-stream; name=v32-0003-Creating-query_utils-frontend-utility.patch; x-unix-mode=0644Download
From a5c25a47a6ef095b190a6049b82569b35e126e78 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:01:02 -0800
Subject: [PATCH v32 3/9] Creating query_utils frontend utility
Moving the ExecuteSqlQuery, ExecuteSqlQueryForSingleRow, and
ExecuteSqlStatement out of the pg_dump project into a new shared
location.
---
src/bin/pg_dump/pg_backup_db.c | 102 +-------------------------
src/bin/pg_dump/pg_backup_db.h | 17 -----
src/fe_utils/Makefile | 1 +
src/fe_utils/query_utils.c | 114 +++++++++++++++++++++++++++++
src/include/fe_utils/query_utils.h | 34 +++++++++
5 files changed, 150 insertions(+), 118 deletions(-)
create mode 100644 src/fe_utils/query_utils.c
create mode 100644 src/include/fe_utils/query_utils.h
diff --git a/src/bin/pg_dump/pg_backup_db.c b/src/bin/pg_dump/pg_backup_db.c
index b55a968da2..38402d0831 100644
--- a/src/bin/pg_dump/pg_backup_db.c
+++ b/src/bin/pg_dump/pg_backup_db.c
@@ -20,6 +20,7 @@
#include "common/connect.h"
#include "common/string.h"
#include "dumputils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "parallel.h"
#include "pg_backup_archiver.h"
@@ -271,107 +272,6 @@ notice_processor(void *arg, const char *message)
pg_log_generic(PG_LOG_INFO, "%s", message);
}
-/*
- * The exiting query result handler embeds the historical pg_dump behavior
- * under query error conditions, including exiting nicely. The 'conn' object
- * is unused here, but is included in the interface for alternate query result
- * handler implementations.
- *
- * Whether the query was successful is determined by comparing the returned
- * status code against the expected status code, and by comparing the number of
- * tuples returned from the query against expected_ntups. Special negative
- * values of expected_ntups can be used to require at least one row or to
- * disables ntup checking.
- *
- * Exits on failure. On successful query completion, returns the 'res'
- * argument as a notational convenience.
- */
-PGresult *
-exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
- int expected_ntups, const char *query)
-{
- if (PQresultStatus(res) != expected_status)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_error("query was: %s", query);
- PQfinish(conn);
- exit_nicely(1);
- }
- if (expected_ntups == POSITIVE_NTUPS || expected_ntups >= 0)
- {
- int ntups = PQntuples(res);
-
- if (expected_ntups == POSITIVE_NTUPS)
- {
- if (ntups == 0)
- fatal("query returned no rows: %s", query);
- }
- else if (ntups != expected_ntups)
- {
- /*
- * Preserve historical message behavior of spelling "one" as the
- * expected row count.
- */
- if (expected_ntups == 1)
- fatal(ngettext("query returned %d row instead of one: %s",
- "query returned %d rows instead of one: %s",
- ntups),
- ntups, query);
- fatal(ngettext("query returned %d row instead of %d: %s",
- "query returned %d rows instead of %d: %s",
- ntups),
- ntups, expected_ntups, query);
- }
- }
- return res;
-}
-
-/*
- * Executes the given SQL query statement.
- *
- * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
- */
-void
-ExecuteSqlStatement(PGconn *conn, const char *query)
-{
- PQclear(exiting_handler(PQexec(conn, query),
- conn,
- PGRES_COMMAND_OK,
- ANY_NTUPS,
- query));
-}
-
-/*
- * Executes the given SQL query.
- *
- * Invokes the exiting handler unless the given 'status' results.
- *
- * If successful, returns the query result.
- */
-PGresult *
-ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
-{
- return exiting_handler(PQexec(conn, query),
- conn,
- status,
- ANY_NTUPS,
- query);
-}
-
-/*
- * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
- * requires that exactly one row be returned.
- */
-PGresult *
-ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
-{
- return exiting_handler(PQexec(conn, query),
- conn,
- PGRES_TUPLES_OK,
- 1,
- query);
-}
-
void
ExecuteSqlStatementAH(Archive *AHX, const char *query)
{
diff --git a/src/bin/pg_dump/pg_backup_db.h b/src/bin/pg_dump/pg_backup_db.h
index 1aac600ece..018a28908e 100644
--- a/src/bin/pg_dump/pg_backup_db.h
+++ b/src/bin/pg_dump/pg_backup_db.h
@@ -13,23 +13,6 @@
extern int ExecuteSqlCommandBuf(Archive *AHX, const char *buf, size_t bufLen);
-#define POSITIVE_NTUPS (-1)
-#define ANY_NTUPS (-2)
-typedef PGresult *(*PGresultHandler) (PGresult *res,
- PGconn *conn,
- ExecStatusType expected_status,
- int expected_ntups,
- const char *query);
-
-extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
- ExecStatusType expected_status,
- int expected_ntups, const char *query);
-
-extern void ExecuteSqlStatement(PGconn *conn, const char *query);
-extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
- ExecStatusType expected_status);
-extern PGresult *ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query);
-
extern void ExecuteSqlStatementAH(Archive *AHX, const char *query);
extern PGresult *ExecuteSqlQueryAH(Archive *AHX, const char *query,
ExecStatusType status);
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index d6c328faf1..7fdbe08e11 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -27,6 +27,7 @@ OBJS = \
mbprint.o \
print.o \
psqlscan.o \
+ query_utils.o \
recovery_gen.o \
simple_list.o \
string_utils.o
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
new file mode 100644
index 0000000000..b28750f4b2
--- /dev/null
+++ b/src/fe_utils/query_utils.c
@@ -0,0 +1,114 @@
+/*-------------------------------------------------------------------------
+ *
+ * Query executing routines with facilities for modular error handling.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/query_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "fe_utils/exit_utils.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * The exiting query result handler embeds the historical pg_dump behavior
+ * under query error conditions, including exiting nicely. The 'conn' object
+ * is unused here, but is included in the interface for alternate query result
+ * handler implementations.
+ *
+ * Whether the query was successful is determined by comparing the returned
+ * status code against the expected status code, and by comparing the number of
+ * tuples returned from the query against expected_ntups. Special negative
+ * values of expected_ntups can be used to require at least one row or to
+ * disables ntup checking.
+ *
+ * Exits on failure. On successful query completion, returns the 'res'
+ * argument as a notational convenience.
+ */
+PGresult *
+exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
+{
+ if (PQresultStatus(res) != expected_status)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", query);
+ PQfinish(conn);
+ exit_nicely(1);
+ }
+ if (expected_ntups == POSITIVE_NTUPS || expected_ntups >= 0)
+ {
+ int ntups = PQntuples(res);
+
+ if (expected_ntups == POSITIVE_NTUPS)
+ {
+ if (ntups == 0)
+ fatal("query returned no rows: %s", query);
+ }
+ else if (ntups != expected_ntups)
+ {
+ /*
+ * Preserve historical message behavior of spelling "one" as the
+ * expected row count.
+ */
+ if (expected_ntups == 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+ fatal(ngettext("query returned %d row instead of %d: %s",
+ "query returned %d rows instead of %d: %s",
+ ntups),
+ ntups, expected_ntups, query);
+ }
+ }
+ return res;
+}
+
+/*
+ * Executes the given SQL query statement.
+ *
+ * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
+ */
+void
+ExecuteSqlStatement(PGconn *conn, const char *query)
+{
+ PQclear(exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_COMMAND_OK,
+ ANY_NTUPS,
+ query));
+}
+
+/*
+ * Executes the given SQL query.
+ *
+ * Invokes the exiting handler unless the given 'status' results.
+ *
+ * If successful, returns the query result.
+ */
+PGresult *
+ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ status,
+ ANY_NTUPS,
+ query);
+}
+
+/*
+ * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
+ * requires that exactly one row be returned.
+ */
+PGresult *
+ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
+{
+ return exiting_handler(PQexec(conn, query),
+ conn,
+ PGRES_TUPLES_OK,
+ 1,
+ query);
+}
diff --git a/src/include/fe_utils/query_utils.h b/src/include/fe_utils/query_utils.h
new file mode 100644
index 0000000000..f03d17b1ed
--- /dev/null
+++ b/src/include/fe_utils/query_utils.h
@@ -0,0 +1,34 @@
+/*-------------------------------------------------------------------------
+ *
+ * Query executing routines with facilities for modular error handling.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/query_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERY_UTILS_H
+#define QUERY_UTILS_H
+
+#include "libpq-fe.h"
+
+#define POSITIVE_NTUPS (-1)
+#define ANY_NTUPS (-2)
+typedef PGresult *(*PGresultHandler) (PGresult *res,
+ PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups,
+ const char *query);
+
+extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+
+extern void ExecuteSqlStatement(PGconn *conn, const char *query);
+extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
+ ExecStatusType expected_status);
+extern PGresult *ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query);
+
+#endif /* QUERY_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v32-0004-Adding-CurrentQueryHandler-logic.patchapplication/octet-stream; name=v32-0004-Adding-CurrentQueryHandler-logic.patch; x-unix-mode=0644Download
From 86a8078d2750a9ee42db5a341838fad6072252f5 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:25:29 -0800
Subject: [PATCH v32 4/9] Adding CurrentQueryHandler logic.
Extending the default set of PGresultHandlers and creating a
mechanism to switch between them using a new function
ResultHandlerSwitchTo, analogous to MemoryContextSwithTo. In
addition to the exiting_handler already created in a prior commit
(which embeds the historical behavior from pg_dump), adding a
quiet_handler which cleans up and exits without logging anything,
and a noop_handler which does nothing, leaving the responsibility
for cleanup handling to the caller.
---
src/fe_utils/query_utils.c | 81 ++++++++++++++++++++++--------
src/include/fe_utils/query_utils.h | 17 +++++++
2 files changed, 76 insertions(+), 22 deletions(-)
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
index b28750f4b2..355da6edaf 100644
--- a/src/fe_utils/query_utils.c
+++ b/src/fe_utils/query_utils.c
@@ -12,6 +12,12 @@
#include "fe_utils/exit_utils.h"
#include "fe_utils/query_utils.h"
+/*
+ * Global memory.
+ */
+
+PGresultHandler CurrentResultHandler = exiting_handler;
+
/*
* The exiting query result handler embeds the historical pg_dump behavior
* under query error conditions, including exiting nicely. The 'conn' object
@@ -29,7 +35,7 @@
*/
PGresult *
exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
- int expected_ntups, const char *query)
+ int expected_ntups, const char *query)
{
if (PQresultStatus(res) != expected_status)
{
@@ -67,48 +73,79 @@ exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
return res;
}
+/*
+ * Quietly cleans up and exits nicely unless the expected conditions were met.
+ */
+PGresult *
+quiet_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
+{
+ int ntups = PQntuples(res);
+
+ if ((PQresultStatus(res) != expected_status) ||
+ (expected_ntups == POSITIVE_NTUPS && ntups == 0) ||
+ (expected_ntups >= 0 && ntups != expected_ntups))
+ {
+ PQfinish(conn);
+ exit_nicely(1);
+ }
+
+ return res;
+}
+
+/*
+ * Does nothing other than returning the 'res' argument back to the caller.
+ * This handler is intended for callers who prefer to perform the error
+ * handling themselves.
+ */
+PGresult *
+noop_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
+{
+ return res;
+}
+
/*
* Executes the given SQL query statement.
*
- * Invokes the exiting handler for any but PGRES_COMMAND_OK status.
+ * Expects a PGRES_COMMAND_OK status.
*/
void
ExecuteSqlStatement(PGconn *conn, const char *query)
{
- PQclear(exiting_handler(PQexec(conn, query),
- conn,
- PGRES_COMMAND_OK,
- ANY_NTUPS,
- query));
+ PQclear(CurrentResultHandler(PQexec(conn, query),
+ conn,
+ PGRES_COMMAND_OK,
+ ANY_NTUPS,
+ query));
}
/*
* Executes the given SQL query.
*
- * Invokes the exiting handler unless the given 'status' results.
- *
- * If successful, returns the query result.
+ * Expects the given status.
*/
PGresult *
ExecuteSqlQuery(PGconn *conn, const char *query, ExecStatusType status)
{
- return exiting_handler(PQexec(conn, query),
- conn,
- status,
- ANY_NTUPS,
- query);
+ return CurrentResultHandler(PQexec(conn, query),
+ conn,
+ status,
+ ANY_NTUPS,
+ query);
}
/*
- * Like ExecuteSqlQuery, but requires PGRES_TUPLES_OK status and
- * requires that exactly one row be returned.
+ * Executes the given SQL query.
+ *
+ * Expects a PGRES_TUPLES_OK status and precisely one row.
*/
PGresult *
ExecuteSqlQueryForSingleRow(PGconn *conn, const char *query)
{
- return exiting_handler(PQexec(conn, query),
- conn,
- PGRES_TUPLES_OK,
- 1,
- query);
+ return CurrentResultHandler(PQexec(conn, query),
+ conn,
+ PGRES_TUPLES_OK,
+ 1,
+ query);
}
diff --git a/src/include/fe_utils/query_utils.h b/src/include/fe_utils/query_utils.h
index f03d17b1ed..80958e94fb 100644
--- a/src/include/fe_utils/query_utils.h
+++ b/src/include/fe_utils/query_utils.h
@@ -22,9 +22,26 @@ typedef PGresult *(*PGresultHandler) (PGresult *res,
int expected_ntups,
const char *query);
+extern PGresultHandler CurrentResultHandler;
+
+static inline PGresultHandler
+ResultHandlerSwitchTo(PGresultHandler handler)
+{
+ PGresultHandler old = CurrentResultHandler;
+
+ CurrentResultHandler = handler;
+ return old;
+}
+
extern PGresult *exiting_handler(PGresult *res, PGconn *conn,
ExecStatusType expected_status,
int expected_ntups, const char *query);
+extern PGresult *quiet_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+extern PGresult *noop_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
extern void ExecuteSqlStatement(PGconn *conn, const char *query);
extern PGresult *ExecuteSqlQuery(PGconn *conn, const char *query,
--
2.21.1 (Apple Git-122.3)
v32-0005-Refactoring-pg_dumpall-functions.patchapplication/octet-stream; name=v32-0005-Refactoring-pg_dumpall-functions.patch; x-unix-mode=0644Download
From 34718483db460517416a84d505ce9a166abd5ebb Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:37:42 -0800
Subject: [PATCH v32 5/9] Refactoring pg_dumpall functions.
The functions executeQuery and executeCommand in pg_dumpall.c were
not refactored in prior commits along with functions from
pg_backup_db.c because they were in a separate file, but now that
the infrastructure has been moved to fe_utils/query_utils,
refactoring these two functions to use it.
---
src/bin/pg_dump/pg_dumpall.c | 31 +++----------------------------
1 file changed, 3 insertions(+), 28 deletions(-)
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 85d08ad660..807226537a 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -23,6 +23,7 @@
#include "common/logging.h"
#include "common/string.h"
#include "dumputils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
#include "pg_backup.h"
@@ -1874,21 +1875,8 @@ constructConnStr(const char **keywords, const char **values)
static PGresult *
executeQuery(PGconn *conn, const char *query)
{
- PGresult *res;
-
pg_log_info("executing %s", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_TUPLES_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_error("query was: %s", query);
- PQfinish(conn);
- exit_nicely(1);
- }
-
- return res;
+ return ExecuteSqlQuery(conn, query, PGRES_TUPLES_OK);
}
/*
@@ -1897,21 +1885,8 @@ executeQuery(PGconn *conn, const char *query)
static void
executeCommand(PGconn *conn, const char *query)
{
- PGresult *res;
-
pg_log_info("executing %s", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_COMMAND_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_error("query was: %s", query);
- PQfinish(conn);
- exit_nicely(1);
- }
-
- PQclear(res);
+ PQclear(ExecuteSqlQuery(conn, query, PGRES_COMMAND_OK));
}
--
2.21.1 (Apple Git-122.3)
v32-0006-Refactoring-expand_schema_name_patterns-and-frie.patchapplication/octet-stream; name=v32-0006-Refactoring-expand_schema_name_patterns-and-frie.patch; x-unix-mode=0644Download
From 6d91a3031f28387fbc2a6d776278abfd2b04bb1e Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 13:51:02 -0800
Subject: [PATCH v32 6/9] Refactoring expand_schema_name_patterns and friends.
Refactoring these functions to take a PGconn pointer rather than an
Archive pointer in preparation for moving these functions to
fe_utils. This is much like what was previously done for
ExecuteSqlQuery and friends, and for the same reasons.
---
src/bin/pg_dump/pg_dump.c | 47 ++++++++++++++++++++++-----------------
1 file changed, 27 insertions(+), 20 deletions(-)
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e8985a834f..41ce4b7866 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,6 +55,7 @@
#include "catalog/pg_type_d.h"
#include "common/connect.h"
#include "dumputils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
#include "libpq/libpq-fs.h"
@@ -147,14 +148,14 @@ static void setup_connection(Archive *AH,
const char *dumpencoding, const char *dumpsnapshot,
char *use_role);
static ArchiveFormat parseArchiveFormat(const char *format, ArchiveMode *mode);
-static void expand_schema_name_patterns(Archive *fout,
+static void expand_schema_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids,
bool strict_names);
-static void expand_foreign_server_name_patterns(Archive *fout,
+static void expand_foreign_server_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids);
-static void expand_table_name_patterns(Archive *fout,
+static void expand_table_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids,
bool strict_names);
@@ -798,13 +799,15 @@ main(int argc, char **argv)
/* Expand schema selection patterns into OID lists */
if (schema_include_patterns.head != NULL)
{
- expand_schema_name_patterns(fout, &schema_include_patterns,
+ expand_schema_name_patterns(GetConnection(fout),
+ &schema_include_patterns,
&schema_include_oids,
strict_names);
if (schema_include_oids.head == NULL)
fatal("no matching schemas were found");
}
- expand_schema_name_patterns(fout, &schema_exclude_patterns,
+ expand_schema_name_patterns(GetConnection(fout),
+ &schema_exclude_patterns,
&schema_exclude_oids,
false);
/* non-matching exclusion patterns aren't an error */
@@ -812,21 +815,25 @@ main(int argc, char **argv)
/* Expand table selection patterns into OID lists */
if (table_include_patterns.head != NULL)
{
- expand_table_name_patterns(fout, &table_include_patterns,
+ expand_table_name_patterns(GetConnection(fout),
+ &table_include_patterns,
&table_include_oids,
strict_names);
if (table_include_oids.head == NULL)
fatal("no matching tables were found");
}
- expand_table_name_patterns(fout, &table_exclude_patterns,
+ expand_table_name_patterns(GetConnection(fout),
+ &table_exclude_patterns,
&table_exclude_oids,
false);
- expand_table_name_patterns(fout, &tabledata_exclude_patterns,
+ expand_table_name_patterns(GetConnection(fout),
+ &tabledata_exclude_patterns,
&tabledata_exclude_oids,
false);
- expand_foreign_server_name_patterns(fout, &foreign_servers_include_patterns,
+ expand_foreign_server_name_patterns(GetConnection(fout),
+ &foreign_servers_include_patterns,
&foreign_servers_include_oids);
/* non-matching exclusion patterns aren't an error */
@@ -1316,7 +1323,7 @@ parseArchiveFormat(const char *format, ArchiveMode *mode)
* and append them to the given OID list.
*/
static void
-expand_schema_name_patterns(Archive *fout,
+expand_schema_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids,
bool strict_names)
@@ -1340,10 +1347,10 @@ expand_schema_name_patterns(Archive *fout,
{
appendPQExpBufferStr(query,
"SELECT oid FROM pg_catalog.pg_namespace n\n");
- processSQLNamePattern(GetConnection(fout), query, cell->val, false,
+ processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "n.nspname", NULL, NULL);
- res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
if (strict_names && PQntuples(res) == 0)
fatal("no matching schemas were found for pattern \"%s\"", cell->val);
@@ -1364,7 +1371,7 @@ expand_schema_name_patterns(Archive *fout,
* and append them to the given OID list.
*/
static void
-expand_foreign_server_name_patterns(Archive *fout,
+expand_foreign_server_name_patterns(PGconn *conn,
SimpleStringList *patterns,
SimpleOidList *oids)
{
@@ -1387,10 +1394,10 @@ expand_foreign_server_name_patterns(Archive *fout,
{
appendPQExpBufferStr(query,
"SELECT oid FROM pg_catalog.pg_foreign_server s\n");
- processSQLNamePattern(GetConnection(fout), query, cell->val, false,
+ processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "s.srvname", NULL, NULL);
- res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
if (PQntuples(res) == 0)
fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
@@ -1410,7 +1417,7 @@ expand_foreign_server_name_patterns(Archive *fout,
* in pg_dumpall.c
*/
static void
-expand_table_name_patterns(Archive *fout,
+expand_table_name_patterns(PGconn *conn,
SimpleStringList *patterns, SimpleOidList *oids,
bool strict_names)
{
@@ -1446,13 +1453,13 @@ expand_table_name_patterns(Archive *fout,
RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
RELKIND_PARTITIONED_TABLE);
- processSQLNamePattern(GetConnection(fout), query, cell->val, true,
+ processSQLNamePattern(conn, query, cell->val, true,
false, "n.nspname", "c.relname", NULL,
"pg_catalog.pg_table_is_visible(c.oid)");
- ExecuteSqlStatementAH(fout, "RESET search_path");
- res = ExecuteSqlQueryAH(fout, query->data, PGRES_TUPLES_OK);
- PQclear(ExecuteSqlQueryForSingleRowAH(fout,
+ ExecuteSqlStatement(conn, "RESET search_path");
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(conn,
ALWAYS_SECURE_SEARCH_PATH_SQL));
if (strict_names && PQntuples(res) == 0)
fatal("no matching tables were found for pattern \"%s\"", cell->val);
--
2.21.1 (Apple Git-122.3)
v32-0007-Moving-pg_dump-functions-to-new-file-option_util.patchapplication/octet-stream; name=v32-0007-Moving-pg_dump-functions-to-new-file-option_util.patch; x-unix-mode=0644Download
From 0ceafd8608c542ca21ce8b8ea111a1f5115bbc0a Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 14:07:58 -0800
Subject: [PATCH v32 7/9] Moving pg_dump functions to new file option_utils
Moving the recently refactored functions
expand_schema_name_patterns, expand_foreign_server_name_patterns,
and expand_table_name_patterns from pg_dump.c, along with the
function expand_dbname_patterns from pg_dumpall.c, into the new file
fe_utils/option_utils.c
---
src/bin/pg_dump/pg_dump.c | 170 +--------------------
src/bin/pg_dump/pg_dumpall.c | 46 +-----
src/fe_utils/Makefile | 1 +
src/fe_utils/option_utils.c | 225 ++++++++++++++++++++++++++++
src/include/fe_utils/option_utils.h | 35 +++++
5 files changed, 263 insertions(+), 214 deletions(-)
create mode 100644 src/fe_utils/option_utils.c
create mode 100644 src/include/fe_utils/option_utils.h
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 41ce4b7866..c334b9e829 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -55,6 +55,7 @@
#include "catalog/pg_type_d.h"
#include "common/connect.h"
#include "dumputils.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
@@ -148,17 +149,6 @@ static void setup_connection(Archive *AH,
const char *dumpencoding, const char *dumpsnapshot,
char *use_role);
static ArchiveFormat parseArchiveFormat(const char *format, ArchiveMode *mode);
-static void expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names);
-static void expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids);
-static void expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names);
static NamespaceInfo *findNamespace(Oid nsoid);
static void dumpTableData(Archive *fout, TableDataInfo *tdinfo);
static void refreshMatViewData(Archive *fout, TableDataInfo *tdinfo);
@@ -1318,164 +1308,6 @@ parseArchiveFormat(const char *format, ArchiveMode *mode)
return archiveFormat;
}
-/*
- * Find the OIDs of all schemas matching the given list of patterns,
- * and append them to the given OID list.
- */
-static void
-expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names)
-{
- PQExpBuffer query;
- PGresult *res;
- SimpleStringListCell *cell;
- int i;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * The loop below runs multiple SELECTs might sometimes result in
- * duplicate entries in the OID list, but we don't care.
- */
-
- for (cell = patterns->head; cell; cell = cell->next)
- {
- appendPQExpBufferStr(query,
- "SELECT oid FROM pg_catalog.pg_namespace n\n");
- processSQLNamePattern(conn, query, cell->val, false,
- false, NULL, "n.nspname", NULL, NULL);
-
- res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- if (strict_names && PQntuples(res) == 0)
- fatal("no matching schemas were found for pattern \"%s\"", cell->val);
-
- for (i = 0; i < PQntuples(res); i++)
- {
- simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
- }
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
-/*
- * Find the OIDs of all foreign servers matching the given list of patterns,
- * and append them to the given OID list.
- */
-static void
-expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids)
-{
- PQExpBuffer query;
- PGresult *res;
- SimpleStringListCell *cell;
- int i;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * The loop below runs multiple SELECTs might sometimes result in
- * duplicate entries in the OID list, but we don't care.
- */
-
- for (cell = patterns->head; cell; cell = cell->next)
- {
- appendPQExpBufferStr(query,
- "SELECT oid FROM pg_catalog.pg_foreign_server s\n");
- processSQLNamePattern(conn, query, cell->val, false,
- false, NULL, "s.srvname", NULL, NULL);
-
- res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- if (PQntuples(res) == 0)
- fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
-
- for (i = 0; i < PQntuples(res); i++)
- simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
-/*
- * Find the OIDs of all tables matching the given list of patterns,
- * and append them to the given OID list. See also expand_dbname_patterns()
- * in pg_dumpall.c
- */
-static void
-expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns, SimpleOidList *oids,
- bool strict_names)
-{
- PQExpBuffer query;
- PGresult *res;
- SimpleStringListCell *cell;
- int i;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * this might sometimes result in duplicate entries in the OID list, but
- * we don't care.
- */
-
- for (cell = patterns->head; cell; cell = cell->next)
- {
- /*
- * Query must remain ABSOLUTELY devoid of unqualified names. This
- * would be unnecessary given a pg_table_is_visible() variant taking a
- * search_path argument.
- */
- appendPQExpBuffer(query,
- "SELECT c.oid"
- "\nFROM pg_catalog.pg_class c"
- "\n LEFT JOIN pg_catalog.pg_namespace n"
- "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
- "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
- "\n (array['%c', '%c', '%c', '%c', '%c', '%c'])\n",
- RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
- RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
- RELKIND_PARTITIONED_TABLE);
- processSQLNamePattern(conn, query, cell->val, true,
- false, "n.nspname", "c.relname", NULL,
- "pg_catalog.pg_table_is_visible(c.oid)");
-
- ExecuteSqlStatement(conn, "RESET search_path");
- res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- PQclear(ExecuteSqlQueryForSingleRow(conn,
- ALWAYS_SECURE_SEARCH_PATH_SQL));
- if (strict_names && PQntuples(res) == 0)
- fatal("no matching tables were found for pattern \"%s\"", cell->val);
-
- for (i = 0; i < PQntuples(res); i++)
- {
- simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
- }
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
/*
* checkExtensionMembership
* Determine whether object is an extension member, and if so,
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 807226537a..01db15dfda 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -23,6 +23,7 @@
#include "common/logging.h"
#include "common/string.h"
#include "dumputils.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#include "getopt_long.h"
@@ -54,8 +55,6 @@ static PGconn *connectDatabase(const char *dbname, const char *connstr, const ch
static char *constructConnStr(const char **keywords, const char **values);
static PGresult *executeQuery(PGconn *conn, const char *query);
static void executeCommand(PGconn *conn, const char *query);
-static void expand_dbname_patterns(PGconn *conn, SimpleStringList *patterns,
- SimpleStringList *names);
static char pg_dump_bin[MAXPGPATH];
static const char *progname;
@@ -1409,49 +1408,6 @@ dumpUserConfig(PGconn *conn, const char *username)
destroyPQExpBuffer(buf);
}
-/*
- * Find a list of database names that match the given patterns.
- * See also expand_table_name_patterns() in pg_dump.c
- */
-static void
-expand_dbname_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleStringList *names)
-{
- PQExpBuffer query;
- PGresult *res;
-
- if (patterns->head == NULL)
- return; /* nothing to do */
-
- query = createPQExpBuffer();
-
- /*
- * The loop below runs multiple SELECTs, which might sometimes result in
- * duplicate entries in the name list, but we don't care, since all we're
- * going to do is test membership of the list.
- */
-
- for (SimpleStringListCell *cell = patterns->head; cell; cell = cell->next)
- {
- appendPQExpBufferStr(query,
- "SELECT datname FROM pg_catalog.pg_database n\n");
- processSQLNamePattern(conn, query, cell->val, false,
- false, NULL, "datname", NULL, NULL);
-
- res = executeQuery(conn, query->data);
- for (int i = 0; i < PQntuples(res); i++)
- {
- simple_string_list_append(names, PQgetvalue(res, i, 0));
- }
-
- PQclear(res);
- resetPQExpBuffer(query);
- }
-
- destroyPQExpBuffer(query);
-}
-
/*
* Dump contents of databases.
*/
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 7fdbe08e11..eb937e4648 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -25,6 +25,7 @@ OBJS = \
conditional.o \
exit_utils.o \
mbprint.o \
+ option_utils.o \
print.o \
psqlscan.o \
query_utils.o \
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
new file mode 100644
index 0000000000..7893df77aa
--- /dev/null
+++ b/src/fe_utils/option_utils.c
@@ -0,0 +1,225 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command-line option facilities for frontend code
+ *
+ * Functions for converting shell-style patterns into simple lists of Oids for
+ * database objects that match the patterns.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/option_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_class.h"
+#include "common/connect.h"
+#include "fe_utils/exit_utils.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "libpq-fe.h"
+#include "pqexpbuffer.h"
+
+/*
+ * Find the OIDs of all schemas matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+void
+expand_schema_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_namespace n\n");
+ processSQLNamePattern(conn, query, cell->val, false,
+ false, NULL, "n.nspname", NULL, NULL);
+
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching schemas were found for pattern \"%s\"", cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all foreign servers matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+void
+expand_foreign_server_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs might sometimes result in
+ * duplicate entries in the OID list, but we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT oid FROM pg_catalog.pg_foreign_server s\n");
+ processSQLNamePattern(conn, query, cell->val, false,
+ false, NULL, "s.srvname", NULL, NULL);
+
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ if (PQntuples(res) == 0)
+ fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find the OIDs of all tables matching the given list of patterns,
+ * and append them to the given OID list.
+ */
+void
+expand_table_name_patterns(PGconn *conn,
+ SimpleStringList *patterns, SimpleOidList *oids,
+ bool strict_names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+ SimpleStringListCell *cell;
+ int i;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * this might sometimes result in duplicate entries in the OID list, but
+ * we don't care.
+ */
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /*
+ * Query must remain ABSOLUTELY devoid of unqualified names. This
+ * would be unnecessary given a pg_table_is_visible() variant taking a
+ * search_path argument.
+ */
+ appendPQExpBuffer(query,
+ "SELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\n LEFT JOIN pg_catalog.pg_namespace n"
+ "\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
+ "\n (array['%c', '%c', '%c', '%c', '%c', '%c'])\n",
+ RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
+ RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
+ RELKIND_PARTITIONED_TABLE);
+ processSQLNamePattern(conn, query, cell->val, true,
+ false, "n.nspname", "c.relname", NULL,
+ "pg_catalog.pg_table_is_visible(c.oid)");
+
+ ExecuteSqlStatement(conn, "RESET search_path");
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ PQclear(ExecuteSqlQueryForSingleRow(conn,
+ ALWAYS_SECURE_SEARCH_PATH_SQL));
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching tables were found for pattern \"%s\"", cell->val);
+
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(oids, atooid(PQgetvalue(res, i, 0)));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
+
+/*
+ * Find a list of database names that match the given patterns.
+ */
+void
+expand_dbname_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleStringList *names)
+{
+ PQExpBuffer query;
+ PGresult *res;
+
+ if (patterns->head == NULL)
+ return; /* nothing to do */
+
+ query = createPQExpBuffer();
+
+ /*
+ * The loop below runs multiple SELECTs, which might sometimes result in
+ * duplicate entries in the name list, but we don't care, since all we're
+ * going to do is test membership of the list.
+ */
+
+ for (SimpleStringListCell *cell = patterns->head; cell; cell = cell->next)
+ {
+ appendPQExpBufferStr(query,
+ "SELECT datname FROM pg_catalog.pg_database n\n");
+ processSQLNamePattern(conn, query, cell->val, false,
+ false, NULL, "datname", NULL, NULL);
+
+ pg_log_info("executing %s", query->data);
+ res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ for (int i = 0; i < PQntuples(res); i++)
+ {
+ simple_string_list_append(names, PQgetvalue(res, i, 0));
+ }
+
+ PQclear(res);
+ resetPQExpBuffer(query);
+ }
+
+ destroyPQExpBuffer(query);
+}
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
new file mode 100644
index 0000000000..d626a0bbc9
--- /dev/null
+++ b/src/include/fe_utils/option_utils.h
@@ -0,0 +1,35 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command-line option facilities for frontend code
+ *
+ * Functions for converting shell-style patterns into simple lists of Oids for
+ * database objects that match the patterns.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/option_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OPTION_UTILS_H
+#define OPTION_UTILS_H
+
+#include "fe_utils/simple_list.h"
+#include "libpq-fe.h"
+
+extern void expand_schema_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+extern void expand_foreign_server_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids);
+extern void expand_table_name_patterns(PGconn *conn,
+ SimpleStringList *patterns,
+ SimpleOidList *oids,
+ bool strict_names);
+extern void expand_dbname_patterns(PGconn *conn, SimpleStringList *patterns,
+ SimpleStringList *names);
+
+#endif /* OPTION_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v32-0008-Normalizing-option_utils-interface.patchapplication/octet-stream; name=v32-0008-Normalizing-option_utils-interface.patch; x-unix-mode=0644Download
From 521ccf1daff979b9beb5bd08f395217175d5f948 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 14:37:44 -0800
Subject: [PATCH v32 8/9] Normalizing option_utils interface.
The functions in option_utils were copied from pg_dump, mostly preserving
the function signatures. But the signatures and corresponding functionality
were originally written based solely on pg_dump's needs, not with the goal
of creating a consistent interface. Fixing that.
---
src/bin/pg_dump/pg_dump.c | 58 +++++++++++++++++++------
src/bin/pg_dump/pg_dumpall.c | 4 +-
src/fe_utils/option_utils.c | 66 ++++++++++++++++++++---------
src/fe_utils/string_utils.c | 50 ++++++++++++++++++++++
src/include/fe_utils/option_utils.h | 29 +++++++++----
src/include/fe_utils/string_utils.h | 6 +++
6 files changed, 169 insertions(+), 44 deletions(-)
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index c334b9e829..5c446c0f24 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -125,6 +125,17 @@ static SimpleOidList tabledata_exclude_oids = {NULL, NULL};
static SimpleStringList foreign_servers_include_patterns = {NULL, NULL};
static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
+/*
+ * Cstring list of relkinds which qualify as tables for our purposes when
+ * processing table inclusion or exclusion patterns.
+ */
+#define TABLE_RELKIND_LIST CppAsString2(RELKIND_RELATION) ", " \
+ CppAsString2(RELKIND_SEQUENCE) ", " \
+ CppAsString2(RELKIND_VIEW) ", " \
+ CppAsString2(RELKIND_MATVIEW) ", " \
+ CppAsString2(RELKIND_FOREIGN_TABLE) ", " \
+ CppAsString2(RELKIND_PARTITIONED_TABLE)
+
static const CatalogId nilCatalogId = {0, 0};
/* override for standard extra_float_digits setting */
@@ -791,6 +802,7 @@ main(int argc, char **argv)
{
expand_schema_name_patterns(GetConnection(fout),
&schema_include_patterns,
+ NULL,
&schema_include_oids,
strict_names);
if (schema_include_oids.head == NULL)
@@ -798,6 +810,7 @@ main(int argc, char **argv)
}
expand_schema_name_patterns(GetConnection(fout),
&schema_exclude_patterns,
+ NULL,
&schema_exclude_oids,
false);
/* non-matching exclusion patterns aren't an error */
@@ -805,26 +818,43 @@ main(int argc, char **argv)
/* Expand table selection patterns into OID lists */
if (table_include_patterns.head != NULL)
{
- expand_table_name_patterns(GetConnection(fout),
- &table_include_patterns,
- &table_include_oids,
- strict_names);
+ expand_rel_name_patterns(GetConnection(fout),
+ &table_include_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_include_oids,
+ strict_names,
+ true);
if (table_include_oids.head == NULL)
fatal("no matching tables were found");
}
- expand_table_name_patterns(GetConnection(fout),
- &table_exclude_patterns,
- &table_exclude_oids,
- false);
-
- expand_table_name_patterns(GetConnection(fout),
- &tabledata_exclude_patterns,
- &tabledata_exclude_oids,
- false);
+ expand_rel_name_patterns(GetConnection(fout),
+ &table_exclude_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_exclude_oids,
+ false,
+ true);
+
+ expand_rel_name_patterns(GetConnection(fout),
+ &tabledata_exclude_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &tabledata_exclude_oids,
+ false,
+ true);
expand_foreign_server_name_patterns(GetConnection(fout),
&foreign_servers_include_patterns,
- &foreign_servers_include_oids);
+ NULL,
+ &foreign_servers_include_oids,
+ true);
/* non-matching exclusion patterns aren't an error */
diff --git a/src/bin/pg_dump/pg_dumpall.c b/src/bin/pg_dump/pg_dumpall.c
index 01db15dfda..2b3a4e3349 100644
--- a/src/bin/pg_dump/pg_dumpall.c
+++ b/src/bin/pg_dump/pg_dumpall.c
@@ -471,8 +471,8 @@ main(int argc, char *argv[])
/*
* Get a list of database names that match the exclude patterns
*/
- expand_dbname_patterns(conn, &database_exclude_patterns,
- &database_exclude_names);
+ expand_dbname_patterns(conn, &database_exclude_patterns, NULL,
+ &database_exclude_names, false);
/*
* Open the output file if required, otherwise use stdout
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
index 7893df77aa..76ca456784 100644
--- a/src/fe_utils/option_utils.c
+++ b/src/fe_utils/option_utils.c
@@ -14,6 +14,7 @@
*/
#include "postgres_fe.h"
+#include "catalog/pg_am.h"
#include "catalog/pg_class.h"
#include "common/connect.h"
#include "fe_utils/exit_utils.h"
@@ -30,7 +31,8 @@
*/
void
expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
SimpleOidList *oids,
bool strict_names)
{
@@ -55,6 +57,7 @@ expand_schema_name_patterns(PGconn *conn,
"SELECT oid FROM pg_catalog.pg_namespace n\n");
processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "n.nspname", NULL, NULL);
+ exclude_filter(query, "n.oid", exclude_oids);
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
if (strict_names && PQntuples(res) == 0)
@@ -78,8 +81,10 @@ expand_schema_name_patterns(PGconn *conn,
*/
void
expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids)
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names)
{
PQExpBuffer query;
PGresult *res;
@@ -102,9 +107,10 @@ expand_foreign_server_name_patterns(PGconn *conn,
"SELECT oid FROM pg_catalog.pg_foreign_server s\n");
processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "s.srvname", NULL, NULL);
+ exclude_filter(query, "s.oid", exclude_oids);
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
- if (PQntuples(res) == 0)
+ if (strict_names && PQntuples(res) == 0)
fatal("no matching foreign servers were found for pattern \"%s\"", cell->val);
for (i = 0; i < PQntuples(res); i++)
@@ -118,18 +124,32 @@ expand_foreign_server_name_patterns(PGconn *conn,
}
/*
- * Find the OIDs of all tables matching the given list of patterns,
- * and append them to the given OID list.
+ * Find the OIDs of all relations matching the given list of patterns
+ * and restrictions, and append them to the given OID list.
*/
void
-expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns, SimpleOidList *oids,
- bool strict_names)
+expand_rel_name_patterns(PGconn *conn,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ const char *relkinds,
+ char amtype,
+ SimpleOidList *oids,
+ bool strict_names,
+ bool restrict_visible)
{
PQExpBuffer query;
PGresult *res;
SimpleStringListCell *cell;
int i;
+ const char *visibility_rule;
+
+ Assert(amtype == AMTYPE_TABLE || amtype == AMTYPE_INDEX);
+
+ if (restrict_visible)
+ visibility_rule = "pg_catalog.pg_table_is_visible(c.oid)";
+ else
+ visibility_rule = NULL;
if (patterns->head == NULL)
return; /* nothing to do */
@@ -154,20 +174,22 @@ expand_table_name_patterns(PGconn *conn,
"\n LEFT JOIN pg_catalog.pg_namespace n"
"\n ON n.oid OPERATOR(pg_catalog.=) c.relnamespace"
"\nWHERE c.relkind OPERATOR(pg_catalog.=) ANY"
- "\n (array['%c', '%c', '%c', '%c', '%c', '%c'])\n",
- RELKIND_RELATION, RELKIND_SEQUENCE, RELKIND_VIEW,
- RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
- RELKIND_PARTITIONED_TABLE);
+ "\n (array[%s])\n", relkinds);
processSQLNamePattern(conn, query, cell->val, true,
- false, "n.nspname", "c.relname", NULL,
- "pg_catalog.pg_table_is_visible(c.oid)");
+ false, "n.nspname", "c.relname", NULL, visibility_rule);
+ exclude_filter(query, "n.oid", exclude_nsp_oids);
+ exclude_filter(query, "c.oid", exclude_oids);
ExecuteSqlStatement(conn, "RESET search_path");
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
PQclear(ExecuteSqlQueryForSingleRow(conn,
ALWAYS_SECURE_SEARCH_PATH_SQL));
if (strict_names && PQntuples(res) == 0)
- fatal("no matching tables were found for pattern \"%s\"", cell->val);
+ {
+ if (amtype == AMTYPE_TABLE)
+ fatal("no matching tables were found for pattern \"%s\"", cell->val);
+ fatal("no matching indexes were found for pattern \"%s\"", cell->val);
+ }
for (i = 0; i < PQntuples(res); i++)
{
@@ -186,8 +208,10 @@ expand_table_name_patterns(PGconn *conn,
*/
void
expand_dbname_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleStringList *names)
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleStringList *names,
+ bool strict_names)
{
PQExpBuffer query;
PGresult *res;
@@ -206,12 +230,16 @@ expand_dbname_patterns(PGconn *conn,
for (SimpleStringListCell *cell = patterns->head; cell; cell = cell->next)
{
appendPQExpBufferStr(query,
- "SELECT datname FROM pg_catalog.pg_database n\n");
+ "SELECT datname FROM pg_catalog.pg_database d\n");
processSQLNamePattern(conn, query, cell->val, false,
false, NULL, "datname", NULL, NULL);
+ exclude_filter(query, "d.oid", exclude_oids);
pg_log_info("executing %s", query->data);
res = ExecuteSqlQuery(conn, query->data, PGRES_TUPLES_OK);
+ if (strict_names && PQntuples(res) == 0)
+ fatal("no matching databases were found for pattern \"%s\"", cell->val);
+
for (int i = 0; i < PQntuples(res); i++)
{
simple_string_list_append(names, PQgetvalue(res, i, 0));
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index a1a9d691d5..4e57a6f940 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -797,6 +797,56 @@ appendReloptionsArray(PQExpBuffer buffer, const char *reloptions,
return true;
}
+/*
+ * Internal implementation of include_filter and exclude_filter.
+ */
+static void
+apply_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids,
+ bool include)
+{
+ const SimpleOidListCell *cell;
+ const char *comma;
+
+ if (!oids || !oids->head)
+ return;
+
+ if (include)
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.=) ANY(array[", lval);
+ else
+ appendPQExpBuffer(querybuf, "\nAND %s OPERATOR(pg_catalog.!=) ALL(array[", lval);
+
+ for (comma = "", cell = oids->head; cell; comma = ", ", cell = cell->next)
+ appendPQExpBuffer(querybuf, "%s%u", comma, cell->val);
+ appendPQExpBuffer(querybuf, "]::OID[])");
+}
+
+/*
+ * Conditionally add a restriction to a query such that lval must be an Oid in
+ * the given list of Oids, except that for a null or empty oids list argument,
+ * no filtering is done and we return without having modified the query buffer.
+ *
+ * The query argument must already have begun the WHERE clause and must be in a
+ * state where we can append an AND clause. No checking of this requirement is
+ * done here.
+ *
+ * On return, the query buffer will be extended with an AND clause that filters
+ * only those rows where the lval is an Oid present in the given list of oids.
+ */
+void
+include_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, true);
+}
+
+/*
+ * Same as include_filter, above, except that for a non-null, non-empty oids
+ * list, the lval is restricted to not be any of the values in the list.
+ */
+void
+exclude_filter(PQExpBuffer querybuf, const char *lval, const SimpleOidList *oids)
+{
+ apply_filter(querybuf, lval, oids, false);
+}
/*
* processSQLNamePattern
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
index d626a0bbc9..53da30754f 100644
--- a/src/include/fe_utils/option_utils.h
+++ b/src/include/fe_utils/option_utils.h
@@ -19,17 +19,28 @@
#include "libpq-fe.h"
extern void expand_schema_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
SimpleOidList *oids,
bool strict_names);
extern void expand_foreign_server_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids);
-extern void expand_table_name_patterns(PGconn *conn,
- SimpleStringList *patterns,
- SimpleOidList *oids,
- bool strict_names);
-extern void expand_dbname_patterns(PGconn *conn, SimpleStringList *patterns,
- SimpleStringList *names);
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleOidList *oids,
+ bool strict_names);
+extern void expand_rel_name_patterns(PGconn *conn,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_nsp_oids,
+ const SimpleOidList *exclude_oids,
+ const char *relkinds,
+ char amtype,
+ SimpleOidList *oids,
+ bool strict_names,
+ bool restrict_visible);
+extern void expand_dbname_patterns(PGconn *conn,
+ const SimpleStringList *patterns,
+ const SimpleOidList *exclude_oids,
+ SimpleStringList *names,
+ bool strict_names);
#endif /* OPTION_UTILS_H */
diff --git a/src/include/fe_utils/string_utils.h b/src/include/fe_utils/string_utils.h
index c290c302f5..301a8eef4d 100644
--- a/src/include/fe_utils/string_utils.h
+++ b/src/include/fe_utils/string_utils.h
@@ -16,6 +16,7 @@
#ifndef STRING_UTILS_H
#define STRING_UTILS_H
+#include "fe_utils/simple_list.h"
#include "libpq-fe.h"
#include "pqexpbuffer.h"
@@ -50,6 +51,11 @@ extern bool parsePGArray(const char *atext, char ***itemarray, int *nitems);
extern bool appendReloptionsArray(PQExpBuffer buffer, const char *reloptions,
const char *prefix, int encoding, bool std_strings);
+extern void include_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids);
+extern void exclude_filter(PQExpBuffer querybuf, const char *lval,
+ const SimpleOidList *oids);
+
extern bool processSQLNamePattern(PGconn *conn, PQExpBuffer buf,
const char *pattern,
bool have_where, bool force_escape,
--
2.21.1 (Apple Git-122.3)
v32-0009-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v32-0009-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 01267b3ca3cf9651cd698342fe383fbe00028fde Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 6 Jan 2021 15:47:08 -0800
Subject: [PATCH v32 9/9] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 29 +
contrib/pg_amcheck/pg_amcheck.c | 886 +++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.control | 5 +
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 60 ++
contrib/pg_amcheck/t/003_check.pl | 248 ++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 496 ++++++++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 ++
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 493 ++++++++++++
src/fe_utils/query_utils.c | 8 +-
src/tools/msvc/Mkvcbuild.pm | 10 +-
src/tools/pgindent/typedefs.list | 2 +
16 files changed, 2295 insertions(+), 9 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.control
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 7a4866e338..0fd4125902 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..c21a14de31
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..bc61ee7970
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,29 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+SHLIB_PREREQS = submake-libpq
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..97df1d1074
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,886 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am.h"
+#include "catalog/pg_class.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "common/string.h"
+#include "common/username.h"
+#include "fe_utils/exit_utils.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/print.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "getopt_long.h"
+#include "pg_getopt.h"
+#include "storage/block.h"
+
+typedef struct ConnectOptions
+{
+ char *dbname;
+ char *host;
+ char *port;
+ char *username;
+} ConnectOptions;
+
+typedef enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+} trivalue;
+
+typedef struct
+{
+ bool notty; /* stdin or stdout is not a tty (as determined
+ * on startup) */
+ trivalue getPassword; /* prompt for a username and password */
+ const char *progname; /* in case you renamed pg_amcheck */
+ bool strict_names; /* The specified names/patterns should to
+ * match at least one entity */
+ bool on_error_stop; /* The checking of each table should stop
+ * after the first corrupt page is found. */
+ bool skip_frozen; /* Do not check pages marked all frozen */
+ bool skip_visible; /* Do not check pages marked all visible */
+ bool check_indexes; /* Check btree indexes */
+ bool check_toast; /* Check associated toast tables and indexes */
+ bool check_corrupt; /* Check indexes even if table is corrupt */
+ bool heapallindexed; /* Perform index to table reconciling checks */
+ bool rootdescend; /* Perform index rootdescend checks */
+ bool verbose;
+ long startblock; /* Block number where checking begins */
+ long endblock; /* Block number where checking ends, inclusive */
+} AmCheckSettings;
+
+static AmCheckSettings settings;
+
+/* Connection to backend */
+static PGconn *conn;
+
+/*
+ * Object inclusion/exclusion lists
+ *
+ * The string lists record the patterns given by command-line switches,
+ * which we then convert to lists of Oids of matching objects.
+ */
+static SimpleStringList schema_include_patterns = {NULL, NULL};
+static SimpleOidList schema_include_oids = {NULL, NULL};
+static SimpleStringList schema_exclude_patterns = {NULL, NULL};
+static SimpleOidList schema_exclude_oids = {NULL, NULL};
+
+static SimpleStringList table_include_patterns = {NULL, NULL};
+static SimpleOidList table_include_oids = {NULL, NULL};
+static SimpleStringList table_exclude_patterns = {NULL, NULL};
+static SimpleOidList table_exclude_oids = {NULL, NULL};
+
+static SimpleStringList index_include_patterns = {NULL, NULL};
+static SimpleOidList index_include_oids = {NULL, NULL};
+static SimpleStringList index_exclude_patterns = {NULL, NULL};
+static SimpleOidList index_exclude_oids = {NULL, NULL};
+
+/*
+ * Cstring list of relkinds which qualify as tables for our purposes when
+ * processing table inclusion or exclusion patterns.
+ */
+#define TABLE_RELKIND_LIST CppAsString2(RELKIND_RELATION) ", " \
+ CppAsString2(RELKIND_MATVIEW) ", " \
+ CppAsString2(RELKIND_PARTITIONED_TABLE)
+
+#define INDEX_RELKIND_LIST CppAsString2(RELKIND_INDEX)
+
+/*
+ * List of main tables to be checked, compiled from above lists, and
+ * corresponding list of toast tables. The lists should always be
+ * the same length, with InvalidOid in the toastlist for main relations
+ * without a corresponding toast relation.
+ */
+static SimpleOidList mainlist = {NULL, NULL};
+static SimpleOidList toastlist = {NULL, NULL};
+
+
+/*
+ * Functions for running the various corruption checks.
+ */
+static void check_tables(SimpleOidList *checklist);
+static uint64 check_table(Oid tbloid, long startblock, long endblock,
+ bool on_error_stop, bool check_toast);
+static uint64 check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids);
+static uint64 check_index(const char *idxoid, const char *idxname,
+ const char *tblname);
+
+/*
+ * Functions implementing standard command line behaviors.
+ */
+static void parse_cli_options(int argc, char *argv[],
+ ConnectOptions *connOpts);
+static void usage(void);
+static void showVersion(void);
+static void NoticeProcessor(void *arg, const char *message);
+
+static void get_table_check_list(const SimpleOidList *include_nsp,
+ const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl,
+ const SimpleOidList *exclude_tbl);
+
+#define EXIT_BADCONN 2
+
+int
+main(int argc, char **argv)
+{
+ ConnectOptions connOpts;
+ bool have_password = false;
+ char *password = NULL;
+ bool new_pass;
+
+ pg_logging_init(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_amcheck"));
+
+ if (argc > 1)
+ {
+ if ((strcmp(argv[1], "-?") == 0) ||
+ (argc == 2 && (strcmp(argv[1], "--help") == 0)))
+ {
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ showVersion();
+ exit(EXIT_SUCCESS);
+ }
+ }
+
+ memset(&settings, 0, sizeof(settings));
+ settings.progname = get_progname(argv[0]);
+
+ conn = NULL;
+ setDecimalLocale();
+
+ settings.notty = (!isatty(fileno(stdin)) || !isatty(fileno(stdout)));
+
+ settings.getPassword = TRI_DEFAULT;
+
+ settings.startblock = -1;
+ settings.endblock = -1;
+
+ /*
+ * Default behaviors for user settable options. Note that these default
+ * to doing all the safe checks and none of the unsafe ones, on the theory
+ * that if a user says "pg_amcheck mydb" without specifying any additional
+ * options, we should check everything we know how to check without
+ * risking any backend aborts.
+ */
+
+ settings.on_error_stop = false;
+ settings.skip_frozen = false;
+ settings.skip_visible = false;
+
+ /* Index checking options */
+ settings.check_indexes = true;
+ settings.check_corrupt = true;
+ settings.heapallindexed = false;
+ settings.rootdescend = false;
+
+ /*
+ * Reconciling toasted attributes from the main table against the toast
+ * table can crash the backend if the toast table or index are corrupt. We
+ * can optionally check the toast table and then the toast index prior to
+ * checking the main table, but if the toast table or index are
+ * concurrently corrupted after we conclude they are valid, the check of
+ * the main table can crash the backend. The oneous is on any caller who
+ * enables this option to make certain the environment is sufficiently
+ * stable that concurrent corruptions of the toast is not possible.
+ */
+ settings.check_toast = false;
+
+ parse_cli_options(argc, argv, &connOpts);
+
+ if (settings.getPassword == TRI_YES)
+ {
+ /*
+ * We can't be sure yet of the username that will be used, so don't
+ * offer a potentially wrong one. Typical uses of this option are
+ * noninteractive anyway.
+ */
+ password = simple_prompt("Password: ", false);
+ have_password = true;
+ }
+
+ /* loop until we have a password if requested by backend */
+ do
+ {
+#define ARRAY_SIZE 8
+ const char **keywords = pg_malloc(ARRAY_SIZE * sizeof(*keywords));
+ const char **values = pg_malloc(ARRAY_SIZE * sizeof(*values));
+
+ keywords[0] = "host";
+ values[0] = connOpts.host;
+ keywords[1] = "port";
+ values[1] = connOpts.port;
+ keywords[2] = "user";
+ values[2] = connOpts.username;
+ keywords[3] = "password";
+ values[3] = have_password ? password : NULL;
+ keywords[4] = "dbname"; /* see do_connect() */
+ if (connOpts.dbname == NULL)
+ {
+ if (getenv("PGDATABASE"))
+ values[4] = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ values[4] = getenv("PGUSER");
+ else
+ values[4] = "postgres";
+ }
+ else
+ values[4] = connOpts.dbname;
+ keywords[5] = "fallback_application_name";
+ values[5] = settings.progname;
+ keywords[6] = "client_encoding";
+ values[6] = (settings.notty ||
+ getenv("PGCLIENTENCODING")) ? NULL : "auto";
+ keywords[7] = NULL;
+ values[7] = NULL;
+
+ new_pass = false;
+ conn = PQconnectdbParams(keywords, values, true);
+ if (!conn)
+ fatal("could not connect to database %s: out of memory", values[4]);
+
+ free(keywords);
+ free(values);
+
+ if (PQstatus(conn) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(conn) &&
+ !have_password &&
+ settings.getPassword != TRI_NO)
+ {
+ /*
+ * Before closing the old PGconn, extract the user name that was
+ * actually connected with.
+ */
+ const char *realusername = PQuser(conn);
+ char *password_prompt;
+
+ if (realusername && realusername[0])
+ password_prompt = psprintf("Password for user %s: ",
+ realusername);
+ else
+ password_prompt = pg_strdup("Password: ");
+ PQfinish(conn);
+
+ password = simple_prompt(password_prompt, false);
+ free(password_prompt);
+ have_password = true;
+ new_pass = true;
+ }
+
+ if (!new_pass && PQstatus(conn) == CONNECTION_BAD)
+ {
+ pg_log_error("could not connect to database %s: %s",
+ values[4], PQerrorMessage(conn));
+ PQfinish(conn);
+ exit(1);
+ }
+ } while (new_pass);
+
+ if (settings.verbose)
+ PQsetErrorVerbosity(conn, PQERRORS_VERBOSE);
+
+ /*
+ * Expand schema, table, and index exclusion patterns, if any. Note that
+ * non-matching exclusion patterns are not an error, even when
+ * --strict-names was specified.
+ */
+ expand_schema_name_patterns(conn,
+ &schema_exclude_patterns,
+ NULL,
+ &schema_exclude_oids,
+ false);
+ expand_rel_name_patterns(conn,
+ &table_exclude_patterns,
+ NULL,
+ NULL,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_exclude_oids,
+ false,
+ false);
+ expand_rel_name_patterns(conn,
+ &index_exclude_patterns,
+ NULL,
+ NULL,
+ INDEX_RELKIND_LIST,
+ AMTYPE_INDEX,
+ &index_exclude_oids,
+ false,
+ false);
+
+ /* Expand schema selection patterns into Oid lists */
+ if (schema_include_patterns.head != NULL)
+ {
+ expand_schema_name_patterns(conn,
+ &schema_include_patterns,
+ &schema_exclude_oids,
+ &schema_include_oids,
+ settings.strict_names);
+ if (schema_include_oids.head == NULL)
+ fatal("no matching schemas were found");
+ }
+
+ /* Expand table selection patterns into Oid lists */
+ if (table_include_patterns.head != NULL)
+ {
+ expand_rel_name_patterns(conn,
+ &table_include_patterns,
+ &schema_exclude_oids,
+ &table_exclude_oids,
+ TABLE_RELKIND_LIST,
+ AMTYPE_TABLE,
+ &table_include_oids,
+ settings.strict_names,
+ false);
+ if (table_include_oids.head == NULL)
+ fatal("no matching tables were found");
+ }
+
+ /* Expand index selection patterns into Oid lists */
+ if (index_include_patterns.head != NULL)
+ {
+ expand_rel_name_patterns(conn,
+ &index_include_patterns,
+ &schema_exclude_oids,
+ &index_exclude_oids,
+ INDEX_RELKIND_LIST,
+ AMTYPE_INDEX,
+ &index_include_oids,
+ settings.strict_names,
+ false);
+ if (index_include_oids.head == NULL)
+ fatal("no matching indexes were found");
+ }
+
+ /*
+ * Compile list of all tables to be checked based on namespace and table
+ * includes and excludes.
+ */
+ get_table_check_list(&schema_include_oids, &schema_exclude_oids,
+ &table_include_oids, &table_exclude_oids);
+
+ PQsetNoticeProcessor(conn, NoticeProcessor, NULL);
+
+ if (settings.check_toast)
+ check_tables(&toastlist);
+ check_tables(&mainlist);
+
+ return 0;
+}
+
+/*
+ * Check each table from the given checklist per the user specified options.
+ */
+static void
+check_tables(SimpleOidList *checklist)
+{
+ const SimpleOidListCell *cell;
+
+ for (cell = checklist->head; cell; cell = cell->next)
+ {
+ uint64 corruptions = 0;
+
+ if (!OidIsValid(cell->val))
+ continue;
+
+ corruptions = check_table(cell->val,
+ settings.startblock,
+ settings.endblock,
+ settings.on_error_stop,
+ settings.check_toast);
+
+ if (settings.check_indexes)
+ {
+ /* Optionally skip the index checks for a corrupt table. */
+ if (corruptions && !settings.check_corrupt)
+ continue;
+
+ corruptions += check_indexes(cell->val,
+ &index_include_oids,
+ &index_exclude_oids);
+ }
+ }
+}
+
+/*
+ * Checks the given table for corruption, returning the number of corruptions
+ * detected and printed to the user.
+ */
+static uint64
+check_table(Oid tbloid, long startblock, long endblock,
+ bool on_error_stop, bool check_toast)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ char *skip;
+ char *toast;
+ const char *stop;
+ uint64 corruption_cnt = 0;
+
+ if (settings.skip_frozen)
+ skip = pg_strdup("'all frozen'");
+ else if (settings.skip_visible)
+ skip = pg_strdup("'all visible'");
+ else
+ skip = pg_strdup("'none'");
+ stop = on_error_stop ? "true" : "false";
+ toast = check_toast ? "true" : "false";
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.relname, v.blkno, v.offnum, v.attnum, v.msg "
+ "FROM public.verify_heapam("
+ "relation := %u, "
+ "on_error_stop := %s, "
+ "skip := %s, "
+ "check_toast := %s, ",
+ tbloid, stop, skip, toast);
+ if (startblock < 0)
+ appendPQExpBuffer(querybuf, "startblock := NULL, ");
+ else
+ appendPQExpBuffer(querybuf, "startblock := %ld, ", startblock);
+
+ if (endblock < 0)
+ appendPQExpBuffer(querybuf, "endblock := NULL");
+ else
+ appendPQExpBuffer(querybuf, "endblock := %ld", endblock);
+
+ appendPQExpBuffer(querybuf, ") v, pg_catalog.pg_class c "
+ "WHERE c.oid = %u", tbloid);
+
+ res = PQexec(conn, querybuf->data);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK && PQntuples(res) > 0)
+ {
+ corruption_cnt += PQntuples(res);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ if (!PQgetisnull(res, i, 3))
+ printf("relation %s, block %s, offset %s, attribute %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 3), /* attnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ else if (!PQgetisnull(res, i, 2))
+ printf("relation %s, block %s, offset %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 2), /* offnum */
+ PQgetvalue(res, i, 4)); /* msg */
+ else if (!PQgetisnull(res, i, 1))
+ printf("relation %s, block %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 1), /* blkno */
+ PQgetvalue(res, i, 4)); /* msg */
+ else if (!PQgetisnull(res, i, 0))
+ printf("relation %s\n %s\n",
+ PQgetvalue(res, i, 0), /* relname */
+ PQgetvalue(res, i, 4)); /* msg */
+ else
+ printf("%s\n", PQgetvalue(res, i, 4)); /* msg */
+ }
+ }
+ else if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("relation with OID %u\n %s\n", tbloid, PQerrorMessage(conn));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+ return corruption_cnt;
+}
+
+static uint64
+check_indexes(Oid tbloid, const SimpleOidList *include_oids,
+ const SimpleOidList *exclude_oids)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+ uint64 corruption_cnt = 0;
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT i.indexrelid, ci.relname, ct.relname"
+ "\nFROM pg_catalog.pg_index i, pg_catalog.pg_class ci, "
+ "pg_catalog.pg_class ct"
+ "\nWHERE i.indexrelid = ci.oid"
+ "\n AND i.indrelid = ct.oid"
+ "\n AND ci.relam = %u"
+ "\n AND i.indrelid = %u",
+ BTREE_AM_OID, tbloid);
+ include_filter(querybuf, "i.indexrelid", include_oids);
+ exclude_filter(querybuf, "i.indexrelid", exclude_oids);
+
+ res = PQexec(conn, querybuf->data);
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ for (i = 0; i < PQntuples(res); i++)
+ corruption_cnt += check_index(PQgetvalue(res, i, 0),
+ PQgetvalue(res, i, 1),
+ PQgetvalue(res, i, 2));
+ }
+ else
+ {
+ corruption_cnt++;
+ printf("%s\n", PQerrorMessage(conn));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static uint64
+check_index(const char *idxoid, const char *idxname, const char *tblname)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ uint64 corruption_cnt = 0;
+
+ querybuf = createPQExpBuffer();
+ appendPQExpBuffer(querybuf,
+ "SELECT public.bt_index_parent_check('%s'::regclass, %s, %s)",
+ idxoid,
+ settings.heapallindexed ? "true" : "false",
+ settings.rootdescend ? "true" : "false");
+ res = PQexec(conn, querybuf->data);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ corruption_cnt++;
+ printf("index check failed for index %s of table %s:\n",
+ idxname, tblname);
+ printf("%s", PQerrorMessage(conn));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+
+ return corruption_cnt;
+}
+
+static void
+parse_cli_options(int argc, char *argv[], ConnectOptions *connOpts)
+{
+ static struct option long_options[] =
+ {
+ {"check-toast", no_argument, NULL, 'z'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"endblock", required_argument, NULL, 'e'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"exclude-schema", required_argument, NULL, 'N'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"heapallindexed", no_argument, NULL, 'a'},
+ {"help", optional_argument, NULL, '?'},
+ {"host", required_argument, NULL, 'h'},
+ {"index", required_argument, NULL, 'i'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"on-error-stop", no_argument, NULL, 'o'},
+ {"password", no_argument, NULL, 'W'},
+ {"port", required_argument, NULL, 'p'},
+ {"rootdescend", no_argument, NULL, 'r'},
+ {"schema", required_argument, NULL, 'n'},
+ {"skip", required_argument, NULL, 'S'},
+ {"skip-corrupt", no_argument, NULL, 'C'},
+ {"skip-indexes", no_argument, NULL, 'X'},
+ {"skip-toast", no_argument, NULL, 'Z'},
+ {"startblock", required_argument, NULL, 'b'},
+ {"strict-names", no_argument, NULL, 's'},
+ {"table", required_argument, NULL, 't'},
+ {"toast-endblock", required_argument, NULL, 'E'},
+ {"toast-startblock", required_argument, NULL, 'B'},
+ {"username", required_argument, NULL, 'U'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"version", no_argument, NULL, 'V'},
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ memset(connOpts, 0, sizeof *connOpts);
+
+ while ((c = getopt_long(argc, argv, "ab:Cd:e:h:i:I:n:N:op:rsS:t:T:U:vVwWXzZ?1",
+ long_options, &optindex)) != -1)
+ {
+ char *endptr;
+
+ switch (c)
+ {
+ case 'a':
+ settings.heapallindexed = true;
+ break;
+ case 'b':
+ settings.startblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ fatal("relation starting block argument contains garbage characters");
+ if (settings.startblock > (long) MaxBlockNumber)
+ fatal("relation starting block argument out of bounds");
+ break;
+ case 'C':
+ settings.check_corrupt = false;
+ break;
+ case 'd':
+ connOpts->dbname = pg_strdup(optarg);
+ break;
+ case 'e':
+ settings.endblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ fatal("relation ending block argument contains garbage characters");
+ if (settings.endblock > (long) MaxBlockNumber)
+ fatal("relation ending block argument out of bounds");
+ break;
+ case 'h':
+ connOpts->host = pg_strdup(optarg);
+ break;
+ case 'i':
+ simple_string_list_append(&index_include_patterns, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&index_exclude_patterns, optarg);
+ break;
+ case 'n': /* include schema(s) */
+ simple_string_list_append(&schema_include_patterns, optarg);
+ break;
+ case 'N': /* exclude schema(s) */
+ simple_string_list_append(&schema_exclude_patterns, optarg);
+ break;
+ case 'o':
+ settings.on_error_stop = true;
+ break;
+ case 'p':
+ connOpts->port = pg_strdup(optarg);
+ break;
+ case 's':
+ settings.strict_names = true;
+ break;
+ case 'S':
+ if (pg_strcasecmp(optarg, "all-visible") == 0)
+ {
+ settings.skip_visible = true;
+ settings.skip_frozen = false;
+ }
+ else if (pg_strcasecmp(optarg, "all-frozen") == 0)
+ {
+ settings.skip_frozen = true;
+ settings.skip_visible = false;
+ }
+ else
+ {
+ exit(EXIT_FAILURE);
+ pg_log_error("invalid skip option");
+ }
+ break;
+ case 'r':
+ settings.rootdescend = true;
+ break;
+ case 't': /* include table(s) */
+ simple_string_list_append(&table_include_patterns, optarg);
+ break;
+ case 'T': /* exclude table(s) */
+ simple_string_list_append(&table_exclude_patterns, optarg);
+ break;
+ case 'U':
+ connOpts->username = pg_strdup(optarg);
+ break;
+ case 'V':
+ showVersion();
+ exit(EXIT_SUCCESS);
+ case 'w':
+ settings.getPassword = TRI_NO;
+ break;
+ case 'W':
+ settings.getPassword = TRI_YES;
+ break;
+ case 'X':
+ settings.check_indexes = false;
+ break;
+ case 'v':
+ settings.verbose = true;
+ break;
+ case 'z':
+ settings.check_toast = true;
+ break;
+ case 'Z':
+ settings.check_toast = false;
+ break;
+ case '?':
+ if (optind <= argc &&
+ strcmp(argv[optind - 1], "-?") == 0)
+ {
+ /* actual help option given */
+ usage();
+ exit(EXIT_SUCCESS);
+ }
+ else
+ {
+ /* getopt error (unknown option or missing argument) */
+ goto unknown_option;
+ }
+ break;
+ case 1:
+ {
+ if (!optarg || strcmp(optarg, "options") == 0)
+ usage();
+ else
+ goto unknown_option;
+
+ exit(EXIT_SUCCESS);
+ }
+ break;
+ default:
+ unknown_option:
+ fprintf(stderr, "Try \"%s --help\" for more information.\n",
+ settings.progname);
+ exit(EXIT_FAILURE);
+ break;
+ }
+ }
+
+ /*
+ * if we still have arguments, use it as the database name and username
+ */
+ while (argc - optind >= 1)
+ {
+ if (!connOpts->dbname)
+ connOpts->dbname = argv[optind];
+ else if (!connOpts->username)
+ connOpts->username = argv[optind];
+ else
+ pg_log_warning("extra command-line argument \"%s\" ignored",
+ argv[optind]);
+
+ optind++;
+ }
+
+ if (settings.endblock >= 0 && settings.endblock < settings.startblock)
+ fatal("relation ending block argument precedes starting block argument");
+}
+
+/*
+ * usage
+ *
+ * print out command line arguments
+ */
+static void
+usage(void)
+{
+ printf("pg_amcheck is the PostgreSQL command line frontend for the amcheck database corruption checker.\n");
+ printf("\n");
+ printf("Usage:\n");
+ printf(" pg_amcheck [OPTION]... [DBNAME [USERNAME]]\n");
+ printf("\n");
+ printf("General options:\n");
+ printf(" -V, --version output version information, then exit\n");
+ printf(" -?, --help show this help, then exit\n");
+ printf(" -s, --strict-names require include patterns to match at least one entity each\n");
+ printf(" -o, --on-error-stop stop checking at end of first corrupt page\n");
+ printf(" -v, --verbose output verbose messages\n");
+ printf("\n");
+ printf("Schema checking options:\n");
+ printf(" -n, --schema=PATTERN check relations in the specified schema(s) only\n");
+ printf(" -N, --exclude-schema=PATTERN do NOT check relations in the specified schema(s)\n");
+ printf("\n");
+ printf("Table checking options:\n");
+ printf(" -t, --table=PATTERN check the specified table(s) only\n");
+ printf(" -T, --exclude-table=PATTERN do NOT check the specified table(s)\n");
+ printf(" -b, --startblock begin checking table(s) at the given starting block number\n");
+ printf(" -e, --endblock check table(s) only up to the given ending block number\n");
+ printf(" -S, --skip=OPTION do NOT check \"all-frozen\" or \"all-visible\" blocks\n");
+ printf("\n");
+ printf("TOAST table checking options:\n");
+ printf(" -z, --check-toast check associated toast tables and toast indexes\n");
+ printf(" -Z, --skip-toast do NOT check associated toast tables and toast indexes\n");
+ printf("\n");
+ printf("Index checking options:\n");
+ printf(" -X, --skip-indexes do NOT check any btree indexes\n");
+ printf(" -i, --index=PATTERN check the specified index(es) only\n");
+ printf(" -I, --exclude-index=PATTERN do NOT check the specified index(es)\n");
+ printf(" -C, --skip-corrupt do NOT check indexes if their associated table is corrupt\n");
+ printf(" -a, --heapallindexed check index tuples against the table tuples\n");
+ printf(" -r, --rootdescend search from the root page for each index tuple\n");
+ printf("\n");
+ printf("Connection options:\n");
+ printf(" -d, --dbname=DBNAME database name to connect to\n");
+ printf(" -h, --host=HOSTNAME database server host or socket directory\n");
+ printf(" -p, --port=PORT database server port\n");
+ printf(" -U, --username=USERNAME database user name\n");
+ printf(" -w, --no-password never prompt for password\n");
+ printf(" -W, --password force password prompt (should happen automatically)\n");
+ printf("\n");
+ printf("Report bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+static void
+showVersion(void)
+{
+ puts("pg_amcheck (PostgreSQL) " PG_VERSION);
+}
+
+/*
+ * for backend Notice messages (INFO, WARNING, etc)
+ */
+static void
+NoticeProcessor(void *arg, const char *message)
+{
+ (void) arg; /* not used */
+ pg_log_info("%s", message);
+}
+
+static void
+get_table_check_list(const SimpleOidList *include_nsp, const SimpleOidList *exclude_nsp,
+ const SimpleOidList *include_tbl, const SimpleOidList *exclude_tbl)
+{
+ PQExpBuffer querybuf;
+ PGresult *res;
+ int i;
+
+ querybuf = createPQExpBuffer();
+
+ appendPQExpBuffer(querybuf,
+ "SELECT c.oid, c.reltoastrelid"
+ "\nFROM pg_catalog.pg_class c, pg_catalog.pg_namespace n"
+ "\nWHERE n.oid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\n AND c.relkind OPERATOR(pg_catalog.=) ANY(ARRAY[%s])\n",
+ TABLE_RELKIND_LIST);
+ include_filter(querybuf, "n.oid", include_nsp);
+ exclude_filter(querybuf, "n.oid", exclude_nsp);
+ include_filter(querybuf, "c.oid", include_tbl);
+ exclude_filter(querybuf, "c.oid", exclude_tbl);
+
+ res = ExecuteSqlQuery(conn, querybuf->data, PGRES_TUPLES_OK);
+ for (i = 0; i < PQntuples(res); i++)
+ {
+ simple_oid_list_append(&mainlist, atooid(PQgetvalue(res, i, 0)));
+ simple_oid_list_append(&toastlist, atooid(PQgetvalue(res, i, 1)));
+ }
+
+ PQclear(res);
+ destroyPQExpBuffer(querybuf);
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.control b/contrib/pg_amcheck/pg_amcheck.control
new file mode 100644
index 0000000000..395f368101
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.control
@@ -0,0 +1,5 @@
+# pg_amcheck extension
+comment = 'command-line tool for verifying relation integrity'
+default_version = '1.3'
+module_pathname = '$libdir/pg_amcheck'
+relocatable = true
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..189f05ef0a
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,60 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/\Qpg_amcheck: error: could not connect to database qqq: FATAL: database "qqq" does not exist\E/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user', 'postgres' ],
+ qr/\Qpg_amcheck: error: could not connect to database postgres: FATAL: role "=no_such_user" does not exist\E/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schema, table, and patterns with --strict-names
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-n', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found\E/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'nonexistent' ],
+ qr/\Qpg_amcheck: error: no matching tables were found\E/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-n', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching schemas were found for pattern\E/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-t', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching tables were found for pattern\E/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '--strict-names', '-i', 'nonexistent*' ],
+ qr/\Qpg_amcheck: error: no matching indexes were found for pattern\E/,
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..30bbbdeddd
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,248 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 45;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($)
+{
+ my ($relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql('postgres',
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the node is running.
+sub corrupt_first_page($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($)
+{
+ my ($relname) = @_;
+ my $relpath = relation_filepath($relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create schemas and tables for checking pg_amcheck's include
+# and exclude schema and table command line options
+$node->safe_psql('postgres', q(
+-- We'll corrupt all indexes in s1
+CREATE SCHEMA s1;
+CREATE TABLE s1.t1 (a TEXT);
+CREATE TABLE s1.t2 (a TEXT);
+CREATE INDEX i1 ON s1.t1(a);
+CREATE INDEX i2 ON s1.t2(a);
+INSERT INTO s1.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s1.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables in s2
+CREATE SCHEMA s2;
+CREATE TABLE s2.t1 (a TEXT);
+CREATE TABLE s2.t2 (a TEXT);
+CREATE INDEX i1 ON s2.t1(a);
+CREATE INDEX i2 ON s2.t2(a);
+INSERT INTO s2.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s2.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll corrupt all tables and indexes in s3
+CREATE SCHEMA s3;
+CREATE TABLE s3.t1 (a TEXT);
+CREATE TABLE s3.t2 (a TEXT);
+CREATE INDEX i1 ON s3.t1(a);
+CREATE INDEX i2 ON s3.t2(a);
+INSERT INTO s3.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s3.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+
+-- We'll leave everything in s4 uncorrupted
+CREATE SCHEMA s4;
+CREATE TABLE s4.t1 (a TEXT);
+CREATE TABLE s4.t2 (a TEXT);
+CREATE INDEX i1 ON s4.t1(a);
+CREATE INDEX i2 ON s4.t2(a);
+INSERT INTO s4.t1 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+INSERT INTO s4.t2 (a) (SELECT gs::TEXT FROM generate_series(1,10000) AS gs);
+));
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('s1.i1');
+corrupt_first_page('s1.i2');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('s2.t1');
+corrupt_first_page('s2.t2');
+
+# Corrupt tables and indexes in schema "s3"
+remove_relation_file('s3.i1');
+corrupt_first_page('s3.i2');
+remove_relation_file('s3.t1');
+corrupt_first_page('s3.t2');
+
+# Leave schema "s4" alone
+
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas and tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres' ],
+ 'pg_amcheck all schemas, tables and indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's*', '-t', 't1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-N', 's1', '-T', 't1' ],
+ 'pg_amcheck all tables not named t1 nor in schema s1');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s1.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+
+# In schema s3, the tables and indexes are both corrupt. Ordinarily, checking
+# of indexes will not be performed for corrupt tables, but the --check-corrupt
+# option (-c) forces the indexes to also be checked.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's3', '-i', 'i1' ],
+ qr/index "i1" lacks a main relation fork/,
+ 'pg_amcheck index s3.i1 reports missing main relation fork');
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's3', '-i', 'i2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s3.s2 reports index corruption');
+
+
+# Check that '-x' and '-X' work as expected. Since only index corruption
+# (and not table corruption) exists in s1, '-X' should give no errors, and
+# '-x' should give errors about index corruption.
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck over tables and indexes in schema s1 reports corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's1' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over only tables in schema s1 reports no corruption');
+
+
+# Check that table corruption is reported as expected, with or without
+# index checking
+#
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables in schema s2 reports table corruption');
+
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's2' ],
+ qr/could not open file/,
+ 'pg_amcheck over tables and indexes in schema s2 reports table corruption');
+
+# Check that no corruption is reported in schema s4
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-n', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s4 reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt schemas
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-N', 's1', '-N', 's2', '-N', 's3' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt schemas reports no corruption');
+
+# Check that no corruption is reported if we exclude corrupt tables
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $port, 'postgres', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck excluding corrupt tables reports no corruption');
+
+# Check errors about bad block range command line arguments. We use schema s4
+# to avoid getting messages about corrupt tables or indexes.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4', '-b', 'junk' ],
+ qr/\Qpg_amcheck: error: relation starting block argument contains garbage characters\E/,
+ 'pg_amcheck rejects garbage startblock');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4', '-e', '1234junk' ],
+ qr/\Qpg_amcheck: error: relation ending block argument contains garbage characters\E/,
+ 'pg_amcheck rejects garbage endblock');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-n', 's4', '-b', '5', '-e', '4' ],
+ qr/\Qpg_amcheck: error: relation ending block argument precedes starting block argument\E/,
+ 'pg_amcheck rejects invalid block range');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..271eca7da6
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,496 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper function to generate a regular expression matching the header we
+# expect verify_heapam() to return given which fields we expect to be non-null.
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ return qr/relation test, block $blkno, offset $offnum, attribute $attnum\s+/ms
+ if (defined $attnum);
+ return qr/relation test, block $blkno, offset $offnum\s+/ms
+ if (defined $offnum);
+ return qr/relation test\s+/ms
+ if (defined $blkno);
+ return qr/relation test\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, undef);
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--check-toast', '--skip-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..c24f154883
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '-X', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index ae2759be55..797b4dc61e 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -119,6 +119,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&oldsnapshot;
&pageinspect;
&passwordcheck;
+ &pgamcheck;
&pgbuffercache;
&pgcrypto;
&pgfreespacemap;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a4e1b28b38 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..00643d2e58
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,493 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<sect1 id="pgamcheck" xreflabel="pg_amcheck">
+ <title>pg_amcheck</title>
+
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_amcheck</filename> module provides a command line interface
+ to the <xref linkend="amcheck"/> corruption checking functionality.
+ </para>
+
+ <para>
+ <application>pg_amcheck</application> is a regular
+ <productname>PostgreSQL</productname> client application. You can perform
+ corruption checks from any remote host that has access to the database
+ connecting as a user with sufficient privileges to check tables and indexes.
+ Currently, this requires execute privileges on <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> and <function>verify_heapam</function>
+ functions.
+ </para>
+
+<synopsis>
+pg_amcheck mydb
+</synopsis>
+
+ <sect2>
+ <title>Options</title>
+
+ <para>
+ The following command-line options for controlling general program behavior
+ are recognized.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Show <application>pg_amcheck</application> version number, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_amcheck</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Specifies verbose mode. This will cause
+ <application>pg_amcheck</application> to output more detailed information
+ about its activities, mostly to do with its communication with the
+ database.
+ </para>
+ <para>
+ Note that this does not increase the number of corruptions reported nor
+ the level of detail reported about each of them.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ The following command-line options control which datatabase objects
+ <application>pg_amcheck</application> checks and how such options
+ are interpreted.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-i <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--index=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ For indexes associated with tables being checked, check only those
+ indexes with names matching <replaceable
+ class="parameter">pattern</replaceable>. Multiple indexes can be
+ selected by writing multiple <option>-i</option> switches. The
+ <replaceable class="parameter">pattern</replaceable> parameter is
+ interpreted as a pattern according to the same rules used by
+ <application>psql</application>'s <literal>\d</literal> commands (see
+ <xref linkend="app-psql-patterns"/>), so multiple indexes can also
+ be selected by writing wildcard characters in the pattern. When using
+ wildcards, be careful to quote the pattern if needed to prevent the
+ shell from expanding the wildcards.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-I <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--exclude-index=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check any indexes matching <replaceable
+ class="parameter">pattern</replaceable>. The pattern is interpreted
+ according to the same rules as for <option>-i</option>.
+ <option>-I</option> can be given more than once to exclude indexes
+ matching any of several patterns.
+ </para>
+
+ <para>
+ When both <option>-i</option> and <option>-I</option> are given, the
+ behavior is to check just the indexes that match at least one
+ <option>-i</option> switch but no <option>-I</option> switches. If
+ <option>-I</option> appears without <option>-i</option>, then indexes
+ matching <option>-I</option> are excluded from what is otherwise a check
+ of all indexes associated with tables that are checked.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-n <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--schema=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Dump only schemas matching <replaceable
+ class="parameter">pattern</replaceable>; this selects both the
+ schema itself, and all its contained objects. When this option is
+ not specified, all non-system schemas in the target database will be
+ checked. Multiple schemas can be
+ selected by writing multiple <option>-n</option> switches. The
+ <replaceable class="parameter">pattern</replaceable> parameter is
+ interpreted as a pattern according to the same rules used by
+ <application>psql</application>'s <literal>\d</literal> commands
+ (see <xref linkend="app-psql-patterns"/>),
+ so multiple schemas can also be selected by writing wildcard characters
+ in the pattern. When using wildcards, be careful to quote the pattern
+ if needed to prevent the shell from expanding the wildcards.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-N <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--exclude-schema=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check any schemas matching <replaceable
+ class="parameter">pattern</replaceable>. The pattern is
+ interpreted according to the same rules as for <option>-n</option>.
+ <option>-N</option> can be given more than once to exclude schemas
+ matching any of several patterns.
+ </para>
+
+ <para>
+ When both <option>-n</option> and <option>-N</option> are given, the behavior
+ is to check just the schemas that match at least one <option>-n</option>
+ switch but no <option>-N</option> switches. If <option>-N</option> appears
+ without <option>-n</option>, then schemas matching <option>-N</option> are
+ excluded from what is otherwise a check of all schemas.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--strict-names</option></term>
+ <listitem>
+ <para>
+ Requires that each schema
+ (<option>-n</option>/<option>--schema</option>), table
+ (<option>-t</option>/<option>--table</option>) and index
+ (<option>-i</option>/<option>--index</option>) qualifier match at least
+ one schema/table/index in the database to be checked.
+ </para>
+ <para>
+ This option has no effect on
+ <option>-N</option>/<option>--exclude-schema</option>,
+ <option>-T</option>/<option>--exclude-table</option>,
+ or <option>-I</option><option>--exclude-index</option>. An exclude
+ pattern failing to match any objects is not considered an error.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--table=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Check only tables with names matching
+ <replaceable class="parameter">pattern</replaceable>. Multiple tables
+ can be selected by writing multiple <option>-t</option> switches. The
+ <replaceable class="parameter">pattern</replaceable> parameter is
+ interpreted as a pattern according to the same rules used by
+ <application>psql</application>'s <literal>\d</literal> commands
+ (see <xref linkend="app-psql-patterns"/>),
+ so multiple tables can also be selected by writing wildcard characters
+ in the pattern. When using wildcards, be careful to quote the pattern
+ if needed to prevent the shell from expanding the wildcards.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T <replaceable class="parameter">pattern</replaceable></option></term>
+ <term><option>--exclude-table=<replaceable class="parameter">pattern</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check any tables matching <replaceable
+ class="parameter">pattern</replaceable>. The pattern is interpreted
+ according to the same rules as for <option>-t</option>.
+ <option>-T</option> can be given more than once to exclude tables
+ matching any of several patterns.
+ </para>
+
+ <para>
+ When both <option>-t</option> and <option>-T</option> are given, the
+ behavior is to check just the tables that match at least one
+ <option>-t</option> switch but no <option>-T</option> switches. If
+ <option>-T</option> appears without <option>-t</option>, then tables
+ matching <option>-T</option> are excluded from what is otherwise a check
+ of all tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ The following command-line options control additional behaviors.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-a</option></term>
+ <term><option>--heapallindexed</option></term>
+ <listitem>
+ <para>
+ When checking indexes, additionally verify the presence of all heap
+ tuples as index tuples within the index.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-b <replaceable class="parameter">block</replaceable></option></term>
+ <term><option>--startblock=<replaceable class="parameter">block</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check blocks prior to <replaceable
+ class="parameter">block</replaceable>, which should be a non-negative
+ integer. (Negative values disable the option).
+ </para>
+ <para>
+ When both <option>-b</option> <option>--startblock</option> and
+ <option>-e</option> <option>--endblock</option> are specified, the end
+ block must not be less than the start block.
+ </para>
+ <para>
+ The <option>-b</option> <option>--startblock</option> option will be
+ applied to all tables that are checked, including toast tables. The
+ option is most useful when checking exactly one table, to focus the
+ checking on just specific blocks of that one table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-C</option></term>
+ <term><option>--skip-corrupt</option></term>
+ <listitem>
+ <para>
+ Skip checking indexes for a table if the table is found to be corrupt.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-e <replaceable class="parameter">block</replaceable></option></term>
+ <term><option>--endblock=<replaceable class="parameter">block</replaceable></option></term>
+ <listitem>
+ <para>
+ Do not check blocks after <replaceable
+ class="parameter">block</replaceable>, which should be a non-negative
+ integer. (Negative values disable the option).
+ </para>
+ <para>
+ When both <option>-b</option> <option>--startblock</option> and
+ <option>-e</option> <option>--endblock</option> are specified, the end
+ block must not be less than the start block.
+ </para>
+ <para>
+ The <option>-e</option> <option>--endblock</option> option will be
+ applied to all tables that are checked, including toast tables. The
+ option is most useful when checking exactly one table, to focus the
+ checking on just specific blocks of that one table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-o</option></term>
+ <term><option>--on-error-stop</option></term>
+ <listitem>
+ <para>
+ Stop checking database objects at the end of the first page on
+ which the first corruption is found. Note that even with this option
+ enabled, more than one corruption message may be reported.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-r</option></term>
+ <term><option>--rootdescend</option></term>
+ <listitem>
+ <para>
+ When checking indexes, for each tuple, perform addition verification by
+ re-finding the tuple on the leaf level by performing a new search from
+ the root page.
+ </para>
+ <para>
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited or even of no
+ use in helping detect the kinds of corruption that occur in practice.
+ In any event, it is known to be a rather expensive check to perform.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-S</option></term>
+ <term><option>--skip=<replaceable class="parameter">blocks</replaceable></option></term>
+ <listitem>
+ <para>
+ When <option>-S</option><replaceable
+ class="parameter">all-visible</replaceable> is given, skips corruption
+ checking of blocks marked as all visible in the visibility map.
+ </para>
+ <para>
+ When <option>-S</option><replaceable
+ class="parameter">all-frozen</replaceable> is given, skips corruption
+ checking of blocks marked as all frozen in the visibility map.
+ </para>
+ <para>
+ The default is to check blocks without regard to their marking in the
+ visibility map.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-X</option></term>
+ <term><option>--skip-indexes</option></term>
+ <listitem>
+ <para>
+ Check only tables but not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-Z</option></term>
+ <term><option>--skip-toast</option></term>
+ <listitem>
+ <para>
+ Do not check toast tables nor their indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+
+ <para>
+ The following additional command-line options control the database
+ connection parameters.
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-d <replaceable class="parameter">dbname</replaceable></option></term>
+ <term><option>--dbname=<replaceable class="parameter">dbname</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to connect to. This is
+ equivalent to specifying <replaceable
+ class="parameter">dbname</replaceable> as the first non-option
+ argument on the command line. The <replaceable>dbname</replaceable>
+ can be a <link linkend="libpq-connstring">connection string</link>.
+ If so, connection string parameters will override any conflicting
+ command line options.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-h <replaceable class="parameter">host</replaceable></option></term>
+ <term><option>--host=<replaceable class="parameter">host</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server is
+ running. If the value begins with a slash, it is used as the
+ directory for the Unix domain socket. The default is taken
+ from the <envar>PGHOST</envar> environment variable, if set,
+ else a Unix domain socket connection is attempted.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-p <replaceable class="parameter">port</replaceable></option></term>
+ <term><option>--port=<replaceable class="parameter">port</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file
+ extension on which the server is listening for connections.
+ Defaults to the <envar>PGPORT</envar> environment variable, if
+ set, or a compiled-in default.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-U <replaceable>username</replaceable></option></term>
+ <term><option>--username=<replaceable class="parameter">username</replaceable></option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires
+ password authentication and a password is not available by
+ other means such as a <filename>.pgpass</filename> file, the
+ connection attempt will fail. This option can be useful in
+ batch jobs and scripts where no user is present to enter a
+ password.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_dump</application> to prompt for a
+ password before connecting to a database.
+ </para>
+
+ <para>
+ This option is never essential, since
+ <application>pg_dump</application> will automatically prompt
+ for a password if the server demands password authentication.
+ However, <application>pg_dump</application> will waste a
+ connection attempt finding out that the server wants a password.
+ In some cases it is worth typing <option>-W</option> to avoid the extra
+ connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--role=<replaceable class="parameter">rolename</replaceable></option></term>
+ <listitem>
+ <para>
+ Specifies a role name to be used to create the dump.
+ This option causes <application>pg_dump</application> to issue a
+ <command>SET ROLE</command> <replaceable class="parameter">rolename</replaceable>
+ command after connecting to the database. It is useful when the
+ authenticated user (specified by <option>-U</option>) lacks privileges
+ needed by <application>pg_dump</application>, but can switch to a role with
+ the required rights. Some installations have a policy against
+ logging in directly as a superuser, and use of this option allows
+ dumps to be made without violating the policy.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </sect2>
+</sect1>
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
index 355da6edaf..cbbc085698 100644
--- a/src/fe_utils/query_utils.c
+++ b/src/fe_utils/query_utils.c
@@ -46,7 +46,7 @@ exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
}
if (expected_ntups == POSITIVE_NTUPS || expected_ntups >= 0)
{
- int ntups = PQntuples(res);
+ int ntups = PQntuples(res);
if (expected_ntups == POSITIVE_NTUPS)
{
@@ -63,11 +63,11 @@ exiting_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
fatal(ngettext("query returned %d row instead of one: %s",
"query returned %d rows instead of one: %s",
ntups),
- ntups, query);
+ ntups, query);
fatal(ngettext("query returned %d row instead of %d: %s",
"query returned %d rows instead of %d: %s",
ntups),
- ntups, expected_ntups, query);
+ ntups, expected_ntups, query);
}
}
return res;
@@ -80,7 +80,7 @@ PGresult *
quiet_handler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
int expected_ntups, const char *query)
{
- int ntups = PQntuples(res);
+ int ntups = PQntuples(res);
if ((PQresultStatus(res) != expected_status) ||
(expected_ntups == POSITIVE_NTUPS && ntups == 0) ||
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 7f014a12c9..eef00320e7 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'pg_standby', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
@@ -146,8 +146,8 @@ sub mkvcbuild
our @pgcommonbkndfiles = @pgcommonallfiles;
our @pgfeutilsfiles = qw(
- archive.c cancel.c conditional.c mbprint.c print.c psqlscan.l
- psqlscan.c simple_list.c string_utils.c recovery_gen.c);
+ archive.c cancel.c conditional.c exit_utils.c mbprint.c option_utils.c print.c psqlscan.l
+ psqlscan.c query_utils.c simple_list.c string_utils.c recovery_gen.c);
$libpgport = $solution->AddProject('libpgport', 'lib', 'misc');
$libpgport->AddDefine('FRONTEND');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f3957bad6c..e4003b15a0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -101,6 +101,7 @@ AlterUserMappingStmt
AlteredTableInfo
AlternativeSubPlan
AlternativeSubPlanState
+AmCheckSettings
AnalyzeAttrComputeStatsFunc
AnalyzeAttrFetchFunc
AnalyzeForeignTable_function
@@ -404,6 +405,7 @@ ConnCacheEntry
ConnCacheKey
ConnStatusType
ConnType
+ConnectOptions
ConnectionStateEnum
ConsiderSplitContext
Const
--
2.21.1 (Apple Git-122.3)
On Mon, Jan 11, 2021 at 1:16 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Added in v32, along with adding pg_amcheck to @contrib_uselibpq, @contrib_uselibpgport, and @contrib_uselibpgcommon
exit_utils.c fails to achieve the goal of making this code independent
of pg_dump, because of:
#ifdef WIN32
if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
_endthreadex(code);
#endif
parallel_init_done is a pg_dump-ism. Perhaps this chunk of code could
be a handler that gets registered using exit_nicely() rather than
hard-coded like this. Note that the function comments for
exit_nicely() are heavily implicated in this problem, since they also
apply to stuff that only happens in pg_dump and not other utilities.
I'm skeptical about the idea of putting functions into string_utils.c
with names as generic as include_filter() and exclude_filter().
Existing cases like fmtId() and fmtQualifiedId() are not great either,
but I think this is worse and that we should do some renaming. On a
related note, it's not clear to me why these should be classified as
string_utils while stuff like expand_schema_name_patterns() gets
classified as option_utils. These are neither generic
string-processing functions nor are they generic options-parsing
functions. They are functions for expanding shell-glob style patterns
for database object names. And they seem like they ought to be
together, because they seem to do closely-related things. I'm open to
an argument that this is wrongheaded on my part, but it looks weird to
me the way it is.
I'm pretty unimpressed by query_utils.c. The CurrentResultHandler
stuff looks grotty, and you don't seem to really use it anywhere. And
it seems woefully overambitious to me anyway: this doesn't apply to
every kind of "result" we've got hanging around, absolutely nothing
even close to that, even though a name like CurrentResultHandler
sounds very broad. It also means more global variables, which is a
thing of which the PostgreSQL codebase already has a deplorable
oversupply. quiet_handler() and noop_handler() aren't used anywhere
either, AFAICS.
I wonder if it would be better to pass in callbacks rather than
relying on global variables. e.g.:
typedef void (*fatal_error_callback)(const char *fmt,...)
pg_attribute_printf(1, 2) pg_attribute_noreturn();
Then you could have a few helper functions that take an argument of
type fatal_error_callback and throw the right fatal error for (a)
wrong PQresultStatus() and (b) result is not one row. Do you need any
other cases? exiting_handler() seems to think that the caller might
want to allow any number of tuples, or any positive number, or any
particular cout, but I'm not sure if all of those cases are really
needed.
This stuff is finnicky and hard to get right. You don't really want to
create a situation where the same code keeps getting duplicated, or
the behavior's just a little bit inconsistent everywhere, but it also
isn't great to build layers upon layers of abstraction around
something like ExecuteSqlQuery which is, in the end, a four-line
function. I don't think there's any problem with something like
pg_dump having it's own function to execute-a-query-or-die. Maybe that
function ends up doing something like
TheGenericFunctionToExecuteOrDie(my_die_fn, the_query), or maybe
pg_dump can just open-code it but have a my_die_fn to pass down to the
glob-expansion stuff, or, well, I don't know.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Jan 14, 2021, at 1:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 11, 2021 at 1:16 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:Added in v32, along with adding pg_amcheck to @contrib_uselibpq, @contrib_uselibpgport, and @contrib_uselibpgcommon
exit_utils.c fails to achieve the goal of making this code independent
of pg_dump, because of:#ifdef WIN32
if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
_endthreadex(code);
#endifparallel_init_done is a pg_dump-ism. Perhaps this chunk of code could
be a handler that gets registered using exit_nicely() rather than
hard-coded like this. Note that the function comments for
exit_nicely() are heavily implicated in this problem, since they also
apply to stuff that only happens in pg_dump and not other utilities.
The 0001 patch has been restructured to not have this problem.
I'm skeptical about the idea of putting functions into string_utils.c
with names as generic as include_filter() and exclude_filter().
Existing cases like fmtId() and fmtQualifiedId() are not great either,
but I think this is worse and that we should do some renaming. On a
related note, it's not clear to me why these should be classified as
string_utils while stuff like expand_schema_name_patterns() gets
classified as option_utils. These are neither generic
string-processing functions nor are they generic options-parsing
functions. They are functions for expanding shell-glob style patterns
for database object names. And they seem like they ought to be
together, because they seem to do closely-related things. I'm open to
an argument that this is wrongheaded on my part, but it looks weird to
me the way it is.
The logic to filter which relations are checked is completely restructured and is kept in pg_amcheck.c
I'm pretty unimpressed by query_utils.c. The CurrentResultHandler
stuff looks grotty, and you don't seem to really use it anywhere. And
it seems woefully overambitious to me anyway: this doesn't apply to
every kind of "result" we've got hanging around, absolutely nothing
even close to that, even though a name like CurrentResultHandler
sounds very broad. It also means more global variables, which is a
thing of which the PostgreSQL codebase already has a deplorable
oversupply. quiet_handler() and noop_handler() aren't used anywhere
either, AFAICS.I wonder if it would be better to pass in callbacks rather than
relying on global variables. e.g.:typedef void (*fatal_error_callback)(const char *fmt,...)
pg_attribute_printf(1, 2) pg_attribute_noreturn();Then you could have a few helper functions that take an argument of
type fatal_error_callback and throw the right fatal error for (a)
wrong PQresultStatus() and (b) result is not one row. Do you need any
other cases? exiting_handler() seems to think that the caller might
want to allow any number of tuples, or any positive number, or any
particular cout, but I'm not sure if all of those cases are really
needed.
The error callback stuff has been refactored in this next patch set, and also now includes handlers for parallel slots, as the src/bin/scripts/scripts_parallel.c stuff has been moved to fe_utils and made more general. As it was, there were hardcoded assumptions that are valid for reindexdb and vacuumdb, but not general enough for pg_amcheck to use. The refactoring in patches 0002 through 0005 make it more generally usable. Patch 0008 uses it in pg_amcheck.
This stuff is finnicky and hard to get right. You don't really want to
create a situation where the same code keeps getting duplicated, or
the behavior's just a little bit inconsistent everywhere, but it also
isn't great to build layers upon layers of abstraction around
something like ExecuteSqlQuery which is, in the end, a four-line
function. I don't think there's any problem with something like
pg_dump having it's own function to execute-a-query-or-die. Maybe that
function ends up doing something like
TheGenericFunctionToExecuteOrDie(my_die_fn, the_query), or maybe
pg_dump can just open-code it but have a my_die_fn to pass down to the
glob-expansion stuff, or, well, I don't know.
There are some real improvements in this next patch set.
The number of queries issued to the database to determine the databases to use is much reduced. I had been following the pattern in pg_dump, but abandoned that for something new.
The parallel slots stuff is now used for parallelism, much like what is done in vacuumdb and reindexdb.
The pg_amcheck application can now be run over one database, multiple specified databases, or all databases.
Relations, schemas, and databases can be included and excluded by pattern, like "(db1|db2|db3).myschema.(mytable|myindex)". The real-world use-cases for this that I have in mind are things like:
pg_amcheck --jobs=12 --all \
--exclude-relation="db7.schema.known_corrupt_table" \
--exclude-relation="db*.schema.known_big_table"
and
pg_amcheck --jobs=20 \
--include-relation="*.compliance.audited"
I might be missing something, but I think the interface is a superset of the interface from reindexdb and vacuumdb. None of the new interface stuff (patterns, allowing multiple databases to be given on the command line, etc) is required.
Attachments:
v33-0001-Moving-exit_nicely-and-fatal-into-fe_utils.patchapplication/octet-stream; name=v33-0001-Moving-exit_nicely-and-fatal-into-fe_utils.patch; x-unix-mode=0644Download
From 9f5dddb330f7a5bec9a2233df7ad2d4e269ffc55 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 11 Jan 2021 16:04:31 -0800
Subject: [PATCH v33 1/8] Moving exit_nicely and fatal into fe_utils
Various frontend executables in src/bin, src/bin/scripts, and
contrib/ have logic for logging and exiting under error conditions.
The logging code itself is already under common/, but executables
differ in their calls to exit() vs. exit_nicely(), with
exit_nicely() not uniformly defined, and sometimes all of this
wrapped up under a macro named fatal(), the definition of that macro
also not uniformly defined. This makes it harder to move code out
of these executables into a shared library under fe_utils/.
Standardizing all executables to define these things the same way or
to use a single fe_utils/ library is beyond the scope of this patch,
but this patch should get the ball rolling in that direction.
For pg_amcheck purposes, we really just need to get scripts_parallel
moved into fe_utils, which doesn't require changing pg_dump at all.
But by creating a new fe_utils/exit_utils which is compatible with
pg_dump along with everything else, we avoid creating yet another
compatibility problem. So we move the functions "on_exit_nicely"
and "exit_nicely", and the macro "fatal" from pg_dump into fe_utils,
and we refactor the on_exit_nicely() implementation to no longer
contain pg_dump specific logic for exiting threads on Windows.
Instead, we create a new on_exit_nicely_final() function for
registering a final callback, and use that from within pg_dump to
register a final callback encapsulating knowledge of threading
variables.
The on_exit_nicely_final() function, and associated
final_on_exit_nicely variable, are not strictly necessary. The same
behavior could be achieved by registering an exiting callback via
the on_exit_nicely() function prior to any other callbacks, but that
would place a larger burden on scripts authors to make certain that
(a) no other callback gets registered before the exiting callback,
and (b) that only one exiting callback gets registered.
---
src/bin/pg_dump/parallel.c | 16 ++++
src/bin/pg_dump/parallel.h | 5 --
src/bin/pg_dump/pg_backup_archiver.h | 1 +
src/bin/pg_dump/pg_backup_utils.c | 59 ---------------
src/bin/pg_dump/pg_backup_utils.h | 8 --
src/fe_utils/Makefile | 1 +
src/fe_utils/exit_utils.c | 105 +++++++++++++++++++++++++++
src/include/fe_utils/exit_utils.h | 26 +++++++
src/tools/msvc/Mkvcbuild.pm | 2 +-
9 files changed, 150 insertions(+), 73 deletions(-)
create mode 100644 src/fe_utils/exit_utils.c
create mode 100644 src/include/fe_utils/exit_utils.h
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index c7351a43fd..e619f126f5 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -228,6 +228,19 @@ static char *readMessageFromPipe(int fd);
#define messageStartsWith(msg, prefix) \
(strncmp(msg, prefix, strlen(prefix)) == 0)
+#ifdef WIN32
+
+/*
+ * An on_exit_nicely_callback which, if run in a parallel worker thread on
+ * Windows, will only exit the thread, not the whole process.
+ */
+static void
+end_threads(int code, void *arg)
+{
+ if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
+ _endthreadex(code);
+}
+#endif
/*
* Initialize parallel dump support --- should be called early in process
@@ -255,6 +268,9 @@ init_parallel_dump_utils(void)
exit_nicely(1);
}
+ /* install callback to close threads on exit */
+ on_exit_nicely_final(end_threads, NULL);
+
parallel_init_done = true;
}
#endif
diff --git a/src/bin/pg_dump/parallel.h b/src/bin/pg_dump/parallel.h
index 0fbf736c81..48c48e12e6 100644
--- a/src/bin/pg_dump/parallel.h
+++ b/src/bin/pg_dump/parallel.h
@@ -45,11 +45,6 @@ typedef struct ParallelState
ParallelSlot *parallelSlot; /* private info about each worker */
} ParallelState;
-#ifdef WIN32
-extern bool parallel_init_done;
-extern DWORD mainThreadId;
-#endif
-
extern void init_parallel_dump_utils(void);
extern bool IsEveryWorkerIdle(ParallelState *pstate);
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index a8ea5c7eae..37d157b7ad 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -26,6 +26,7 @@
#include <time.h>
+#include "fe_utils/exit_utils.h"
#include "libpq-fe.h"
#include "pg_backup.h"
#include "pqexpbuffer.h"
diff --git a/src/bin/pg_dump/pg_backup_utils.c b/src/bin/pg_dump/pg_backup_utils.c
index c709a40e06..631e88f7db 100644
--- a/src/bin/pg_dump/pg_backup_utils.c
+++ b/src/bin/pg_dump/pg_backup_utils.c
@@ -19,16 +19,6 @@
/* Globals exported by this file */
const char *progname = NULL;
-#define MAX_ON_EXIT_NICELY 20
-
-static struct
-{
- on_exit_nicely_callback function;
- void *arg;
-} on_exit_nicely_list[MAX_ON_EXIT_NICELY];
-
-static int on_exit_nicely_index;
-
/*
* Parse a --section=foo command line argument.
*
@@ -57,52 +47,3 @@ set_dump_section(const char *arg, int *dumpSections)
exit_nicely(1);
}
}
-
-
-/* Register a callback to be run when exit_nicely is invoked. */
-void
-on_exit_nicely(on_exit_nicely_callback function, void *arg)
-{
- if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
- {
- pg_log_fatal("out of on_exit_nicely slots");
- exit_nicely(1);
- }
- on_exit_nicely_list[on_exit_nicely_index].function = function;
- on_exit_nicely_list[on_exit_nicely_index].arg = arg;
- on_exit_nicely_index++;
-}
-
-/*
- * Run accumulated on_exit_nicely callbacks in reverse order and then exit
- * without printing any message.
- *
- * If running in a parallel worker thread on Windows, we only exit the thread,
- * not the whole process.
- *
- * Note that in parallel operation on Windows, the callback(s) will be run
- * by each thread since the list state is necessarily shared by all threads;
- * each callback must contain logic to ensure it does only what's appropriate
- * for its thread. On Unix, callbacks are also run by each process, but only
- * for callbacks established before we fork off the child processes. (It'd
- * be cleaner to reset the list after fork(), and let each child establish
- * its own callbacks; but then the behavior would be completely inconsistent
- * between Windows and Unix. For now, just be sure to establish callbacks
- * before forking to avoid inconsistency.)
- */
-void
-exit_nicely(int code)
-{
- int i;
-
- for (i = on_exit_nicely_index - 1; i >= 0; i--)
- on_exit_nicely_list[i].function(code,
- on_exit_nicely_list[i].arg);
-
-#ifdef WIN32
- if (parallel_init_done && GetCurrentThreadId() != mainThreadId)
- _endthreadex(code);
-#endif
-
- exit(code);
-}
diff --git a/src/bin/pg_dump/pg_backup_utils.h b/src/bin/pg_dump/pg_backup_utils.h
index 306798f9ac..ee4409c274 100644
--- a/src/bin/pg_dump/pg_backup_utils.h
+++ b/src/bin/pg_dump/pg_backup_utils.h
@@ -15,22 +15,14 @@
#ifndef PG_BACKUP_UTILS_H
#define PG_BACKUP_UTILS_H
-#include "common/logging.h"
-
/* bits returned by set_dump_section */
#define DUMP_PRE_DATA 0x01
#define DUMP_DATA 0x02
#define DUMP_POST_DATA 0x04
#define DUMP_UNSECTIONED 0xff
-typedef void (*on_exit_nicely_callback) (int code, void *arg);
-
extern const char *progname;
extern void set_dump_section(const char *arg, int *dumpSections);
-extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
-extern void exit_nicely(int code) pg_attribute_noreturn();
-
-#define fatal(...) do { pg_log_error(__VA_ARGS__); exit_nicely(1); } while(0)
#endif /* PG_BACKUP_UTILS_H */
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 10d6838cf9..d6c328faf1 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -23,6 +23,7 @@ OBJS = \
archive.o \
cancel.o \
conditional.o \
+ exit_utils.o \
mbprint.o \
print.o \
psqlscan.o \
diff --git a/src/fe_utils/exit_utils.c b/src/fe_utils/exit_utils.c
new file mode 100644
index 0000000000..8c2760a78c
--- /dev/null
+++ b/src/fe_utils/exit_utils.c
@@ -0,0 +1,105 @@
+/*-------------------------------------------------------------------------
+ *
+ * Exiting with cleanup callback facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/exit_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "fe_utils/exit_utils.h"
+
+#define MAX_ON_EXIT_NICELY 20
+
+typedef struct
+{
+ on_exit_nicely_callback function;
+ void *arg;
+} OnExitCallback;
+
+/*
+ * Storage for registered on_exit_nicely_callbacks to be run by exit_nicely()
+ * before exit()ing. The callbacks in the on_exit_nicely_list[] are run in
+ * reverse of the order that they were registered. The final_on_exit_nicely
+ * callback is run after all the ones from the list, regardless of when it was
+ * registered.
+ *
+ * Note that in parallel operation on Windows, the callback(s) will be run
+ * by each thread since the list state is necessarily shared by all
+ * threads; each callback must contain logic to ensure it does only what's
+ * appropriate for its thread. On Unix, callbacks are also run by each
+ * process, but only for callbacks established before we fork off the child
+ * processes. (It'd be cleaner to reset the list after fork(), and let
+ * each child establish its own callbacks; but then the behavior would be
+ * completely inconsistent between Windows and Unix. For now, just be sure
+ * to establish callbacks before forking to avoid inconsistency.)
+ */
+static OnExitCallback final_on_exit_nicely;
+static OnExitCallback on_exit_nicely_list[MAX_ON_EXIT_NICELY];
+static int on_exit_nicely_index;
+
+/*
+ * Register a callback to be run when exit_nicely is invoked.
+ *
+ * If you wish to register a callback which itself exits the application or
+ * thread, see on_exit_nicely_final() instead.
+ */
+void
+on_exit_nicely(on_exit_nicely_callback function, void *arg)
+{
+ if (on_exit_nicely_index >= MAX_ON_EXIT_NICELY)
+ {
+ pg_log_fatal("out of on_exit_nicely slots");
+ exit_nicely(1);
+ }
+ on_exit_nicely_list[on_exit_nicely_index].function = function;
+ on_exit_nicely_list[on_exit_nicely_index].arg = arg;
+ on_exit_nicely_index++;
+}
+
+/*
+ * Register a final callback to be run after all other on_exit_nicely callbacks
+ * have been processed.
+ *
+ * Only one final callback may be registered. Attempts to subsequently register
+ * additional final callbacks are an error.
+ *
+ * A callback registered via on_exit_nicely_final() may choose to exit the
+ * application or thread without risk of skipping any other callback logic. If
+ * the callback does not itself exit, exit_nicely() will exit() immediately
+ * after the final callback returns.
+ */
+void
+on_exit_nicely_final(on_exit_nicely_callback function, void *arg)
+{
+ if (final_on_exit_nicely.function)
+ {
+ pg_log_fatal("multiple on_exit_nicely_final callbacks");
+ exit_nicely(1);
+ }
+ final_on_exit_nicely.function = function;
+ final_on_exit_nicely.arg = arg;
+}
+
+/*
+ * Run accumulated on_exit_nicely callbacks in reverse order followed by the
+ * final on_exit_nicely callback, if any, and then exit() without printing any
+ * message.
+ */
+void
+exit_nicely(int code)
+{
+ int i;
+
+ for (i = on_exit_nicely_index - 1; i >= 0; i--)
+ on_exit_nicely_list[i].function(code,
+ on_exit_nicely_list[i].arg);
+
+ if (final_on_exit_nicely.function)
+ final_on_exit_nicely.function(code, final_on_exit_nicely.arg);
+
+ exit(code);
+}
diff --git a/src/include/fe_utils/exit_utils.h b/src/include/fe_utils/exit_utils.h
new file mode 100644
index 0000000000..6326404133
--- /dev/null
+++ b/src/include/fe_utils/exit_utils.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * Exiting with cleanup callback facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/exit_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef EXIT_UTILS_H
+#define EXIT_UTILS_H
+
+#include "postgres_fe.h"
+#include "common/logging.h"
+
+typedef void (*on_exit_nicely_callback) (int code, void *arg);
+
+extern void on_exit_nicely(on_exit_nicely_callback function, void *arg);
+extern void on_exit_nicely_final(on_exit_nicely_callback function, void *arg);
+extern void exit_nicely(int code) pg_attribute_noreturn();
+
+#define fatal(...) do { pg_log_error(__VA_ARGS__); exit_nicely(1); } while(0)
+
+#endif /* EXIT_UTILS_H */
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 7213e65e08..0c6c1c996f 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -147,7 +147,7 @@ sub mkvcbuild
our @pgcommonbkndfiles = @pgcommonallfiles;
our @pgfeutilsfiles = qw(
- archive.c cancel.c conditional.c mbprint.c print.c psqlscan.l
+ archive.c cancel.c conditional.c exit_utils.c mbprint.c print.c psqlscan.l
psqlscan.c simple_list.c string_utils.c recovery_gen.c);
$libpgport = $solution->AddProject('libpgport', 'lib', 'misc');
--
2.21.1 (Apple Git-122.3)
v33-0002-Introducing-PGresultHandler-abstraction.patchapplication/octet-stream; name=v33-0002-Introducing-PGresultHandler-abstraction.patch; x-unix-mode=0644Download
From 30ec41aa015af53af03d13f621f04f5a2ae1c20b Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 17 Jan 2021 11:35:57 -0800
Subject: [PATCH v33 2/8] Introducing PGresultHandler abstraction
Many frontend applications in src/bin and src/bin/scripts have
functions that include the pattern:
result = PQexec(...)
if (PQresultStatus(result) != ...)
{
pg_log_{notice,warning,error} ...
...
exit(1)
}
The exact error handling behavior differs from application to
application. Factoring functions out of frontend applications into
src/fe_utils requires abstracting the error handling so that various
callers can specify different behavior consistent with their
idiosyncratic needs.
For callers without idiosyncratic needs, who simply want to do
whatever is normal and customary, a default handler is defined.
This seems cleaner than duplicating the implementation of the most
customary error handling behavior across numerous frontend
applications. It also makes it easier for a code contributer to
know which behavior is standard, given that they would otherwise
need to survey all frontend applications trying to deduce which
behavior is most common.
Not done here, but the src/bin/scripts/scripts_parallel.[ch] have
hardcoded error handling appropriate to vacuumdb.c and reindexdb.c,
including the assumption that the parallel slots implement a command
rather than a query, and that no rows will be returned. The
PGresultHandler abstraction will be used to make parallel slots more
flexible so that pg_amcheck (which executes queries and expects rows
to be returned) can use them also.
---
src/fe_utils/Makefile | 1 +
src/fe_utils/pgreshandler.c | 55 +++++++++++++++++++++++++++++
src/include/fe_utils/pgreshandler.h | 27 ++++++++++++++
src/tools/msvc/Mkvcbuild.pm | 2 +-
4 files changed, 84 insertions(+), 1 deletion(-)
create mode 100644 src/fe_utils/pgreshandler.c
create mode 100644 src/include/fe_utils/pgreshandler.h
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index d6c328faf1..bc77d42dbe 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -25,6 +25,7 @@ OBJS = \
conditional.o \
exit_utils.o \
mbprint.o \
+ pgreshandler.o \
print.o \
psqlscan.o \
recovery_gen.o \
diff --git a/src/fe_utils/pgreshandler.c b/src/fe_utils/pgreshandler.c
new file mode 100644
index 0000000000..689c6770a5
--- /dev/null
+++ b/src/fe_utils/pgreshandler.c
@@ -0,0 +1,55 @@
+/*-------------------------------------------------------------------------
+ *
+ * Supporting routines for modular handling of PGresult error conditions.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/pgreshandler.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "fe_utils/exit_utils.h"
+#include "fe_utils/pgreshandler.h"
+
+/*
+ * Implements the PGresultHandler abstract interface.
+ *
+ * The pgres_default_handler can be used by applications not requiring
+ * divergent query error handling functionality. It checks a PGresult against
+ * the supplied expectations for status and number of rows returned. If they
+ * do not match, it logs errors and exits. The exact error messages and log
+ * levels are chosen to precisely match the historical pg_dump behavior.
+ */
+PGresult *
+pgres_default_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status, int expected_ntups,
+ const char *query)
+{
+ if (PQresultStatus(res) != expected_status)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", query);
+ PQfinish(conn);
+ exit_nicely(1);
+ }
+ if (expected_ntups >= 0)
+ {
+ int ntups = PQntuples(res);
+
+ if (ntups != expected_ntups)
+ {
+ if (expected_ntups == 1)
+ fatal(ngettext("query returned %d row instead of one: %s",
+ "query returned %d rows instead of one: %s",
+ ntups),
+ ntups, query);
+ fatal(ngettext("query returned %d row instead of %d: %s",
+ "query returned %d rows instead of %d: %s",
+ ntups),
+ ntups, expected_ntups, query);
+ }
+ }
+ return res;
+}
diff --git a/src/include/fe_utils/pgreshandler.h b/src/include/fe_utils/pgreshandler.h
new file mode 100644
index 0000000000..4cc4c02eca
--- /dev/null
+++ b/src/include/fe_utils/pgreshandler.h
@@ -0,0 +1,27 @@
+/*-------------------------------------------------------------------------
+ *
+ * Modular handling of PGresult error conditions.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/query_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PGRES_HANDLER_H
+#define PGRES_HANDLER_H
+
+#include "libpq-fe.h"
+
+typedef PGresult *(*PGresultHandler) (PGresult *res,
+ PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups,
+ const char *query);
+
+extern PGresult *pgres_default_handler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+
+#endif /* PGRES_HANDLER_H */
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 0c6c1c996f..4385962c7c 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -147,7 +147,7 @@ sub mkvcbuild
our @pgcommonbkndfiles = @pgcommonallfiles;
our @pgfeutilsfiles = qw(
- archive.c cancel.c conditional.c exit_utils.c mbprint.c print.c psqlscan.l
+ archive.c cancel.c conditional.c exit_utils.c mbprint.c pgreshandler.c print.c psqlscan.l
psqlscan.c simple_list.c string_utils.c recovery_gen.c);
$libpgport = $solution->AddProject('libpgport', 'lib', 'misc');
--
2.21.1 (Apple Git-122.3)
v33-0003-Preparing-for-move-of-parallel-slot-infrastructu.patchapplication/octet-stream; name=v33-0003-Preparing-for-move-of-parallel-slot-infrastructu.patch; x-unix-mode=0644Download
From 5e0dec3961f338f621d55bd3f666f69b5063fd90 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sat, 23 Jan 2021 16:24:32 -0800
Subject: [PATCH v33 3/8] Preparing for move of parallel slot infrastructure
Preparing to move src/bin/scripts/scripts_parallel.[ch] parallel
slot code into fe_utils.
ParallelSlotSetup() uses struct ConnParams, functions
connectDatabase() and disconnectDatabase(), and enum trivalue.
Moving those from src/bin/scripts/common.h into new file
src/include/fe_utils/connect_utils.h. Since connectDatabase() uses
executeQuery(), also defined in src/bin/scripts/common.h,
executeQuery() also needs to be moved to fe_utils.
These moves are somewhat messy, because connectDatabase() and
disconnectDatabase() are defined next to connectMaintenanceDatabase,
the latter not directly implicated in the anticipated move of
parallel slot code to fe_utils. But connectMaintenanceDatabase will
ultimately need to be moved for use by pg_amcheck, so moving these
three connection management functions to
fe_utils/connect_utils.[ch] now seems fine.
Also messy is that executeQuery() is defined next to
executeCommand() and executeMaintenanceCommand(), but these other
functions are also not directly implicated in the anticipated move
of parallel slot code to fe_utils. We will ultimately need
executeCommand() in pg_amcheck, but not executeMaintenanceCommand().
It seems wrong to split these up, though, so moving all three query
functions to fe_utils/query_utils.[ch] now seems acceptable.
The functions consumeQueryResult() and processQueryResult() in
src/bin/scripts/common.[ch] might seem to belong in
fe_utils/query_utils.[ch] for the same reasons that executeCommand()
and executeMaintenanceCommand(), but unlike the other functions we
are moving in this commit, they are used exclusively by the
parallel slot infrastructure and will therefore be moved/refactored
at the same time as it is.
---
src/bin/scripts/clusterdb.c | 1 +
src/bin/scripts/common.c | 241 +--------------------------
src/bin/scripts/common.h | 39 +----
src/bin/scripts/reindexdb.c | 1 +
src/bin/scripts/vacuumdb.c | 1 +
src/fe_utils/Makefile | 2 +
src/fe_utils/connect_utils.c | 170 +++++++++++++++++++
src/fe_utils/query_utils.c | 92 ++++++++++
src/include/fe_utils/connect_utils.h | 48 ++++++
src/include/fe_utils/query_utils.h | 26 +++
10 files changed, 343 insertions(+), 278 deletions(-)
create mode 100644 src/fe_utils/connect_utils.c
create mode 100644 src/fe_utils/query_utils.c
create mode 100644 src/include/fe_utils/connect_utils.h
create mode 100644 src/include/fe_utils/query_utils.h
diff --git a/src/bin/scripts/clusterdb.c b/src/bin/scripts/clusterdb.c
index 7d25bb31d4..24a5a549b4 100644
--- a/src/bin/scripts/clusterdb.c
+++ b/src/bin/scripts/clusterdb.c
@@ -13,6 +13,7 @@
#include "common.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 21ef297e6e..62645fd276 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -22,6 +22,7 @@
#include "common/logging.h"
#include "common/string.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
#define ERRCODE_UNDEFINED_TABLE "42P01"
@@ -49,246 +50,6 @@ handle_help_version_opts(int argc, char *argv[],
}
}
-
-/*
- * Make a database connection with the given parameters.
- *
- * An interactive password prompt is automatically issued if needed and
- * allowed by cparams->prompt_password.
- *
- * If allow_password_reuse is true, we will try to re-use any password
- * given during previous calls to this routine. (Callers should not pass
- * allow_password_reuse=true unless reconnecting to the same database+user
- * as before, else we might create password exposure hazards.)
- */
-PGconn *
-connectDatabase(const ConnParams *cparams, const char *progname,
- bool echo, bool fail_ok, bool allow_password_reuse)
-{
- PGconn *conn;
- bool new_pass;
- static char *password = NULL;
-
- /* Callers must supply at least dbname; other params can be NULL */
- Assert(cparams->dbname);
-
- if (!allow_password_reuse && password)
- {
- free(password);
- password = NULL;
- }
-
- if (cparams->prompt_password == TRI_YES && password == NULL)
- password = simple_prompt("Password: ", false);
-
- /*
- * Start the connection. Loop until we have a password if requested by
- * backend.
- */
- do
- {
- const char *keywords[8];
- const char *values[8];
- int i = 0;
-
- /*
- * If dbname is a connstring, its entries can override the other
- * values obtained from cparams; but in turn, override_dbname can
- * override the dbname component of it.
- */
- keywords[i] = "host";
- values[i++] = cparams->pghost;
- keywords[i] = "port";
- values[i++] = cparams->pgport;
- keywords[i] = "user";
- values[i++] = cparams->pguser;
- keywords[i] = "password";
- values[i++] = password;
- keywords[i] = "dbname";
- values[i++] = cparams->dbname;
- if (cparams->override_dbname)
- {
- keywords[i] = "dbname";
- values[i++] = cparams->override_dbname;
- }
- keywords[i] = "fallback_application_name";
- values[i++] = progname;
- keywords[i] = NULL;
- values[i++] = NULL;
- Assert(i <= lengthof(keywords));
-
- new_pass = false;
- conn = PQconnectdbParams(keywords, values, true);
-
- if (!conn)
- {
- pg_log_error("could not connect to database %s: out of memory",
- cparams->dbname);
- exit(1);
- }
-
- /*
- * No luck? Trying asking (again) for a password.
- */
- if (PQstatus(conn) == CONNECTION_BAD &&
- PQconnectionNeedsPassword(conn) &&
- cparams->prompt_password != TRI_NO)
- {
- PQfinish(conn);
- if (password)
- free(password);
- password = simple_prompt("Password: ", false);
- new_pass = true;
- }
- } while (new_pass);
-
- /* check to see that the backend connection was successfully made */
- if (PQstatus(conn) == CONNECTION_BAD)
- {
- if (fail_ok)
- {
- PQfinish(conn);
- return NULL;
- }
- pg_log_error("%s", PQerrorMessage(conn));
- exit(1);
- }
-
- /* Start strict; callers may override this. */
- PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
-
- return conn;
-}
-
-/*
- * Try to connect to the appropriate maintenance database.
- *
- * This differs from connectDatabase only in that it has a rule for
- * inserting a default "dbname" if none was given (which is why cparams
- * is not const). Note that cparams->dbname should typically come from
- * a --maintenance-db command line parameter.
- */
-PGconn *
-connectMaintenanceDatabase(ConnParams *cparams,
- const char *progname, bool echo)
-{
- PGconn *conn;
-
- /* If a maintenance database name was specified, just connect to it. */
- if (cparams->dbname)
- return connectDatabase(cparams, progname, echo, false, false);
-
- /* Otherwise, try postgres first and then template1. */
- cparams->dbname = "postgres";
- conn = connectDatabase(cparams, progname, echo, true, false);
- if (!conn)
- {
- cparams->dbname = "template1";
- conn = connectDatabase(cparams, progname, echo, false, false);
- }
- return conn;
-}
-
-/*
- * Disconnect the given connection, canceling any statement if one is active.
- */
-void
-disconnectDatabase(PGconn *conn)
-{
- char errbuf[256];
-
- Assert(conn != NULL);
-
- if (PQtransactionStatus(conn) == PQTRANS_ACTIVE)
- {
- PGcancel *cancel;
-
- if ((cancel = PQgetCancel(conn)))
- {
- (void) PQcancel(cancel, errbuf, sizeof(errbuf));
- PQfreeCancel(cancel);
- }
- }
-
- PQfinish(conn);
-}
-
-/*
- * Run a query, return the results, exit program on failure.
- */
-PGresult *
-executeQuery(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
-
- if (echo)
- printf("%s\n", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_TUPLES_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_info("query was: %s", query);
- PQfinish(conn);
- exit(1);
- }
-
- return res;
-}
-
-
-/*
- * As above for a SQL command (which returns nothing).
- */
-void
-executeCommand(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
-
- if (echo)
- printf("%s\n", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_COMMAND_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_info("query was: %s", query);
- PQfinish(conn);
- exit(1);
- }
-
- PQclear(res);
-}
-
-
-/*
- * As above for a SQL maintenance command (returns command success).
- * Command is executed with a cancel handler set, so Ctrl-C can
- * interrupt it.
- */
-bool
-executeMaintenanceCommand(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
- bool r;
-
- if (echo)
- printf("%s\n", query);
-
- SetCancelConn(conn);
- res = PQexec(conn, query);
- ResetCancelConn();
-
- r = (res && PQresultStatus(res) == PGRES_COMMAND_OK);
-
- if (res)
- PQclear(res);
-
- return r;
-}
-
/*
* Consume all the results generated for the given connection until
* nothing remains. If at least one error is encountered, return false.
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index 5630975712..ae19c420df 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -10,54 +10,17 @@
#define COMMON_H
#include "common/username.h"
+#include "fe_utils/connect_utils.h"
#include "getopt_long.h" /* pgrminclude ignore */
#include "libpq-fe.h"
#include "pqexpbuffer.h" /* pgrminclude ignore */
-enum trivalue
-{
- TRI_DEFAULT,
- TRI_NO,
- TRI_YES
-};
-
-/* Parameters needed by connectDatabase/connectMaintenanceDatabase */
-typedef struct _connParams
-{
- /* These fields record the actual command line parameters */
- const char *dbname; /* this may be a connstring! */
- const char *pghost;
- const char *pgport;
- const char *pguser;
- enum trivalue prompt_password;
- /* If not NULL, this overrides the dbname obtained from command line */
- /* (but *only* the DB name, not anything else in the connstring) */
- const char *override_dbname;
-} ConnParams;
-
typedef void (*help_handler) (const char *progname);
extern void handle_help_version_opts(int argc, char *argv[],
const char *fixed_progname,
help_handler hlp);
-extern PGconn *connectDatabase(const ConnParams *cparams,
- const char *progname,
- bool echo, bool fail_ok,
- bool allow_password_reuse);
-
-extern PGconn *connectMaintenanceDatabase(ConnParams *cparams,
- const char *progname, bool echo);
-
-extern void disconnectDatabase(PGconn *conn);
-
-extern PGresult *executeQuery(PGconn *conn, const char *query, bool echo);
-
-extern void executeCommand(PGconn *conn, const char *query, bool echo);
-
-extern bool executeMaintenanceCommand(PGconn *conn, const char *query,
- bool echo);
-
extern bool consumeQueryResult(PGconn *conn);
extern bool processQueryResult(PGconn *conn, PGresult *result);
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index dece8200fa..c9289ae78d 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -16,6 +16,7 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
#include "scripts_parallel.h"
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 8246327770..55c974ff6d 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -18,6 +18,7 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
#include "scripts_parallel.h"
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index bc77d42dbe..9ddf324584 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -23,11 +23,13 @@ OBJS = \
archive.o \
cancel.o \
conditional.o \
+ connect_utils.o \
exit_utils.o \
mbprint.o \
pgreshandler.o \
print.o \
psqlscan.o \
+ query_utils.o \
recovery_gen.o \
simple_list.o \
string_utils.o
diff --git a/src/fe_utils/connect_utils.c b/src/fe_utils/connect_utils.c
new file mode 100644
index 0000000000..7475e2f366
--- /dev/null
+++ b/src/fe_utils/connect_utils.c
@@ -0,0 +1,170 @@
+#include "postgres_fe.h"
+
+#include "common/connect.h"
+#include "common/logging.h"
+#include "common/string.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * Make a database connection with the given parameters.
+ *
+ * An interactive password prompt is automatically issued if needed and
+ * allowed by cparams->prompt_password.
+ *
+ * If allow_password_reuse is true, we will try to re-use any password
+ * given during previous calls to this routine. (Callers should not pass
+ * allow_password_reuse=true unless reconnecting to the same database+user
+ * as before, else we might create password exposure hazards.)
+ */
+PGconn *
+connectDatabase(const ConnParams *cparams, const char *progname,
+ bool echo, bool fail_ok, bool allow_password_reuse)
+{
+ PGconn *conn;
+ bool new_pass;
+ static char *password = NULL;
+
+ /* Callers must supply at least dbname; other params can be NULL */
+ Assert(cparams->dbname);
+
+ if (!allow_password_reuse && password)
+ {
+ free(password);
+ password = NULL;
+ }
+
+ if (cparams->prompt_password == TRI_YES && password == NULL)
+ password = simple_prompt("Password: ", false);
+
+ /*
+ * Start the connection. Loop until we have a password if requested by
+ * backend.
+ */
+ do
+ {
+ const char *keywords[8];
+ const char *values[8];
+ int i = 0;
+
+ /*
+ * If dbname is a connstring, its entries can override the other
+ * values obtained from cparams; but in turn, override_dbname can
+ * override the dbname component of it.
+ */
+ keywords[i] = "host";
+ values[i++] = cparams->pghost;
+ keywords[i] = "port";
+ values[i++] = cparams->pgport;
+ keywords[i] = "user";
+ values[i++] = cparams->pguser;
+ keywords[i] = "password";
+ values[i++] = password;
+ keywords[i] = "dbname";
+ values[i++] = cparams->dbname;
+ if (cparams->override_dbname)
+ {
+ keywords[i] = "dbname";
+ values[i++] = cparams->override_dbname;
+ }
+ keywords[i] = "fallback_application_name";
+ values[i++] = progname;
+ keywords[i] = NULL;
+ values[i++] = NULL;
+ Assert(i <= lengthof(keywords));
+
+ new_pass = false;
+ conn = PQconnectdbParams(keywords, values, true);
+
+ if (!conn)
+ {
+ pg_log_error("could not connect to database %s: out of memory",
+ cparams->dbname);
+ exit(1);
+ }
+
+ /*
+ * No luck? Trying asking (again) for a password.
+ */
+ if (PQstatus(conn) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(conn) &&
+ cparams->prompt_password != TRI_NO)
+ {
+ PQfinish(conn);
+ if (password)
+ free(password);
+ password = simple_prompt("Password: ", false);
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ /* check to see that the backend connection was successfully made */
+ if (PQstatus(conn) == CONNECTION_BAD)
+ {
+ if (fail_ok)
+ {
+ PQfinish(conn);
+ return NULL;
+ }
+ pg_log_error("%s", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* Start strict; callers may override this. */
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
+
+ return conn;
+}
+
+/*
+ * Try to connect to the appropriate maintenance database.
+ *
+ * This differs from connectDatabase only in that it has a rule for
+ * inserting a default "dbname" if none was given (which is why cparams
+ * is not const). Note that cparams->dbname should typically come from
+ * a --maintenance-db command line parameter.
+ */
+PGconn *
+connectMaintenanceDatabase(ConnParams *cparams,
+ const char *progname, bool echo)
+{
+ PGconn *conn;
+
+ /* If a maintenance database name was specified, just connect to it. */
+ if (cparams->dbname)
+ return connectDatabase(cparams, progname, echo, false, false);
+
+ /* Otherwise, try postgres first and then template1. */
+ cparams->dbname = "postgres";
+ conn = connectDatabase(cparams, progname, echo, true, false);
+ if (!conn)
+ {
+ cparams->dbname = "template1";
+ conn = connectDatabase(cparams, progname, echo, false, false);
+ }
+ return conn;
+}
+
+/*
+ * Disconnect the given connection, canceling any statement if one is active.
+ */
+void
+disconnectDatabase(PGconn *conn)
+{
+ char errbuf[256];
+
+ Assert(conn != NULL);
+
+ if (PQtransactionStatus(conn) == PQTRANS_ACTIVE)
+ {
+ PGcancel *cancel;
+
+ if ((cancel = PQgetCancel(conn)))
+ {
+ (void) PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(conn);
+}
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
new file mode 100644
index 0000000000..a70ae3c082
--- /dev/null
+++ b/src/fe_utils/query_utils.c
@@ -0,0 +1,92 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to query a databases.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/query_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * Run a query, return the results, exit program on failure.
+ */
+PGresult *
+executeQuery(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+
+ if (echo)
+ printf("%s\n", query);
+
+ res = PQexec(conn, query);
+ if (!res ||
+ PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_info("query was: %s", query);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ return res;
+}
+
+
+/*
+ * As above for a SQL command (which returns nothing).
+ */
+void
+executeCommand(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+
+ if (echo)
+ printf("%s\n", query);
+
+ res = PQexec(conn, query);
+ if (!res ||
+ PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_info("query was: %s", query);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ PQclear(res);
+}
+
+
+/*
+ * As above for a SQL maintenance command (returns command success).
+ * Command is executed with a cancel handler set, so Ctrl-C can
+ * interrupt it.
+ */
+bool
+executeMaintenanceCommand(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+ bool r;
+
+ if (echo)
+ printf("%s\n", query);
+
+ SetCancelConn(conn);
+ res = PQexec(conn, query);
+ ResetCancelConn();
+
+ r = (res && PQresultStatus(res) == PGRES_COMMAND_OK);
+
+ if (res)
+ PQclear(res);
+
+ return r;
+}
diff --git a/src/include/fe_utils/connect_utils.h b/src/include/fe_utils/connect_utils.h
new file mode 100644
index 0000000000..8fde0ea2a0
--- /dev/null
+++ b/src/include/fe_utils/connect_utils.h
@@ -0,0 +1,48 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to connect to and disconnect from databases.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/connect_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CONNECT_UTILS_H
+#define CONNECT_UTILS_H
+
+#include "libpq-fe.h"
+
+enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+};
+
+/* Parameters needed by connectDatabase/connectMaintenanceDatabase */
+typedef struct _connParams
+{
+ /* These fields record the actual command line parameters */
+ const char *dbname; /* this may be a connstring! */
+ const char *pghost;
+ const char *pgport;
+ const char *pguser;
+ enum trivalue prompt_password;
+ /* If not NULL, this overrides the dbname obtained from command line */
+ /* (but *only* the DB name, not anything else in the connstring) */
+ const char *override_dbname;
+} ConnParams;
+
+extern PGconn *connectDatabase(const ConnParams *cparams,
+ const char *progname,
+ bool echo, bool fail_ok,
+ bool allow_password_reuse);
+
+extern PGconn *connectMaintenanceDatabase(ConnParams *cparams,
+ const char *progname, bool echo);
+
+extern void disconnectDatabase(PGconn *conn);
+
+#endif /* CONNECT_UTILS_H */
diff --git a/src/include/fe_utils/query_utils.h b/src/include/fe_utils/query_utils.h
new file mode 100644
index 0000000000..1f5812bbf6
--- /dev/null
+++ b/src/include/fe_utils/query_utils.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to query a databases.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/query_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERY_UTILS_H
+#define QUERY_UTILS_H
+
+#include "postgres_fe.h"
+
+#include "libpq-fe.h"
+
+extern PGresult *executeQuery(PGconn *conn, const char *query, bool echo);
+
+extern void executeCommand(PGconn *conn, const char *query, bool echo);
+
+extern bool executeMaintenanceCommand(PGconn *conn, const char *query,
+ bool echo);
+
+#endif /* QUERY_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v33-0004-Moving-and-renaming-scripts_parallel.patchapplication/octet-stream; name=v33-0004-Moving-and-renaming-scripts_parallel.patch; x-unix-mode=0644Download
From af7d8b679e7e7920aa086b913c3c6ab292f722ed Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sat, 23 Jan 2021 17:35:51 -0800
Subject: [PATCH v33 4/8] Moving and renaming scripts_parallel
Moving src/bin/scripts/scripts_parallel.[ch] to
fe_utils/parallel_slot.[ch]. During the move, I couldn't resist
removing an unnecessary blank line from the header. This header
file is about to get changed in the next patch right around where
this blank line is located, and I don't want that change to appear
as a substitution but merely as an addition.
Moving functions consumeQueryResult() and processQueryResult() from
src/bin/scripts/scripts_parallel.[ch] int new file
fe_utils/parallel_slot.c and making them static, since they are used
nowhere else.
---
src/bin/scripts/Makefile | 6 +-
src/bin/scripts/common.c | 53 ----------------
src/bin/scripts/common.h | 4 --
src/bin/scripts/nls.mk | 2 +-
src/bin/scripts/reindexdb.c | 2 +-
src/bin/scripts/vacuumdb.c | 2 +-
src/fe_utils/Makefile | 1 +
.../parallel_slot.c} | 63 +++++++++++++++++--
.../fe_utils/parallel_slot.h} | 13 ++--
9 files changed, 71 insertions(+), 75 deletions(-)
rename src/{bin/scripts/scripts_parallel.c => fe_utils/parallel_slot.c} (80%)
rename src/{bin/scripts/scripts_parallel.h => include/fe_utils/parallel_slot.h} (82%)
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index a02e4e430c..b8d7cf2f2d 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -28,8 +28,8 @@ createuser: createuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport
dropdb: dropdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
dropuser: dropuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-vacuumdb: vacuumdb.o common.o scripts_parallel.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-reindexdb: reindexdb.o common.o scripts_parallel.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+vacuumdb: vacuumdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
install: all installdirs
@@ -50,7 +50,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
- rm -f common.o scripts_parallel.o $(WIN32RES)
+ rm -f common.o $(WIN32RES)
rm -rf tmp_check
check:
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 62645fd276..c7fdd3adcb 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -25,8 +25,6 @@
#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
-#define ERRCODE_UNDEFINED_TABLE "42P01"
-
/*
* Provide strictly harmonized handling of --help and --version
* options.
@@ -50,57 +48,6 @@ handle_help_version_opts(int argc, char *argv[],
}
}
-/*
- * Consume all the results generated for the given connection until
- * nothing remains. If at least one error is encountered, return false.
- * Note that this will block if the connection is busy.
- */
-bool
-consumeQueryResult(PGconn *conn)
-{
- bool ok = true;
- PGresult *result;
-
- SetCancelConn(conn);
- while ((result = PQgetResult(conn)) != NULL)
- {
- if (!processQueryResult(conn, result))
- ok = false;
- }
- ResetCancelConn();
- return ok;
-}
-
-/*
- * Process (and delete) a query result. Returns true if there's no error,
- * false otherwise -- but errors about trying to work on a missing relation
- * are reported and subsequently ignored.
- */
-bool
-processQueryResult(PGconn *conn, PGresult *result)
-{
- /*
- * If it's an error, report it. Errors about a missing table are harmless
- * so we continue processing; but die for other errors.
- */
- if (PQresultStatus(result) != PGRES_COMMAND_OK)
- {
- char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
-
- pg_log_error("processing of database \"%s\" failed: %s",
- PQdb(conn), PQerrorMessage(conn));
-
- if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
- {
- PQclear(result);
- return false;
- }
- }
-
- PQclear(result);
- return true;
-}
-
/*
* Split TABLE[(COLUMNS)] into TABLE and [(COLUMNS)] portions. When you
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index ae19c420df..54e6575a7b 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -21,10 +21,6 @@ extern void handle_help_version_opts(int argc, char *argv[],
const char *fixed_progname,
help_handler hlp);
-extern bool consumeQueryResult(PGconn *conn);
-
-extern bool processQueryResult(PGconn *conn, PGresult *result);
-
extern void splitTableColumnsSpec(const char *spec, int encoding,
char **table, const char **columns);
diff --git a/src/bin/scripts/nls.mk b/src/bin/scripts/nls.mk
index 5d5dd11b7b..7fc716092e 100644
--- a/src/bin/scripts/nls.mk
+++ b/src/bin/scripts/nls.mk
@@ -7,7 +7,7 @@ GETTEXT_FILES = $(FRONTEND_COMMON_GETTEXT_FILES) \
clusterdb.c vacuumdb.c reindexdb.c \
pg_isready.c \
common.c \
- scripts_parallel.c \
+ ../../fe_utils/parallel_slot.c \
../../fe_utils/cancel.c ../../fe_utils/print.c \
../../common/fe_memutils.c ../../common/username.c
GETTEXT_TRIGGERS = $(FRONTEND_COMMON_GETTEXT_TRIGGERS) simple_prompt yesno_prompt
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index c9289ae78d..b03c94f35f 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -16,10 +16,10 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/parallel_slot.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
-#include "scripts_parallel.h"
typedef enum ReindexType
{
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 55c974ff6d..a4f5d545a7 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -18,10 +18,10 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/parallel_slot.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
-#include "scripts_parallel.h"
/* vacuum options controlled by user flags */
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 9ddf324584..bd499e6045 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -26,6 +26,7 @@ OBJS = \
connect_utils.o \
exit_utils.o \
mbprint.o \
+ parallel_slot.o \
pgreshandler.o \
print.o \
psqlscan.o \
diff --git a/src/bin/scripts/scripts_parallel.c b/src/fe_utils/parallel_slot.c
similarity index 80%
rename from src/bin/scripts/scripts_parallel.c
rename to src/fe_utils/parallel_slot.c
index 1f863a1bb4..3987a4702b 100644
--- a/src/bin/scripts/scripts_parallel.c
+++ b/src/fe_utils/parallel_slot.c
@@ -1,13 +1,13 @@
/*-------------------------------------------------------------------------
*
- * scripts_parallel.c
- * Parallel support for bin/scripts/
+ * parallel_slot.c
+ * Parallel support for front-end parallel database connections
*
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
- * src/bin/scripts/scripts_parallel.c
+ * src/fe_utils/parallel_slot.c
*
*-------------------------------------------------------------------------
*/
@@ -22,13 +22,15 @@
#include <sys/select.h>
#endif
-#include "common.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
-#include "scripts_parallel.h"
+#include "fe_utils/parallel_slot.h"
+
+#define ERRCODE_UNDEFINED_TABLE "42P01"
static void init_slot(ParallelSlot *slot, PGconn *conn);
static int select_loop(int maxFd, fd_set *workerset);
+static bool processQueryResult(PGconn *conn, PGresult *result);
static void
init_slot(ParallelSlot *slot, PGconn *conn)
@@ -38,6 +40,57 @@ init_slot(ParallelSlot *slot, PGconn *conn)
slot->isFree = true;
}
+/*
+ * Process (and delete) a query result. Returns true if there's no error,
+ * false otherwise -- but errors about trying to work on a missing relation
+ * are reported and subsequently ignored.
+ */
+static bool
+processQueryResult(PGconn *conn, PGresult *result)
+{
+ /*
+ * If it's an error, report it. Errors about a missing table are harmless
+ * so we continue processing; but die for other errors.
+ */
+ if (PQresultStatus(result) != PGRES_COMMAND_OK)
+ {
+ char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+
+ pg_log_error("processing of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
+
+ if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
+ {
+ PQclear(result);
+ return false;
+ }
+ }
+
+ PQclear(result);
+ return true;
+}
+
+/*
+ * Consume all the results generated for the given connection until
+ * nothing remains. If at least one error is encountered, return false.
+ * Note that this will block if the connection is busy.
+ */
+static bool
+consumeQueryResult(PGconn *conn)
+{
+ bool ok = true;
+ PGresult *result;
+
+ SetCancelConn(conn);
+ while ((result = PQgetResult(conn)) != NULL)
+ {
+ if (!processQueryResult(conn, result))
+ ok = false;
+ }
+ ResetCancelConn();
+ return ok;
+}
+
/*
* Wait until a file descriptor from the given set becomes readable.
*
diff --git a/src/bin/scripts/scripts_parallel.h b/src/include/fe_utils/parallel_slot.h
similarity index 82%
rename from src/bin/scripts/scripts_parallel.h
rename to src/include/fe_utils/parallel_slot.h
index f62692510a..99eeb3328d 100644
--- a/src/bin/scripts/scripts_parallel.h
+++ b/src/include/fe_utils/parallel_slot.h
@@ -1,21 +1,20 @@
/*-------------------------------------------------------------------------
*
- * scripts_parallel.h
+ * parallel_slot.h
* Parallel support for bin/scripts/
*
* Copyright (c) 2003-2021, PostgreSQL Global Development Group
*
- * src/bin/scripts/scripts_parallel.h
+ * src/include/fe_utils/parallel_slot.h
*
*-------------------------------------------------------------------------
*/
-#ifndef SCRIPTS_PARALLEL_H
-#define SCRIPTS_PARALLEL_H
+#ifndef PARALLEL_SLOT_H
+#define PARALLEL_SLOT_H
-#include "common.h"
+#include "fe_utils/connect_utils.h"
#include "libpq-fe.h"
-
typedef struct ParallelSlot
{
PGconn *connection; /* One connection */
@@ -33,4 +32,4 @@ extern void ParallelSlotsTerminate(ParallelSlot *slots, int numslots);
extern bool ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots);
-#endif /* SCRIPTS_PARALLEL_H */
+#endif /* PARALLEL_SLOT_H */
--
2.21.1 (Apple Git-122.3)
v33-0005-Parameterizing-parallel-slot-result-handling.patchapplication/octet-stream; name=v33-0005-Parameterizing-parallel-slot-result-handling.patch; x-unix-mode=0644Download
From ae91c1271ce36263d46f7f97335866b6646dbb0b Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sat, 23 Jan 2021 18:26:15 -0800
Subject: [PATCH v33 5/8] Parameterizing parallel slot result handling
The function consumeQueryResult was being used to handle all results
returned by queries executed through the parallel slot interface,
but this hardcodes knowledge about the expectations of reindexdb and
vacuumdb such as the expected result status being PGRES_COMMAND_OK
(as opposed to, say, PGRES_TUPLES_OK).
Reworking the slot interface to optionally include a PGresultHandler
and related fields per slot. The idea is that a caller who executes
a command or query through the slot can set the handler to be called
when the query completes.
The old logic of consumeQueryResults is moved into a new callback
function, TableCommandSlotHandler(), which gets registered as the
slot handler explicitly from vacuumdb and reindexdb. This is
defined in fe_utils/parallel_slot.c rather than somewhere in
src/bin/scripts where its only callers reside, partly to keep it
close to the rest of the shared parallel slot handling code and
partly in anticipation that other utility programs will eventually
want to use it also.
Adding a default handler which is used to handle results for slots
which have no handler explicitly registered. The default simply
checks the status of the result and makes a judgement about whether
the status is ok, similarly to psql's AcceptResult(). I also
considered whether to just have a missing handler always be an
error, but decided against requiring users of the parallel slot
infrastructure to pedantically specify the default handler. Both
designs seem reasonable, but the tie-breaker for me is that edge
cases that do not come up in testing will be better handled in
production with this design than with pedantically erroring out.
The expectation of this commit is that pg_amcheck will have handlers
for table and index checks which will process the PGresults of calls
to the amcheck functions. This commit sets up the infrastructure
necessary to support those handlers being different from the one
used by vacuumdb and reindexdb.
---
src/bin/scripts/reindexdb.c | 2 +
src/bin/scripts/vacuumdb.c | 2 +
src/fe_utils/parallel_slot.c | 143 +++++++++++++++++++++------
src/include/fe_utils/parallel_slot.h | 44 +++++++++
4 files changed, 163 insertions(+), 28 deletions(-)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index b03c94f35f..af0cc2bb00 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -465,6 +465,8 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
goto finish;
}
+ ParallelSlotSetHandler(free_slot, TableCommandSlotHandler,
+ PGRES_COMMAND_OK, -1, NULL);
run_reindex_command(free_slot->connection, process_type, objname,
echo, verbose, concurrently, true);
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index a4f5d545a7..10ab894f10 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -712,6 +712,8 @@ vacuum_one_database(const ConnParams *cparams,
* Execute the vacuum. All errors are handled in processQueryResult
* through ParallelSlotsGetIdle.
*/
+ ParallelSlotSetHandler(free_slot, TableCommandSlotHandler,
+ PGRES_COMMAND_OK, -1, sql.data);
run_vacuum_command(free_slot->connection, sql.data,
echo, tabname);
diff --git a/src/fe_utils/parallel_slot.c b/src/fe_utils/parallel_slot.c
index 3987a4702b..f1e78089e9 100644
--- a/src/fe_utils/parallel_slot.c
+++ b/src/fe_utils/parallel_slot.c
@@ -30,7 +30,7 @@
static void init_slot(ParallelSlot *slot, PGconn *conn);
static int select_loop(int maxFd, fd_set *workerset);
-static bool processQueryResult(PGconn *conn, PGresult *result);
+static bool handleOneQueryResult(ParallelSlot *slot, PGresult *result);
static void
init_slot(ParallelSlot *slot, PGconn *conn)
@@ -38,53 +38,47 @@ init_slot(ParallelSlot *slot, PGconn *conn)
slot->connection = conn;
/* Initially assume connection is idle */
slot->isFree = true;
+ ParallelSlotClearHandler(slot);
}
/*
- * Process (and delete) a query result. Returns true if there's no error,
- * false otherwise -- but errors about trying to work on a missing relation
- * are reported and subsequently ignored.
+ * Invoke the slot's handler for a single query result, or fall back to the
+ * default handler if none is defined for the slot. Returns true if the
+ * handler reports that there's no error, false otherwise.
*/
static bool
-processQueryResult(PGconn *conn, PGresult *result)
+handleOneQueryResult(ParallelSlot *slot, PGresult *result)
{
- /*
- * If it's an error, report it. Errors about a missing table are harmless
- * so we continue processing; but die for other errors.
- */
- if (PQresultStatus(result) != PGRES_COMMAND_OK)
- {
- char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+ PGresultHandler handler = slot->handler;
- pg_log_error("processing of database \"%s\" failed: %s",
- PQdb(conn), PQerrorMessage(conn));
+ if (!handler)
+ handler = DefaultSlotHandler;
- if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
- {
- PQclear(result);
- return false;
- }
- }
+ /* On failure, the handler should return NULL after freeing the result. */
+ if (!handler(result, slot->connection, slot->expected_status,
+ slot->expected_ntups, slot->query))
+ return false;
+ /* Ok, we have to free it ourself */
PQclear(result);
return true;
}
/*
- * Consume all the results generated for the given connection until
+ * Handle all the results generated for the given connection until
* nothing remains. If at least one error is encountered, return false.
* Note that this will block if the connection is busy.
*/
static bool
-consumeQueryResult(PGconn *conn)
+handleQueryResults(ParallelSlot *slot)
{
bool ok = true;
PGresult *result;
- SetCancelConn(conn);
- while ((result = PQgetResult(conn)) != NULL)
+ SetCancelConn(slot->connection);
+ while ((result = PQgetResult(slot->connection)) != NULL)
{
- if (!processQueryResult(conn, result))
+ if (!handleOneQueryResult(slot, result))
ok = false;
}
ResetCancelConn();
@@ -227,14 +221,15 @@ ParallelSlotsGetIdle(ParallelSlot *slots, int numslots)
if (result != NULL)
{
- /* Check and discard the command result */
- if (!processQueryResult(slots[i].connection, result))
+ /* Handle and discard the command result */
+ if (!handleOneQueryResult(slots + i, result))
return NULL;
}
else
{
/* This connection has become idle */
slots[i].isFree = true;
+ ParallelSlotClearHandler(slots + i);
if (firstFree < 0)
firstFree = i;
break;
@@ -329,9 +324,101 @@ ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots)
for (i = 0; i < numslots; i++)
{
- if (!consumeQueryResult((slots + i)->connection))
+ if (!handleQueryResults(slots + i))
return false;
}
return true;
}
+
+/*
+ * DefaultSlotHandler
+ * default handler of query results for slots with no handler registered.
+ *
+ * This gets called when the slot's handler is NULL, but it could also be used
+ * explicitly. Either way, we do not trust 'expected_status', 'expected_ntups'
+ * or 'query' fields to have been defined, since a user who neglected to set up
+ * the handler may well also have neglected to set up these other fields. So
+ * we ignore them and only consider whether the result status looks like a
+ * success.
+ */
+PGresult *
+DefaultSlotHandler(PGresult *res, PGconn *conn, ExecStatusType expected_status,
+ int expected_ntups, const char *query)
+{
+ switch (PQresultStatus(res))
+ {
+ /* Success codes */
+ case PGRES_EMPTY_QUERY:
+ case PGRES_COMMAND_OK:
+ case PGRES_TUPLES_OK:
+ case PGRES_COPY_OUT:
+ case PGRES_COPY_IN:
+ case PGRES_COPY_BOTH:
+ case PGRES_SINGLE_TUPLE:
+ /* Ok */
+ return res;
+
+ /*
+ * Error codes.
+ *
+ * There is no default here, as we want the compiler to warn about
+ * missing cases.
+ */
+ case PGRES_BAD_RESPONSE:
+ case PGRES_NONFATAL_ERROR:
+ case PGRES_FATAL_ERROR:
+ break;
+ }
+
+ /*
+ * Handle all error cases here, including anything not matched in the
+ * switch (though that should not happen.) The 'query' argument may be
+ * NULL or garbage left over from a prior usage of the lot. Don't include
+ * it in the error message!
+ */
+ pg_log_error("processing in database \"%s\" failed: %s", PQdb(conn),
+ PQerrorMessage(conn));
+ PQclear(res);
+ return NULL;
+}
+
+/*
+ * TableCommandSlotHandler
+ * handler for results of commands against tables
+ *
+ * Requires that the result status is either PGRES_COMMAND_OK or an error about
+ * a missing table. This is useful for utilities that compile a list of tables
+ * to process and then run commands (vacuum, reindex, or whatever) against
+ * those tables, as there is a race condition between the time the list is
+ * compiled and the time the command attempts to open the table.
+ *
+ * For missing tables, logs an error but allows processing to continue.
+ *
+ * For all other errors, logs an error and terminates further processing.
+ */
+PGresult *
+TableCommandSlotHandler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status, int expected_ntups,
+ const char *query)
+{
+ /*
+ * If it's an error, report it. Errors about a missing table are harmless
+ * so we continue processing; but die for other errors.
+ */
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ char *sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+ pg_log_error("processing of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
+
+ if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
+ {
+ PQclear(res);
+ return NULL;
+ }
+ }
+
+ return res;
+}
diff --git a/src/include/fe_utils/parallel_slot.h b/src/include/fe_utils/parallel_slot.h
index 99eeb3328d..007c764067 100644
--- a/src/include/fe_utils/parallel_slot.h
+++ b/src/include/fe_utils/parallel_slot.h
@@ -13,14 +13,50 @@
#define PARALLEL_SLOT_H
#include "fe_utils/connect_utils.h"
+#include "fe_utils/pgreshandler.h"
#include "libpq-fe.h"
typedef struct ParallelSlot
{
PGconn *connection; /* One connection */
bool isFree; /* Is it known to be idle? */
+
+ /*
+ * If a command or query has been issued on 'connection',
+ * the following fields store our expectations about the
+ * result we should get back.
+ */
+ PGresultHandler handler;
+ ExecStatusType expected_status;
+ int expected_ntups;
+
+ /*
+ * If not null, should contain the query string for the
+ * currently executing query, for use by the handler.
+ */
+ const char *query;
} ParallelSlot;
+static inline void
+ParallelSlotSetHandler(ParallelSlot *slot, PGresultHandler handler,
+ ExecStatusType expected_status, int expected_ntups,
+ const char *query)
+{
+ slot->handler = handler;
+ slot->expected_status = expected_status;
+ slot->expected_ntups = expected_ntups;
+ slot->query = query;
+}
+
+static inline void
+ParallelSlotClearHandler(ParallelSlot *slot)
+{
+ slot->handler = NULL;
+ slot->expected_status = -1;
+ slot->expected_ntups = -1;
+ slot->query = NULL;
+}
+
extern ParallelSlot *ParallelSlotsGetIdle(ParallelSlot *slots, int numslots);
extern ParallelSlot *ParallelSlotsSetup(const ConnParams *cparams,
@@ -31,5 +67,13 @@ extern void ParallelSlotsTerminate(ParallelSlot *slots, int numslots);
extern bool ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots);
+extern PGresult *DefaultSlotHandler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+
+extern PGresult *TableCommandSlotHandler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups,
+ const char *query);
#endif /* PARALLEL_SLOT_H */
--
2.21.1 (Apple Git-122.3)
v33-0006-Moving-handle_help_version_opts.patchapplication/octet-stream; name=v33-0006-Moving-handle_help_version_opts.patch; x-unix-mode=0644Download
From 25cd1302bda0f0a74e93e76472631361d5612266 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 24 Jan 2021 09:17:08 -0800
Subject: [PATCH v33 6/8] Moving handle_help_version_opts
Moving handle_help_version_opts from src/bin/scripts/common.[ch] to
new files fe_utils/option_utils.[ch] in anticipation of pg_amcheck
needing to use it.
---
src/bin/scripts/clusterdb.c | 1 +
src/bin/scripts/common.c | 24 ------------------
src/bin/scripts/common.h | 6 -----
src/bin/scripts/createdb.c | 1 +
src/bin/scripts/createuser.c | 1 +
src/bin/scripts/dropdb.c | 1 +
src/bin/scripts/dropuser.c | 1 +
src/bin/scripts/pg_isready.c | 1 +
src/bin/scripts/reindexdb.c | 1 +
src/bin/scripts/vacuumdb.c | 1 +
src/fe_utils/Makefile | 1 +
src/fe_utils/option_utils.c | 38 +++++++++++++++++++++++++++++
src/include/fe_utils/option_utils.h | 23 +++++++++++++++++
src/tools/msvc/Mkvcbuild.pm | 2 +-
14 files changed, 71 insertions(+), 31 deletions(-)
create mode 100644 src/fe_utils/option_utils.c
create mode 100644 src/include/fe_utils/option_utils.h
diff --git a/src/bin/scripts/clusterdb.c b/src/bin/scripts/clusterdb.c
index 24a5a549b4..fc771eed77 100644
--- a/src/bin/scripts/clusterdb.c
+++ b/src/bin/scripts/clusterdb.c
@@ -13,6 +13,7 @@
#include "common.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index c7fdd3adcb..c86c19eae2 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -25,30 +25,6 @@
#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
-/*
- * Provide strictly harmonized handling of --help and --version
- * options.
- */
-void
-handle_help_version_opts(int argc, char *argv[],
- const char *fixed_progname, help_handler hlp)
-{
- if (argc > 1)
- {
- if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
- {
- hlp(get_progname(argv[0]));
- exit(0);
- }
- if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
- {
- printf("%s (PostgreSQL) " PG_VERSION "\n", fixed_progname);
- exit(0);
- }
- }
-}
-
-
/*
* Split TABLE[(COLUMNS)] into TABLE and [(COLUMNS)] portions. When you
* finish using them, pg_free(*table). *columns is a pointer into "spec",
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index 54e6575a7b..ddd8f35274 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -15,12 +15,6 @@
#include "libpq-fe.h"
#include "pqexpbuffer.h" /* pgrminclude ignore */
-typedef void (*help_handler) (const char *progname);
-
-extern void handle_help_version_opts(int argc, char *argv[],
- const char *fixed_progname,
- help_handler hlp);
-
extern void splitTableColumnsSpec(const char *spec, int encoding,
char **table, const char **columns);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index abf21d4942..041454f075 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -13,6 +13,7 @@
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index 47b0e28bc6..ef7e0e549f 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -14,6 +14,7 @@
#include "common.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/dropdb.c b/src/bin/scripts/dropdb.c
index ba0dcdecb9..b154ed1bb6 100644
--- a/src/bin/scripts/dropdb.c
+++ b/src/bin/scripts/dropdb.c
@@ -13,6 +13,7 @@
#include "postgres_fe.h"
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/dropuser.c b/src/bin/scripts/dropuser.c
index ff5b455ae5..61b8557bc7 100644
--- a/src/bin/scripts/dropuser.c
+++ b/src/bin/scripts/dropuser.c
@@ -14,6 +14,7 @@
#include "common.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/pg_isready.c b/src/bin/scripts/pg_isready.c
index ceb8a09b4c..fc6f7b0a93 100644
--- a/src/bin/scripts/pg_isready.c
+++ b/src/bin/scripts/pg_isready.c
@@ -12,6 +12,7 @@
#include "postgres_fe.h"
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#define DEFAULT_CONNECT_TIMEOUT "3"
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index af0cc2bb00..a0f9592ee9 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -16,6 +16,7 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/parallel_slot.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 10ab894f10..b7634a1ecd 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -18,6 +18,7 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/parallel_slot.h"
#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index bd499e6045..8167adb225 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -26,6 +26,7 @@ OBJS = \
connect_utils.o \
exit_utils.o \
mbprint.o \
+ option_utils.o \
parallel_slot.o \
pgreshandler.o \
print.o \
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
new file mode 100644
index 0000000000..97aca1f02b
--- /dev/null
+++ b/src/fe_utils/option_utils.c
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command line option processing facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/option_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "fe_utils/option_utils.h"
+
+/*
+ * Provide strictly harmonized handling of --help and --version
+ * options.
+ */
+void
+handle_help_version_opts(int argc, char *argv[],
+ const char *fixed_progname, help_handler hlp)
+{
+ if (argc > 1)
+ {
+ if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+ {
+ hlp(get_progname(argv[0]));
+ exit(0);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ printf("%s (PostgreSQL) " PG_VERSION "\n", fixed_progname);
+ exit(0);
+ }
+ }
+}
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
new file mode 100644
index 0000000000..ef6eb24ae0
--- /dev/null
+++ b/src/include/fe_utils/option_utils.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command line option processing facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/option_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OPTION_UTILS_H
+#define OPTION_UTILS_H
+
+#include "postgres_fe.h"
+
+typedef void (*help_handler) (const char *progname);
+
+extern void handle_help_version_opts(int argc, char *argv[],
+ const char *fixed_progname,
+ help_handler hlp);
+
+#endif /* OPTION_UTILS_H */
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 4385962c7c..f3d8c1faf4 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -147,7 +147,7 @@ sub mkvcbuild
our @pgcommonbkndfiles = @pgcommonallfiles;
our @pgfeutilsfiles = qw(
- archive.c cancel.c conditional.c exit_utils.c mbprint.c pgreshandler.c print.c psqlscan.l
+ archive.c cancel.c conditional.c exit_utils.c mbprint.c option_utils.c pgreshandler.c print.c psqlscan.l
psqlscan.c simple_list.c string_utils.c recovery_gen.c);
$libpgport = $solution->AddProject('libpgport', 'lib', 'misc');
--
2.21.1 (Apple Git-122.3)
v33-0007-Refactoring-processSQLNamePattern.patchapplication/octet-stream; name=v33-0007-Refactoring-processSQLNamePattern.patch; x-unix-mode=0644Download
From a740ffddcdaa60f1d98932096ae51dc24abf5ddc Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 30 Dec 2020 13:29:20 -0800
Subject: [PATCH v33 7/8] Refactoring processSQLNamePattern.
Factoring out logic which transforms shell-style patterns into SQL
style regexp format from inside processSQLNamePattern into a
separate new function "patternToSQLRegex". The interface and
semantics of processSQLNamePattern are unchanged.
The motivation for the refactoring is that processSQLNamePattern
mixes the job of transforming the pattern with the job of
constructing a where-clause based on a single pattern, which makes
the code hard to reuse from other places.
The new helper function patternToSQLRegex can handle parsing of
patterns of the form "database.schema.relation", "schema.relation",
and "relation". The three-part form is unused in this commit, as
the pre-existing patternToSQLRegex function ignores the dbname
functionality, and there are not yet any other callers. The
three-part form will be used by pg_amcheck, not yet committed, to
allow specifying on the command line the inclusion and exclusion of
relations spanning multiple databases.
---
src/fe_utils/string_utils.c | 253 +++++++++++++++++-----------
src/include/fe_utils/string_utils.h | 4 +
2 files changed, 163 insertions(+), 94 deletions(-)
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index a1a9d691d5..2b4a818af5 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -831,10 +831,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
PQExpBufferData schemabuf;
PQExpBufferData namebuf;
- int encoding = PQclientEncoding(conn);
- bool inquotes;
- const char *cp;
- int i;
bool added_clause = false;
#define WHEREAND() \
@@ -856,98 +852,12 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
initPQExpBuffer(&namebuf);
/*
- * Parse the pattern, converting quotes and lower-casing unquoted letters.
- * Also, adjust shell-style wildcard characters into regexp notation.
- *
- * We surround the pattern with "^(...)$" to force it to match the whole
- * string, as per SQL practice. We have to have parens in case the string
- * contains "|", else the "^" and "$" will be bound into the first and
- * last alternatives which is not what we want.
- *
* Note: the result of this pass is the actual regexp pattern(s) we want
* to execute. Quoting/escaping into SQL literal format will be done
* below using appendStringLiteralConn().
*/
- appendPQExpBufferStr(&namebuf, "^(");
-
- inquotes = false;
- cp = pattern;
-
- while (*cp)
- {
- char ch = *cp;
-
- if (ch == '"')
- {
- if (inquotes && cp[1] == '"')
- {
- /* emit one quote, stay in inquotes mode */
- appendPQExpBufferChar(&namebuf, '"');
- cp++;
- }
- else
- inquotes = !inquotes;
- cp++;
- }
- else if (!inquotes && isupper((unsigned char) ch))
- {
- appendPQExpBufferChar(&namebuf,
- pg_tolower((unsigned char) ch));
- cp++;
- }
- else if (!inquotes && ch == '*')
- {
- appendPQExpBufferStr(&namebuf, ".*");
- cp++;
- }
- else if (!inquotes && ch == '?')
- {
- appendPQExpBufferChar(&namebuf, '.');
- cp++;
- }
- else if (!inquotes && ch == '.')
- {
- /* Found schema/name separator, move current pattern to schema */
- resetPQExpBuffer(&schemabuf);
- appendPQExpBufferStr(&schemabuf, namebuf.data);
- resetPQExpBuffer(&namebuf);
- appendPQExpBufferStr(&namebuf, "^(");
- cp++;
- }
- else if (ch == '$')
- {
- /*
- * Dollar is always quoted, whether inside quotes or not. The
- * reason is that it's allowed in SQL identifiers, so there's a
- * significant use-case for treating it literally, while because
- * we anchor the pattern automatically there is no use-case for
- * having it possess its regexp meaning.
- */
- appendPQExpBufferStr(&namebuf, "\\$");
- cp++;
- }
- else
- {
- /*
- * Ordinary data character, transfer to pattern
- *
- * Inside double quotes, or at all times if force_escape is true,
- * quote regexp special characters with a backslash to avoid
- * regexp errors. Outside quotes, however, let them pass through
- * as-is; this lets knowledgeable users build regexp expressions
- * that are more powerful than shell-style patterns.
- */
- if ((inquotes || force_escape) &&
- strchr("|*+?()[]{}.^$\\", ch))
- appendPQExpBufferChar(&namebuf, '\\');
- i = PQmblen(cp, encoding);
- while (i-- && *cp)
- {
- appendPQExpBufferChar(&namebuf, *cp);
- cp++;
- }
- }
- }
+ patternToSQLRegex(PQclientEncoding(conn), NULL, &schemabuf, &namebuf,
+ pattern, force_escape);
/*
* Now decide what we need to emit. We may run under a hostile
@@ -964,7 +874,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
/* We have a name pattern, so constrain the namevar(s) */
- appendPQExpBufferStr(&namebuf, ")$");
/* Optimize away a "*" pattern */
if (strcmp(namebuf.data, "^(.*)$") != 0)
{
@@ -999,7 +908,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
/* We have a schema pattern, so constrain the schemavar */
- appendPQExpBufferStr(&schemabuf, ")$");
/* Optimize away a "*" pattern */
if (strcmp(schemabuf.data, "^(.*)$") != 0 && schemavar)
{
@@ -1027,3 +935,160 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
return added_clause;
#undef WHEREAND
}
+
+/*
+ * Transform a possibly qualified shell-style object name pattern into up to
+ * three SQL-style regular expressions, converting quotes, lower-casing
+ * unquoted letters, and adjusting shell-style wildcard characters into regexp
+ * notation.
+ *
+ * If the dbnamebuf and schemabuf arguments are non-NULL, and the pattern
+ * contains two or more dbname/schema/name separators, we parse the portions of
+ * the pattern prior to the first and second separators into dbnamebuf and
+ * schemabuf, and the rest into namebuf. (Additional dots in the name portion
+ * are not treated as special.)
+ *
+ * If dbnamebuf is NULL and schemabuf is non-NULL, and the pattern contains at
+ * least one separator, we parse the first portion into schemabuf and the rest
+ * into namebuf.
+ *
+ * Otherwise, we parse all the pattern into namebuf.
+ *
+ * We surround the regexps with "^(...)$" to force them to match whole strings,
+ * as per SQL practice. We have to have parens in case strings contain "|",
+ * else the "^" and "$" will be bound into the first and last alternatives
+ * which is not what we want.
+ *
+ * The regexps we parse into the buffers are appended to the data (if any)
+ * already present. If we parse fewer fields than the number of buffers we
+ * were given, the extra buffers are unaltered.
+ */
+void
+patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf, PQExpBuffer schemabuf,
+ PQExpBuffer namebuf, const char *pattern, bool force_escape)
+{
+ PQExpBufferData buf[3];
+ PQExpBuffer curbuf;
+ PQExpBuffer maxbuf;
+ int i;
+ bool inquotes;
+ const char *cp;
+
+ Assert(pattern != NULL);
+ Assert(namebuf != NULL);
+
+ /* callers should never expect "dbname.relname" format */
+ Assert(dbnamebuf == NULL || schemabuf != NULL);
+
+ inquotes = false;
+ cp = pattern;
+
+ if (dbnamebuf != NULL)
+ maxbuf = buf + 2;
+ else if (schemabuf != NULL)
+ maxbuf = buf + 1;
+ else
+ maxbuf = buf;
+
+ curbuf = buf;
+ initPQExpBuffer(curbuf);
+ appendPQExpBufferStr(curbuf, "^(");
+ while (*cp)
+ {
+ char ch = *cp;
+
+ if (ch == '"')
+ {
+ if (inquotes && cp[1] == '"')
+ {
+ /* emit one quote, stay in inquotes mode */
+ appendPQExpBufferChar(curbuf, '"');
+ cp++;
+ }
+ else
+ inquotes = !inquotes;
+ cp++;
+ }
+ else if (!inquotes && isupper((unsigned char) ch))
+ {
+ appendPQExpBufferChar(curbuf,
+ pg_tolower((unsigned char) ch));
+ cp++;
+ }
+ else if (!inquotes && ch == '*')
+ {
+ appendPQExpBufferStr(curbuf, ".*");
+ cp++;
+ }
+ else if (!inquotes && ch == '?')
+ {
+ appendPQExpBufferChar(curbuf, '.');
+ cp++;
+ }
+ /*
+ * When we find a dbname/schema/name separator, we treat it specially
+ * only if the caller requested more patterns to be parsed than we have
+ * already parsed from the pattern. Otherwise, dot characters are not
+ * special.
+ */
+ else if (!inquotes && ch == '.' && curbuf < maxbuf)
+ {
+ appendPQExpBufferStr(curbuf, ")$");
+ curbuf++;
+ initPQExpBuffer(curbuf);
+ appendPQExpBufferStr(curbuf, "^(");
+ cp++;
+ }
+ else if (ch == '$')
+ {
+ /*
+ * Dollar is always quoted, whether inside quotes or not. The
+ * reason is that it's allowed in SQL identifiers, so there's a
+ * significant use-case for treating it literally, while because
+ * we anchor the pattern automatically there is no use-case for
+ * having it possess its regexp meaning.
+ */
+ appendPQExpBufferStr(curbuf, "\\$");
+ cp++;
+ }
+ else
+ {
+ /*
+ * Ordinary data character, transfer to pattern
+ *
+ * Inside double quotes, or at all times if force_escape is true,
+ * quote regexp special characters with a backslash to avoid
+ * regexp errors. Outside quotes, however, let them pass through
+ * as-is; this lets knowledgeable users build regexp expressions
+ * that are more powerful than shell-style patterns.
+ */
+ if ((inquotes || force_escape) &&
+ strchr("|*+?()[]{}.^$\\", ch))
+ appendPQExpBufferChar(curbuf, '\\');
+ i = PQmblen(cp, encoding);
+ while (i-- && *cp)
+ {
+ appendPQExpBufferChar(curbuf, *cp);
+ cp++;
+ }
+ }
+ }
+ appendPQExpBufferStr(curbuf, ")$");
+
+ appendPQExpBufferStr(namebuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+
+ if (curbuf > buf)
+ {
+ curbuf--;
+ appendPQExpBufferStr(schemabuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+
+ if (curbuf > buf)
+ {
+ curbuf--;
+ appendPQExpBufferStr(namebuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+ }
+ }
+}
diff --git a/src/include/fe_utils/string_utils.h b/src/include/fe_utils/string_utils.h
index c290c302f5..caafb97d29 100644
--- a/src/include/fe_utils/string_utils.h
+++ b/src/include/fe_utils/string_utils.h
@@ -56,4 +56,8 @@ extern bool processSQLNamePattern(PGconn *conn, PQExpBuffer buf,
const char *schemavar, const char *namevar,
const char *altnamevar, const char *visibilityrule);
+extern void patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf,
+ PQExpBuffer schemabuf, PQExpBuffer namebuf,
+ const char *pattern, bool force_escape);
+
#endif /* STRING_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v33-0008-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v33-0008-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From bda1f3a0bd35c1762e47aaef8381f67ba241ba24 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 24 Jan 2021 13:42:24 -0800
Subject: [PATCH v33 8/8] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 29 +
contrib/pg_amcheck/pg_amcheck.c | 1380 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.h | 130 ++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 59 +
contrib/pg_amcheck/t/003_check.pl | 428 ++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 496 +++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 1004 ++++++++++++++
src/tools/msvc/Install.pm | 4 +-
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 3 +
16 files changed, 3601 insertions(+), 5 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.h
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 7a4866e338..0fd4125902 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..c21a14de31
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..bc61ee7970
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,29 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+SHLIB_PREREQS = submake-libpq
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..843a47b5c3
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1380 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_class.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "getopt_long.h" /* pgrminclude ignore */
+#include "libpq-fe.h"
+#include "pg_amcheck.h"
+#include "pqexpbuffer.h" /* pgrminclude ignore */
+#include "storage/block.h"
+
+/* Keep this order by CheckType */
+static const CheckTypeFilter ctfilter[] = {
+ {
+ .relam = HEAP_TABLE_AM_OID,
+ .relkinds = CppAsString2(RELKIND_RELATION) ","
+ CppAsString2(RELKIND_MATVIEW) ","
+ CppAsString2(RELKIND_TOASTVALUE),
+ .typname = "heap"
+ },
+ {
+ .relam = BTREE_AM_OID,
+ .relkinds = CppAsString2(RELKIND_INDEX),
+ .typname = "btree index"
+ }
+};
+
+int
+main(int argc, char *argv[])
+{
+ static struct option long_options[] = {
+ /* Connection options */
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"maintenance-db", required_argument, NULL, 1},
+
+ /* check options */
+ {"all", no_argument, NULL, 'a'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"exclude-dbname", required_argument, NULL, 'D'},
+ {"echo", no_argument, NULL, 'e'},
+ {"heapallindexed", no_argument, NULL, 'H'},
+ {"index", required_argument, NULL, 'i'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"jobs", required_argument, NULL, 'j'},
+ {"quiet", no_argument, NULL, 'q'},
+ {"relation", required_argument, NULL, 'r'},
+ {"exclude-relation", required_argument, NULL, 'R'},
+ {"schema", required_argument, NULL, 's'},
+ {"exclude-schema", required_argument, NULL, 'S'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"parent-check", no_argument, NULL, 'P'},
+ {"exclude-indexes", no_argument, NULL, 2},
+ {"exclude-toast", no_argument, NULL, 3},
+ {"exclude-toast-pointers", no_argument, NULL, 4},
+ {"on-error-stop", no_argument, NULL, 5},
+ {"skip", required_argument, NULL, 6},
+ {"startblock", required_argument, NULL, 7},
+ {"endblock", required_argument, NULL, 8},
+ {"rootdescend", no_argument, NULL, 9},
+ {"no-dependents", no_argument, NULL, 10},
+ {"verbose", no_argument, NULL, 'v'},
+
+ {NULL, 0, NULL, 0}
+ };
+
+ const char *progname;
+ int optindex;
+ int c;
+
+ const char *maintenance_db = NULL;
+ const char *connect_db = NULL;
+ const char *host = NULL;
+ const char *port = NULL;
+ const char *username = NULL;
+ enum trivalue prompt_password = TRI_DEFAULT;
+ ConnParams cparams;
+
+ amcheckOptions checkopts = {
+ .alldb = false,
+ .echo = false,
+ .quiet = false,
+ .dependents = true,
+ .no_indexes = false,
+ .on_error_stop = false,
+ .parent_check = false,
+ .rootdescend = false,
+ .heapallindexed = false,
+ .exclude_toast = false,
+ .reconcile_toast = true,
+ .skip = "none",
+ .jobs = -1,
+ .startblock = -1,
+ .endblock = -1
+ };
+
+ amcheckObjects objects = {
+ .dbnames = {NULL, NULL},
+ .schemas = {NULL, NULL},
+ .tables = {NULL, NULL},
+ .indexes = {NULL, NULL},
+ .exclude_dbnames = {NULL, NULL},
+ .exclude_schemas = {NULL, NULL},
+ .exclude_tables = {NULL, NULL},
+ .exclude_indexes = {NULL, NULL}
+ };
+
+ pg_logging_init(argv[0]);
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("contrib"));
+
+ handle_help_version_opts(argc, argv, progname, help);
+
+ /* process command-line options */
+ while ((c = getopt_long(argc, argv, "ad:D:eh:Hi:I:j:p:Pqr:R:s:S:t:T:U:wWv",
+ long_options, &optindex)) != -1)
+ {
+ char *endptr;
+
+ switch (c)
+ {
+ case 'a':
+ checkopts.alldb = true;
+ break;
+ case 'd':
+ simple_string_list_append(&objects.dbnames, optarg);
+ break;
+ case 'D':
+ simple_string_list_append(&objects.exclude_dbnames, optarg);
+ break;
+ case 'e':
+ checkopts.echo = true;
+ break;
+ case 'h':
+ host = pg_strdup(optarg);
+ break;
+ case 'H':
+ checkopts.heapallindexed = true;
+ break;
+ case 'i':
+ simple_string_list_append(&objects.indexes, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&objects.exclude_indexes, optarg);
+ break;
+ case 'j':
+ checkopts.jobs = atoi(optarg);
+ if (checkopts.jobs <= 0)
+ {
+ pg_log_error("number of parallel jobs must be at least 1");
+ exit(1);
+ }
+ break;
+ case 'p':
+ port = pg_strdup(optarg);
+ break;
+ case 'P':
+ checkopts.parent_check = true;
+ break;
+ case 'q':
+ checkopts.quiet = true;
+ break;
+ case 'r':
+ simple_string_list_append(&objects.indexes, optarg);
+ simple_string_list_append(&objects.tables, optarg);
+ break;
+ case 'R':
+ simple_string_list_append(&objects.exclude_tables, optarg);
+ simple_string_list_append(&objects.exclude_indexes, optarg);
+ break;
+ case 's':
+ simple_string_list_append(&objects.schemas, optarg);
+ break;
+ case 'S':
+ simple_string_list_append(&objects.exclude_schemas, optarg);
+ break;
+ case 't':
+ simple_string_list_append(&objects.tables, optarg);
+ break;
+ case 'T':
+ simple_string_list_append(&objects.exclude_tables, optarg);
+ break;
+ case 'U':
+ username = pg_strdup(optarg);
+ break;
+ case 'w':
+ prompt_password = TRI_NO;
+ break;
+ case 'W':
+ prompt_password = TRI_YES;
+ break;
+ case 'v':
+ pg_logging_increase_verbosity();
+ break;
+ case 1:
+ maintenance_db = pg_strdup(optarg);
+ break;
+ case 2:
+ checkopts.no_indexes = true;
+ break;
+ case 3:
+ checkopts.exclude_toast = true;
+ break;
+ case 4:
+ checkopts.reconcile_toast = false;
+ break;
+ case 5:
+ checkopts.on_error_stop = true;
+ break;
+ case 6:
+ if (pg_strcasecmp(optarg, "all-visible") == 0)
+ checkopts.skip = "all visible";
+ else if (pg_strcasecmp(optarg, "all-frozen") == 0)
+ checkopts.skip = "all frozen";
+ else
+ {
+ fprintf(stderr, _("invalid skip options"));
+ exit(1);
+ }
+ break;
+ case 7:
+ checkopts.startblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ _("relation starting block argument contains garbage characters"));
+ exit(1);
+ }
+ if (checkopts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ _("relation starting block argument out of bounds"));
+ exit(1);
+ }
+ break;
+ case 8:
+ checkopts.endblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ _("relation ending block argument contains garbage characters"));
+ exit(1);
+ }
+ if (checkopts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ _("relation ending block argument out of bounds"));
+ exit(1);
+ }
+ break;
+ case 9:
+ checkopts.rootdescend = true;
+ checkopts.parent_check = true;
+ break;
+ case 10:
+ checkopts.dependents = false;
+ break;
+ default:
+ fprintf(stderr,
+ _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ }
+
+ if (checkopts.endblock >= 0 && checkopts.endblock < checkopts.startblock)
+ {
+ pg_log_error("relation ending block argument precedes starting block argument");
+ exit(1);
+ }
+
+ /* non-option arguments specify database names */
+ while (optind < argc)
+ {
+ if (connect_db == NULL)
+ connect_db = argv[optind];
+ simple_string_list_append(&objects.dbnames, argv[optind]);
+ optind++;
+ }
+
+ /* fill cparams except for dbname, which is set below */
+ cparams.pghost = host;
+ cparams.pgport = port;
+ cparams.pguser = username;
+ cparams.prompt_password = prompt_password;
+ cparams.override_dbname = NULL;
+
+ setup_cancel_handler(NULL);
+
+ /* choose the database for our initial connection */
+ if (maintenance_db)
+ cparams.dbname = maintenance_db;
+ else if (connect_db != NULL)
+ cparams.dbname = connect_db;
+ else if (objects.dbnames.head != NULL)
+ cparams.dbname = objects.dbnames.head->val;
+ else
+ {
+ const char *default_db;
+
+ if (getenv("PGDATABASE"))
+ default_db = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ default_db = getenv("PGUSER");
+ else
+ default_db = get_user_name_or_exit(progname);
+
+ if (objects.dbnames.head == NULL)
+ simple_string_list_append(&objects.dbnames, default_db);
+
+ cparams.dbname = default_db;
+ }
+
+ /*
+ * Any positive table or index pattern given in the arguments that is
+ * fully-qualified (including database) adds to the set of databases to be
+ * processed. Table and index exclusion patterns do not add to the set.
+ *
+ * We do this only after setting cparams.dbname, above, as we don't want
+ * any of these to be used for the initial connection. Beyond wanting to
+ * avoid surprising users, we also must be wary that these may be database
+ * patterns like "db*" rather than literal database names.
+ *
+ * This process may result in the same database name (or pattern) in the
+ * list multiple times, but we don't care. Their presence in the list
+ * multiple times will not result in multiple iterations over the same
+ * database.
+ */
+ append_dbnames(&objects.dbnames, &objects.tables);
+ append_dbnames(&objects.dbnames, &objects.indexes);
+
+ check_each_database(&cparams, &objects, &checkopts, progname);
+
+ exit(0);
+}
+
+/*
+ * check_each_database
+ *
+ * Connects to the initial database and resolves a list of all databases that
+ * should be checked per the user supplied options. Sequentially checks each
+ * database in the list.
+ *
+ * cparams: parameters for the initial database connection
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ */
+static void
+check_each_database(ConnParams *cparams, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname)
+{
+ PGconn *conn;
+ PGresult *databases;
+ PQExpBufferData sql;
+ int ntups;
+ int i;
+
+ conn = connectMaintenanceDatabase(cparams, progname, checkopts->echo);
+
+ initPQExpBuffer(&sql);
+ dbname_select(conn, &sql, &objects->dbnames, checkopts->alldb);
+ appendPQExpBufferStr(&sql, "\nEXCEPT");
+ dbname_select(conn, &sql, &objects->exclude_dbnames, false);
+ executeCommand(conn, "RESET search_path;", checkopts->echo);
+ databases = executeQuery(conn, sql.data, checkopts->echo);
+ pgres_default_handler(databases, conn, PGRES_TUPLES_OK, -1, sql.data);
+ termPQExpBuffer(&sql);
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, checkopts->echo));
+ PQfinish(conn);
+
+ ntups = PQntuples(databases);
+ if (ntups == 0 && !checkopts->quiet)
+ printf(_("%s: no databases to check\n"), progname);
+
+ for (i = 0; i < ntups; i++)
+ {
+ cparams->override_dbname = PQgetvalue(databases, i, 0);
+ check_one_database(cparams, objects, checkopts, progname);
+ }
+
+ PQclear(databases);
+}
+
+/*
+ * check_one_database
+ *
+ * Connects the this next database and checks all appropriate relations.
+ *
+ * cparams: parameters for this next database connection
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ */
+static void
+check_one_database(const ConnParams *cparams, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname)
+{
+ PQExpBufferData sql;
+ PGconn *conn;
+ PGresult *checkable_relations;
+ ParallelSlot *slots;
+ int ntups;
+ int i;
+ int parallel_workers;
+ bool inclusive;
+ bool failed = false;
+
+ conn = connectDatabase(cparams, progname, checkopts->echo, false, true);
+
+ if (!checkopts->quiet)
+ {
+ printf(_("%s: checking database \"%s\"\n"),
+ progname, PQdb(conn));
+ fflush(stdout);
+ }
+
+ /*
+ * If we were given no tables nor indexes to check, then we select all
+ * targets not excluded. Otherwise, we select only the targets that we
+ * were given.
+ */
+ inclusive = objects->tables.head == NULL &&
+ objects->indexes.head == NULL;
+
+ initPQExpBuffer(&sql);
+ target_select(conn, &sql, objects, checkopts, progname, inclusive);
+ executeCommand(conn, "RESET search_path;", checkopts->echo);
+ checkable_relations = executeQuery(conn, sql.data, checkopts->echo);
+ pgres_default_handler(checkable_relations, conn, PGRES_TUPLES_OK, -1, sql.data);
+ termPQExpBuffer(&sql);
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, checkopts->echo));
+
+ /*
+ * If no rows are returned, there are no matching relations, so we are
+ * done.
+ */
+ ntups = PQntuples(checkable_relations);
+ if (ntups == 0)
+ {
+ PQclear(checkable_relations);
+ PQfinish(conn);
+ return;
+ }
+
+ /*
+ * Ensure parallel_workers is sane. If there are more connections than
+ * relations to be checked, we don't need to use them all.
+ */
+ parallel_workers = checkopts->jobs;
+ if (parallel_workers > ntups)
+ parallel_workers = ntups;
+ if (parallel_workers <= 0)
+ parallel_workers = 1;
+
+ /*
+ * Setup the database connections. We reuse the connection we already have
+ * for the first slot. If not in parallel mode, the first slot in the
+ * array contains the connection.
+ */
+ slots = ParallelSlotsSetup(cparams, progname, checkopts->echo, conn,
+ parallel_workers);
+
+ initPQExpBuffer(&sql);
+
+ for (i = 0; i < ntups; i++)
+ {
+ ParallelSlot *free_slot;
+
+ CheckType checktype = atoi(PQgetvalue(checkable_relations, i, 0));
+ Oid reloid = atooid(PQgetvalue(checkable_relations, i, 1));
+
+ if (CancelRequested)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ free_slot = ParallelSlotsGetIdle(slots, parallel_workers);
+ if (!free_slot)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ switch (checktype)
+ {
+ /* heapam types */
+ case CT_TABLE:
+ prepare_table_command(&sql, checkopts, reloid);
+ ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler,
+ PGRES_TUPLES_OK, -1, sql.data);
+ run_command(free_slot->connection, sql.data, checkopts, reloid,
+ ctfilter[checktype].typname);
+ break;
+
+ /* btreeam types */
+ case CT_BTREE:
+ prepare_btree_command(&sql, checkopts, reloid);
+ ParallelSlotSetHandler(free_slot, VerifyBtreeSlotHandler,
+ PGRES_TUPLES_OK, -1, sql.data);
+ run_command(free_slot->connection, sql.data, checkopts, reloid,
+ ctfilter[checktype].typname);
+ break;
+
+ /* intentionally no default here */
+ }
+ }
+
+ if (!ParallelSlotsWaitCompletion(slots, parallel_workers))
+ failed = true;
+
+finish:
+ ParallelSlotsTerminate(slots, parallel_workers);
+ pg_free(slots);
+
+ termPQExpBuffer(&sql);
+
+ if (failed)
+ exit(1);
+}
+
+/*
+ * prepare_table_command
+ *
+ * Creates a SQL command for running amcheck checking on the given heap
+ * relation. The command is phrased as a SQL query, with column order and
+ * names matching the expectations of VerifyHeapamSlotHandler, which will
+ * receive and handle each row returned from the verify_heapam() function.
+ *
+ * sql: buffer into which the table checking command will be written
+ * checkopts: user supplied program options
+ * reloid: relation of the table to be checked
+ */
+static void
+prepare_table_command(PQExpBuffer sql, const amcheckOptions *checkopts,
+ Oid reloid)
+{
+ resetPQExpBuffer(sql);
+ appendPQExpBuffer(sql,
+ "SELECT n.nspname, c.relname, v.blkno, v.offnum, v.attnum, v.msg"
+ "\nFROM public.verify_heapam("
+ "\nrelation := %u,"
+ "\non_error_stop := %s,"
+ "\ncheck_toast := %s,"
+ "\nskip := '%s'",
+ reloid,
+ checkopts->on_error_stop ? "true" : "false",
+ checkopts->reconcile_toast ? "true" : "false",
+ checkopts->skip);
+ if (checkopts->startblock >= 0)
+ appendPQExpBuffer(sql, ",\nstartblock := %ld", checkopts->startblock);
+ if (checkopts->endblock >= 0)
+ appendPQExpBuffer(sql, ",\nendblock := %ld", checkopts->endblock);
+ appendPQExpBuffer(sql, "\n) v,"
+ "\npg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE c.oid OPERATOR(pg_catalog.=) %u",
+ reloid);
+}
+
+/*
+ * prepare_btree_command
+ *
+ * Creates a SQL command for running amcheck checking on the given btree index
+ * relation. The command does not select any columns, as btree checking
+ * functions do not return any, but rather return corruption information by
+ * raising errors, which VerifyBtreeSlotHandler expects.
+ *
+ * Which check to peform is controlled by checkopts.
+ *
+ * sql: buffer into which the table checking command will be written
+ * checkopts: user supplied program options
+ * reloid: relation of the table to be checked
+ */
+static void
+prepare_btree_command(PQExpBuffer sql, const amcheckOptions *checkopts,
+ Oid reloid)
+{
+ resetPQExpBuffer(sql);
+ if (checkopts->parent_check)
+ appendPQExpBuffer(sql,
+ "SELECT public.bt_index_parent_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s,"
+ "\nrootdescend := %s)",
+ reloid,
+ (checkopts->heapallindexed ? "true" : "false"),
+ (checkopts->rootdescend ? "true" : "false"));
+ else
+ appendPQExpBuffer(sql,
+ "SELECT public.bt_index_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s)",
+ reloid,
+ (checkopts->heapallindexed ? "true" : "false"));
+}
+
+/*
+ * run_command
+ *
+ * Sends a command to the server without waiting for the command to complete.
+ * Logs an error if the command cannot be sent, but otherwise any errors are
+ * expected to be handled by a ParallelSlotHandler.
+ *
+ * conn: connection to the server associated with the slot to use
+ * sql: query to send
+ * checkopts: user supplied program options
+ * reloid: oid of the object being checked, for error reporting
+ * typ: type of object being checked, for error reporting
+ */
+static void
+run_command(PGconn *conn, const char *sql, const amcheckOptions *checkopts,
+ Oid reloid, const char *typ)
+{
+ bool status;
+
+ if (checkopts->echo)
+ printf("%s\n", sql);
+
+ status = PQsendQuery(conn, sql) == 1;
+
+ if (!status)
+ {
+ pg_log_error("check of %s with id %u in database \"%s\" failed: %s",
+ typ, reloid, PQdb(conn), PQerrorMessage(conn));
+ pg_log_error("command was: %s", sql);
+ }
+}
+
+/*
+ * VerifyHeapamSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a table checking command
+ * created by prepare_table_command and outputs them for the user.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * expected_status: not used
+ * expected_ntups: not used
+ * query: the query string that was executed, or error reporting
+ */
+static PGresult *
+VerifyHeapamSlotHandler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status, int expected_ntups,
+ const char *query)
+{
+ int ntups = PQntuples(res);
+
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ int i;
+
+ for (i = 0; i < ntups; i++)
+ {
+ if (!PQgetisnull(res, i, 4))
+ printf("relation %s.%s, block %s, offset %s, attribute %s\n %s\n",
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ PQgetvalue(res, i, 4), /* attnum */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 3))
+ printf("relation %s.%s, block %s, offset %s\n %s\n",
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 2))
+ printf("relation %s.%s, block %s\n %s\n",
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 1))
+ printf("relation %s.%s\n %s\n",
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ /* blkno is null: 2 */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else
+ printf("%s\n", PQgetvalue(res, i, 5)); /* msg */
+ }
+ }
+ else if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ printf("%s\n", PQerrorMessage(conn));
+ printf("query was: %s\n", query);
+ }
+
+ return res;
+}
+
+/*
+ * VerifyBtreeSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a btree checking command
+ * created by prepare_btree_command and outputs them for the user. The results
+ * from the btree checking command is assumed to be empty, but when the results
+ * are an error code, the useful information about the corruption is expected
+ * in the connection's error message.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * expected_status: not used
+ * expected_ntups: not used
+ * query: not used
+ */
+static PGresult *
+VerifyBtreeSlotHandler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status, int expected_ntups,
+ const char *query)
+{
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ printf("%s\n", PQerrorMessage(conn));
+ return res;
+}
+
+/*
+ * help
+ *
+ * Prints help page for the program
+ *
+ * progname: the name of the executed program, such as "pg_amcheck"
+ */
+static void
+help(const char *progname)
+{
+ printf(_("%s checks objects in a PostgreSQL database for corruption.\n\n"), progname);
+ printf(_("Usage:\n"));
+ printf(_(" %s [OPTION]... [DBNAME]\n"), progname);
+ printf(_("\nTarget Options:\n"));
+ printf(_(" -a, --all check all databases\n"));
+ printf(_(" -d, --dbname=DBNAME check specific database(s)\n"));
+ printf(_(" -D, --exclude-dbname=DBNAME do NOT check specific database(s)\n"));
+ printf(_(" -i, --index=INDEX check specific index(es)\n"));
+ printf(_(" -I, --exclude-index=INDEX do NOT check specific index(es)\n"));
+ printf(_(" -r, --relation=RELNAME check specific relation(s)\n"));
+ printf(_(" -R, --exclude-relation=RELNAME do NOT check specific relation(s)\n"));
+ printf(_(" -s, --schema=SCHEMA check specific schema(s)\n"));
+ printf(_(" -S, --exclude-schema=SCHEMA do NOT check specific schema(s)\n"));
+ printf(_(" -t, --table=TABLE check specific table(s)\n"));
+ printf(_(" -T, --exclude-table=TABLE do NOT check specific table(s)\n"));
+ printf(_(" --exclude-indexes do NOT perform any index checking\n"));
+ printf(_(" --exclude-toast do NOT check any toast tables or indexes\n"));
+ printf(_(" --no-dependents do NOT automatically check dependent objects\n"));
+ printf(_("\nIndex Checking Options:\n"));
+ printf(_(" -H, --heapallindexed check all heap tuples are found within indexes\n"));
+ printf(_(" -P, --parent-check check parent/child relationships during index checking\n"));
+ printf(_(" --rootdescend search from root page to refind tuples at the leaf level\n"));
+ printf(_("\nTable Checking Options:\n"));
+ printf(_(" --exclude-toast-pointers do NOT check relation toast pointers against toast\n"));
+ printf(_(" --on-error-stop stop checking a relation at end of first corrupt page\n"));
+ printf(_(" --skip=OPTION do NOT check \"all-frozen\" or \"all-visible\" blocks\n"));
+ printf(_(" --startblock begin checking table(s) at the given starting block number\n"));
+ printf(_(" --endblock check table(s) only up to the given ending block number\n"));
+ printf(_("\nConnection options:\n"));
+ printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
+ printf(_(" -p, --port=PORT database server port\n"));
+ printf(_(" -U, --username=USERNAME user name to connect as\n"));
+ printf(_(" -w, --no-password never prompt for password\n"));
+ printf(_(" -W, --password force password prompt\n"));
+ printf(_(" --maintenance-db=DBNAME alternate maintenance database\n"));
+ printf(_("\nOther Options:\n"));
+ printf(_(" -e, --echo show the commands being sent to the server\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to the server\n"));
+ printf(_(" -q, --quiet don't write any messages\n"));
+ printf(_(" -v, --verbose write a lot of output\n"));
+ printf(_(" -V, --version output version information, then exit\n"));
+ printf(_(" -?, --help show this help, then exit\n"));
+
+ printf(_("\nRead the description of the amcheck contrib module for details.\n"));
+ printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+ printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
+
+/*
+ * append_dbname
+ *
+ * For each pattern in the patterns list, if it is in fully-qualified
+ * database.schema.name format, parse the database portion of the pattern and
+ * append it to the dbnames list. Patterns that are not fully-qualified are
+ * skipped over. No deduplication of dbnames is performed.
+ *
+ * dbnames: list to which parsed database patterns are appended
+ * patterns: list of all patterns to parse
+ */
+static void append_dbnames(SimpleStringList *dbnames,
+ const SimpleStringList *patterns)
+{
+ const SimpleStringListCell *cell;
+ PQExpBufferData dbnamebuf;
+ PQExpBufferData schemabuf;
+ PQExpBufferData namebuf;
+ int encoding = pg_get_encoding_from_locale(NULL, false);
+
+ initPQExpBuffer(&dbnamebuf);
+ initPQExpBuffer(&schemabuf);
+ initPQExpBuffer(&namebuf);
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /* parse the pattern as db.schema.relname, if possible */
+ patternToSQLRegex(encoding, &dbnamebuf, &schemabuf, &namebuf,
+ cell->val, false);
+
+ /* add the database name (or pattern), if any, to the list */
+ if (dbnamebuf.data[0])
+ simple_string_list_append(dbnames, dbnamebuf.data);
+
+ /* we do not use the schema or relname portions */
+
+ /* we may have dirtied the buffers */
+ resetPQExpBuffer(&dbnamebuf);
+ resetPQExpBuffer(&schemabuf);
+ resetPQExpBuffer(&namebuf);
+ }
+ termPQExpBuffer(&dbnamebuf);
+ termPQExpBuffer(&schemabuf);
+ termPQExpBuffer(&namebuf);
+}
+
+/*
+ * dbname_select
+ *
+ * Appends a statement which selects all databases matching the given patterns
+ *
+ * conn: connection to the initial database
+ * sql: buffer into which the constructed sql statement is appended
+ * patterns: list of database name patterns to match
+ * alldb: when true, select all databases which allow connections
+ */
+static void
+dbname_select(PGconn *conn, PQExpBuffer sql, const SimpleStringList *patterns,
+ bool alldb)
+{
+ SimpleStringListCell *cell;
+ const char *comma;
+ int encoding = PQclientEncoding(conn);
+
+ if (alldb)
+ {
+ appendPQExpBufferStr(sql, "\nSELECT datname::TEXT AS datname"
+ "\nFROM pg_database"
+ "\nWHERE datallowconn");
+ return;
+ }
+ else if (patterns->head == NULL)
+ {
+ appendPQExpBufferStr(sql, "\nSELECT ''::TEXT AS datname"
+ "\nWHERE false");
+ return;
+ }
+
+ appendPQExpBufferStr(sql, "\nSELECT datname::TEXT AS datname"
+ "\nFROM pg_database"
+ "\nWHERE datallowconn"
+ "\nAND datname::TEXT OPERATOR(pg_catalog.~) ANY(ARRAY[\n");
+ for (cell = patterns->head, comma = ""; cell; cell = cell->next, comma = ",\n")
+ {
+ PQExpBufferData regexbuf;
+
+ initPQExpBuffer(®exbuf);
+ patternToSQLRegex(encoding, NULL, NULL, ®exbuf, cell->val, false);
+ appendPQExpBufferStr(sql, comma);
+ appendStringLiteralConn(sql, regexbuf.data, conn);
+ appendPQExpBufferStr(sql, "::TEXT COLLATE pg_catalog.default");
+ termPQExpBuffer(®exbuf);
+ }
+ appendPQExpBufferStr(sql, "\n]::TEXT[])");
+}
+
+/*
+ * schema_select
+ *
+ * Appends a statement which selects all schemas matching the given patterns
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * fieldname: alias to use for the oid field within the created SELECT
+ * statement
+ * patterns: list of schema name patterns to match
+ * inclusive: when patterns is an empty list, whether the select statement
+ * should match all non-system schemas
+ */
+static void
+schema_select(PGconn *conn, PQExpBuffer sql, const char *fieldname,
+ const SimpleStringList *patterns, bool inclusive)
+{
+ SimpleStringListCell *cell;
+ const char *comma;
+ int encoding = PQclientEncoding(conn);
+
+ if (patterns->head == NULL)
+ {
+ if (!inclusive)
+ appendPQExpBuffer(sql, "\nSELECT 0::pg_catalog.oid AS %s WHERE false", fieldname);
+ else
+ appendPQExpBuffer(sql, "\nSELECT oid AS %s"
+ "\nFROM pg_catalog.pg_namespace"
+ "\nWHERE oid OPERATOR(pg_catalog.!=) pg_catalog.regnamespace('pg_catalog')"
+ "\nAND oid OPERATOR(pg_catalog.!=) pg_catalog.regnamespace('pg_toast')",
+ fieldname);
+ return;
+ }
+
+ appendPQExpBuffer(sql, "\nSELECT oid AS %s"
+ "\nFROM pg_catalog.pg_namespace"
+ "\nWHERE nspname OPERATOR(pg_catalog.~) ANY(ARRAY[\n",
+ fieldname);
+ for (cell = patterns->head, comma = ""; cell; cell = cell->next, comma = ",\n")
+ {
+ PQExpBufferData regexbuf;
+
+ initPQExpBuffer(®exbuf);
+ patternToSQLRegex(encoding, NULL, NULL, ®exbuf, cell->val, false);
+ appendPQExpBufferStr(sql, comma);
+ appendStringLiteralConn(sql, regexbuf.data, conn);
+ appendPQExpBufferStr(sql, "::TEXT COLLATE pg_catalog.default");
+ termPQExpBuffer(®exbuf);
+ }
+ appendPQExpBufferStr(sql, "\n]::TEXT[])");
+}
+
+/*
+ * schema_cte
+ *
+ * Appends a Common Table Expression (CTE) which selects all schemas to be
+ * checked, with the CTE and oid field named as requested. The CTE will select
+ * all schemas matching the include list except any schemas matching the
+ * exclude list.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * ctename: name of the schema CTE to be created
+ * fieldname: name of the oid field within the schema CTE to be created
+ * include: list of schema name patterns for inclusion
+ * exclude: list of schema name patterns for exclusion
+ * inclusive: when 'include' is an empty list, whether to use all schemas in
+ * the database in lieu of the include list.
+ */
+static void
+schema_cte(PGconn *conn, PQExpBuffer sql, const char *ctename,
+ const char *fieldname, const SimpleStringList *include,
+ const SimpleStringList *exclude, bool inclusive)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+ schema_select(conn, sql, fieldname, include, inclusive);
+ appendPQExpBufferStr(sql, "\nEXCEPT");
+ schema_select(conn, sql, fieldname, exclude, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * append_ctfilter_quals
+ *
+ * Appends quals to a buffer that restrict the rows selected from pg_class to
+ * only those which match the given checktype. No initial "WHERE" or "AND" is
+ * appended, nor do we surround our appended clauses in parens. The caller is
+ * assumed to take care of such matters.
+ *
+ * sql: buffer into which the constructed sql quals are appended
+ * relname: name (or alias) of pg_class in the surrounding query
+ * checktype: struct containing filter info
+ */
+static void
+append_ctfilter_quals(PQExpBuffer sql, const char *relname, CheckType checktype)
+{
+ appendPQExpBuffer(sql,
+ "%s.relam OPERATOR(pg_catalog.=) %u"
+ "\nAND %s.relkind OPERATOR(pg_catalog.=) ANY(ARRAY[%s])",
+ relname, ctfilter[checktype].relam,
+ relname, ctfilter[checktype].relkinds);
+}
+
+/*
+ * relation_select
+ *
+ * Appends a statement which selects the oid of all relations matching the
+ * given parameters. Expects a mixture of qualified and unqualified relation
+ * name patterns.
+ *
+ * For unqualified relation patterns, selects relations that match the relation
+ * name portion of the pattern which are in namespaces that are in the given
+ * namespace CTE.
+ *
+ * For qualified relation patterns, ignores the given namespace CTE and selects
+ * relations that match the relation name portion of the pattern which are in
+ * namespaces that match the schema portion of the pattern.
+ *
+ * For fully qualified relation patterns (database.schema.name), the pattern
+ * will be ignored unless the database portion of the pattern matches the name
+ * of the current database, as retrieved from conn.
+ *
+ * Only relations of the specified checktype will be selected.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * schemafield: name of the oid field within the schema CTE
+ * fieldname: alias to use for the oid field within the created SELECT
+ * statement
+ * patterns: list of (possibly qualified) relation name patterns to match
+ * checktype: the type of relation to select
+ * inclusive: when patterns is an empty list, whether the select statement
+ * should match all relations of the given type
+ */
+static void
+relation_select(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *schemafield, const char *fieldname,
+ const SimpleStringList *patterns, CheckType checktype,
+ bool inclusive)
+{
+ SimpleStringListCell *cell;
+ const char *comma = "";
+ const char *qor = "";
+ PQExpBufferData qualified;
+ PQExpBufferData unqualified;
+ PQExpBufferData dbnamebuf;
+ PQExpBufferData schemabuf;
+ PQExpBufferData namebuf;
+ int encoding = PQclientEncoding(conn);
+
+ if (patterns->head == NULL)
+ {
+ if (!inclusive)
+ appendPQExpBuffer(sql,
+ "\nSELECT 0::pg_catalog.oid AS %s WHERE false",
+ fieldname);
+ else
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT oid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN %s n"
+ "\nON n.%s OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE ",
+ fieldname, schemacte, schemafield);
+ append_ctfilter_quals(sql, "c", checktype);
+ }
+ return;
+ }
+
+ /*
+ * We have to distinguish between schema-qualified and unqualified relation
+ * patterns. The unqualified patterns need to be restricted by the list of
+ * schemas returned by the schema CTE, but not so for the qualified
+ * patterns.
+ *
+ * We treat fully-qualified relation patterns (database.schema.relation)
+ * like schema-qualified patterns except that we also require the database
+ * portion to match the current database name.
+ */
+ initPQExpBuffer(&qualified);
+ initPQExpBuffer(&unqualified);
+ initPQExpBuffer(&dbnamebuf);
+ initPQExpBuffer(&schemabuf);
+ initPQExpBuffer(&namebuf);
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ patternToSQLRegex(encoding, &dbnamebuf, &schemabuf, &namebuf,
+ cell->val, false);
+
+ if (schemabuf.data[0])
+ {
+ /* Qualified relation pattern */
+ appendPQExpBuffer(&qualified, "%s\n(", qor);
+
+ if (dbnamebuf.data[0])
+ {
+ /*
+ * Fully-qualified relation pattern. Require the database name
+ * of our connection to match the database portion of the
+ * relation pattern.
+ */
+ appendPQExpBufferStr(&qualified, "\n'");
+ appendStringLiteralConn(&qualified, PQdb(conn), conn);
+ appendPQExpBufferStr(&qualified,
+ "'::TEXT OPERATOR(pg_catalog.~) '");
+ appendStringLiteralConn(&qualified, dbnamebuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "'::TEXT COLLATE pg_catalog.default AND");
+ }
+
+ /*
+ * Require the namespace name to match the schema portion of the
+ * relation pattern and the relation name to match the relname
+ * portion of the relation pattern.
+ */
+ appendPQExpBufferStr(&qualified,
+ "\nn.nspname OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, schemabuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default AND"
+ "\nc.relname OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, namebuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default)");
+ qor = "\nOR";
+ }
+ else
+ {
+ /* Unqualified relation pattern */
+ appendPQExpBufferStr(&unqualified, comma);
+ appendStringLiteralConn(&unqualified, namebuf.data, conn);
+ appendPQExpBufferStr(&unqualified,
+ "::TEXT COLLATE pg_catalog.default");
+ comma = "\n, ";
+ }
+
+ resetPQExpBuffer(&dbnamebuf);
+ resetPQExpBuffer(&schemabuf);
+ resetPQExpBuffer(&namebuf);
+ }
+
+ if (qualified.data[0])
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT c.oid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE (",
+ fieldname);
+ appendPQExpBufferStr(sql, qualified.data);
+ appendPQExpBufferStr(sql, ")\nAND ");
+ append_ctfilter_quals(sql, "c", checktype);
+ if (unqualified.data[0])
+ appendPQExpBufferStr(sql, "\nUNION ALL");
+ }
+ if (unqualified.data[0])
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT c.oid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN %s ls"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) ls.%s"
+ "\nWHERE c.relname OPERATOR(pg_catalog.~) ANY(ARRAY[",
+ fieldname, schemacte, schemafield);
+ appendPQExpBufferStr(sql, unqualified.data);
+ appendPQExpBufferStr(sql, "\n]::TEXT[])\nAND ");
+ append_ctfilter_quals(sql, "c", checktype);
+ }
+}
+
+/*
+ * table_cte
+ *
+ * Appends to the buffer 'sql' a Common Table Expression (CTE) which selects
+ * all table relations matching the given filters.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * schemafield: name of the oid field within the schema CTE
+ * ctename: name of the table CTE to be created
+ * fieldname: name of the oid field within the table CTE to be created
+ * include: list of table name patterns for inclusion
+ * exclude: list of table name patterns for exclusion
+ * inclusive: when 'include' is an empty list, whether the select statement
+ * should match all relations
+ * toast: whether to also select the associated toast tables
+ */
+static void
+table_cte(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *schemafield, const char *ctename, const char *fieldname,
+ const SimpleStringList *include, const SimpleStringList *exclude,
+ bool inclusive, bool toast)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+
+ if (toast)
+ {
+ /*
+ * Compute the primary tables, then union on all associated toast
+ * tables. We depend on left to right evaluation of the UNION before
+ * the EXCEPT which gets added below. UNION and EXCEPT have equal
+ * precedence, so be careful if you rearrange this query.
+ */
+ appendPQExpBuffer(sql, "\nWITH primary_table AS (");
+ relation_select(conn, sql, schemacte, schemafield, fieldname, include,
+ CT_TABLE, inclusive);
+ appendPQExpBuffer(sql, "\n)"
+ "\nSELECT %s"
+ "\nFROM primary_table"
+ "\nUNION"
+ "\nSELECT c.reltoastrelid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN primary_table pt"
+ "\nON pt.%s OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE c.reltoastrelid OPERATOR(pg_catalog.!=) 0",
+ fieldname, fieldname, fieldname);
+ }
+ else
+ relation_select(conn, sql, schemacte, schemafield, fieldname, include,
+ CT_TABLE, inclusive);
+
+ appendPQExpBufferStr(sql, "\nEXCEPT");
+ relation_select(conn, sql, schemacte, schemafield, fieldname, exclude,
+ CT_TABLE, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * exclude_index_cte
+ * Appends a CTE which selects all indexes to be excluded
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql CTE is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * schemafield: name of the oid field within the schema CTE
+ * ctename: name of the index CTE to be created
+ * fieldname: name of the oid field within the index CTE to be created
+ * patterns: list of index name patterns to match
+ */
+static void
+exclude_index_cte(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *schemafield, const char *ctename,
+ const char *fieldname, const SimpleStringList *patterns)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+ relation_select(conn, sql, schemacte, schemafield, fieldname, patterns,
+ CT_BTREE, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * index_cte
+ * Appends a CTE which selects all indexes to be checked
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql CTE is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * schemafield: name of the oid field within the schema CTE
+ * ctename: name of the index CTE to be created
+ * fieldname: name of the oid field within the index CTE to be created
+ * excludecte: name of the CTE which contains all indexes to be excluded
+ * tablescte: optional; if automatically including indexes for checked tables,
+ * the name of the CTE which contains all tables to be checked
+ * tablesfield: if tablescte is not NULL, the name of the oid field in the
+ * tables CTE
+ * patterns: list of index name patterns to match
+ * inclusive: when 'include' is an empty list, whether the select statement should match all relations
+ */
+static void
+index_cte(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *schemafield, const char *ctename, const char *fieldname,
+ const char *excludecte, const char *tablescte,
+ const char *tablesfield, const SimpleStringList *patterns,
+ bool inclusive)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+ appendPQExpBuffer(sql, "\nSELECT %s FROM (", fieldname);
+ relation_select(conn, sql, schemacte, schemafield, fieldname, patterns,
+ CT_BTREE, inclusive);
+ if (tablescte)
+ {
+ appendPQExpBuffer(sql,
+ "\nUNION"
+ "\nSELECT i.indexrelid AS %s"
+ "\nFROM pg_catalog.pg_index i"
+ "\nJOIN %s t ON t.%s OPERATOR(pg_catalog.=) i.indrelid",
+ fieldname, tablescte, tablesfield);
+ }
+ appendPQExpBuffer(sql,
+ "\n) AS included_indexes"
+ "\nEXCEPT"
+ "\nSELECT %s FROM %s",
+ fieldname, excludecte);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * target_select
+ *
+ * Construct a query that will return a list of all tables and indexes in
+ * the database matching the user specified options, sorted by size. We
+ * want the largest tables and indexes first, so that the parallel
+ * processing of the larger database objects gets started sooner.
+ *
+ * If 'inclusive' is true, include all tables and indexes not otherwise
+ * excluded; if false, include only tables and indexes explicitly included.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql select statement is appended
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ * inclusive: when list of objects to include is empty, whether the select
+ * statement should match all objects not otherwise excluded
+ */
+static void
+target_select(PGconn *conn, PQExpBuffer sql, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname,
+ bool inclusive)
+{
+ appendPQExpBufferStr(sql, "WITH");
+ schema_cte(conn, sql, "namespaces", "nspoid", &objects->schemas,
+ &objects->exclude_schemas, inclusive);
+ appendPQExpBufferStr(sql, ",");
+ table_cte(conn, sql, "namespaces", "nspoid", "tables", "tbloid",
+ &objects->tables, &objects->exclude_tables, inclusive,
+ !checkopts->exclude_toast);
+ if (!checkopts->no_indexes)
+ {
+ appendPQExpBufferStr(sql, ",");
+ exclude_index_cte(conn, sql, "namespaces", "nspoid",
+ "excluded_indexes", "idxoid",
+ &objects->exclude_indexes);
+ appendPQExpBufferStr(sql, ",");
+ if (checkopts->dependents)
+ index_cte(conn, sql, "namespaces", "nspoid", "indexes", "idxoid",
+ "excluded_indexes", "tables", "tbloid",
+ &objects->indexes, inclusive);
+ else
+ index_cte(conn, sql, "namespaces", "nspoid", "indexes", "idxoid",
+ "excluded_indexes", NULL, NULL, &objects->indexes,
+ inclusive);
+ }
+ appendPQExpBuffer(sql,
+ "\nSELECT checktype, oid FROM ("
+ "\nSELECT %u AS checktype, tables.tbloid AS oid, c.relpages"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN tables"
+ "\nON tables.tbloid OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE ",
+ CT_TABLE);
+ append_ctfilter_quals(sql, "c", CT_TABLE);
+ if (!checkopts->no_indexes)
+ {
+ appendPQExpBuffer(sql,
+ "\nUNION ALL"
+ "\nSELECT %u AS checktype, indexes.idxoid AS oid, c.relpages"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN indexes"
+ "\nON indexes.idxoid OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE ",
+ CT_BTREE);
+ append_ctfilter_quals(sql, "c", CT_BTREE);
+ }
+ appendPQExpBufferStr(sql,
+ "\n) AS ss"
+ "\nORDER BY relpages DESC, checktype, oid");
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.h b/contrib/pg_amcheck/pg_amcheck.h
new file mode 100644
index 0000000000..43e2c1acde
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.h
@@ -0,0 +1,130 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.h
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2020-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_AMCHECK_H
+#define PG_AMCHECK_H
+
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "libpq-fe.h"
+#include "pqexpbuffer.h" /* pgrminclude ignore */
+
+/* amcheck options controlled by user flags */
+typedef struct amcheckOptions
+{
+ bool alldb;
+ bool echo;
+ bool quiet;
+ bool dependents;
+ bool no_indexes;
+ bool exclude_toast;
+ bool reconcile_toast;
+ bool on_error_stop;
+ bool parent_check;
+ bool rootdescend;
+ bool heapallindexed;
+ const char *skip;
+ int jobs; /* >= 0 indicates user specified the parallel
+ * degree, otherwise -1 */
+ long startblock;
+ long endblock;
+} amcheckOptions;
+
+/* names of database objects to include or exclude controlled by user flags */
+typedef struct amcheckObjects
+{
+ SimpleStringList dbnames;
+ SimpleStringList schemas;
+ SimpleStringList tables;
+ SimpleStringList indexes;
+ SimpleStringList exclude_dbnames;
+ SimpleStringList exclude_schemas;
+ SimpleStringList exclude_tables;
+ SimpleStringList exclude_indexes;
+} amcheckObjects;
+
+/*
+ * We cannot launch the same amcheck function for all checked objects. For
+ * btree indexes, we must use either bt_index_check() or
+ * bt_index_parent_check(). For heap relations, we must use verify_heapam().
+ * We silently ignore all other object types.
+ *
+ * The following CheckType enum and corresponding ct_filter array track which
+ * which kinds of relations get which treatment.
+ */
+typedef enum
+{
+ CT_TABLE = 0,
+ CT_BTREE
+} CheckType;
+
+/*
+ * This struct is used for filtering relations in pg_catalog.pg_class to just
+ * those of a given CheckType. The relam field should equal pg_class.relam,
+ * and the pg_class.relkind should be contained in the relkinds comma separated
+ * list.
+ *
+ * The 'typname' field is not strictly for filtering, but for printing messages
+ * about relations that matched the filter.
+ */
+typedef struct
+{
+ Oid relam;
+ const char *relkinds;
+ const char *typname;
+} CheckTypeFilter;
+
+/* Constants taken from pg_catalog/pg_am.dat */
+#define HEAP_TABLE_AM_OID 2
+#define BTREE_AM_OID 403
+
+static void check_each_database(ConnParams *cparams,
+ const amcheckObjects *objects,
+ const amcheckOptions *checkopts,
+ const char *progname);
+
+static void check_one_database(const ConnParams *cparams,
+ const amcheckObjects *objects,
+ const amcheckOptions *checkopts,
+ const char *progname);
+static void prepare_table_command(PQExpBuffer sql,
+ const amcheckOptions *checkopts, Oid reloid);
+
+static void prepare_btree_command(PQExpBuffer sql,
+ const amcheckOptions *checkopts, Oid reloid);
+static void run_command(PGconn *conn, const char *sql,
+ const amcheckOptions *checkopts, Oid reloid,
+ const char *typ);
+
+static PGresult *VerifyHeapamSlotHandler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups,
+ const char *query);
+
+static PGresult *VerifyBtreeSlotHandler(PGresult *res, PGconn *conn,
+ ExecStatusType expected_status,
+ int expected_ntups, const char *query);
+
+static void help(const char *progname);
+
+static void append_dbnames(SimpleStringList *dbnames,
+ const SimpleStringList *patterns);
+
+static void dbname_select(PGconn *conn, PQExpBuffer sql,
+ const SimpleStringList *patterns, bool alldb);
+
+static void target_select(PGconn *conn, PQExpBuffer sql,
+ const amcheckObjects *objects,
+ const amcheckOptions *options, const char *progname,
+ bool inclusive);
+
+#endif /* PG_AMCHECK_H */
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..111ef81146
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,59 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 10;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", 'qqq' ],
+ qr/database "qqq" does not exist/,
+ 'connecting to a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', "$port", '-U=no_such_user', 'postgres' ],
+ qr/role "=no_such_user" does not exist/,
+ 'connecting with a non-existent user');
+
+#########################################
+# Test checking a non-existent schemas, tables, and indexes
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', "$port", '-s', 'no_such_schema' ],
+ 'checking a non-existent schema');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'no_such_table' ],
+ 'checking a non-existent table');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', "$port", '-i', 'no_such_index' ],
+ 'checking a non-existent schema');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', "$port", '-s', 'no*such*schema*' ],
+ 'no matching schemas');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', "$port", '-t', 'no*such*table*' ],
+ 'no matching tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', "$port", '-i', 'no*such*index' ],
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..583660f3df
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,428 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 70;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the test node is running.
+sub corrupt_first_page($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# toast table (if any) corresponding to the given main table relation, and
+# restarts the node.
+#
+# Assumes the test node is running
+sub remove_toast_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $toastname = relation_toast($dbname, $relname);
+ remove_relation_file($dbname, $toastname) if ($toastname);
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+for my $dbname (qw(db1 db2 db3))
+{
+ # Create the database
+ $node->safe_psql('postgres', qq(CREATE DATABASE $dbname));
+
+ # Load the amcheck extension, upon which pg_amcheck depends
+ $node->safe_psql($dbname, q(CREATE EXTENSION amcheck));
+
+ # Create schemas, tables and indexes in five separate
+ # schemas. The schemas are all identical to start, but
+ # we will corrupt them differently later.
+ #
+ for my $schema (qw(s1 s2 s3 s4 s5))
+ {
+ $node->safe_psql($dbname, qq(
+ CREATE SCHEMA $schema;
+ CREATE SEQUENCE $schema.seq1;
+ CREATE SEQUENCE $schema.seq2;
+ CREATE TABLE $schema.t1 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE TABLE $schema.t2 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE VIEW $schema.t2_view AS (
+ SELECT i*2, t FROM $schema.t2
+ );
+ ALTER TABLE $schema.t2
+ ALTER COLUMN t
+ SET STORAGE EXTERNAL;
+
+ INSERT INTO $schema.t1 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ INSERT INTO $schema.t2 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ CREATE MATERIALIZED VIEW $schema.t1_mv AS SELECT * FROM $schema.t1;
+ CREATE MATERIALIZED VIEW $schema.t2_mv AS SELECT * FROM $schema.t2;
+
+ create table $schema.p1 (a int, b int) PARTITION BY list (a);
+ create table $schema.p2 (a int, b int) PARTITION BY list (a);
+
+ create table $schema.p1_1 partition of $schema.p1 for values in (1, 2, 3);
+ create table $schema.p1_2 partition of $schema.p1 for values in (4, 5, 6);
+ create table $schema.p2_1 partition of $schema.p2 for values in (1, 2, 3);
+ create table $schema.p2_2 partition of $schema.p2 for values in (4, 5, 6);
+
+ CREATE INDEX t1_btree ON $schema.t1 USING BTREE (i);
+ CREATE INDEX t2_btree ON $schema.t2 USING BTREE (i);
+
+ CREATE INDEX t1_hash ON $schema.t1 USING HASH (i);
+ CREATE INDEX t2_hash ON $schema.t2 USING HASH (i);
+
+ CREATE INDEX t1_brin ON $schema.t1 USING BRIN (i);
+ CREATE INDEX t2_brin ON $schema.t2 USING BRIN (i);
+
+ CREATE INDEX t1_gist ON $schema.t1 USING GIST (b);
+ CREATE INDEX t2_gist ON $schema.t2 USING GIST (b);
+
+ CREATE INDEX t1_gin ON $schema.t1 USING GIN (ia);
+ CREATE INDEX t2_gin ON $schema.t2 USING GIN (ia);
+
+ CREATE INDEX t1_spgist ON $schema.t1 USING SPGIST (ir);
+ CREATE INDEX t2_spgist ON $schema.t2 USING SPGIST (ir);
+ ));
+ }
+}
+
+# Database 'db1' corruptions
+#
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('db1', 's1.t1_btree');
+corrupt_first_page('db1', 's1.t2_btree');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('db1', 's2.t1');
+corrupt_first_page('db1', 's2.t2');
+
+# Corrupt tables, partitions, matviews, and btrees in schema "s3"
+remove_relation_file('db1', 's3.t1');
+corrupt_first_page('db1', 's3.t2');
+
+remove_relation_file('db1', 's3.t1_mv');
+remove_relation_file('db1', 's3.p1_1');
+
+corrupt_first_page('db1', 's3.t2_mv');
+corrupt_first_page('db1', 's3.p2_1');
+
+remove_relation_file('db1', 's3.t1_btree');
+corrupt_first_page('db1', 's3.t2_btree');
+
+# Corrupt toast table, partitions, and materialized views in schema "s4"
+remove_toast_file('db1', 's4.t2');
+
+# Corrupt all other object types in schema "s5". We don't have amcheck support
+# for these types, but we check that their corruption does not trigger any
+# errors in pg_amcheck
+remove_relation_file('db1', 's5.seq1');
+remove_relation_file('db1', 's5.t1_hash');
+remove_relation_file('db1', 's5.t1_gist');
+remove_relation_file('db1', 's5.t1_gin');
+remove_relation_file('db1', 's5.t1_brin');
+remove_relation_file('db1', 's5.t1_spgist');
+
+corrupt_first_page('db1', 's5.seq2');
+corrupt_first_page('db1', 's5.t2_hash');
+corrupt_first_page('db1', 's5.t2_gist');
+corrupt_first_page('db1', 's5.t2_gin');
+corrupt_first_page('db1', 's5.t2_brin');
+corrupt_first_page('db1', 's5.t2_spgist');
+
+
+# Database 'db2' corruptions
+#
+remove_relation_file('db2', 's1.t1');
+remove_relation_file('db2', 's1.t1_btree');
+
+
+# Leave 'db3' uncorrupted
+#
+
+
+# Standard first arguments to TestLib functions
+my @cmd = ('pg_amcheck', '--quiet', '-p', $port);
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ @cmd,, 'db1' ],
+ 'pg_amcheck all schemas, tables and indexes in database db1');
+
+$node->command_ok(
+ [ @cmd,, 'db1', 'db2', 'db3' ],
+ 'pg_amcheck all schemas, tables and indexes in databases db1, db2 and db3');
+
+$node->command_ok(
+ [ @cmd, '--all' ],
+ 'pg_amcheck all schemas, tables and indexes in all databases');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-s', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-r', 's*.t1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-i', 'i*.idx', '-i', 'idx.i*' ],
+ 'pg_amcheck all indexes with qualified names matching /i*.idx/ or /idx.i*/');
+
+$node->command_ok(
+ [ @cmd, '--no-dependents', 'db1', '-r', 's*.t1' ],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/');
+
+$node->command_ok(
+ [ @cmd, '--no-dependents', 'db1', '-t', 's*.t1', '-t', 'foo*.bar*' ],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/ or /foo*.bar*/');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-T', 't1' ],
+ 'pg_amcheck everything except tables named t1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-S', 's1', '-R', 't1' ],
+ 'pg_amcheck everything not named t1 nor in schema s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.*.*' ],
+ 'pg_amcheck all tables across all databases and schemas');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.*.t1' ],
+ 'pg_amcheck all tables named t1 across all databases and schemas');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.s1.*' ],
+ 'pg_amcheck all tables across all databases in schemas named s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', 'db2.*.*' ],
+ 'pg_amcheck all tables across all schemas in database db2');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', 'db2.*.*', '-t', 'db3.*.*' ],
+ 'pg_amcheck all tables across all schemas in databases db2 and db3');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ @cmd, '--all', '-s', 's1', '-i', 't1_btree' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck index s1.t1_btree reports missing main relation fork');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't2_btree' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+# Checking db1.s1 should show no corruptions if indexes are excluded
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/^$/,
+ 'pg_amcheck of db1.s1 excluding indexes');
+
+# But checking across all databases in schema s1 should show corruptions
+# messages for tables in db2
+$node->command_like(
+ [ @cmd, '--all', '-s', 's1', '--exclude-indexes' ],
+ qr/could not open file/,
+ 'pg_amcheck of schema s1 across all databases but excluding indexes');
+
+# Checking across a list of databases should also work
+$node->command_like(
+ [ @cmd, '-d', 'db2', '-d', 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/could not open file/,
+ 'pg_amcheck of schema s1 across db1 and db2 but excluding indexes');
+
+# In schema s3, the tables and indexes are both corrupt. We should see
+# corruption messages on stdout, nothing on stderr, and an exit
+# status of zero.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's3' ],
+ 0,
+ [ qr/index "t1_btree" lacks a main relation fork/,
+ qr/could not open file/ ],
+ [ qr/^$/ ],
+ 'pg_amcheck schema s3 reports table and index errors');
+
+# In schema s2, only tables are corrupt. Check that table corruption is
+# reported as expected.
+#
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't1' ],
+ qr/could not open file/,
+ 'pg_amcheck in schema s2 reports table corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck in schema s2 reports table corruption');
+
+# In schema s4, only toast tables are corrupt. Check that under default
+# options the toast corruption is reported, but when excluding toast we get no
+# error reports.
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's4' ],
+ qr/could not open file/,
+ 'pg_amcheck in schema s4 reports toast corruption');
+
+$node->command_like(
+ [ @cmd, '--exclude-toast', '--exclude-toast-pointers', 'db1', '-s', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck in schema s4 excluding toast reports no corruption');
+
+# Check that no corruption is reported in schema s5
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's5' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s5 reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-I', 't1_btree', '-I', 't2_btree' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s1 with corrupt indexes excluded reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s1 with all indexes excluded reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s2 with corrupt tables excluded reports no corruption');
+
+# Check errors about bad block range command line arguments. We use schema s5
+# to avoid getting messages about corrupt tables or indexes.
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', 'junk' ],
+ qr/relation starting block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage startblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--endblock', '1234junk' ],
+ qr/relation ending block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage endblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', '5', '--endblock', '4' ],
+ qr/relation ending block argument precedes starting block argument/,
+ 'pg_amcheck rejects invalid block range');
+
+# Check bt_index_parent_check alternates. We don't create any index corruption
+# that would behave differently under these modes, so just smoke test that the
+# arguments are handled sensibly.
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--parent-check' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck smoke test --parent-check');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--heapallindexed', '--rootdescend' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck smoke test --heapallindexed --rootdescend');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..cd21874735
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,496 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper function to generate a regular expression matching the header we
+# expect verify_heapam() to return given which fields we expect to be non-null.
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ return qr/relation public\.test, block $blkno, offset $offnum, attribute $attnum\s+/ms
+ if (defined $attnum);
+ return qr/relation public\.test, block $blkno, offset $offnum\s+/ms
+ if (defined $offnum);
+ return qr/relation public\.test\s+/ms
+ if (defined $blkno);
+ return qr/relation public\.test\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, undef);
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--exclude-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..379225cbf8
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index ae2759be55..487cc27027 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -185,6 +185,7 @@ pages.
</para>
&oid2name;
+ &pgamcheck;
&vacuumlo;
</sect1>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38e8aa0bbf..a4e1b28b38 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..2b2c73ca8b
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,1004 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<refentry id="pgamcheck">
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle><application>pg_amcheck</application></refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>pg_amcheck</refname>
+ <refpurpose>checks for corruption in one or more <productname>PostgreSQL</productname> databases</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_amcheck</command>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+ <arg rep="repeat"><replaceable>dbname</replaceable></arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <application>pg_amcheck</application> supports running
+ <xref linkend="amcheck"/>'s corruption checking functions against one or more
+ databases, with options to select which schemas, tables and indexes to check,
+ which kinds of checking to perform, and whether to perform the checks in
+ parallel, and if so, the number of parallel connections to establish and use.
+ </para>
+
+ <para>
+ Only table relations and btree indexes are currently supported. Other
+ relation types are silently skipped.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Usage</title>
+
+ <refsect2>
+ <title>Parallelism Options</title>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --jobs=20 --all</literal></term>
+ <listitem>
+ <para>
+ Check all databases one after another, but for each database checked,
+ use up to 20 simultaneous connections to check relations in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --jobs=8 mydb yourdb</literal></term>
+ <listitem>
+ <para>
+ Check databases <literal>mydb</literal> and <literal>yourdb</literal>
+ one after another, using up to 8 simultaneous connections to check
+ relations in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Checking Option Specification</title>
+
+ <para>
+ If no checking options are specified, by default all table relation checks
+ and default level btree index checks are performed. A variety of options
+ exist to change the set of checks performed on whichever relations are
+ being checked. They are briefly mentioned here in the following examples,
+ but see their full descriptions below.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --parent-check --heapallindexed</literal></term>
+ <listitem>
+ <para>
+ For each btree index checked, performs more extensive checks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --exclude-toast-pointers</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, do not check toast pointers against
+ the toast relation.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --on-error-stop</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, do not continue checking pages after
+ the first page where corruption is encountered.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --skip="all-frozen"</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, skips over blocks marked as all
+ frozen. Note that "all-visible" may also be specified.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --startblock=3000 --endblock=4000</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, check only blocks in the given block
+ range.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Relation Specification</title>
+
+ <para>
+ If no relations are explicitly listed, by default all relations will be
+ checked, but there are options to specify which relations to check.
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck -r mytable -r yourtable</literal></term>
+ <listitem>
+ <para>
+ If one or more relations are explicitly given, they are interpreted as
+ an exhaustive list of all relations to be checked, with one caveat:
+ for all such relations, associated toast relations and indexes are by
+ default included in the list of relations to check.
+ </para>
+ <para>
+ Assuming <literal>mytable</literal> is an ordinary table, and that it
+ is indexed by <literal>mytable_idx</literal> and has an associated
+ toast table <literal>pg_toast_12345</literal>, checking will be
+ performed on <literal>mytable</literal>,
+ <literal>mytable_idx</literal>, and <literal>pg_toast_12345</literal>.
+ </para>
+ <para>
+ Likewise for <literal>yourtable</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r mytable --no-dependents</literal></term>
+ <listitem>
+ <para>
+ This restricts the list of relations checked to just
+ <literal>mytable</literal>, without pulling in the corresponding
+ indexes or toast, but see also
+ <option>--exclude-toast-pointers</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -t mytable -i myindex</literal></term>
+ <listitem>
+ <para>
+ The <option>-r</option> (<option>--relation</option>) will match any
+ relation, but <option>-t</option> (<option>--table</option>) and
+ <option>-i</option> (<option>--index</option>) may be used to avoid
+ matching objects of the other type.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r="my*" -R="mytemp*"</literal></term>
+ <listitem>
+ <para>
+ Relations may be included (<option>-r</option>) or excluded
+ (<option>-R</option>) using shell-style patterns.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r="my*" -I="myanmar"</literal></term>
+ <listitem>
+ <para>
+ Table and index inclusion and exclusion patterns may be used
+ equivalently with <option>-t</option>, <option>-T</option>,
+ <option>-i</option> and <option>-I</option>. The above example checks
+ all tables and indexes starting with <literal>my</literal> except for
+ indexes starting with <literal>myanmar</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -R="india" -T="laos" -I="myanmar"</literal></term>
+ <listitem>
+ <para>
+ Unlike specifying one ore more <option>--relation</option> options, which
+ disables the default behavior of checking all relations, specifying one or
+ more of <option>-R</option>, <option>-T</option> or <option>-I</option> does not.
+ The above command will check all relations not named
+ <literal>india</literal>, not a table named
+ <literal>laos</literal>, nor an index named <literal>myanmar</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Schema Specification</title>
+
+ <para>
+ If no schemas are explicitly listed, by default all schemas except
+ <literal>pg_catalog</literal> and <literal>pg_toast</literal> will be
+ checked.
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck -s s1 -s s2 -r mytable</literal></term>
+ <listitem>
+ <para>
+ If one or more schemas are listed with <option>-s</option>, unqualified
+ relation names will be checked only in the given schemas. The above
+ command will check tables <literal>s1.mytable</literal> and
+ <literal>s2.mytable</literal> but not tables named
+ <literal>mytable</literal> in other schemas.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -S s1 -S s2 -r mytable</literal></term>
+ <listitem>
+ <para>
+ As with relations, schemas may be excluded. The above command will
+ check any table named <literal>mytable</literal> not in schemas
+ <literal>s1</literal> and <literal>s2</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -S s1 -S s2 -r mytable -t s1.stuff</literal></term>
+ <listitem>
+ <para>
+ Relations may be included or excluded with a schema-qualified name
+ without interference from the <option>-s</option> or
+ <option>-S</option> options. Even though schema <literal>s1</literal>
+ has been excluded, the table <literal>s1.stuff</literal> will be
+ checked.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Database Specification</title>
+
+ <para>
+ If no databases are explicitly listed, the database to check is obtained
+ from environment variables in the usual way. Otherwise, when one or more
+ databases are explicitly given, they are interpreted as an exhaustive list
+ of all databases to be checked. This list of databases to check may
+ contain patterns, but because any such patterns need to be reconciled
+ against a list of all databases to find the matching database names, at
+ least one database specified must be a literal database name and not merely
+ a pattern, and the one so specified must be in a location where
+ <application>pg_amcheck</application> expects to find it.
+ </para>
+ <para>
+ For example:
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --all --maintenance-db=foo</literal></term>
+ <listitem>
+ <para>
+ If the <option>--maintenance-db</option> option is given, it will be
+ used to look up the matching databases, though it will not itself be
+ added to the list of databases for checking.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck foo bar baz</literal></term>
+ <listitem>
+ <para>
+ Otherwise, if one or more plain database name arguments not preceded by
+ <option>-d</option> or <option>--dbname</option> are given, the first
+ one will be used for this purpose, and it will also be included in the
+ list of databases to check.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -d foo -d bar baz</literal></term>
+ <listitem>
+ <para>
+ If a mixture of plain database names and databases preceded with
+ <option>-d</option> or <option>--dbname</option> are given, the first
+ plain database name will be used for this purpose. In the above
+ example, <literal>baz</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --dbname=foo --dbname="bar*"</literal></term>
+ <listitem>
+ <para>
+ Otherwise, if one or more databases are given with the
+ <option>-d</option> or <option>--dbname</option> option, the first one
+ will be used and must be a literal database name. In this example,
+ <literal>foo</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --relation="accounts_*.*.*"</literal></term>
+ <listitem>
+ <para>
+ Otherwise, the environment will be consulted for the database to be
+ used. In the example above, the default database will be queried to
+ find all databases with names that begin with
+ <literal>accounts_</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ As discussed above for schema-qualified relations, a database-qualified
+ relation name or pattern may also be given.
+<programlisting>
+pg_amcheck mydb \
+ --schema="t*" \
+ --exclude-schema="tmp*" \
+ --relation=baz \
+ --relation=bar.baz \
+ --relation=foo.bar.baz \
+ --relation="f*".a.b \
+ --exclude-relation=foo.a.b
+</programlisting>
+ will check relations in database <literal>mydb</literal> using the schema
+ resolution rules discussed above, but additionally will check all relations
+ named <literal>a.b</literal> in all databases with names starting with
+ <literal>f</literal> except database <literal>foo</literal>.
+ </para>
+
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Options</title>
+
+ <para>
+ <application>pg_amcheck</application> accepts the following command-line arguments:
+ </para>
+
+ <refsect2>
+ <title>Help and Version Information Options</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_amcheck</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Print the <application>pg_amcheck</application> version and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-e</option></term>
+ <term><option>--echo</option></term>
+ <listitem>
+ <para>
+ Print to stdout all commands and queries being executed against the
+ server.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-q</option></term>
+ <term><option>--quiet</option></term>
+ <listitem>
+ <para>
+ Do not write additional messages beyond those about corruption.
+ </para>
+ <para>
+ This option does not quiet any output specifically due to the use of
+ the <option>-e</option> <option>--echo</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Increases the log level verbosity. This option may be given more than
+ once.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Database Connection and Concurrent Connection Options</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-h</option></term>
+ <term><option>--host=HOSTNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server is running.
+ If the value begins with a slash, it is used as the directory for the
+ Unix domain socket.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-p</option></term>
+ <term><option>--port=PORT</option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file extension on
+ which the server is listening for connections.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-U</option></term>
+ <term><option>--username=USERNAME</option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires password
+ authentication and a password is not available by other means such as
+ a <filename>.pgpass</filename> file, the connection attempt will fail.
+ This option can be useful in batch jobs and scripts where no user is
+ present to enter a password.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_amcheck</application> to prompt for a password
+ before connecting to a database.
+ </para>
+ <para>
+ This option is never essential, since
+ <application>pg_amcheck</application> will automatically prompt for a
+ password if the server demands password authentication. However,
+ <application>pg_amcheck</application> will waste a connection attempt
+ finding out that the server wants a password. In some cases it is
+ worth typing <option>-W</option> to avoid the extra connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--maintenance-db=DBNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to connect to when querying the
+ list of all databases. If not specified, the
+ <literal>postgres</literal> database will be used; if that does not
+ exist <literal>template1</literal> will be used. This can be a
+ <link linkend="libpq-connstring">connection string</link>. If so,
+ connection string parameters will override any conflicting command
+ line options.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-j</option></term>
+ <term><option>--jobs=NUM</option></term>
+ <listitem>
+ <para>
+ Use the specified number of concurrent connections to the server, or
+ one per object to be checked, whichever number is smaller.
+ </para>
+ <para>
+ When used in conjunction with the <option>-a</option>
+ <option>--all</option> option, the total number of objects to check,
+ and correspondingly the number of concurrent connections to use, is
+ recalculated per database. If the number of objects to check differs
+ from one database to the next and is less than the concurrency level
+ specified, the number of concurrent connections open to the server
+ will fluctuate to meet the needs of each database processed.
+ </para>
+ <para>
+ The default is to use a single connection.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Options Controlling Index Checking Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-P</option></term>
+ <term><option>--parent-check</option></term>
+ <listitem>
+ <para>
+ For each btree index checked, use <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> function, which performs
+ additional checks of parent/child relationships during index checking.
+ </para>
+ <para>
+ The default is to use <application>amcheck</application>'s
+ <function>bt_index_check</function> function, but note that use of the
+ <option>--rootdescend</option> option implicitly
+ selects <function>bt_index_parent_check</function>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-H</option></term>
+ <term><option>--heapallindexed</option></term>
+ <listitem>
+ <para>
+ For each index checked, verify the presence of all heap tuples as index
+ tuples in the index using <application>amcheck</application>'s
+ <option>heapallindexed</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--rootdescend</option></term>
+ <listitem>
+ <para>
+ For each index checked, re-find tuples on the leaf level by performing
+ a new search from the root page for each tuple using
+ <xref linkend="amcheck"/>'s <option>rootdescend</option> option.
+ </para>
+ <para>
+ Use of this option implicitly also selects the <option>-P</option>
+ <option>--parent-check</option> option.
+ </para>
+ <para>
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited use or even
+ of no use in helping detect the kinds of corruption that occur in
+ practice. It may also cause corruption checking to take considerably
+ longer and consume considerably more resources on the server.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Options Controlling Table Checking Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>--exclude-toast-pointers</option></term>
+ <listitem>
+ <para>
+ When checking main relations, do not look up entries in toast tables
+ corresponding to toast pointers in the main releation.
+ </para>
+ <para>
+ The default behavior checks each toast pointer encountered in the main
+ table to verify, as much as possible, that the pointer points at
+ something in the toast table that is reasonable. Toast pointers which
+ point beyond the end of the toast table, or to the middle (rather than
+ the beginning) of a toast entry, are identified as corrupt.
+ </para>
+ <para>
+ The process by which <xref linkend="amcheck"/>'s
+ <function>verify_heapam</function> function checks each toast pointer
+ is slow and may be improved in a future release. Some users may wish
+ to disable this check to save time.
+ </para>
+ <para>
+ Note that, despite their similar names, this option is unrelated to the
+ <option>--exclude-toast</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--on-error-stop</option></term>
+ <listitem>
+ <para>
+ After reporting all corruptions on the first page of a table where
+ corruptions are found, stop processing that table relation and move on
+ to the next table or index.
+ </para>
+ <para>
+ Note that index checking always stops after the first corrupt page.
+ This option only has meaning relative to table relations.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--skip=OPTION</option></term>
+ <listitem>
+ <para>
+ If <literal>"all-frozen"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all frozen.
+ </para>
+ <para>
+ If <literal>"all-visible"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all visible.
+ </para>
+ <para>
+ By default, no pages are skipped.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--startblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) pages prior to the given starting block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--endblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) all pages after the given ending block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Corruption Checking Target Options</title>
+
+ <para>
+ Objects to be checked may span schemas in more than one database. Options
+ for restricting the list of databases, schemas, tables and indexes are
+ described below. In each place where a name may be specified, a
+ <link linkend="app-psql-patterns"><replaceable class="parameter">pattern</replaceable></link>
+ may also be used.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><option>--all</option></term>
+ <listitem>
+ <para>
+ Perform checking in all databases.
+ </para>
+ <para>
+ In the absence of any other options, selects all objects across all
+ schemas and databases.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-db</option> takes
+ precedence over <option>--all</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-d</option></term>
+ <term><option>--dbname</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for checking. By default, all objects in
+ the matching database(s) will be checked.
+ </para>
+ <para>
+ If no <option>maintenance-db</option> argument is given nor is any
+ database name given as a command line argument, the first argument
+ specified with <option>-d</option> <option>--dbname</option> will be
+ used for the initial connection. If that argument is not a literal
+ database name, the attempt to connect will fail.
+ </para>
+ <para>
+ If <option>--all</option> is also specified, <option>-d</option>
+ <option>--dbname</option> does not affect which databases are checked,
+ but may be used to specify the database for the initial connection.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-db</option> takes
+ precedence over <option>-d</option> <option>--dbname</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--dbname=africa</literal></member>
+ <member><literal>--dbname="a*"</literal></member>
+ <member><literal>--dbname="africa|asia|europe"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-D</option></term>
+ <term><option>--exclude-db</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for exclusion.
+ </para>
+ <para>
+ If a database which is included using <option>--all</option> or
+ <option>-d</option> <option>--dbname</option> is also excluded using
+ <option>-D</option> <option>--exclude-db</option>, the database will be
+ excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--exclude-db=america</literal></member>
+ <member><literal>--exclude-db="*pacific*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--schema</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified schema(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for checking. By default, all objects in
+ the matching schema(s) will be checked.
+ </para>
+ <para>
+ Option <option>-S</option> <option>--exclude-schema</option> takes
+ precedence over <option>-s</option> <option>--schema</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--schema=corp</literal></member>
+ <member><literal>--schema="corp|llc|npo"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-S</option></term>
+ <term><option>--exclude-schema</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified schema.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for exclusion.
+ </para>
+ <para>
+ If a schema which is included using
+ <option>-s</option> <option>--schema</option> is also excluded using
+ <option>-S</option> <option>--exclude-schema</option>, the schema will be
+ excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>-S corp -S llc</literal></member>
+ <member><literal>--exclude-schema="*c*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-r</option></term>
+ <term><option>--relation</option></term>
+ <listitem>
+ <para>
+ Perform checking on the specified relation(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ relation (or relation pattern) for checking.
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>.
+ </para>
+ <para>
+ If the relation is not schema qualified, database and schema
+ inclusion/exclusion lists will determine in which databases or schemas
+ matching relations will be checked.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--relation=accounts_idx</literal></member>
+ <member><literal>--relation="llc.accounts_idx"</literal></member>
+ <member><literal>--relation="asia|africa.corp|llc.accounts_idx"</literal></member>
+ </simplelist>
+ </para>
+ <para>
+ The first example, <literal>--relation=accounts_idx</literal>, checks
+ relations named <literal>accounts_idx</literal> in all selected schemas
+ and databases.
+ </para>
+ <para>
+ The second example, <literal>--relation="llc.accounts_idx"</literal>,
+ checks relations named <literal>accounts_idx</literal> in schema
+ <literal>llc</literal> in all selected databases.
+ </para>
+ <para>
+ The third example,
+ <literal>--relation="asia|africa.corp|llc.accounts_idx"</literal>,
+ checks relations named <literal>accounts_idx</literal> in
+ schemas <literal>corp</literal> and <literal>llc</literal> in databases
+ <literal>asia</literal> and <literal>africa</literal>.
+ </para>
+ <para>
+ Note that if a database is implicated in a relation pattern, such as
+ <literal>asia</literal> and <literal>africa</literal> in the third
+ example above, the database need not be otherwise given in the command
+ arguments for the relation to be checked. As an extreme example of
+ this:
+ <simplelist>
+ <member><literal>pg_amcheck --relation="*.*.*" mydb</literal></member>
+ </simplelist>
+ will check all relations in all databases. The <literal>mydb</literal>
+ argument only serves to tell <application>pg_amcheck</application> the
+ name of the database to use for querying the list of all databases.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-R</option></term>
+ <term><option>--exclude-relation</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified relation(s).
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>,
+ <option>-t</option> <option>--table</option> and <option>-i</option>
+ <option>--index</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t</option></term>
+ <term><option>--table</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified tables(s). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T</option></term>
+ <term><option>--exclude-table</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified tables(s). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-i</option></term>
+ <term><option>--index</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified index(es). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-I</option></term>
+ <term><option>--exclude-index</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified index(es). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-dependents</option></term>
+ <listitem>
+ <para>
+ When calculating the list of objects to be checked, do not automatically
+ expand the list to include associated indexes and toast tables of
+ elements otherwise in the list.
+ </para>
+ <para>
+ By default, for each main table relation checked, any associated toast
+ table and all associated indexes are also checked, unless explicitly
+ excluded.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+
+ <para>
+ <application>pg_amcheck</application> is designed to work with
+ <productname>PostgreSQL</productname> 14.0 and later.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Author</title>
+
+ <para>
+ Mark Dilger <email>mark.dilger@enterprisedb.com</email>
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="amcheck"/></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index ea3af48777..6eba8e1870 100644
--- a/src/tools/msvc/Install.pm
+++ b/src/tools/msvc/Install.pm
@@ -18,11 +18,11 @@ our (@ISA, @EXPORT_OK);
@EXPORT_OK = qw(Install);
my $insttype;
-my @client_contribs = ('oid2name', 'pgbench', 'vacuumlo');
+my @client_contribs = ('oid2name', 'pg_amcheck', 'pgbench', 'vacuumlo');
my @client_program_files = (
'clusterdb', 'createdb', 'createuser', 'dropdb',
'dropuser', 'ecpg', 'libecpg', 'libecpg_compat',
- 'libpgtypes', 'libpq', 'pg_basebackup', 'pg_config',
+ 'libpgtypes', 'libpq', 'pg_amcheck', 'pg_basebackup', 'pg_config',
'pg_dump', 'pg_dumpall', 'pg_isready', 'pg_receivewal',
'pg_recvlogical', 'pg_restore', 'psql', 'reindexdb',
'vacuumdb', @client_contribs);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index f3d8c1faf4..99b1c2fb8f 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'pg_standby', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'pg_standby', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 721b230bf2..86fb26974b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -403,6 +403,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnParams
ConnStatusType
ConnType
ConnectionStateEnum
@@ -2845,6 +2846,8 @@ ambuildempty_function
ambuildphasename_function
ambulkdelete_function
amcanreturn_function
+amcheckObjects
+amcheckOptions
amcostestimate_function
amendscan_function
amestimateparallelscan_function
--
2.21.1 (Apple Git-122.3)
I like 0007 quite a bit and am inclined to commit it soon, as it
doesn't depend on the earlier patches. But:
- I think the residual comment in processSQLNamePattern beginning with
"Note:" could use some wordsmithing to account for the new structure
of things -- maybe just "this pass" -> "this function".
- I suggest changing initializations like maxbuf = buf + 2 to maxbuf =
&buf[2] for clarity.
Regarding 0001:
- My preference would be to dump on_exit_nicely_final() and just rely
on order of registration.
- I'm not entirely sure it's a good ideal to expose something named
fatal() like this, because that's a fairly short and general name. On
the other hand, it's pretty descriptive and it's not clear why someone
including exit_utils.h would want any other definitiion. I guess we
can always change it later if it proves to be problematic; it's got a
lot of callers and I guess there's no point in churning the code
without a clear reason.
- I don't quite see why we need this at all. Like, exit_nicely() is a
pg_dump-ism. It would make sense to centralize it if we were going to
use it for pg_amcheck, but you don't. If you were going to, you'd need
to adapt 0003 to use exit_nicely() instead of exit(), but you don't,
nor do you add any other new calls to exit_nicely() anywhere, except
for one in 0002. That makes the PGresultHandler stuff depend on
exit_nicely(), which might be important if you were going to refactor
pg_dump to use that abstraction, but you don't. I'm not opposed to the
idea of centralized exit processing for frontend utilities; indeed, it
seems like a good idea. But this doesn't seem to get us there. AFAICS
it just entangles pg_dump with pg_amcheck unnecessarily in a way that
doesn't really benefit either of them.
Regarding 0002:
- I don't think this is separately committable because it adds an
abstraction but not any uses of that abstraction to demonstrate that
it's actually any good. Perhaps it should just be merged into 0005,
and even into parallel_slot.h vs. having its own header. I'm not
really sure about that, though
- Is this really much of an abstraction layer? Like, how generic can
this be when the argument list includes ExecStatusType expected_status
and int expected_ntups?
- The logic seems to be very similar to some of the stuff that you
move around in 0003, like executeQuery() and executeCommand(), but it
doesn't get unified. I'm not necessarily saying it should be, but it's
weird to do all this refactoring and end up with something that still
looks this
0003, 0004, and 0006 look pretty boring; they are just moving code
around. Is there any point in splitting the code from 0003 across two
files? Maybe it's fine.
If I run pg_amcheck --all -j4 do I get a serialization boundary across
databases? Like, I have to completely finish db1 before I can go onto
db2, even though maybe only one worker is still busy with it?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Jan 28, 2021, at 9:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
If I run pg_amcheck --all -j4 do I get a serialization boundary across
databases? Like, I have to completely finish db1 before I can go onto
db2, even though maybe only one worker is still busy with it?
Yes, you do. That's patterned on reindexdb and vacuumdb.
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jan 28, 2021, at 9:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I like 0007 quite a bit and am inclined to commit it soon, as it
doesn't depend on the earlier patches. But:- I think the residual comment in processSQLNamePattern beginning with
"Note:" could use some wordsmithing to account for the new structure
of things -- maybe just "this pass" -> "this function".
- I suggest changing initializations like maxbuf = buf + 2 to maxbuf =
&buf[2] for clarity.
Ok, I should be able to get you an updated version of 0007 with those changes here soon for you to commit.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Jan 28, 2021 at 12:40 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
On Jan 28, 2021, at 9:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
If I run pg_amcheck --all -j4 do I get a serialization boundary across
databases? Like, I have to completely finish db1 before I can go onto
db2, even though maybe only one worker is still busy with it?Yes, you do. That's patterned on reindexdb and vacuumdb.
Sounds lame, but fair enough. We can leave that problem for another day.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Jan 28, 2021, at 9:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jan 28, 2021 at 12:40 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:On Jan 28, 2021, at 9:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
If I run pg_amcheck --all -j4 do I get a serialization boundary across
databases? Like, I have to completely finish db1 before I can go onto
db2, even though maybe only one worker is still busy with it?Yes, you do. That's patterned on reindexdb and vacuumdb.
Sounds lame, but fair enough. We can leave that problem for another day.
Yeah, I agree that it's lame, and should eventually be addressed.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jan 28, 2021, at 9:41 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
On Jan 28, 2021, at 9:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
I like 0007 quite a bit and am inclined to commit it soon, as it
doesn't depend on the earlier patches. But:- I think the residual comment in processSQLNamePattern beginning with
"Note:" could use some wordsmithing to account for the new structure
of things -- maybe just "this pass" -> "this function".
- I suggest changing initializations like maxbuf = buf + 2 to maxbuf =
&buf[2] for clarity.Ok, I should be able to get you an updated version of 0007 with those changes here soon for you to commit.
I made those changes, and fixed a bug that would impact the pg_amcheck callers. I'll have to extend the regression test coverage in 0008 since it obviously wasn't caught, but that's not part of this patch since there are no callers that use the dbname.schema.relname format as yet.
This is the only patch for v34, since you want to commit it separately. It's renamed as 0001 here....
Attachments:
v34-0001-Refactoring-processSQLNamePattern.patchapplication/octet-stream; name=v34-0001-Refactoring-processSQLNamePattern.patch; x-unix-mode=0644Download
From e155ecbf58c613d7518ddc30f7541e4095d4ac21 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Wed, 30 Dec 2020 13:29:20 -0800
Subject: [PATCH v34] Refactoring processSQLNamePattern.
Factoring out logic which transforms shell-style patterns into SQL
style regexp format from inside processSQLNamePattern into a
separate new function "patternToSQLRegex". The interface and
semantics of processSQLNamePattern are unchanged.
The motivation for the refactoring is that processSQLNamePattern
mixes the job of transforming the pattern with the job of
constructing a where-clause based on a single pattern, which makes
the code hard to reuse from other places.
The new helper function patternToSQLRegex can handle parsing of
patterns of the form "database.schema.relation", "schema.relation",
and "relation". The three-part form is unused in this commit, as
the pre-existing patternToSQLRegex function ignores the dbname
functionality, and there are not yet any other callers. The
three-part form will be used by pg_amcheck, not yet committed, to
allow specifying on the command line the inclusion and exclusion of
relations spanning multiple databases.
---
src/fe_utils/string_utils.c | 255 +++++++++++++++++-----------
src/include/fe_utils/string_utils.h | 4 +
2 files changed, 164 insertions(+), 95 deletions(-)
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index a1a9d691d5..94941132ac 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -831,10 +831,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
PQExpBufferData schemabuf;
PQExpBufferData namebuf;
- int encoding = PQclientEncoding(conn);
- bool inquotes;
- const char *cp;
- int i;
bool added_clause = false;
#define WHEREAND() \
@@ -856,98 +852,12 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
initPQExpBuffer(&namebuf);
/*
- * Parse the pattern, converting quotes and lower-casing unquoted letters.
- * Also, adjust shell-style wildcard characters into regexp notation.
- *
- * We surround the pattern with "^(...)$" to force it to match the whole
- * string, as per SQL practice. We have to have parens in case the string
- * contains "|", else the "^" and "$" will be bound into the first and
- * last alternatives which is not what we want.
- *
- * Note: the result of this pass is the actual regexp pattern(s) we want
+ * Convert shell-style 'pattern' into the regular expression(s) we want
* to execute. Quoting/escaping into SQL literal format will be done
* below using appendStringLiteralConn().
*/
- appendPQExpBufferStr(&namebuf, "^(");
-
- inquotes = false;
- cp = pattern;
-
- while (*cp)
- {
- char ch = *cp;
-
- if (ch == '"')
- {
- if (inquotes && cp[1] == '"')
- {
- /* emit one quote, stay in inquotes mode */
- appendPQExpBufferChar(&namebuf, '"');
- cp++;
- }
- else
- inquotes = !inquotes;
- cp++;
- }
- else if (!inquotes && isupper((unsigned char) ch))
- {
- appendPQExpBufferChar(&namebuf,
- pg_tolower((unsigned char) ch));
- cp++;
- }
- else if (!inquotes && ch == '*')
- {
- appendPQExpBufferStr(&namebuf, ".*");
- cp++;
- }
- else if (!inquotes && ch == '?')
- {
- appendPQExpBufferChar(&namebuf, '.');
- cp++;
- }
- else if (!inquotes && ch == '.')
- {
- /* Found schema/name separator, move current pattern to schema */
- resetPQExpBuffer(&schemabuf);
- appendPQExpBufferStr(&schemabuf, namebuf.data);
- resetPQExpBuffer(&namebuf);
- appendPQExpBufferStr(&namebuf, "^(");
- cp++;
- }
- else if (ch == '$')
- {
- /*
- * Dollar is always quoted, whether inside quotes or not. The
- * reason is that it's allowed in SQL identifiers, so there's a
- * significant use-case for treating it literally, while because
- * we anchor the pattern automatically there is no use-case for
- * having it possess its regexp meaning.
- */
- appendPQExpBufferStr(&namebuf, "\\$");
- cp++;
- }
- else
- {
- /*
- * Ordinary data character, transfer to pattern
- *
- * Inside double quotes, or at all times if force_escape is true,
- * quote regexp special characters with a backslash to avoid
- * regexp errors. Outside quotes, however, let them pass through
- * as-is; this lets knowledgeable users build regexp expressions
- * that are more powerful than shell-style patterns.
- */
- if ((inquotes || force_escape) &&
- strchr("|*+?()[]{}.^$\\", ch))
- appendPQExpBufferChar(&namebuf, '\\');
- i = PQmblen(cp, encoding);
- while (i-- && *cp)
- {
- appendPQExpBufferChar(&namebuf, *cp);
- cp++;
- }
- }
- }
+ patternToSQLRegex(PQclientEncoding(conn), NULL, &schemabuf, &namebuf,
+ pattern, force_escape);
/*
* Now decide what we need to emit. We may run under a hostile
@@ -964,7 +874,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
/* We have a name pattern, so constrain the namevar(s) */
- appendPQExpBufferStr(&namebuf, ")$");
/* Optimize away a "*" pattern */
if (strcmp(namebuf.data, "^(.*)$") != 0)
{
@@ -999,7 +908,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
/* We have a schema pattern, so constrain the schemavar */
- appendPQExpBufferStr(&schemabuf, ")$");
/* Optimize away a "*" pattern */
if (strcmp(schemabuf.data, "^(.*)$") != 0 && schemavar)
{
@@ -1027,3 +935,160 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
return added_clause;
#undef WHEREAND
}
+
+/*
+ * Transform a possibly qualified shell-style object name pattern into up to
+ * three SQL-style regular expressions, converting quotes, lower-casing
+ * unquoted letters, and adjusting shell-style wildcard characters into regexp
+ * notation.
+ *
+ * If the dbnamebuf and schemabuf arguments are non-NULL, and the pattern
+ * contains two or more dbname/schema/name separators, we parse the portions of
+ * the pattern prior to the first and second separators into dbnamebuf and
+ * schemabuf, and the rest into namebuf. (Additional dots in the name portion
+ * are not treated as special.)
+ *
+ * If dbnamebuf is NULL and schemabuf is non-NULL, and the pattern contains at
+ * least one separator, we parse the first portion into schemabuf and the rest
+ * into namebuf.
+ *
+ * Otherwise, we parse all the pattern into namebuf.
+ *
+ * We surround the regexps with "^(...)$" to force them to match whole strings,
+ * as per SQL practice. We have to have parens in case strings contain "|",
+ * else the "^" and "$" will be bound into the first and last alternatives
+ * which is not what we want.
+ *
+ * The regexps we parse into the buffers are appended to the data (if any)
+ * already present. If we parse fewer fields than the number of buffers we
+ * were given, the extra buffers are unaltered.
+ */
+void
+patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf, PQExpBuffer schemabuf,
+ PQExpBuffer namebuf, const char *pattern, bool force_escape)
+{
+ PQExpBufferData buf[3];
+ PQExpBuffer curbuf;
+ PQExpBuffer maxbuf;
+ int i;
+ bool inquotes;
+ const char *cp;
+
+ Assert(pattern != NULL);
+ Assert(namebuf != NULL);
+
+ /* callers should never expect "dbname.relname" format */
+ Assert(dbnamebuf == NULL || schemabuf != NULL);
+
+ inquotes = false;
+ cp = pattern;
+
+ if (dbnamebuf != NULL)
+ maxbuf = &buf[2];
+ else if (schemabuf != NULL)
+ maxbuf = &buf[1];
+ else
+ maxbuf = &buf[0];
+
+ curbuf = &buf[0];
+ initPQExpBuffer(curbuf);
+ appendPQExpBufferStr(curbuf, "^(");
+ while (*cp)
+ {
+ char ch = *cp;
+
+ if (ch == '"')
+ {
+ if (inquotes && cp[1] == '"')
+ {
+ /* emit one quote, stay in inquotes mode */
+ appendPQExpBufferChar(curbuf, '"');
+ cp++;
+ }
+ else
+ inquotes = !inquotes;
+ cp++;
+ }
+ else if (!inquotes && isupper((unsigned char) ch))
+ {
+ appendPQExpBufferChar(curbuf,
+ pg_tolower((unsigned char) ch));
+ cp++;
+ }
+ else if (!inquotes && ch == '*')
+ {
+ appendPQExpBufferStr(curbuf, ".*");
+ cp++;
+ }
+ else if (!inquotes && ch == '?')
+ {
+ appendPQExpBufferChar(curbuf, '.');
+ cp++;
+ }
+ /*
+ * When we find a dbname/schema/name separator, we treat it specially
+ * only if the caller requested more patterns to be parsed than we have
+ * already parsed from the pattern. Otherwise, dot characters are not
+ * special.
+ */
+ else if (!inquotes && ch == '.' && curbuf < maxbuf)
+ {
+ appendPQExpBufferStr(curbuf, ")$");
+ curbuf++;
+ initPQExpBuffer(curbuf);
+ appendPQExpBufferStr(curbuf, "^(");
+ cp++;
+ }
+ else if (ch == '$')
+ {
+ /*
+ * Dollar is always quoted, whether inside quotes or not. The
+ * reason is that it's allowed in SQL identifiers, so there's a
+ * significant use-case for treating it literally, while because
+ * we anchor the pattern automatically there is no use-case for
+ * having it possess its regexp meaning.
+ */
+ appendPQExpBufferStr(curbuf, "\\$");
+ cp++;
+ }
+ else
+ {
+ /*
+ * Ordinary data character, transfer to pattern
+ *
+ * Inside double quotes, or at all times if force_escape is true,
+ * quote regexp special characters with a backslash to avoid
+ * regexp errors. Outside quotes, however, let them pass through
+ * as-is; this lets knowledgeable users build regexp expressions
+ * that are more powerful than shell-style patterns.
+ */
+ if ((inquotes || force_escape) &&
+ strchr("|*+?()[]{}.^$\\", ch))
+ appendPQExpBufferChar(curbuf, '\\');
+ i = PQmblen(cp, encoding);
+ while (i-- && *cp)
+ {
+ appendPQExpBufferChar(curbuf, *cp);
+ cp++;
+ }
+ }
+ }
+ appendPQExpBufferStr(curbuf, ")$");
+
+ appendPQExpBufferStr(namebuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+
+ if (curbuf > buf)
+ {
+ curbuf--;
+ appendPQExpBufferStr(schemabuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+
+ if (curbuf > buf)
+ {
+ curbuf--;
+ appendPQExpBufferStr(dbnamebuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+ }
+ }
+}
diff --git a/src/include/fe_utils/string_utils.h b/src/include/fe_utils/string_utils.h
index c290c302f5..caafb97d29 100644
--- a/src/include/fe_utils/string_utils.h
+++ b/src/include/fe_utils/string_utils.h
@@ -56,4 +56,8 @@ extern bool processSQLNamePattern(PGconn *conn, PQExpBuffer buf,
const char *schemavar, const char *namevar,
const char *altnamevar, const char *visibilityrule);
+extern void patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf,
+ PQExpBuffer schemabuf, PQExpBuffer namebuf,
+ const char *pattern, bool force_escape);
+
#endif /* STRING_UTILS_H */
--
2.21.1 (Apple Git-122.3)
On Jan 28, 2021, at 9:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Attached is patch set 35. Per your review comments, I have restructured the patches in the following way:
v33's 0007 is now the first patch, v35's 0001
v33's 0001 is no more. The frontend infrastructure for error handling and exiting may be resubmitted someday in another patch, but they aren't necessary for pg_amcheck
v33's 0002 is no more. The PGresultHandler stuff that it defined inspires some of what comes later in v35's 0003, but it isn't sufficiently similar to what v35 does to be thought of as moving from v33-0002 into v35-0003.
v33's 0003, 0004 and 0006 are combined into v35's 0002
v33's 0005 becomes v35's 0003
v33's 0007 becomes v35's 0004
Additionally, pg_amcheck testing is extended beyond what v33 had in v35's new 0005 patch, but pg_amcheck doesn't depend on this new 0005 patch ever being committed, so if you don't like it, just throw it in the bit bucket.
I like 0007 quite a bit and am inclined to commit it soon, as it
doesn't depend on the earlier patches. But:- I think the residual comment in processSQLNamePattern beginning with
"Note:" could use some wordsmithing to account for the new structure
of things -- maybe just "this pass" -> "this function".
- I suggest changing initializations like maxbuf = buf + 2 to maxbuf =
&buf[2] for clarity
Already responded to this in the v34 development a few days ago. Nothing meaningfully changes between 34 and 35.
Regarding 0001:
- My preference would be to dump on_exit_nicely_final() and just rely
on order of registration.
- I'm not entirely sure it's a good ideal to expose something named
fatal() like this, because that's a fairly short and general name. On
the other hand, it's pretty descriptive and it's not clear why someone
including exit_utils.h would want any other definitiion. I guess we
can always change it later if it proves to be problematic; it's got a
lot of callers and I guess there's no point in churning the code
without a clear reason.
- I don't quite see why we need this at all. Like, exit_nicely() is a
pg_dump-ism. It would make sense to centralize it if we were going to
use it for pg_amcheck, but you don't. If you were going to, you'd need
to adapt 0003 to use exit_nicely() instead of exit(), but you don't,
nor do you add any other new calls to exit_nicely() anywhere, except
for one in 0002. That makes the PGresultHandler stuff depend on
exit_nicely(), which might be important if you were going to refactor
pg_dump to use that abstraction, but you don't. I'm not opposed to the
idea of centralized exit processing for frontend utilities; indeed, it
seems like a good idea. But this doesn't seem to get us there. AFAICS
it just entangles pg_dump with pg_amcheck unnecessarily in a way that
doesn't really benefit either of them.
Removed from v35.
Regarding 0002:
- I don't think this is separately committable because it adds an
abstraction but not any uses of that abstraction to demonstrate that
it's actually any good. Perhaps it should just be merged into 0005,
and even into parallel_slot.h vs. having its own header. I'm not
really sure about that, though
Yeah, this is gone from v35, with hints of it moved into 0003 as part of the parallel slots refactoring.
- Is this really much of an abstraction layer? Like, how generic can
this be when the argument list includes ExecStatusType expected_status
and int expected_ntups?
The new format takes a void *context argument.
- The logic seems to be very similar to some of the stuff that you
move around in 0003, like executeQuery() and executeCommand(), but it
doesn't get unified. I'm not necessarily saying it should be, but it's
weird to do all this refactoring and end up with something that still
looks this
Yeah, I agree with this. The refactoring is a lot less ambitious in v35, to avoid these issues.
0003, 0004, and 0006 look pretty boring; they are just moving code
around. Is there any point in splitting the code from 0003 across two
files? Maybe it's fine.
Combined.
If I run pg_amcheck --all -j4 do I get a serialization boundary across
databases? Like, I have to completely finish db1 before I can go onto
db2, even though maybe only one worker is still busy with it?
The command line interface and corresponding semantics for specifying which tables to check, which schemas to check, and which databases to check should be the same as that for reindexdb and vacuumdb, and the behavior for handing off those targets to be checked/reindexed/vacuumed through the parallel slots interface should be the same. It seems a bit much to refactor reindexdb and vacuumdb to match pg_amcheck when pg_amcheck hasn't been accepted for commit as yet. If/when that happens, and if the project generally approves of going in this direction, I think the next step will be to refactor some of this logic out of pg_amcheck into fe_utils and use it from all three utilities. At that time, I'd like to tackle the serialization choke point in all three, and handle it in the same way for them all.
For the new v35-0005 patch, I have extended PostgresNode.pm with some new corruption abilities. In short, it can now take a snapshot of the files that back a relation, and can corruptly rollback those files to prior versions, in full or in part. This allows creating kinds of corruption that are hard to create through mere bit twiddling. For example, if the relation backing an index is rolled back to a prior version, amcheck's btree checking sees the index as not corrupt, but when asked to reconcile the entries in the heap with the index, it can see that not all of them are present. This gives test coverage of corruption checking functionality that is otherwise hard to achieve.
To check that the PostgresNode.pm changes themselves work, v35-0005 adds src/test/modules/corruption
To check pg_amcheck, and by implication amcheck, v35-0005 adds contrib/pg_amcheck/t/006_relfile_damage.pl
Once again, v35-0005 does not need to be committed -- pg_amcheck works just fine without it.
You and I have discussed this off-list, but for the record, amcheck and pg_amcheck currently only check heaps and btree indexes. Other object types, such as sequences and non-btree indexes, are not checked. Some basic sanity checking of other object types would be a good addition, and pg_amcheck has been structured in a way where it should be fairly straightforward to add support for those. The only such sanity checking that I thought could be done in a short timeframe was to check that the relation files backing the objects were not missing, and we decided off-list such checking wasn't worth much, so I didn't add it.
Attachments:
v35-0001-Refactoring-processSQLNamePattern.patchapplication/octet-stream; name=v35-0001-Refactoring-processSQLNamePattern.patch; x-unix-mode=0644Download
From d85576faf4d745db25e5dce4eda8ceb91e2f1286 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 31 Jan 2021 13:06:53 -0800
Subject: [PATCH v35 1/5] Refactoring processSQLNamePattern.
Factoring out logic which transforms shell-style patterns into SQL
style regexp format from inside processSQLNamePattern into a
separate new function "patternToSQLRegex". The interface and
semantics of processSQLNamePattern are unchanged.
The motivation for the refactoring is that processSQLNamePattern
mixes the job of transforming the pattern with the job of
constructing a where-clause based on a single pattern, which makes
the code hard to reuse from other places.
The new helper function patternToSQLRegex can handle parsing of
patterns of the form "database.schema.relation", "schema.relation",
and "relation". The three-part form is unused in this commit, as
the pre-existing patternToSQLRegex function ignores the dbname
functionality, and there are not yet any other callers. The
three-part form will be used by pg_amcheck, not yet committed, to
allow specifying on the command line the inclusion and exclusion of
relations spanning multiple databases.
---
src/fe_utils/string_utils.c | 260 +++++++++++++++++-----------
src/include/fe_utils/string_utils.h | 4 +
2 files changed, 167 insertions(+), 97 deletions(-)
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index a1a9d691d5..9a1ea9ab98 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -831,10 +831,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
PQExpBufferData schemabuf;
PQExpBufferData namebuf;
- int encoding = PQclientEncoding(conn);
- bool inquotes;
- const char *cp;
- int i;
bool added_clause = false;
#define WHEREAND() \
@@ -856,98 +852,12 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
initPQExpBuffer(&namebuf);
/*
- * Parse the pattern, converting quotes and lower-casing unquoted letters.
- * Also, adjust shell-style wildcard characters into regexp notation.
- *
- * We surround the pattern with "^(...)$" to force it to match the whole
- * string, as per SQL practice. We have to have parens in case the string
- * contains "|", else the "^" and "$" will be bound into the first and
- * last alternatives which is not what we want.
- *
- * Note: the result of this pass is the actual regexp pattern(s) we want
- * to execute. Quoting/escaping into SQL literal format will be done
- * below using appendStringLiteralConn().
+ * Convert shell-style 'pattern' into the regular expression(s) we want to
+ * execute. Quoting/escaping into SQL literal format will be done below
+ * using appendStringLiteralConn().
*/
- appendPQExpBufferStr(&namebuf, "^(");
-
- inquotes = false;
- cp = pattern;
-
- while (*cp)
- {
- char ch = *cp;
-
- if (ch == '"')
- {
- if (inquotes && cp[1] == '"')
- {
- /* emit one quote, stay in inquotes mode */
- appendPQExpBufferChar(&namebuf, '"');
- cp++;
- }
- else
- inquotes = !inquotes;
- cp++;
- }
- else if (!inquotes && isupper((unsigned char) ch))
- {
- appendPQExpBufferChar(&namebuf,
- pg_tolower((unsigned char) ch));
- cp++;
- }
- else if (!inquotes && ch == '*')
- {
- appendPQExpBufferStr(&namebuf, ".*");
- cp++;
- }
- else if (!inquotes && ch == '?')
- {
- appendPQExpBufferChar(&namebuf, '.');
- cp++;
- }
- else if (!inquotes && ch == '.')
- {
- /* Found schema/name separator, move current pattern to schema */
- resetPQExpBuffer(&schemabuf);
- appendPQExpBufferStr(&schemabuf, namebuf.data);
- resetPQExpBuffer(&namebuf);
- appendPQExpBufferStr(&namebuf, "^(");
- cp++;
- }
- else if (ch == '$')
- {
- /*
- * Dollar is always quoted, whether inside quotes or not. The
- * reason is that it's allowed in SQL identifiers, so there's a
- * significant use-case for treating it literally, while because
- * we anchor the pattern automatically there is no use-case for
- * having it possess its regexp meaning.
- */
- appendPQExpBufferStr(&namebuf, "\\$");
- cp++;
- }
- else
- {
- /*
- * Ordinary data character, transfer to pattern
- *
- * Inside double quotes, or at all times if force_escape is true,
- * quote regexp special characters with a backslash to avoid
- * regexp errors. Outside quotes, however, let them pass through
- * as-is; this lets knowledgeable users build regexp expressions
- * that are more powerful than shell-style patterns.
- */
- if ((inquotes || force_escape) &&
- strchr("|*+?()[]{}.^$\\", ch))
- appendPQExpBufferChar(&namebuf, '\\');
- i = PQmblen(cp, encoding);
- while (i-- && *cp)
- {
- appendPQExpBufferChar(&namebuf, *cp);
- cp++;
- }
- }
- }
+ patternToSQLRegex(PQclientEncoding(conn), NULL, &schemabuf, &namebuf,
+ pattern, force_escape);
/*
* Now decide what we need to emit. We may run under a hostile
@@ -964,7 +874,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
/* We have a name pattern, so constrain the namevar(s) */
- appendPQExpBufferStr(&namebuf, ")$");
/* Optimize away a "*" pattern */
if (strcmp(namebuf.data, "^(.*)$") != 0)
{
@@ -999,7 +908,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
/* We have a schema pattern, so constrain the schemavar */
- appendPQExpBufferStr(&schemabuf, ")$");
/* Optimize away a "*" pattern */
if (strcmp(schemabuf.data, "^(.*)$") != 0 && schemavar)
{
@@ -1027,3 +935,161 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
return added_clause;
#undef WHEREAND
}
+
+/*
+ * Transform a possibly qualified shell-style object name pattern into up to
+ * three SQL-style regular expressions, converting quotes, lower-casing
+ * unquoted letters, and adjusting shell-style wildcard characters into regexp
+ * notation.
+ *
+ * If the dbnamebuf and schemabuf arguments are non-NULL, and the pattern
+ * contains two or more dbname/schema/name separators, we parse the portions of
+ * the pattern prior to the first and second separators into dbnamebuf and
+ * schemabuf, and the rest into namebuf. (Additional dots in the name portion
+ * are not treated as special.)
+ *
+ * If dbnamebuf is NULL and schemabuf is non-NULL, and the pattern contains at
+ * least one separator, we parse the first portion into schemabuf and the rest
+ * into namebuf.
+ *
+ * Otherwise, we parse all the pattern into namebuf.
+ *
+ * We surround the regexps with "^(...)$" to force them to match whole strings,
+ * as per SQL practice. We have to have parens in case strings contain "|",
+ * else the "^" and "$" will be bound into the first and last alternatives
+ * which is not what we want.
+ *
+ * The regexps we parse into the buffers are appended to the data (if any)
+ * already present. If we parse fewer fields than the number of buffers we
+ * were given, the extra buffers are unaltered.
+ */
+void
+patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf, PQExpBuffer schemabuf,
+ PQExpBuffer namebuf, const char *pattern, bool force_escape)
+{
+ PQExpBufferData buf[3];
+ PQExpBuffer curbuf;
+ PQExpBuffer maxbuf;
+ int i;
+ bool inquotes;
+ const char *cp;
+
+ Assert(pattern != NULL);
+ Assert(namebuf != NULL);
+
+ /* callers should never expect "dbname.relname" format */
+ Assert(dbnamebuf == NULL || schemabuf != NULL);
+
+ inquotes = false;
+ cp = pattern;
+
+ if (dbnamebuf != NULL)
+ maxbuf = &buf[2];
+ else if (schemabuf != NULL)
+ maxbuf = &buf[1];
+ else
+ maxbuf = &buf[0];
+
+ curbuf = &buf[0];
+ initPQExpBuffer(curbuf);
+ appendPQExpBufferStr(curbuf, "^(");
+ while (*cp)
+ {
+ char ch = *cp;
+
+ if (ch == '"')
+ {
+ if (inquotes && cp[1] == '"')
+ {
+ /* emit one quote, stay in inquotes mode */
+ appendPQExpBufferChar(curbuf, '"');
+ cp++;
+ }
+ else
+ inquotes = !inquotes;
+ cp++;
+ }
+ else if (!inquotes && isupper((unsigned char) ch))
+ {
+ appendPQExpBufferChar(curbuf,
+ pg_tolower((unsigned char) ch));
+ cp++;
+ }
+ else if (!inquotes && ch == '*')
+ {
+ appendPQExpBufferStr(curbuf, ".*");
+ cp++;
+ }
+ else if (!inquotes && ch == '?')
+ {
+ appendPQExpBufferChar(curbuf, '.');
+ cp++;
+ }
+
+ /*
+ * When we find a dbname/schema/name separator, we treat it specially
+ * only if the caller requested more patterns to be parsed than we
+ * have already parsed from the pattern. Otherwise, dot characters
+ * are not special.
+ */
+ else if (!inquotes && ch == '.' && curbuf < maxbuf)
+ {
+ appendPQExpBufferStr(curbuf, ")$");
+ curbuf++;
+ initPQExpBuffer(curbuf);
+ appendPQExpBufferStr(curbuf, "^(");
+ cp++;
+ }
+ else if (ch == '$')
+ {
+ /*
+ * Dollar is always quoted, whether inside quotes or not. The
+ * reason is that it's allowed in SQL identifiers, so there's a
+ * significant use-case for treating it literally, while because
+ * we anchor the pattern automatically there is no use-case for
+ * having it possess its regexp meaning.
+ */
+ appendPQExpBufferStr(curbuf, "\\$");
+ cp++;
+ }
+ else
+ {
+ /*
+ * Ordinary data character, transfer to pattern
+ *
+ * Inside double quotes, or at all times if force_escape is true,
+ * quote regexp special characters with a backslash to avoid
+ * regexp errors. Outside quotes, however, let them pass through
+ * as-is; this lets knowledgeable users build regexp expressions
+ * that are more powerful than shell-style patterns.
+ */
+ if ((inquotes || force_escape) &&
+ strchr("|*+?()[]{}.^$\\", ch))
+ appendPQExpBufferChar(curbuf, '\\');
+ i = PQmblen(cp, encoding);
+ while (i-- && *cp)
+ {
+ appendPQExpBufferChar(curbuf, *cp);
+ cp++;
+ }
+ }
+ }
+ appendPQExpBufferStr(curbuf, ")$");
+
+ appendPQExpBufferStr(namebuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+
+ if (curbuf > buf)
+ {
+ curbuf--;
+ appendPQExpBufferStr(schemabuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+
+ if (curbuf > buf)
+ {
+ curbuf--;
+ appendPQExpBufferStr(dbnamebuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+ }
+ }
+}
diff --git a/src/include/fe_utils/string_utils.h b/src/include/fe_utils/string_utils.h
index c290c302f5..caafb97d29 100644
--- a/src/include/fe_utils/string_utils.h
+++ b/src/include/fe_utils/string_utils.h
@@ -56,4 +56,8 @@ extern bool processSQLNamePattern(PGconn *conn, PQExpBuffer buf,
const char *schemavar, const char *namevar,
const char *altnamevar, const char *visibilityrule);
+extern void patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf,
+ PQExpBuffer schemabuf, PQExpBuffer namebuf,
+ const char *pattern, bool force_escape);
+
#endif /* STRING_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v35-0002-Moving-code-from-src-bin-scripts-to-fe_utils.patchapplication/octet-stream; name=v35-0002-Moving-code-from-src-bin-scripts-to-fe_utils.patch; x-unix-mode=0644Download
From 68f0d3a508f776f612e6c2aca303eb49f5459037 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 31 Jan 2021 13:08:23 -0800
Subject: [PATCH v35 2/5] Moving code from src/bin/scripts to fe_utils
To make code useable from contrib/pg_amcheck, moving
scripts_parallel.[ch] and handle_help_version_opts() into
fe_utils.
Moving supporting code from src/bin/scripts/common into fe_utils.
Updating applications in src/bin/scripts with the new location.
---
src/bin/scripts/Makefile | 6 +-
src/bin/scripts/clusterdb.c | 2 +
src/bin/scripts/common.c | 318 +-----------------
src/bin/scripts/common.h | 49 +--
src/bin/scripts/createdb.c | 1 +
src/bin/scripts/createuser.c | 1 +
src/bin/scripts/dropdb.c | 1 +
src/bin/scripts/dropuser.c | 1 +
src/bin/scripts/nls.mk | 2 +-
src/bin/scripts/pg_isready.c | 1 +
src/bin/scripts/reindexdb.c | 4 +-
src/bin/scripts/vacuumdb.c | 4 +-
src/fe_utils/Makefile | 4 +
src/fe_utils/connect_utils.c | 170 ++++++++++
src/fe_utils/option_utils.c | 38 +++
.../parallel_slot.c} | 63 +++-
src/fe_utils/query_utils.c | 92 +++++
src/fe_utils/string_utils.c | 17 +-
src/include/fe_utils/connect_utils.h | 48 +++
src/include/fe_utils/option_utils.h | 23 ++
.../fe_utils/parallel_slot.h} | 13 +-
src/include/fe_utils/query_utils.h | 26 ++
src/tools/msvc/Mkvcbuild.pm | 2 +-
src/tools/pgindent/typedefs.list | 1 +
24 files changed, 494 insertions(+), 393 deletions(-)
create mode 100644 src/fe_utils/connect_utils.c
create mode 100644 src/fe_utils/option_utils.c
rename src/{bin/scripts/scripts_parallel.c => fe_utils/parallel_slot.c} (80%)
create mode 100644 src/fe_utils/query_utils.c
create mode 100644 src/include/fe_utils/connect_utils.h
create mode 100644 src/include/fe_utils/option_utils.h
rename src/{bin/scripts/scripts_parallel.h => include/fe_utils/parallel_slot.h} (82%)
create mode 100644 src/include/fe_utils/query_utils.h
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index a02e4e430c..b8d7cf2f2d 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -28,8 +28,8 @@ createuser: createuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport
dropdb: dropdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
dropuser: dropuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-vacuumdb: vacuumdb.o common.o scripts_parallel.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-reindexdb: reindexdb.o common.o scripts_parallel.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+vacuumdb: vacuumdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
install: all installdirs
@@ -50,7 +50,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
- rm -f common.o scripts_parallel.o $(WIN32RES)
+ rm -f common.o $(WIN32RES)
rm -rf tmp_check
check:
diff --git a/src/bin/scripts/clusterdb.c b/src/bin/scripts/clusterdb.c
index 7d25bb31d4..fc771eed77 100644
--- a/src/bin/scripts/clusterdb.c
+++ b/src/bin/scripts/clusterdb.c
@@ -13,6 +13,8 @@
#include "common.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 21ef297e6e..c86c19eae2 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -22,325 +22,9 @@
#include "common/logging.h"
#include "common/string.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
-#define ERRCODE_UNDEFINED_TABLE "42P01"
-
-/*
- * Provide strictly harmonized handling of --help and --version
- * options.
- */
-void
-handle_help_version_opts(int argc, char *argv[],
- const char *fixed_progname, help_handler hlp)
-{
- if (argc > 1)
- {
- if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
- {
- hlp(get_progname(argv[0]));
- exit(0);
- }
- if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
- {
- printf("%s (PostgreSQL) " PG_VERSION "\n", fixed_progname);
- exit(0);
- }
- }
-}
-
-
-/*
- * Make a database connection with the given parameters.
- *
- * An interactive password prompt is automatically issued if needed and
- * allowed by cparams->prompt_password.
- *
- * If allow_password_reuse is true, we will try to re-use any password
- * given during previous calls to this routine. (Callers should not pass
- * allow_password_reuse=true unless reconnecting to the same database+user
- * as before, else we might create password exposure hazards.)
- */
-PGconn *
-connectDatabase(const ConnParams *cparams, const char *progname,
- bool echo, bool fail_ok, bool allow_password_reuse)
-{
- PGconn *conn;
- bool new_pass;
- static char *password = NULL;
-
- /* Callers must supply at least dbname; other params can be NULL */
- Assert(cparams->dbname);
-
- if (!allow_password_reuse && password)
- {
- free(password);
- password = NULL;
- }
-
- if (cparams->prompt_password == TRI_YES && password == NULL)
- password = simple_prompt("Password: ", false);
-
- /*
- * Start the connection. Loop until we have a password if requested by
- * backend.
- */
- do
- {
- const char *keywords[8];
- const char *values[8];
- int i = 0;
-
- /*
- * If dbname is a connstring, its entries can override the other
- * values obtained from cparams; but in turn, override_dbname can
- * override the dbname component of it.
- */
- keywords[i] = "host";
- values[i++] = cparams->pghost;
- keywords[i] = "port";
- values[i++] = cparams->pgport;
- keywords[i] = "user";
- values[i++] = cparams->pguser;
- keywords[i] = "password";
- values[i++] = password;
- keywords[i] = "dbname";
- values[i++] = cparams->dbname;
- if (cparams->override_dbname)
- {
- keywords[i] = "dbname";
- values[i++] = cparams->override_dbname;
- }
- keywords[i] = "fallback_application_name";
- values[i++] = progname;
- keywords[i] = NULL;
- values[i++] = NULL;
- Assert(i <= lengthof(keywords));
-
- new_pass = false;
- conn = PQconnectdbParams(keywords, values, true);
-
- if (!conn)
- {
- pg_log_error("could not connect to database %s: out of memory",
- cparams->dbname);
- exit(1);
- }
-
- /*
- * No luck? Trying asking (again) for a password.
- */
- if (PQstatus(conn) == CONNECTION_BAD &&
- PQconnectionNeedsPassword(conn) &&
- cparams->prompt_password != TRI_NO)
- {
- PQfinish(conn);
- if (password)
- free(password);
- password = simple_prompt("Password: ", false);
- new_pass = true;
- }
- } while (new_pass);
-
- /* check to see that the backend connection was successfully made */
- if (PQstatus(conn) == CONNECTION_BAD)
- {
- if (fail_ok)
- {
- PQfinish(conn);
- return NULL;
- }
- pg_log_error("%s", PQerrorMessage(conn));
- exit(1);
- }
-
- /* Start strict; callers may override this. */
- PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
-
- return conn;
-}
-
-/*
- * Try to connect to the appropriate maintenance database.
- *
- * This differs from connectDatabase only in that it has a rule for
- * inserting a default "dbname" if none was given (which is why cparams
- * is not const). Note that cparams->dbname should typically come from
- * a --maintenance-db command line parameter.
- */
-PGconn *
-connectMaintenanceDatabase(ConnParams *cparams,
- const char *progname, bool echo)
-{
- PGconn *conn;
-
- /* If a maintenance database name was specified, just connect to it. */
- if (cparams->dbname)
- return connectDatabase(cparams, progname, echo, false, false);
-
- /* Otherwise, try postgres first and then template1. */
- cparams->dbname = "postgres";
- conn = connectDatabase(cparams, progname, echo, true, false);
- if (!conn)
- {
- cparams->dbname = "template1";
- conn = connectDatabase(cparams, progname, echo, false, false);
- }
- return conn;
-}
-
-/*
- * Disconnect the given connection, canceling any statement if one is active.
- */
-void
-disconnectDatabase(PGconn *conn)
-{
- char errbuf[256];
-
- Assert(conn != NULL);
-
- if (PQtransactionStatus(conn) == PQTRANS_ACTIVE)
- {
- PGcancel *cancel;
-
- if ((cancel = PQgetCancel(conn)))
- {
- (void) PQcancel(cancel, errbuf, sizeof(errbuf));
- PQfreeCancel(cancel);
- }
- }
-
- PQfinish(conn);
-}
-
-/*
- * Run a query, return the results, exit program on failure.
- */
-PGresult *
-executeQuery(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
-
- if (echo)
- printf("%s\n", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_TUPLES_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_info("query was: %s", query);
- PQfinish(conn);
- exit(1);
- }
-
- return res;
-}
-
-
-/*
- * As above for a SQL command (which returns nothing).
- */
-void
-executeCommand(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
-
- if (echo)
- printf("%s\n", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_COMMAND_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_info("query was: %s", query);
- PQfinish(conn);
- exit(1);
- }
-
- PQclear(res);
-}
-
-
-/*
- * As above for a SQL maintenance command (returns command success).
- * Command is executed with a cancel handler set, so Ctrl-C can
- * interrupt it.
- */
-bool
-executeMaintenanceCommand(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
- bool r;
-
- if (echo)
- printf("%s\n", query);
-
- SetCancelConn(conn);
- res = PQexec(conn, query);
- ResetCancelConn();
-
- r = (res && PQresultStatus(res) == PGRES_COMMAND_OK);
-
- if (res)
- PQclear(res);
-
- return r;
-}
-
-/*
- * Consume all the results generated for the given connection until
- * nothing remains. If at least one error is encountered, return false.
- * Note that this will block if the connection is busy.
- */
-bool
-consumeQueryResult(PGconn *conn)
-{
- bool ok = true;
- PGresult *result;
-
- SetCancelConn(conn);
- while ((result = PQgetResult(conn)) != NULL)
- {
- if (!processQueryResult(conn, result))
- ok = false;
- }
- ResetCancelConn();
- return ok;
-}
-
-/*
- * Process (and delete) a query result. Returns true if there's no error,
- * false otherwise -- but errors about trying to work on a missing relation
- * are reported and subsequently ignored.
- */
-bool
-processQueryResult(PGconn *conn, PGresult *result)
-{
- /*
- * If it's an error, report it. Errors about a missing table are harmless
- * so we continue processing; but die for other errors.
- */
- if (PQresultStatus(result) != PGRES_COMMAND_OK)
- {
- char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
-
- pg_log_error("processing of database \"%s\" failed: %s",
- PQdb(conn), PQerrorMessage(conn));
-
- if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
- {
- PQclear(result);
- return false;
- }
- }
-
- PQclear(result);
- return true;
-}
-
-
/*
* Split TABLE[(COLUMNS)] into TABLE and [(COLUMNS)] portions. When you
* finish using them, pg_free(*table). *columns is a pointer into "spec",
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index 5630975712..ddd8f35274 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -10,58 +10,11 @@
#define COMMON_H
#include "common/username.h"
+#include "fe_utils/connect_utils.h"
#include "getopt_long.h" /* pgrminclude ignore */
#include "libpq-fe.h"
#include "pqexpbuffer.h" /* pgrminclude ignore */
-enum trivalue
-{
- TRI_DEFAULT,
- TRI_NO,
- TRI_YES
-};
-
-/* Parameters needed by connectDatabase/connectMaintenanceDatabase */
-typedef struct _connParams
-{
- /* These fields record the actual command line parameters */
- const char *dbname; /* this may be a connstring! */
- const char *pghost;
- const char *pgport;
- const char *pguser;
- enum trivalue prompt_password;
- /* If not NULL, this overrides the dbname obtained from command line */
- /* (but *only* the DB name, not anything else in the connstring) */
- const char *override_dbname;
-} ConnParams;
-
-typedef void (*help_handler) (const char *progname);
-
-extern void handle_help_version_opts(int argc, char *argv[],
- const char *fixed_progname,
- help_handler hlp);
-
-extern PGconn *connectDatabase(const ConnParams *cparams,
- const char *progname,
- bool echo, bool fail_ok,
- bool allow_password_reuse);
-
-extern PGconn *connectMaintenanceDatabase(ConnParams *cparams,
- const char *progname, bool echo);
-
-extern void disconnectDatabase(PGconn *conn);
-
-extern PGresult *executeQuery(PGconn *conn, const char *query, bool echo);
-
-extern void executeCommand(PGconn *conn, const char *query, bool echo);
-
-extern bool executeMaintenanceCommand(PGconn *conn, const char *query,
- bool echo);
-
-extern bool consumeQueryResult(PGconn *conn);
-
-extern bool processQueryResult(PGconn *conn, PGresult *result);
-
extern void splitTableColumnsSpec(const char *spec, int encoding,
char **table, const char **columns);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index abf21d4942..041454f075 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -13,6 +13,7 @@
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index 47b0e28bc6..ef7e0e549f 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -14,6 +14,7 @@
#include "common.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/dropdb.c b/src/bin/scripts/dropdb.c
index ba0dcdecb9..b154ed1bb6 100644
--- a/src/bin/scripts/dropdb.c
+++ b/src/bin/scripts/dropdb.c
@@ -13,6 +13,7 @@
#include "postgres_fe.h"
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/dropuser.c b/src/bin/scripts/dropuser.c
index ff5b455ae5..61b8557bc7 100644
--- a/src/bin/scripts/dropuser.c
+++ b/src/bin/scripts/dropuser.c
@@ -14,6 +14,7 @@
#include "common.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/nls.mk b/src/bin/scripts/nls.mk
index 5d5dd11b7b..7fc716092e 100644
--- a/src/bin/scripts/nls.mk
+++ b/src/bin/scripts/nls.mk
@@ -7,7 +7,7 @@ GETTEXT_FILES = $(FRONTEND_COMMON_GETTEXT_FILES) \
clusterdb.c vacuumdb.c reindexdb.c \
pg_isready.c \
common.c \
- scripts_parallel.c \
+ ../../fe_utils/parallel_slot.c \
../../fe_utils/cancel.c ../../fe_utils/print.c \
../../common/fe_memutils.c ../../common/username.c
GETTEXT_TRIGGERS = $(FRONTEND_COMMON_GETTEXT_TRIGGERS) simple_prompt yesno_prompt
diff --git a/src/bin/scripts/pg_isready.c b/src/bin/scripts/pg_isready.c
index ceb8a09b4c..fc6f7b0a93 100644
--- a/src/bin/scripts/pg_isready.c
+++ b/src/bin/scripts/pg_isready.c
@@ -12,6 +12,7 @@
#include "postgres_fe.h"
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#define DEFAULT_CONNECT_TIMEOUT "3"
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index dece8200fa..7781fb1151 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -16,9 +16,11 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
-#include "scripts_parallel.h"
typedef enum ReindexType
{
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 8246327770..ed320817bc 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -18,9 +18,11 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
-#include "scripts_parallel.h"
/* vacuum options controlled by user flags */
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 10d6838cf9..456c441a33 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -23,9 +23,13 @@ OBJS = \
archive.o \
cancel.o \
conditional.o \
+ connect_utils.o \
mbprint.o \
+ option_utils.o \
+ parallel_slot.o \
print.o \
psqlscan.o \
+ query_utils.o \
recovery_gen.o \
simple_list.o \
string_utils.o
diff --git a/src/fe_utils/connect_utils.c b/src/fe_utils/connect_utils.c
new file mode 100644
index 0000000000..7475e2f366
--- /dev/null
+++ b/src/fe_utils/connect_utils.c
@@ -0,0 +1,170 @@
+#include "postgres_fe.h"
+
+#include "common/connect.h"
+#include "common/logging.h"
+#include "common/string.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * Make a database connection with the given parameters.
+ *
+ * An interactive password prompt is automatically issued if needed and
+ * allowed by cparams->prompt_password.
+ *
+ * If allow_password_reuse is true, we will try to re-use any password
+ * given during previous calls to this routine. (Callers should not pass
+ * allow_password_reuse=true unless reconnecting to the same database+user
+ * as before, else we might create password exposure hazards.)
+ */
+PGconn *
+connectDatabase(const ConnParams *cparams, const char *progname,
+ bool echo, bool fail_ok, bool allow_password_reuse)
+{
+ PGconn *conn;
+ bool new_pass;
+ static char *password = NULL;
+
+ /* Callers must supply at least dbname; other params can be NULL */
+ Assert(cparams->dbname);
+
+ if (!allow_password_reuse && password)
+ {
+ free(password);
+ password = NULL;
+ }
+
+ if (cparams->prompt_password == TRI_YES && password == NULL)
+ password = simple_prompt("Password: ", false);
+
+ /*
+ * Start the connection. Loop until we have a password if requested by
+ * backend.
+ */
+ do
+ {
+ const char *keywords[8];
+ const char *values[8];
+ int i = 0;
+
+ /*
+ * If dbname is a connstring, its entries can override the other
+ * values obtained from cparams; but in turn, override_dbname can
+ * override the dbname component of it.
+ */
+ keywords[i] = "host";
+ values[i++] = cparams->pghost;
+ keywords[i] = "port";
+ values[i++] = cparams->pgport;
+ keywords[i] = "user";
+ values[i++] = cparams->pguser;
+ keywords[i] = "password";
+ values[i++] = password;
+ keywords[i] = "dbname";
+ values[i++] = cparams->dbname;
+ if (cparams->override_dbname)
+ {
+ keywords[i] = "dbname";
+ values[i++] = cparams->override_dbname;
+ }
+ keywords[i] = "fallback_application_name";
+ values[i++] = progname;
+ keywords[i] = NULL;
+ values[i++] = NULL;
+ Assert(i <= lengthof(keywords));
+
+ new_pass = false;
+ conn = PQconnectdbParams(keywords, values, true);
+
+ if (!conn)
+ {
+ pg_log_error("could not connect to database %s: out of memory",
+ cparams->dbname);
+ exit(1);
+ }
+
+ /*
+ * No luck? Trying asking (again) for a password.
+ */
+ if (PQstatus(conn) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(conn) &&
+ cparams->prompt_password != TRI_NO)
+ {
+ PQfinish(conn);
+ if (password)
+ free(password);
+ password = simple_prompt("Password: ", false);
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ /* check to see that the backend connection was successfully made */
+ if (PQstatus(conn) == CONNECTION_BAD)
+ {
+ if (fail_ok)
+ {
+ PQfinish(conn);
+ return NULL;
+ }
+ pg_log_error("%s", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* Start strict; callers may override this. */
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
+
+ return conn;
+}
+
+/*
+ * Try to connect to the appropriate maintenance database.
+ *
+ * This differs from connectDatabase only in that it has a rule for
+ * inserting a default "dbname" if none was given (which is why cparams
+ * is not const). Note that cparams->dbname should typically come from
+ * a --maintenance-db command line parameter.
+ */
+PGconn *
+connectMaintenanceDatabase(ConnParams *cparams,
+ const char *progname, bool echo)
+{
+ PGconn *conn;
+
+ /* If a maintenance database name was specified, just connect to it. */
+ if (cparams->dbname)
+ return connectDatabase(cparams, progname, echo, false, false);
+
+ /* Otherwise, try postgres first and then template1. */
+ cparams->dbname = "postgres";
+ conn = connectDatabase(cparams, progname, echo, true, false);
+ if (!conn)
+ {
+ cparams->dbname = "template1";
+ conn = connectDatabase(cparams, progname, echo, false, false);
+ }
+ return conn;
+}
+
+/*
+ * Disconnect the given connection, canceling any statement if one is active.
+ */
+void
+disconnectDatabase(PGconn *conn)
+{
+ char errbuf[256];
+
+ Assert(conn != NULL);
+
+ if (PQtransactionStatus(conn) == PQTRANS_ACTIVE)
+ {
+ PGcancel *cancel;
+
+ if ((cancel = PQgetCancel(conn)))
+ {
+ (void) PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(conn);
+}
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
new file mode 100644
index 0000000000..97aca1f02b
--- /dev/null
+++ b/src/fe_utils/option_utils.c
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command line option processing facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/option_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "fe_utils/option_utils.h"
+
+/*
+ * Provide strictly harmonized handling of --help and --version
+ * options.
+ */
+void
+handle_help_version_opts(int argc, char *argv[],
+ const char *fixed_progname, help_handler hlp)
+{
+ if (argc > 1)
+ {
+ if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+ {
+ hlp(get_progname(argv[0]));
+ exit(0);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ printf("%s (PostgreSQL) " PG_VERSION "\n", fixed_progname);
+ exit(0);
+ }
+ }
+}
diff --git a/src/bin/scripts/scripts_parallel.c b/src/fe_utils/parallel_slot.c
similarity index 80%
rename from src/bin/scripts/scripts_parallel.c
rename to src/fe_utils/parallel_slot.c
index 1f863a1bb4..3987a4702b 100644
--- a/src/bin/scripts/scripts_parallel.c
+++ b/src/fe_utils/parallel_slot.c
@@ -1,13 +1,13 @@
/*-------------------------------------------------------------------------
*
- * scripts_parallel.c
- * Parallel support for bin/scripts/
+ * parallel_slot.c
+ * Parallel support for front-end parallel database connections
*
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
- * src/bin/scripts/scripts_parallel.c
+ * src/fe_utils/parallel_slot.c
*
*-------------------------------------------------------------------------
*/
@@ -22,13 +22,15 @@
#include <sys/select.h>
#endif
-#include "common.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
-#include "scripts_parallel.h"
+#include "fe_utils/parallel_slot.h"
+
+#define ERRCODE_UNDEFINED_TABLE "42P01"
static void init_slot(ParallelSlot *slot, PGconn *conn);
static int select_loop(int maxFd, fd_set *workerset);
+static bool processQueryResult(PGconn *conn, PGresult *result);
static void
init_slot(ParallelSlot *slot, PGconn *conn)
@@ -38,6 +40,57 @@ init_slot(ParallelSlot *slot, PGconn *conn)
slot->isFree = true;
}
+/*
+ * Process (and delete) a query result. Returns true if there's no error,
+ * false otherwise -- but errors about trying to work on a missing relation
+ * are reported and subsequently ignored.
+ */
+static bool
+processQueryResult(PGconn *conn, PGresult *result)
+{
+ /*
+ * If it's an error, report it. Errors about a missing table are harmless
+ * so we continue processing; but die for other errors.
+ */
+ if (PQresultStatus(result) != PGRES_COMMAND_OK)
+ {
+ char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+
+ pg_log_error("processing of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
+
+ if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
+ {
+ PQclear(result);
+ return false;
+ }
+ }
+
+ PQclear(result);
+ return true;
+}
+
+/*
+ * Consume all the results generated for the given connection until
+ * nothing remains. If at least one error is encountered, return false.
+ * Note that this will block if the connection is busy.
+ */
+static bool
+consumeQueryResult(PGconn *conn)
+{
+ bool ok = true;
+ PGresult *result;
+
+ SetCancelConn(conn);
+ while ((result = PQgetResult(conn)) != NULL)
+ {
+ if (!processQueryResult(conn, result))
+ ok = false;
+ }
+ ResetCancelConn();
+ return ok;
+}
+
/*
* Wait until a file descriptor from the given set becomes readable.
*
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
new file mode 100644
index 0000000000..a70ae3c082
--- /dev/null
+++ b/src/fe_utils/query_utils.c
@@ -0,0 +1,92 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to query a databases.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/query_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * Run a query, return the results, exit program on failure.
+ */
+PGresult *
+executeQuery(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+
+ if (echo)
+ printf("%s\n", query);
+
+ res = PQexec(conn, query);
+ if (!res ||
+ PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_info("query was: %s", query);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ return res;
+}
+
+
+/*
+ * As above for a SQL command (which returns nothing).
+ */
+void
+executeCommand(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+
+ if (echo)
+ printf("%s\n", query);
+
+ res = PQexec(conn, query);
+ if (!res ||
+ PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_info("query was: %s", query);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ PQclear(res);
+}
+
+
+/*
+ * As above for a SQL maintenance command (returns command success).
+ * Command is executed with a cancel handler set, so Ctrl-C can
+ * interrupt it.
+ */
+bool
+executeMaintenanceCommand(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+ bool r;
+
+ if (echo)
+ printf("%s\n", query);
+
+ SetCancelConn(conn);
+ res = PQexec(conn, query);
+ ResetCancelConn();
+
+ r = (res && PQresultStatus(res) == PGRES_COMMAND_OK);
+
+ if (res)
+ PQclear(res);
+
+ return r;
+}
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index 9a1ea9ab98..94941132ac 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -852,9 +852,9 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
initPQExpBuffer(&namebuf);
/*
- * Convert shell-style 'pattern' into the regular expression(s) we want to
- * execute. Quoting/escaping into SQL literal format will be done below
- * using appendStringLiteralConn().
+ * Convert shell-style 'pattern' into the regular expression(s) we want
+ * to execute. Quoting/escaping into SQL literal format will be done
+ * below using appendStringLiteralConn().
*/
patternToSQLRegex(PQclientEncoding(conn), NULL, &schemabuf, &namebuf,
pattern, force_escape);
@@ -968,8 +968,8 @@ patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf, PQExpBuffer schemabuf,
PQExpBuffer namebuf, const char *pattern, bool force_escape)
{
PQExpBufferData buf[3];
- PQExpBuffer curbuf;
- PQExpBuffer maxbuf;
+ PQExpBuffer curbuf;
+ PQExpBuffer maxbuf;
int i;
bool inquotes;
const char *cp;
@@ -1025,12 +1025,11 @@ patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf, PQExpBuffer schemabuf,
appendPQExpBufferChar(curbuf, '.');
cp++;
}
-
/*
* When we find a dbname/schema/name separator, we treat it specially
- * only if the caller requested more patterns to be parsed than we
- * have already parsed from the pattern. Otherwise, dot characters
- * are not special.
+ * only if the caller requested more patterns to be parsed than we have
+ * already parsed from the pattern. Otherwise, dot characters are not
+ * special.
*/
else if (!inquotes && ch == '.' && curbuf < maxbuf)
{
diff --git a/src/include/fe_utils/connect_utils.h b/src/include/fe_utils/connect_utils.h
new file mode 100644
index 0000000000..8fde0ea2a0
--- /dev/null
+++ b/src/include/fe_utils/connect_utils.h
@@ -0,0 +1,48 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to connect to and disconnect from databases.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/connect_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CONNECT_UTILS_H
+#define CONNECT_UTILS_H
+
+#include "libpq-fe.h"
+
+enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+};
+
+/* Parameters needed by connectDatabase/connectMaintenanceDatabase */
+typedef struct _connParams
+{
+ /* These fields record the actual command line parameters */
+ const char *dbname; /* this may be a connstring! */
+ const char *pghost;
+ const char *pgport;
+ const char *pguser;
+ enum trivalue prompt_password;
+ /* If not NULL, this overrides the dbname obtained from command line */
+ /* (but *only* the DB name, not anything else in the connstring) */
+ const char *override_dbname;
+} ConnParams;
+
+extern PGconn *connectDatabase(const ConnParams *cparams,
+ const char *progname,
+ bool echo, bool fail_ok,
+ bool allow_password_reuse);
+
+extern PGconn *connectMaintenanceDatabase(ConnParams *cparams,
+ const char *progname, bool echo);
+
+extern void disconnectDatabase(PGconn *conn);
+
+#endif /* CONNECT_UTILS_H */
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
new file mode 100644
index 0000000000..ef6eb24ae0
--- /dev/null
+++ b/src/include/fe_utils/option_utils.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command line option processing facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/option_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OPTION_UTILS_H
+#define OPTION_UTILS_H
+
+#include "postgres_fe.h"
+
+typedef void (*help_handler) (const char *progname);
+
+extern void handle_help_version_opts(int argc, char *argv[],
+ const char *fixed_progname,
+ help_handler hlp);
+
+#endif /* OPTION_UTILS_H */
diff --git a/src/bin/scripts/scripts_parallel.h b/src/include/fe_utils/parallel_slot.h
similarity index 82%
rename from src/bin/scripts/scripts_parallel.h
rename to src/include/fe_utils/parallel_slot.h
index f62692510a..99eeb3328d 100644
--- a/src/bin/scripts/scripts_parallel.h
+++ b/src/include/fe_utils/parallel_slot.h
@@ -1,21 +1,20 @@
/*-------------------------------------------------------------------------
*
- * scripts_parallel.h
+ * parallel_slot.h
* Parallel support for bin/scripts/
*
* Copyright (c) 2003-2021, PostgreSQL Global Development Group
*
- * src/bin/scripts/scripts_parallel.h
+ * src/include/fe_utils/parallel_slot.h
*
*-------------------------------------------------------------------------
*/
-#ifndef SCRIPTS_PARALLEL_H
-#define SCRIPTS_PARALLEL_H
+#ifndef PARALLEL_SLOT_H
+#define PARALLEL_SLOT_H
-#include "common.h"
+#include "fe_utils/connect_utils.h"
#include "libpq-fe.h"
-
typedef struct ParallelSlot
{
PGconn *connection; /* One connection */
@@ -33,4 +32,4 @@ extern void ParallelSlotsTerminate(ParallelSlot *slots, int numslots);
extern bool ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots);
-#endif /* SCRIPTS_PARALLEL_H */
+#endif /* PARALLEL_SLOT_H */
diff --git a/src/include/fe_utils/query_utils.h b/src/include/fe_utils/query_utils.h
new file mode 100644
index 0000000000..1f5812bbf6
--- /dev/null
+++ b/src/include/fe_utils/query_utils.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to query a databases.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/query_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERY_UTILS_H
+#define QUERY_UTILS_H
+
+#include "postgres_fe.h"
+
+#include "libpq-fe.h"
+
+extern PGresult *executeQuery(PGconn *conn, const char *query, bool echo);
+
+extern void executeCommand(PGconn *conn, const char *query, bool echo);
+
+extern bool executeMaintenanceCommand(PGconn *conn, const char *query,
+ bool echo);
+
+#endif /* QUERY_UTILS_H */
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 90328db04e..941d168e19 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -147,7 +147,7 @@ sub mkvcbuild
our @pgcommonbkndfiles = @pgcommonallfiles;
our @pgfeutilsfiles = qw(
- archive.c cancel.c conditional.c mbprint.c print.c psqlscan.l
+ archive.c cancel.c conditional.c mbprint.c option_utils.c print.c psqlscan.l
psqlscan.c simple_list.c string_utils.c recovery_gen.c);
$libpgport = $solution->AddProject('libpgport', 'lib', 'misc');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe489..4d0d09a5dd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -403,6 +403,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnParams
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
v35-0003-Parameterizing-parallel-slot-result-handling.patchapplication/octet-stream; name=v35-0003-Parameterizing-parallel-slot-result-handling.patch; x-unix-mode=0644Download
From 4a55fefe59975cfc62c50b318bfc664d216d36f2 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 31 Jan 2021 13:09:52 -0800
Subject: [PATCH v35 3/5] Parameterizing parallel slot result handling
The function consumeQueryResult was being used to handle all results
returned by queries executed through the parallel slot interface,
but this hardcodes knowledge about the expectations of reindexdb and
vacuumdb such as the expected result status being PGRES_COMMAND_OK
(as opposed to, say, PGRES_TUPLES_OK).
Reworking the slot interface to optionally include a PGresultHandler
and related fields per slot. The idea is that a caller who executes
a command or query through the slot can set the handler to be called
when the query completes.
The old logic of consumeQueryResults is moved into a new callback
function, TableCommandSlotHandler(), which gets registered as the
slot handler explicitly from vacuumdb and reindexdb. This is
defined in fe_utils/parallel_slot.c rather than somewhere in
src/bin/scripts where its only callers reside, partly to keep it
close to the rest of the shared parallel slot handling code and
partly in anticipation that other utility programs will eventually
want to use it also.
Adding a default handler which is used to handle results for slots
which have no handler explicitly registered. The default simply
checks the status of the result and makes a judgement about whether
the status is ok, similarly to psql's AcceptResult(). I also
considered whether to just have a missing handler always be an
error, but decided against requiring users of the parallel slot
infrastructure to pedantically specify the default handler. Both
designs seem reasonable, but the tie-breaker for me is that edge
cases that do not come up in testing will be better handled in
production with this design than with pedantically erroring out.
The expectation of this commit is that pg_amcheck will have handlers
for table and index checks which will process the PGresults of calls
to the amcheck functions. This commit sets up the infrastructure
necessary to support those handlers being different from the one
used by vacuumdb and reindexdb.
---
src/bin/scripts/reindexdb.c | 1 +
src/bin/scripts/vacuumdb.c | 1 +
src/fe_utils/parallel_slot.c | 142 +++++++++++++++++++++------
src/include/fe_utils/parallel_slot.h | 32 ++++++
4 files changed, 148 insertions(+), 28 deletions(-)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 7781fb1151..29394d4a4a 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -466,6 +466,7 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
goto finish;
}
+ ParallelSlotSetHandler(free_slot, TableCommandSlotHandler, NULL);
run_reindex_command(free_slot->connection, process_type, objname,
echo, verbose, concurrently, true);
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index ed320817bc..1158f7b776 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -713,6 +713,7 @@ vacuum_one_database(const ConnParams *cparams,
* Execute the vacuum. All errors are handled in processQueryResult
* through ParallelSlotsGetIdle.
*/
+ ParallelSlotSetHandler(free_slot, TableCommandSlotHandler, NULL);
run_vacuum_command(free_slot->connection, sql.data,
echo, tabname);
diff --git a/src/fe_utils/parallel_slot.c b/src/fe_utils/parallel_slot.c
index 3987a4702b..8e0c65988d 100644
--- a/src/fe_utils/parallel_slot.c
+++ b/src/fe_utils/parallel_slot.c
@@ -30,7 +30,7 @@
static void init_slot(ParallelSlot *slot, PGconn *conn);
static int select_loop(int maxFd, fd_set *workerset);
-static bool processQueryResult(PGconn *conn, PGresult *result);
+static bool handleOneQueryResult(ParallelSlot *slot, PGresult *result);
static void
init_slot(ParallelSlot *slot, PGconn *conn)
@@ -38,53 +38,46 @@ init_slot(ParallelSlot *slot, PGconn *conn)
slot->connection = conn;
/* Initially assume connection is idle */
slot->isFree = true;
+ ParallelSlotClearHandler(slot);
}
/*
- * Process (and delete) a query result. Returns true if there's no error,
- * false otherwise -- but errors about trying to work on a missing relation
- * are reported and subsequently ignored.
+ * Invoke the slot's handler for a single query result, or fall back to the
+ * default handler if none is defined for the slot. Returns true if the
+ * handler reports that there's no error, false otherwise.
*/
static bool
-processQueryResult(PGconn *conn, PGresult *result)
+handleOneQueryResult(ParallelSlot *slot, PGresult *result)
{
- /*
- * If it's an error, report it. Errors about a missing table are harmless
- * so we continue processing; but die for other errors.
- */
- if (PQresultStatus(result) != PGRES_COMMAND_OK)
- {
- char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+ PGresultHandler handler = slot->handler;
- pg_log_error("processing of database \"%s\" failed: %s",
- PQdb(conn), PQerrorMessage(conn));
+ if (!handler)
+ handler = DefaultSlotHandler;
- if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
- {
- PQclear(result);
- return false;
- }
- }
+ /* On failure, the handler should return NULL after freeing the result */
+ if (!handler(result, slot->connection, slot->handler_context))
+ return false;
+ /* Ok, we have to free it ourself */
PQclear(result);
return true;
}
/*
- * Consume all the results generated for the given connection until
+ * Handle all the results generated for the given connection until
* nothing remains. If at least one error is encountered, return false.
* Note that this will block if the connection is busy.
*/
static bool
-consumeQueryResult(PGconn *conn)
+handleQueryResults(ParallelSlot *slot)
{
bool ok = true;
PGresult *result;
- SetCancelConn(conn);
- while ((result = PQgetResult(conn)) != NULL)
+ SetCancelConn(slot->connection);
+ while ((result = PQgetResult(slot->connection)) != NULL)
{
- if (!processQueryResult(conn, result))
+ if (!handleOneQueryResult(slot, result))
ok = false;
}
ResetCancelConn();
@@ -227,14 +220,15 @@ ParallelSlotsGetIdle(ParallelSlot *slots, int numslots)
if (result != NULL)
{
- /* Check and discard the command result */
- if (!processQueryResult(slots[i].connection, result))
+ /* Handle and discard the command result */
+ if (!handleOneQueryResult(slots + i, result))
return NULL;
}
else
{
/* This connection has become idle */
slots[i].isFree = true;
+ ParallelSlotClearHandler(slots + i);
if (firstFree < 0)
firstFree = i;
break;
@@ -329,9 +323,101 @@ ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots)
for (i = 0; i < numslots; i++)
{
- if (!consumeQueryResult((slots + i)->connection))
+ if (!handleQueryResults(slots + i))
return false;
}
return true;
}
+
+/*
+ * DefaultSlotHandler
+ *
+ * PGresultHandler for query results from slots with no handler registered.
+ * Success or failure is determined entirely by examining the status of the
+ * query result. This is very basic, but users who need better can register a
+ * custom handler.
+ *
+ * res: PGresult from the query executed on the slot's connection
+ * conn: connection belonging to the slot
+ * context: unused
+ */
+PGresult *
+DefaultSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ switch (PQresultStatus(res))
+ {
+ /* Success codes */
+ case PGRES_EMPTY_QUERY:
+ case PGRES_COMMAND_OK:
+ case PGRES_TUPLES_OK:
+ case PGRES_COPY_OUT:
+ case PGRES_COPY_IN:
+ case PGRES_COPY_BOTH:
+ case PGRES_SINGLE_TUPLE:
+ /* Ok */
+ return res;
+
+ /* Error codes */
+ case PGRES_BAD_RESPONSE:
+ case PGRES_NONFATAL_ERROR:
+ case PGRES_FATAL_ERROR:
+ break;
+
+ /* Intentionally no default here */
+ }
+
+ /*
+ * Handle all error cases here, including anything not matched in the
+ * switch (though that should not happen.) The 'query' argument may be
+ * NULL or garbage left over from a prior usage of the lot. Don't include
+ * it in the error message!
+ */
+ pg_log_error("processing in database \"%s\" failed: %s", PQdb(conn),
+ PQerrorMessage(conn));
+ PQclear(res);
+ return NULL;
+}
+
+/*
+ * TableCommandSlotHandler
+ *
+ * PGresultHandler for results of commands (not queries) against tables.
+ *
+ * Requires that the result status is either PGRES_COMMAND_OK or an error about
+ * a missing table. This is useful for utilities that compile a list of tables
+ * to process and then run commands (vacuum, reindex, or whatever) against
+ * those tables, as there is a race condition between the time the list is
+ * compiled and the time the command attempts to open the table.
+ *
+ * For missing tables, logs an error but allows processing to continue.
+ *
+ * For all other errors, logs an error and terminates further processing.
+ *
+ * res: PGresult from the query executed on the slot's connection
+ * conn: connection belonging to the slot
+ * context: unused
+ */
+PGresult *
+TableCommandSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ /*
+ * If it's an error, report it. Errors about a missing table are harmless
+ * so we continue processing; but die for other errors.
+ */
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ char *sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+ pg_log_error("processing of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
+
+ if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
+ {
+ PQclear(res);
+ return NULL;
+ }
+ }
+
+ return res;
+}
diff --git a/src/include/fe_utils/parallel_slot.h b/src/include/fe_utils/parallel_slot.h
index 99eeb3328d..524d62306d 100644
--- a/src/include/fe_utils/parallel_slot.h
+++ b/src/include/fe_utils/parallel_slot.h
@@ -15,12 +15,39 @@
#include "fe_utils/connect_utils.h"
#include "libpq-fe.h"
+typedef PGresult *(*PGresultHandler) (PGresult *res, PGconn *conn,
+ void *context);
+
typedef struct ParallelSlot
{
PGconn *connection; /* One connection */
bool isFree; /* Is it known to be idle? */
+
+ /*
+ * Prior to issuing a command or query on 'connection', a handler callback
+ * function may optionally be registered to be invoked to process the
+ * results, and context information may optionally be registered for use
+ * by the handler. If unset, these fields should be NULL.
+ */
+ PGresultHandler handler;
+ void *handler_context;
} ParallelSlot;
+static inline void
+ParallelSlotSetHandler(ParallelSlot *slot, PGresultHandler handler,
+ void *context)
+{
+ slot->handler = handler;
+ slot->handler_context = context;
+}
+
+static inline void
+ParallelSlotClearHandler(ParallelSlot *slot)
+{
+ slot->handler = NULL;
+ slot->handler_context = NULL;
+}
+
extern ParallelSlot *ParallelSlotsGetIdle(ParallelSlot *slots, int numslots);
extern ParallelSlot *ParallelSlotsSetup(const ConnParams *cparams,
@@ -31,5 +58,10 @@ extern void ParallelSlotsTerminate(ParallelSlot *slots, int numslots);
extern bool ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots);
+extern PGresult *DefaultSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+
+extern PGresult *TableCommandSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
#endif /* PARALLEL_SLOT_H */
--
2.21.1 (Apple Git-122.3)
v35-0004-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v35-0004-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From cead80db53696c79bdfab8edd057b76087b058d2 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 31 Jan 2021 13:11:10 -0800
Subject: [PATCH v35 4/5] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 29 +
contrib/pg_amcheck/pg_amcheck.c | 1499 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.h | 132 ++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 78 +
contrib/pg_amcheck/t/003_check.pl | 428 ++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 496 +++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 1004 +++++++++++++
src/tools/msvc/Install.pm | 4 +-
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 4 +
16 files changed, 3742 insertions(+), 5 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.h
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index cdc041c7db..bacefc70da 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..c21a14de31
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..bc61ee7970
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,29 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+SHLIB_PREREQS = submake-libpq
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..e919f6411c
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1499 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_class.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "getopt_long.h" /* pgrminclude ignore */
+#include "libpq-fe.h"
+#include "pg_amcheck.h"
+#include "pqexpbuffer.h" /* pgrminclude ignore */
+#include "storage/block.h"
+
+/* Keep this in order by CheckType */
+static const CheckTypeFilter ctfilter[] = {
+ {
+ .relam = HEAP_TABLE_AM_OID,
+ .relkinds = CppAsString2(RELKIND_RELATION) ","
+ CppAsString2(RELKIND_MATVIEW) ","
+ CppAsString2(RELKIND_TOASTVALUE),
+ .typname = "heap"
+ },
+ {
+ .relam = BTREE_AM_OID,
+ .relkinds = CppAsString2(RELKIND_INDEX),
+ .typname = "btree index"
+ }
+};
+
+/* Query for determining if contrib's amcheck is installed */
+static const char *amcheck_sql =
+"SELECT 1"
+"\nFROM pg_catalog.pg_extension"
+"\nWHERE extname OPERATOR(pg_catalog.=) 'amcheck'";
+
+
+int
+main(int argc, char *argv[])
+{
+ static struct option long_options[] = {
+ /* Connection options */
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"maintenance-db", required_argument, NULL, 1},
+
+ /* check options */
+ {"all", no_argument, NULL, 'a'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"exclude-dbname", required_argument, NULL, 'D'},
+ {"echo", no_argument, NULL, 'e'},
+ {"heapallindexed", no_argument, NULL, 'H'},
+ {"index", required_argument, NULL, 'i'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"jobs", required_argument, NULL, 'j'},
+ {"parent-check", no_argument, NULL, 'P'},
+ {"quiet", no_argument, NULL, 'q'},
+ {"relation", required_argument, NULL, 'r'},
+ {"exclude-relation", required_argument, NULL, 'R'},
+ {"schema", required_argument, NULL, 's'},
+ {"exclude-schema", required_argument, NULL, 'S'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"exclude-indexes", no_argument, NULL, 2},
+ {"exclude-toast", no_argument, NULL, 3},
+ {"exclude-toast-pointers", no_argument, NULL, 4},
+ {"on-error-stop", no_argument, NULL, 5},
+ {"skip", required_argument, NULL, 6},
+ {"startblock", required_argument, NULL, 7},
+ {"endblock", required_argument, NULL, 8},
+ {"rootdescend", no_argument, NULL, 9},
+ {"no-dependents", no_argument, NULL, 10},
+
+ {NULL, 0, NULL, 0}
+ };
+
+ const char *progname;
+ int optindex;
+ int c;
+
+ const char *maintenance_db = NULL;
+ const char *connect_db = NULL;
+ const char *host = NULL;
+ const char *port = NULL;
+ const char *username = NULL;
+ enum trivalue prompt_password = TRI_DEFAULT;
+ ConnParams cparams;
+
+ amcheckOptions checkopts = {
+ .alldb = false,
+ .echo = false,
+ .quiet = false,
+ .verbose = false,
+ .dependents = true,
+ .no_indexes = false,
+ .on_error_stop = false,
+ .parent_check = false,
+ .rootdescend = false,
+ .heapallindexed = false,
+ .exclude_toast = false,
+ .reconcile_toast = true,
+ .skip = "none",
+ .jobs = -1,
+ .startblock = -1,
+ .endblock = -1
+ };
+
+ amcheckObjects objects = {
+ .databases = {NULL, NULL},
+ .schemas = {NULL, NULL},
+ .tables = {NULL, NULL},
+ .indexes = {NULL, NULL},
+ .exclude_databases = {NULL, NULL},
+ .exclude_schemas = {NULL, NULL},
+ .exclude_tables = {NULL, NULL},
+ .exclude_indexes = {NULL, NULL}
+ };
+
+ pg_logging_init(argv[0]);
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("contrib"));
+
+ handle_help_version_opts(argc, argv, progname, help);
+
+ /* process command-line options */
+ while ((c = getopt_long(argc, argv, "ad:D:eh:Hi:I:j:p:Pqr:R:s:S:t:T:U:wWv",
+ long_options, &optindex)) != -1)
+ {
+ char *endptr;
+
+ switch (c)
+ {
+ case 'a':
+ checkopts.alldb = true;
+ break;
+ case 'd':
+ simple_string_list_append(&objects.databases, optarg);
+ break;
+ case 'D':
+ simple_string_list_append(&objects.exclude_databases, optarg);
+ break;
+ case 'e':
+ checkopts.echo = true;
+ break;
+ case 'h':
+ host = pg_strdup(optarg);
+ break;
+ case 'H':
+ checkopts.heapallindexed = true;
+ break;
+ case 'i':
+ simple_string_list_append(&objects.indexes, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&objects.exclude_indexes, optarg);
+ break;
+ case 'j':
+ checkopts.jobs = atoi(optarg);
+ if (checkopts.jobs <= 0)
+ {
+ pg_log_error("number of parallel jobs must be at least 1");
+ exit(1);
+ }
+ break;
+ case 'p':
+ port = pg_strdup(optarg);
+ break;
+ case 'P':
+ checkopts.parent_check = true;
+ break;
+ case 'q':
+ checkopts.quiet = true;
+ break;
+ case 'r':
+ simple_string_list_append(&objects.indexes, optarg);
+ simple_string_list_append(&objects.tables, optarg);
+ break;
+ case 'R':
+ simple_string_list_append(&objects.exclude_tables, optarg);
+ simple_string_list_append(&objects.exclude_indexes, optarg);
+ break;
+ case 's':
+ simple_string_list_append(&objects.schemas, optarg);
+ break;
+ case 'S':
+ simple_string_list_append(&objects.exclude_schemas, optarg);
+ break;
+ case 't':
+ simple_string_list_append(&objects.tables, optarg);
+ break;
+ case 'T':
+ simple_string_list_append(&objects.exclude_tables, optarg);
+ break;
+ case 'U':
+ username = pg_strdup(optarg);
+ break;
+ case 'w':
+ prompt_password = TRI_NO;
+ break;
+ case 'W':
+ prompt_password = TRI_YES;
+ break;
+ case 'v':
+ checkopts.verbose = true;
+ pg_logging_increase_verbosity();
+ break;
+ case 1:
+ maintenance_db = pg_strdup(optarg);
+ break;
+ case 2:
+ checkopts.no_indexes = true;
+ break;
+ case 3:
+ checkopts.exclude_toast = true;
+ break;
+ case 4:
+ checkopts.reconcile_toast = false;
+ break;
+ case 5:
+ checkopts.on_error_stop = true;
+ break;
+ case 6:
+ if (pg_strcasecmp(optarg, "all-visible") == 0)
+ checkopts.skip = "all visible";
+ else if (pg_strcasecmp(optarg, "all-frozen") == 0)
+ checkopts.skip = "all frozen";
+ else
+ {
+ fprintf(stderr, _("invalid skip options"));
+ exit(1);
+ }
+ break;
+ case 7:
+ checkopts.startblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ _("relation starting block argument contains garbage characters"));
+ exit(1);
+ }
+ if (checkopts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ _("relation starting block argument out of bounds"));
+ exit(1);
+ }
+ break;
+ case 8:
+ checkopts.endblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ _("relation ending block argument contains garbage characters"));
+ exit(1);
+ }
+ if (checkopts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ _("relation ending block argument out of bounds"));
+ exit(1);
+ }
+ break;
+ case 9:
+ checkopts.rootdescend = true;
+ checkopts.parent_check = true;
+ break;
+ case 10:
+ checkopts.dependents = false;
+ break;
+ default:
+ fprintf(stderr,
+ _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ }
+
+ if (checkopts.endblock >= 0 && checkopts.endblock < checkopts.startblock)
+ {
+ pg_log_error("relation ending block argument precedes starting block argument");
+ exit(1);
+ }
+
+ /* non-option arguments specify database names */
+ while (optind < argc)
+ {
+ if (connect_db == NULL)
+ connect_db = argv[optind];
+ simple_string_list_append(&objects.databases, argv[optind]);
+ optind++;
+ }
+
+ /* fill cparams except for dbname, which is set below */
+ cparams.pghost = host;
+ cparams.pgport = port;
+ cparams.pguser = username;
+ cparams.prompt_password = prompt_password;
+ cparams.override_dbname = NULL;
+
+ setup_cancel_handler(NULL);
+
+ /* choose the database for our initial connection */
+ if (maintenance_db)
+ cparams.dbname = maintenance_db;
+ else if (connect_db != NULL)
+ cparams.dbname = connect_db;
+ else if (objects.databases.head != NULL)
+ cparams.dbname = objects.databases.head->val;
+ else
+ {
+ const char *default_db;
+
+ if (getenv("PGDATABASE"))
+ default_db = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ default_db = getenv("PGUSER");
+ else
+ default_db = get_user_name_or_exit(progname);
+
+ if (objects.databases.head == NULL)
+ simple_string_list_append(&objects.databases, default_db);
+
+ cparams.dbname = default_db;
+ }
+
+ check_each_database(&cparams, &objects, &checkopts, progname);
+
+ exit(0);
+}
+
+/*
+ * check_each_database
+ *
+ * Connects to the initial database and resolves a list of all databases that
+ * should be checked per the user supplied options. Sequentially checks each
+ * database in the list.
+ *
+ * The user supplied options may include zero databases, or only one database,
+ * in which case we could skip the step of resolving a list of databases, but
+ * it seems not worth optimizing, especially considering that there are
+ * multiple ways in which no databases or just one database might be specified,
+ * including a pattern that happens to match no entries or to match only one
+ * entry in pg_database.
+ *
+ * cparams: parameters for the initial database connection
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ */
+static void
+check_each_database(ConnParams *cparams, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname)
+{
+ PGconn *conn;
+ PGresult *databases;
+ PQExpBufferData sql;
+ int ntups;
+ int i;
+ SimpleStringList dbregex = {NULL, NULL};
+ SimpleStringList exclude = {NULL, NULL};
+
+ /*
+ * Get a list of all database SQL regexes to use for selecting database
+ * names. We assemble these regexes from fully-qualified relation
+ * patterns and database patterns. This process may result in the same
+ * database regex in the list multiple times, but the query against
+ * pg_database will deduplice, so we don't care.
+ */
+ get_db_regexes_from_fqrps(&dbregex, &objects->tables);
+ get_db_regexes_from_fqrps(&dbregex, &objects->indexes);
+ get_db_regexes_from_patterns(&dbregex, &objects->databases);
+
+ /*
+ * Assemble SQL regexes for databases to be excluded. Note that excluded
+ * relations are not considered here, as excluding relation x.y.z does not
+ * imply excluding database x. Excluding x.*.* would imply excluding
+ * database x, but we do not check for that here.
+ */
+ get_db_regexes_from_patterns(&exclude, &objects->exclude_databases);
+
+ conn = connectMaintenanceDatabase(cparams, progname, checkopts->echo);
+
+ initPQExpBuffer(&sql);
+ dbname_select(conn, &sql, &dbregex, checkopts->alldb);
+ appendPQExpBufferStr(&sql, "\nEXCEPT");
+ dbname_select(conn, &sql, &exclude, false);
+ executeCommand(conn, "RESET search_path;", checkopts->echo);
+ databases = executeQuery(conn, sql.data, checkopts->echo);
+ if (PQresultStatus(databases) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ termPQExpBuffer(&sql);
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, checkopts->echo));
+ PQfinish(conn);
+
+ ntups = PQntuples(databases);
+ if (ntups == 0 && !checkopts->quiet)
+ printf(_("%s: no databases to check\n"), progname);
+
+ for (i = 0; i < ntups; i++)
+ {
+ cparams->override_dbname = PQgetvalue(databases, i, 0);
+ check_one_database(cparams, objects, checkopts, progname);
+ }
+
+ PQclear(databases);
+}
+
+/*
+ * string_in_list
+ *
+ * Returns whether a given string is in the list of strings.
+ */
+static bool
+string_in_list(const SimpleStringList *list, const char *str)
+{
+ const SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next)
+ if (strcmp(cell->val, str) == 0)
+ return true;
+ return false;
+}
+
+/*
+ * check_one_database
+ *
+ * Connects to this next database and checks all relations that match the
+ * supplied objects list. Patterns in the object lists are matched to the
+ * relations that exit in this next database.
+ *
+ * cparams: parameters for this next database connection
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ */
+static void
+check_one_database(const ConnParams *cparams, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname)
+{
+ PQExpBufferData sql;
+ PGconn *conn;
+ PGresult *result;
+ ParallelSlot *slots;
+ int ntups;
+ int i;
+ int parallel_workers;
+ bool inclusive;
+ bool failed = false;
+
+ conn = connectDatabase(cparams, progname, checkopts->echo, false, true);
+
+ if (!checkopts->quiet)
+ {
+ printf(_("%s: checking database \"%s\"\n"),
+ progname, PQdb(conn));
+ fflush(stdout);
+ }
+
+ /*
+ * Verify that amcheck is installed for this next database. User error
+ * could result in a database not having amcheck that should have it, but
+ * we also could be iterating over multiple databases where not all of
+ * them have amcheck installed (for example, 'template1').
+ */
+ result = executeQuery(conn, amcheck_sql, checkopts->echo);
+ if (PQresultStatus(result) != PGRES_TUPLES_OK)
+ {
+ /* Querying the catalog failed. */
+ pg_log_error(_("%s: database \"%s\": %s\n"),
+ progname, PQdb(conn), PQerrorMessage(conn));
+ pg_log_error(_("%s: query was: %s"), progname, amcheck_sql);
+ PQclear(result);
+ PQfinish(conn);
+ return;
+ }
+ ntups = PQntuples(result);
+ PQclear(result);
+ if (ntups == 0)
+ {
+ /* Querying the catalog succeeded, but amcheck is missing. */
+ if (!checkopts->quiet &&
+ (checkopts->verbose ||
+ string_in_list(&objects->databases, PQdb(conn))))
+ {
+ printf(_("%s: skipping database \"%s\": amcheck is not installed"),
+ progname, PQdb(conn));
+ }
+ PQfinish(conn);
+ return;
+ }
+
+ /*
+ * If we were given no tables nor indexes to check, then we select all
+ * targets not excluded. Otherwise, we select only the targets that we
+ * were given.
+ */
+ inclusive = objects->tables.head == NULL &&
+ objects->indexes.head == NULL;
+
+ initPQExpBuffer(&sql);
+ target_select(conn, &sql, objects, checkopts, progname, inclusive);
+ executeCommand(conn, "RESET search_path;", checkopts->echo);
+ result = executeQuery(conn, sql.data, checkopts->echo);
+ if (PQresultStatus(result) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ PQfinish(conn);
+ exit(1);
+ }
+ termPQExpBuffer(&sql);
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, checkopts->echo));
+
+ /*
+ * If no rows are returned, there are no matching relations, so we are
+ * done.
+ */
+ ntups = PQntuples(result);
+ if (ntups == 0)
+ {
+ PQclear(result);
+ PQfinish(conn);
+ return;
+ }
+
+ /*
+ * Ensure parallel_workers is sane. If there are more connections than
+ * relations to be checked, we don't need to use them all.
+ */
+ parallel_workers = checkopts->jobs;
+ if (parallel_workers > ntups)
+ parallel_workers = ntups;
+ if (parallel_workers <= 0)
+ parallel_workers = 1;
+
+ /*
+ * Setup the database connections. We reuse the connection we already
+ * have for the first slot. If not in parallel mode, the first slot in
+ * the array contains the connection.
+ */
+ slots = ParallelSlotsSetup(cparams, progname, checkopts->echo, conn,
+ parallel_workers);
+
+ initPQExpBuffer(&sql);
+
+ /*
+ * Loop over all objects to be checked, and execute amcheck checking
+ * commands for each. We do not wait for the checks to complete, nor do
+ * we handle the results of those checks in the loop. We register
+ * handlers for doing all that.
+ */
+ for (i = 0; i < ntups; i++)
+ {
+ ParallelSlot *free_slot;
+
+ CheckType checktype = atoi(PQgetvalue(result, i, 0));
+ Oid reloid = atooid(PQgetvalue(result, i, 1));
+
+ if (CancelRequested)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ /*
+ * Get a parallel slot for the next amcheck command, blocking if
+ * necessary until one is available, or until a previously issued slot
+ * command fails, indicating that we should abort checking the
+ * remaining objects.
+ */
+ free_slot = ParallelSlotsGetIdle(slots, parallel_workers);
+ if (!free_slot)
+ {
+ /*
+ * Something failed. We don't need to know what it was, because
+ * the handler should already have emitted the necessary error
+ * messages.
+ */
+ failed = true;
+ goto finish;
+ }
+
+ /* Execute the amcheck command for the given relation type. */
+ switch (checktype)
+ {
+ /* heapam types */
+ case CT_TABLE:
+ prepare_table_command(&sql, checkopts, reloid);
+ ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler, sql.data);
+ run_command(free_slot->connection, sql.data, checkopts, reloid,
+ ctfilter[checktype].typname);
+ break;
+
+ /* btreeam types */
+ case CT_BTREE:
+ prepare_btree_command(&sql, checkopts, reloid);
+ ParallelSlotSetHandler(free_slot, VerifyBtreeSlotHandler, NULL);
+ run_command(free_slot->connection, sql.data, checkopts, reloid,
+ ctfilter[checktype].typname);
+ break;
+
+ /* intentionally no default here */
+ }
+ }
+
+ /*
+ * Wait for all slots to complete, or for one to indicate that an error
+ * occurred. Like above, we rely on the handler emitting the necessary
+ * error messages.
+ */
+ if (!ParallelSlotsWaitCompletion(slots, parallel_workers))
+ failed = true;
+
+finish:
+ ParallelSlotsTerminate(slots, parallel_workers);
+ pg_free(slots);
+
+ termPQExpBuffer(&sql);
+
+ if (failed)
+ exit(1);
+}
+
+/*
+ * prepare_table_command
+ *
+ * Creates a SQL command for running amcheck checking on the given heap
+ * relation. The command is phrased as a SQL query, with column order and
+ * names matching the expectations of VerifyHeapamSlotHandler, which will
+ * receive and handle each row returned from the verify_heapam() function.
+ *
+ * sql: buffer into which the table checking command will be written
+ * checkopts: user supplied program options
+ * reloid: relation of the table to be checked
+ */
+static void
+prepare_table_command(PQExpBuffer sql, const amcheckOptions *checkopts,
+ Oid reloid)
+{
+ resetPQExpBuffer(sql);
+ appendPQExpBuffer(sql,
+ "SELECT n.nspname, c.relname, v.blkno, v.offnum, v.attnum, v.msg"
+ "\nFROM public.verify_heapam("
+ "\nrelation := %u,"
+ "\non_error_stop := %s,"
+ "\ncheck_toast := %s,"
+ "\nskip := '%s'",
+ reloid,
+ checkopts->on_error_stop ? "true" : "false",
+ checkopts->reconcile_toast ? "true" : "false",
+ checkopts->skip);
+ if (checkopts->startblock >= 0)
+ appendPQExpBuffer(sql, ",\nstartblock := %ld", checkopts->startblock);
+ if (checkopts->endblock >= 0)
+ appendPQExpBuffer(sql, ",\nendblock := %ld", checkopts->endblock);
+ appendPQExpBuffer(sql, "\n) v,"
+ "\npg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE c.oid OPERATOR(pg_catalog.=) %u",
+ reloid);
+}
+
+/*
+ * prepare_btree_command
+ *
+ * Creates a SQL command for running amcheck checking on the given btree index
+ * relation. The command does not select any columns, as btree checking
+ * functions do not return any, but rather return corruption information by
+ * raising errors, which VerifyBtreeSlotHandler expects.
+ *
+ * sql: buffer into which the table checking command will be written
+ * checkopts: user supplied program options
+ * reloid: relation of the table to be checked
+ */
+static void
+prepare_btree_command(PQExpBuffer sql, const amcheckOptions *checkopts,
+ Oid reloid)
+{
+ resetPQExpBuffer(sql);
+ if (checkopts->parent_check)
+ appendPQExpBuffer(sql,
+ "SELECT public.bt_index_parent_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s,"
+ "\nrootdescend := %s)",
+ reloid,
+ (checkopts->heapallindexed ? "true" : "false"),
+ (checkopts->rootdescend ? "true" : "false"));
+ else
+ appendPQExpBuffer(sql,
+ "SELECT public.bt_index_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s)",
+ reloid,
+ (checkopts->heapallindexed ? "true" : "false"));
+}
+
+/*
+ * run_command
+ *
+ * Sends a command to the server without waiting for the command to complete.
+ * Logs an error if the command cannot be sent, but otherwise any errors are
+ * expected to be handled by a ParallelSlotHandler.
+ *
+ * conn: connection to the server associated with the slot to use
+ * sql: query to send
+ * checkopts: user supplied program options
+ * reloid: oid of the object being checked, for error reporting
+ * typ: type of object being checked, for error reporting
+ */
+static void
+run_command(PGconn *conn, const char *sql, const amcheckOptions *checkopts,
+ Oid reloid, const char *typ)
+{
+ bool status;
+
+ if (checkopts->echo)
+ printf("%s\n", sql);
+
+ status = PQsendQuery(conn, sql) == 1;
+
+ if (!status)
+ {
+ pg_log_error("check of %s with id %u in database \"%s\" failed: %s",
+ typ, reloid, PQdb(conn), PQerrorMessage(conn));
+ pg_log_error("command was: %s", sql);
+ }
+}
+
+/*
+ * VerifyHeapamSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a table checking command
+ * created by prepare_table_command and outputs the results for the user.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: the sql query being handled, as a cstring
+ */
+static PGresult *
+VerifyHeapamSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ int ntups = PQntuples(res);
+
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ int i;
+
+ for (i = 0; i < ntups; i++)
+ {
+ if (!PQgetisnull(res, i, 4))
+ printf("relation %s.%s, block %s, offset %s, attribute %s\n %s\n",
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ PQgetvalue(res, i, 4), /* attnum */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 3))
+ printf("relation %s.%s, block %s, offset %s\n %s\n",
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 2))
+ printf("relation %s.%s, block %s\n %s\n",
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 1))
+ printf("relation %s.%s\n %s\n",
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ /* blkno is null: 2 */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else
+ printf("%s\n", PQgetvalue(res, i, 5)); /* msg */
+ }
+ }
+ else if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ printf("%s\n", PQerrorMessage(conn));
+ printf("query was: %s\n", (const char *) context);
+ }
+
+ return res;
+}
+
+/*
+ * VerifyBtreeSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a btree checking command
+ * created by prepare_btree_command and outputs them for the user. The results
+ * from the btree checking command is assumed to be empty, but when the results
+ * are an error code, the useful information about the corruption is expected
+ * in the connection's error message.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: unused
+ */
+static PGresult *
+VerifyBtreeSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ printf("%s\n", PQerrorMessage(conn));
+ return res;
+}
+
+/*
+ * help
+ *
+ * Prints help page for the program
+ *
+ * progname: the name of the executed program, such as "pg_amcheck"
+ */
+static void
+help(const char *progname)
+{
+ printf(_("%s checks objects in a PostgreSQL database for corruption.\n\n"), progname);
+ printf(_("Usage:\n"));
+ printf(_(" %s [OPTION]... [DBNAME]\n"), progname);
+ printf(_("\nTarget Options:\n"));
+ printf(_(" -a, --all check all databases\n"));
+ printf(_(" -d, --dbname=DBNAME check specific database(s)\n"));
+ printf(_(" -D, --exclude-dbname=DBNAME do NOT check specific database(s)\n"));
+ printf(_(" -i, --index=INDEX check specific index(es)\n"));
+ printf(_(" -I, --exclude-index=INDEX do NOT check specific index(es)\n"));
+ printf(_(" -r, --relation=RELNAME check specific relation(s)\n"));
+ printf(_(" -R, --exclude-relation=RELNAME do NOT check specific relation(s)\n"));
+ printf(_(" -s, --schema=SCHEMA check specific schema(s)\n"));
+ printf(_(" -S, --exclude-schema=SCHEMA do NOT check specific schema(s)\n"));
+ printf(_(" -t, --table=TABLE check specific table(s)\n"));
+ printf(_(" -T, --exclude-table=TABLE do NOT check specific table(s)\n"));
+ printf(_(" --exclude-indexes do NOT perform any index checking\n"));
+ printf(_(" --exclude-toast do NOT check any toast tables or indexes\n"));
+ printf(_(" --no-dependents do NOT automatically check dependent objects\n"));
+ printf(_("\nIndex Checking Options:\n"));
+ printf(_(" -H, --heapallindexed check all heap tuples are found within indexes\n"));
+ printf(_(" -P, --parent-check check parent/child relationships during index checking\n"));
+ printf(_(" --rootdescend search from root page to refind tuples at the leaf level\n"));
+ printf(_("\nTable Checking Options:\n"));
+ printf(_(" --exclude-toast-pointers do NOT check relation toast pointers against toast\n"));
+ printf(_(" --on-error-stop stop checking a relation at end of first corrupt page\n"));
+ printf(_(" --skip=OPTION do NOT check \"all-frozen\" or \"all-visible\" blocks\n"));
+ printf(_(" --startblock begin checking table(s) at the given starting block number\n"));
+ printf(_(" --endblock check table(s) only up to the given ending block number\n"));
+ printf(_("\nConnection options:\n"));
+ printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
+ printf(_(" -p, --port=PORT database server port\n"));
+ printf(_(" -U, --username=USERNAME user name to connect as\n"));
+ printf(_(" -w, --no-password never prompt for password\n"));
+ printf(_(" -W, --password force password prompt\n"));
+ printf(_(" --maintenance-db=DBNAME alternate maintenance database\n"));
+ printf(_("\nOther Options:\n"));
+ printf(_(" -e, --echo show the commands being sent to the server\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to the server\n"));
+ printf(_(" -q, --quiet don't write any messages\n"));
+ printf(_(" -v, --verbose write a lot of output\n"));
+ printf(_(" -V, --version output version information, then exit\n"));
+ printf(_(" -?, --help show this help, then exit\n"));
+
+ printf(_("\nRead the description of the amcheck contrib module for details.\n"));
+ printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+ printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
+
+/*
+ * get_db_regexes_from_fqrps
+ *
+ * For each pattern in the patterns list, if it is in fully-qualified
+ * database.schema.name format (fully-qualifed relation pattern (fqrp)), parse
+ * the database portion of the pattern, convert it to SQL regex format, and
+ * append it to the databases list. Patterns that are not fully-qualified are
+ * skipped over. No deduplication of regexes is performed.
+ *
+ * regexes: list to which parsed and converted database regexes are appended
+ * patterns: list of all patterns to parse
+ */
+static void
+get_db_regexes_from_fqrps(SimpleStringList *regexes,
+ const SimpleStringList *patterns)
+{
+ const SimpleStringListCell *cell;
+ PQExpBufferData dbnamebuf;
+ PQExpBufferData schemabuf;
+ PQExpBufferData namebuf;
+ int encoding = pg_get_encoding_from_locale(NULL, false);
+
+ initPQExpBuffer(&dbnamebuf);
+ initPQExpBuffer(&schemabuf);
+ initPQExpBuffer(&namebuf);
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /* parse the pattern as dbname.schema.relname, if possible */
+ patternToSQLRegex(encoding, &dbnamebuf, &schemabuf, &namebuf,
+ cell->val, false);
+
+ /* add the database name (or pattern), if any, to the list */
+ if (dbnamebuf.data[0])
+ simple_string_list_append(regexes, dbnamebuf.data);
+
+ /* we do not use the schema or relname portions */
+
+ /* we may have dirtied the buffers */
+ resetPQExpBuffer(&dbnamebuf);
+ resetPQExpBuffer(&schemabuf);
+ resetPQExpBuffer(&namebuf);
+ }
+ termPQExpBuffer(&dbnamebuf);
+ termPQExpBuffer(&schemabuf);
+ termPQExpBuffer(&namebuf);
+}
+
+/*
+ * get_db_regexes_from_patterns
+ *
+ * Convert each unqualified pattern in the list to SQL regex format and append
+ * it to the regexes list.
+ *
+ * regexes: list to which converted regexes are appended
+ * patterns: list of patterns to be converted
+ */
+static void
+get_db_regexes_from_patterns(SimpleStringList *regexes,
+ const SimpleStringList *patterns)
+{
+ const SimpleStringListCell *cell;
+ PQExpBufferData buf;
+ int encoding = pg_get_encoding_from_locale(NULL, false);
+
+ initPQExpBuffer(&buf);
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ patternToSQLRegex(encoding, NULL, NULL, &buf, cell->val, false);
+ if (buf.data[0])
+ simple_string_list_append(regexes, buf.data);
+ resetPQExpBuffer(&buf);
+ }
+ termPQExpBuffer(&buf);
+}
+
+/*
+ * dbname_select
+ *
+ * Appends a statement which selects the names of all databases matching the
+ * given SQL regular expressions.
+ *
+ * conn: connection to the initial database
+ * sql: buffer into which the constructed sql statement is appended
+ * regexes: list of database name regular expressions to match
+ * alldb: when true, select all databases which allow connections
+ */
+static void
+dbname_select(PGconn *conn, PQExpBuffer sql, const SimpleStringList *regexes,
+ bool alldb)
+{
+ SimpleStringListCell *cell;
+ const char *comma;
+
+ if (alldb)
+ {
+ appendPQExpBufferStr(sql, "\nSELECT datname::TEXT AS datname"
+ "\nFROM pg_database"
+ "\nWHERE datallowconn");
+ return;
+ }
+ else if (regexes->head == NULL)
+ {
+ appendPQExpBufferStr(sql, "\nSELECT ''::TEXT AS datname"
+ "\nWHERE false");
+ return;
+ }
+
+ appendPQExpBufferStr(sql, "\nSELECT datname::TEXT AS datname"
+ "\nFROM pg_database"
+ "\nWHERE datallowconn"
+ "\nAND datname::TEXT OPERATOR(pg_catalog.~) ANY(ARRAY[\n");
+ for (cell = regexes->head, comma = ""; cell; cell = cell->next, comma = ",\n")
+ {
+ appendPQExpBufferStr(sql, comma);
+ appendStringLiteralConn(sql, cell->val, conn);
+ appendPQExpBufferStr(sql, "::TEXT COLLATE pg_catalog.default");
+ }
+ appendPQExpBufferStr(sql, "\n]::TEXT[])");
+}
+
+/*
+ * schema_select
+ *
+ * Appends a statement which selects all schemas matching the given patterns
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * fieldname: alias to use for the oid field within the created SELECT
+ * statement
+ * patterns: list of schema name patterns to match
+ * inclusive: when patterns is an empty list, whether the select statement
+ * should match all non-system schemas
+ */
+static void
+schema_select(PGconn *conn, PQExpBuffer sql, const char *fieldname,
+ const SimpleStringList *patterns, bool inclusive)
+{
+ SimpleStringListCell *cell;
+ const char *comma;
+ int encoding = PQclientEncoding(conn);
+
+ if (patterns->head == NULL)
+ {
+ if (!inclusive)
+ appendPQExpBuffer(sql, "\nSELECT 0::pg_catalog.oid AS %s WHERE false", fieldname);
+ else
+ appendPQExpBuffer(sql, "\nSELECT oid AS %s"
+ "\nFROM pg_catalog.pg_namespace"
+ "\nWHERE oid OPERATOR(pg_catalog.!=) pg_catalog.regnamespace('pg_catalog')"
+ "\nAND oid OPERATOR(pg_catalog.!=) pg_catalog.regnamespace('pg_toast')",
+ fieldname);
+ return;
+ }
+
+ appendPQExpBuffer(sql, "\nSELECT oid AS %s"
+ "\nFROM pg_catalog.pg_namespace"
+ "\nWHERE nspname OPERATOR(pg_catalog.~) ANY(ARRAY[\n",
+ fieldname);
+ for (cell = patterns->head, comma = ""; cell; cell = cell->next, comma = ",\n")
+ {
+ PQExpBufferData regexbuf;
+
+ initPQExpBuffer(®exbuf);
+ patternToSQLRegex(encoding, NULL, NULL, ®exbuf, cell->val, false);
+ appendPQExpBufferStr(sql, comma);
+ appendStringLiteralConn(sql, regexbuf.data, conn);
+ appendPQExpBufferStr(sql, "::TEXT COLLATE pg_catalog.default");
+ termPQExpBuffer(®exbuf);
+ }
+ appendPQExpBufferStr(sql, "\n]::TEXT[])");
+}
+
+/*
+ * schema_cte
+ *
+ * Appends a Common Table Expression (CTE) which selects all schemas to be
+ * checked, with the CTE and oid field named as requested. The CTE will select
+ * all schemas matching the include list except any schemas matching the
+ * exclude list.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * ctename: name of the schema CTE to be created
+ * fieldname: name of the oid field within the schema CTE to be created
+ * include: list of schema name patterns for inclusion
+ * exclude: list of schema name patterns for exclusion
+ * inclusive: when 'include' is an empty list, whether to use all schemas in
+ * the database in lieu of the include list.
+ */
+static void
+schema_cte(PGconn *conn, PQExpBuffer sql, const char *ctename,
+ const char *fieldname, const SimpleStringList *include,
+ const SimpleStringList *exclude, bool inclusive)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+ schema_select(conn, sql, fieldname, include, inclusive);
+ appendPQExpBufferStr(sql, "\nEXCEPT");
+ schema_select(conn, sql, fieldname, exclude, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * append_ctfilter_quals
+ *
+ * Appends quals to a buffer that restrict the rows selected from pg_class to
+ * only those which match the given checktype. No initial "WHERE" or "AND" is
+ * appended, nor do we surround our appended clauses in parens. The caller is
+ * assumed to take care of such matters.
+ *
+ * sql: buffer into which the constructed sql quals are appended
+ * relname: name (or alias) of pg_class in the surrounding query
+ * checktype: struct containing filter info
+ */
+static void
+append_ctfilter_quals(PQExpBuffer sql, const char *relname, CheckType checktype)
+{
+ appendPQExpBuffer(sql,
+ "%s.relam OPERATOR(pg_catalog.=) %u"
+ "\nAND %s.relkind OPERATOR(pg_catalog.=) ANY(ARRAY[%s])",
+ relname, ctfilter[checktype].relam,
+ relname, ctfilter[checktype].relkinds);
+}
+
+/*
+ * relation_select
+ *
+ * Appends a statement which selects the oid of all relations matching the
+ * given parameters. Expects a mixture of qualified and unqualified relation
+ * name patterns.
+ *
+ * For unqualified relation patterns, selects relations that match the relation
+ * name portion of the pattern which are in namespaces that are in the given
+ * namespace CTE.
+ *
+ * For qualified relation patterns, ignores the given namespace CTE and selects
+ * relations that match the relation name portion of the pattern which are in
+ * namespaces that match the schema portion of the pattern.
+ *
+ * For fully qualified relation patterns (database.schema.name), the pattern
+ * will be ignored unless the database portion of the pattern matches the name
+ * of the current database, as retrieved from conn.
+ *
+ * Only relations of the specified checktype will be selected.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * schemafield: name of the oid field within the schema CTE
+ * fieldname: alias to use for the oid field within the created SELECT
+ * statement
+ * patterns: list of (possibly qualified) relation name patterns to match
+ * checktype: the type of relation to select
+ * inclusive: when patterns is an empty list, whether the select statement
+ * should match all relations of the given type
+ */
+static void
+relation_select(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *schemafield, const char *fieldname,
+ const SimpleStringList *patterns, CheckType checktype,
+ bool inclusive)
+{
+ SimpleStringListCell *cell;
+ const char *comma = "";
+ const char *qor = "";
+ PQExpBufferData qualified;
+ PQExpBufferData unqualified;
+ PQExpBufferData dbnamebuf;
+ PQExpBufferData schemabuf;
+ PQExpBufferData namebuf;
+ int encoding = PQclientEncoding(conn);
+
+ if (patterns->head == NULL)
+ {
+ if (!inclusive)
+ appendPQExpBuffer(sql,
+ "\nSELECT 0::pg_catalog.oid AS %s WHERE false",
+ fieldname);
+ else
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT oid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN %s n"
+ "\nON n.%s OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE ",
+ fieldname, schemacte, schemafield);
+ append_ctfilter_quals(sql, "c", checktype);
+ }
+ return;
+ }
+
+ /*
+ * We have to distinguish between schema-qualified and unqualified
+ * relation patterns. The unqualified patterns need to be restricted by
+ * the list of schemas returned by the schema CTE, but not so for the
+ * qualified patterns.
+ *
+ * We treat fully-qualified relation patterns (database.schema.relation)
+ * like schema-qualified patterns except that we also require the database
+ * portion to match the current database name.
+ */
+ initPQExpBuffer(&qualified);
+ initPQExpBuffer(&unqualified);
+ initPQExpBuffer(&dbnamebuf);
+ initPQExpBuffer(&schemabuf);
+ initPQExpBuffer(&namebuf);
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ patternToSQLRegex(encoding, &dbnamebuf, &schemabuf, &namebuf,
+ cell->val, false);
+
+ if (schemabuf.data[0])
+ {
+ /* Qualified relation pattern */
+ appendPQExpBuffer(&qualified, "%s\n(", qor);
+
+ if (dbnamebuf.data[0])
+ {
+ /*
+ * Fully-qualified relation pattern. Require the database
+ * name of our connection to match the database portion of the
+ * relation pattern.
+ */
+ appendPQExpBufferStr(&qualified, "\n");
+ appendStringLiteralConn(&qualified, PQdb(conn), conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, dbnamebuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default AND");
+ }
+
+ /*
+ * Require the namespace name to match the schema portion of the
+ * relation pattern and the relation name to match the relname
+ * portion of the relation pattern.
+ */
+ appendPQExpBufferStr(&qualified,
+ "\nn.nspname OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, schemabuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default AND"
+ "\nc.relname OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, namebuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default)");
+ qor = "\nOR";
+ }
+ else
+ {
+ /* Unqualified relation pattern */
+ appendPQExpBufferStr(&unqualified, comma);
+ appendStringLiteralConn(&unqualified, namebuf.data, conn);
+ appendPQExpBufferStr(&unqualified,
+ "::TEXT COLLATE pg_catalog.default");
+ comma = "\n, ";
+ }
+
+ resetPQExpBuffer(&dbnamebuf);
+ resetPQExpBuffer(&schemabuf);
+ resetPQExpBuffer(&namebuf);
+ }
+
+ if (qualified.data[0])
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT c.oid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE (",
+ fieldname);
+ appendPQExpBufferStr(sql, qualified.data);
+ appendPQExpBufferStr(sql, ")\nAND ");
+ append_ctfilter_quals(sql, "c", checktype);
+ if (unqualified.data[0])
+ appendPQExpBufferStr(sql, "\nUNION ALL");
+ }
+ if (unqualified.data[0])
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT c.oid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN %s ls"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) ls.%s"
+ "\nWHERE c.relname OPERATOR(pg_catalog.~) ANY(ARRAY[",
+ fieldname, schemacte, schemafield);
+ appendPQExpBufferStr(sql, unqualified.data);
+ appendPQExpBufferStr(sql, "\n]::TEXT[])\nAND ");
+ append_ctfilter_quals(sql, "c", checktype);
+ }
+}
+
+/*
+ * table_cte
+ *
+ * Appends to the buffer 'sql' a Common Table Expression (CTE) which selects
+ * all table relations matching the given filters.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * schemafield: name of the oid field within the schema CTE
+ * ctename: name of the table CTE to be created
+ * fieldname: name of the oid field within the table CTE to be created
+ * include: list of table name patterns for inclusion
+ * exclude: list of table name patterns for exclusion
+ * inclusive: when 'include' is an empty list, whether the select statement
+ * should match all relations
+ * toast: whether to also select the associated toast tables
+ */
+static void
+table_cte(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *schemafield, const char *ctename, const char *fieldname,
+ const SimpleStringList *include, const SimpleStringList *exclude,
+ bool inclusive, bool toast)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+
+ if (toast)
+ {
+ /*
+ * Compute the primary tables, then union on all associated toast
+ * tables. We depend on left to right evaluation of the UNION before
+ * the EXCEPT which gets added below. UNION and EXCEPT have equal
+ * precedence, so be careful if you rearrange this query.
+ */
+ appendPQExpBuffer(sql, "\nWITH primary_table AS (");
+ relation_select(conn, sql, schemacte, schemafield, fieldname, include,
+ CT_TABLE, inclusive);
+ appendPQExpBuffer(sql, "\n)"
+ "\nSELECT %s"
+ "\nFROM primary_table"
+ "\nUNION"
+ "\nSELECT c.reltoastrelid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN primary_table pt"
+ "\nON pt.%s OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE c.reltoastrelid OPERATOR(pg_catalog.!=) 0",
+ fieldname, fieldname, fieldname);
+ }
+ else
+ relation_select(conn, sql, schemacte, schemafield, fieldname, include,
+ CT_TABLE, inclusive);
+
+ appendPQExpBufferStr(sql, "\nEXCEPT");
+ relation_select(conn, sql, schemacte, schemafield, fieldname, exclude,
+ CT_TABLE, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * exclude_index_cte
+ * Appends a CTE which selects all indexes to be excluded
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql CTE is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * schemafield: name of the oid field within the schema CTE
+ * ctename: name of the index CTE to be created
+ * fieldname: name of the oid field within the index CTE to be created
+ * patterns: list of index name patterns to match
+ */
+static void
+exclude_index_cte(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *schemafield, const char *ctename,
+ const char *fieldname, const SimpleStringList *patterns)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+ relation_select(conn, sql, schemacte, schemafield, fieldname, patterns,
+ CT_BTREE, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * index_cte
+ * Appends a CTE which selects all indexes to be checked
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql CTE is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * schemafield: name of the oid field within the schema CTE
+ * ctename: name of the index CTE to be created
+ * fieldname: name of the oid field within the index CTE to be created
+ * excludecte: name of the CTE which contains all indexes to be excluded
+ * tablescte: optional; if automatically including indexes for checked tables,
+ * the name of the CTE which contains all tables to be checked
+ * tablesfield: if tablescte is not NULL, the name of the oid field in the
+ * tables CTE
+ * patterns: list of index name patterns to match
+ * inclusive: when 'include' is an empty list, whether the select statement should match all relations
+ */
+static void
+index_cte(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *schemafield, const char *ctename, const char *fieldname,
+ const char *excludecte, const char *tablescte,
+ const char *tablesfield, const SimpleStringList *patterns,
+ bool inclusive)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+ appendPQExpBuffer(sql, "\nSELECT %s FROM (", fieldname);
+ relation_select(conn, sql, schemacte, schemafield, fieldname, patterns,
+ CT_BTREE, inclusive);
+ if (tablescte)
+ {
+ appendPQExpBuffer(sql,
+ "\nUNION"
+ "\nSELECT i.indexrelid AS %s"
+ "\nFROM pg_catalog.pg_index i"
+ "\nJOIN %s t ON t.%s OPERATOR(pg_catalog.=) i.indrelid",
+ fieldname, tablescte, tablesfield);
+ }
+ appendPQExpBuffer(sql,
+ "\n) AS included_indexes"
+ "\nEXCEPT"
+ "\nSELECT %s FROM %s",
+ fieldname, excludecte);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * target_select
+ *
+ * Construct a query that will return a list of all tables and indexes in
+ * the database matching the user specified options, sorted by size. We
+ * want the largest tables and indexes first, so that the parallel
+ * processing of the larger database objects gets started sooner.
+ *
+ * If 'inclusive' is true, include all tables and indexes not otherwise
+ * excluded; if false, include only tables and indexes explicitly included.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql select statement is appended
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ * inclusive: when list of objects to include is empty, whether the select
+ * statement should match all objects not otherwise excluded
+ */
+static void
+target_select(PGconn *conn, PQExpBuffer sql, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname,
+ bool inclusive)
+{
+ appendPQExpBufferStr(sql, "WITH");
+ schema_cte(conn, sql, "namespaces", "nspoid", &objects->schemas,
+ &objects->exclude_schemas, inclusive);
+ appendPQExpBufferStr(sql, ",");
+ table_cte(conn, sql, "namespaces", "nspoid", "tables", "tbloid",
+ &objects->tables, &objects->exclude_tables, inclusive,
+ !checkopts->exclude_toast);
+ if (!checkopts->no_indexes)
+ {
+ appendPQExpBufferStr(sql, ",");
+ exclude_index_cte(conn, sql, "namespaces", "nspoid",
+ "excluded_indexes", "idxoid",
+ &objects->exclude_indexes);
+ appendPQExpBufferStr(sql, ",");
+ if (checkopts->dependents)
+ index_cte(conn, sql, "namespaces", "nspoid", "indexes", "idxoid",
+ "excluded_indexes", "tables", "tbloid",
+ &objects->indexes, inclusive);
+ else
+ index_cte(conn, sql, "namespaces", "nspoid", "indexes", "idxoid",
+ "excluded_indexes", NULL, NULL, &objects->indexes,
+ inclusive);
+ }
+ appendPQExpBuffer(sql,
+ "\nSELECT checktype, oid FROM ("
+ "\nSELECT %u AS checktype, tables.tbloid AS oid, c.relpages"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN tables"
+ "\nON tables.tbloid OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE ",
+ CT_TABLE);
+ append_ctfilter_quals(sql, "c", CT_TABLE);
+ if (!checkopts->no_indexes)
+ {
+ appendPQExpBuffer(sql,
+ "\nUNION ALL"
+ "\nSELECT %u AS checktype, indexes.idxoid AS oid, c.relpages"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN indexes"
+ "\nON indexes.idxoid OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE ",
+ CT_BTREE);
+ append_ctfilter_quals(sql, "c", CT_BTREE);
+ }
+ appendPQExpBufferStr(sql,
+ "\n) AS ss"
+ "\nORDER BY relpages DESC, checktype, oid");
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.h b/contrib/pg_amcheck/pg_amcheck.h
new file mode 100644
index 0000000000..454824d29e
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.h
@@ -0,0 +1,132 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.h
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2020-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_AMCHECK_H
+#define PG_AMCHECK_H
+
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "libpq-fe.h"
+#include "pqexpbuffer.h" /* pgrminclude ignore */
+
+/* amcheck options controlled by user flags */
+typedef struct amcheckOptions
+{
+ bool alldb;
+ bool echo;
+ bool quiet;
+ bool verbose;
+ bool dependents;
+ bool no_indexes;
+ bool exclude_toast;
+ bool reconcile_toast;
+ bool on_error_stop;
+ bool parent_check;
+ bool rootdescend;
+ bool heapallindexed;
+ const char *skip;
+ int jobs; /* >= 0 indicates user specified the parallel
+ * degree, otherwise -1 */
+ long startblock;
+ long endblock;
+} amcheckOptions;
+
+/* names of database objects to include or exclude controlled by user flags */
+typedef struct amcheckObjects
+{
+ SimpleStringList databases;
+ SimpleStringList schemas;
+ SimpleStringList tables;
+ SimpleStringList indexes;
+ SimpleStringList exclude_databases;
+ SimpleStringList exclude_schemas;
+ SimpleStringList exclude_tables;
+ SimpleStringList exclude_indexes;
+} amcheckObjects;
+
+/*
+ * We cannot launch the same amcheck function for all checked objects. For
+ * btree indexes, we must use either bt_index_check() or
+ * bt_index_parent_check(). For heap relations, we must use verify_heapam().
+ * We silently ignore all other object types.
+ *
+ * The following CheckType enum and corresponding ct_filter array track which
+ * which kinds of relations get which treatment.
+ */
+typedef enum
+{
+ CT_TABLE = 0,
+ CT_BTREE
+} CheckType;
+
+/*
+ * This struct is used for filtering relations in pg_catalog.pg_class to just
+ * those of a given CheckType. The relam field should equal pg_class.relam,
+ * and the pg_class.relkind should be contained in the relkinds comma separated
+ * list.
+ *
+ * The 'typname' field is not strictly for filtering, but for printing messages
+ * about relations that matched the filter.
+ */
+typedef struct
+{
+ Oid relam;
+ const char *relkinds;
+ const char *typname;
+} CheckTypeFilter;
+
+/* Constants taken from pg_catalog/pg_am.dat */
+#define HEAP_TABLE_AM_OID 2
+#define BTREE_AM_OID 403
+
+static void check_each_database(ConnParams *cparams,
+ const amcheckObjects *objects,
+ const amcheckOptions *checkopts,
+ const char *progname);
+
+static void check_one_database(const ConnParams *cparams,
+ const amcheckObjects *objects,
+ const amcheckOptions *checkopts,
+ const char *progname);
+static void prepare_table_command(PQExpBuffer sql,
+ const amcheckOptions *checkopts, Oid reloid);
+
+static void prepare_btree_command(PQExpBuffer sql,
+ const amcheckOptions *checkopts, Oid reloid);
+static void run_command(PGconn *conn, const char *sql,
+ const amcheckOptions *checkopts, Oid reloid,
+ const char *typ);
+
+static PGresult *VerifyHeapamSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+
+static PGresult *VerifyBtreeSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+
+static void help(const char *progname);
+
+
+static void get_db_regexes_from_fqrps(SimpleStringList *regexes,
+ const SimpleStringList *patterns);
+
+static void get_db_regexes_from_patterns(SimpleStringList *regexes,
+ const SimpleStringList *patterns);
+
+static void dbname_select(PGconn *conn, PQExpBuffer sql,
+ const SimpleStringList *regexes, bool alldb);
+
+static void target_select(PGconn *conn, PQExpBuffer sql,
+ const amcheckObjects *objects,
+ const amcheckOptions *options, const char *progname,
+ bool inclusive);
+
+#endif /* PG_AMCHECK_H */
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..b52039c79b
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,78 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 16;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'qqq' ],
+ qr/database "qqq" does not exist/,
+ 'checking a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-U=no_such_user', 'postgres' ],
+ qr/role "=no_such_user" does not exist/,
+ 'checking with a non-existent user');
+
+#########################################
+# Test checking a database without amcheck installed, by name. We should see a
+# message about missing amcheck
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'template1' ],
+ qr/pg_amcheck: skipping database "template1": amcheck is not installed/,
+ 'checking a database by name without amcheck installed');
+
+#########################################
+# Test checking a database without amcheck installed, by only indirectly using
+# a dbname pattern. In verbose mode, we should see a message about missing
+# amcheck
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, '-v', '-d', '*', 'postgres' ],
+ qr/pg_amcheck: skipping database "template1": amcheck is not installed/,
+ 'checking a database by dbname implication without amcheck installed');
+
+#########################################
+# Test checking non-existent schemas, tables, and indexes
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no_such_schema' ],
+ 'checking a non-existent schema');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no_such_table' ],
+ 'checking a non-existent table');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no_such_index' ],
+ 'checking a non-existent schema');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no*such*schema*' ],
+ 'no matching schemas');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no*such*table*' ],
+ 'no matching tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no*such*index' ],
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..583660f3df
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,428 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 70;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the test node is running.
+sub corrupt_first_page($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# toast table (if any) corresponding to the given main table relation, and
+# restarts the node.
+#
+# Assumes the test node is running
+sub remove_toast_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $toastname = relation_toast($dbname, $relname);
+ remove_relation_file($dbname, $toastname) if ($toastname);
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+for my $dbname (qw(db1 db2 db3))
+{
+ # Create the database
+ $node->safe_psql('postgres', qq(CREATE DATABASE $dbname));
+
+ # Load the amcheck extension, upon which pg_amcheck depends
+ $node->safe_psql($dbname, q(CREATE EXTENSION amcheck));
+
+ # Create schemas, tables and indexes in five separate
+ # schemas. The schemas are all identical to start, but
+ # we will corrupt them differently later.
+ #
+ for my $schema (qw(s1 s2 s3 s4 s5))
+ {
+ $node->safe_psql($dbname, qq(
+ CREATE SCHEMA $schema;
+ CREATE SEQUENCE $schema.seq1;
+ CREATE SEQUENCE $schema.seq2;
+ CREATE TABLE $schema.t1 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE TABLE $schema.t2 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE VIEW $schema.t2_view AS (
+ SELECT i*2, t FROM $schema.t2
+ );
+ ALTER TABLE $schema.t2
+ ALTER COLUMN t
+ SET STORAGE EXTERNAL;
+
+ INSERT INTO $schema.t1 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ INSERT INTO $schema.t2 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ CREATE MATERIALIZED VIEW $schema.t1_mv AS SELECT * FROM $schema.t1;
+ CREATE MATERIALIZED VIEW $schema.t2_mv AS SELECT * FROM $schema.t2;
+
+ create table $schema.p1 (a int, b int) PARTITION BY list (a);
+ create table $schema.p2 (a int, b int) PARTITION BY list (a);
+
+ create table $schema.p1_1 partition of $schema.p1 for values in (1, 2, 3);
+ create table $schema.p1_2 partition of $schema.p1 for values in (4, 5, 6);
+ create table $schema.p2_1 partition of $schema.p2 for values in (1, 2, 3);
+ create table $schema.p2_2 partition of $schema.p2 for values in (4, 5, 6);
+
+ CREATE INDEX t1_btree ON $schema.t1 USING BTREE (i);
+ CREATE INDEX t2_btree ON $schema.t2 USING BTREE (i);
+
+ CREATE INDEX t1_hash ON $schema.t1 USING HASH (i);
+ CREATE INDEX t2_hash ON $schema.t2 USING HASH (i);
+
+ CREATE INDEX t1_brin ON $schema.t1 USING BRIN (i);
+ CREATE INDEX t2_brin ON $schema.t2 USING BRIN (i);
+
+ CREATE INDEX t1_gist ON $schema.t1 USING GIST (b);
+ CREATE INDEX t2_gist ON $schema.t2 USING GIST (b);
+
+ CREATE INDEX t1_gin ON $schema.t1 USING GIN (ia);
+ CREATE INDEX t2_gin ON $schema.t2 USING GIN (ia);
+
+ CREATE INDEX t1_spgist ON $schema.t1 USING SPGIST (ir);
+ CREATE INDEX t2_spgist ON $schema.t2 USING SPGIST (ir);
+ ));
+ }
+}
+
+# Database 'db1' corruptions
+#
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('db1', 's1.t1_btree');
+corrupt_first_page('db1', 's1.t2_btree');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('db1', 's2.t1');
+corrupt_first_page('db1', 's2.t2');
+
+# Corrupt tables, partitions, matviews, and btrees in schema "s3"
+remove_relation_file('db1', 's3.t1');
+corrupt_first_page('db1', 's3.t2');
+
+remove_relation_file('db1', 's3.t1_mv');
+remove_relation_file('db1', 's3.p1_1');
+
+corrupt_first_page('db1', 's3.t2_mv');
+corrupt_first_page('db1', 's3.p2_1');
+
+remove_relation_file('db1', 's3.t1_btree');
+corrupt_first_page('db1', 's3.t2_btree');
+
+# Corrupt toast table, partitions, and materialized views in schema "s4"
+remove_toast_file('db1', 's4.t2');
+
+# Corrupt all other object types in schema "s5". We don't have amcheck support
+# for these types, but we check that their corruption does not trigger any
+# errors in pg_amcheck
+remove_relation_file('db1', 's5.seq1');
+remove_relation_file('db1', 's5.t1_hash');
+remove_relation_file('db1', 's5.t1_gist');
+remove_relation_file('db1', 's5.t1_gin');
+remove_relation_file('db1', 's5.t1_brin');
+remove_relation_file('db1', 's5.t1_spgist');
+
+corrupt_first_page('db1', 's5.seq2');
+corrupt_first_page('db1', 's5.t2_hash');
+corrupt_first_page('db1', 's5.t2_gist');
+corrupt_first_page('db1', 's5.t2_gin');
+corrupt_first_page('db1', 's5.t2_brin');
+corrupt_first_page('db1', 's5.t2_spgist');
+
+
+# Database 'db2' corruptions
+#
+remove_relation_file('db2', 's1.t1');
+remove_relation_file('db2', 's1.t1_btree');
+
+
+# Leave 'db3' uncorrupted
+#
+
+
+# Standard first arguments to TestLib functions
+my @cmd = ('pg_amcheck', '--quiet', '-p', $port);
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ @cmd,, 'db1' ],
+ 'pg_amcheck all schemas, tables and indexes in database db1');
+
+$node->command_ok(
+ [ @cmd,, 'db1', 'db2', 'db3' ],
+ 'pg_amcheck all schemas, tables and indexes in databases db1, db2 and db3');
+
+$node->command_ok(
+ [ @cmd, '--all' ],
+ 'pg_amcheck all schemas, tables and indexes in all databases');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-s', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-r', 's*.t1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-i', 'i*.idx', '-i', 'idx.i*' ],
+ 'pg_amcheck all indexes with qualified names matching /i*.idx/ or /idx.i*/');
+
+$node->command_ok(
+ [ @cmd, '--no-dependents', 'db1', '-r', 's*.t1' ],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/');
+
+$node->command_ok(
+ [ @cmd, '--no-dependents', 'db1', '-t', 's*.t1', '-t', 'foo*.bar*' ],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/ or /foo*.bar*/');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-T', 't1' ],
+ 'pg_amcheck everything except tables named t1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-S', 's1', '-R', 't1' ],
+ 'pg_amcheck everything not named t1 nor in schema s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.*.*' ],
+ 'pg_amcheck all tables across all databases and schemas');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.*.t1' ],
+ 'pg_amcheck all tables named t1 across all databases and schemas');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.s1.*' ],
+ 'pg_amcheck all tables across all databases in schemas named s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', 'db2.*.*' ],
+ 'pg_amcheck all tables across all schemas in database db2');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', 'db2.*.*', '-t', 'db3.*.*' ],
+ 'pg_amcheck all tables across all schemas in databases db2 and db3');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ @cmd, '--all', '-s', 's1', '-i', 't1_btree' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck index s1.t1_btree reports missing main relation fork');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't2_btree' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+# Checking db1.s1 should show no corruptions if indexes are excluded
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/^$/,
+ 'pg_amcheck of db1.s1 excluding indexes');
+
+# But checking across all databases in schema s1 should show corruptions
+# messages for tables in db2
+$node->command_like(
+ [ @cmd, '--all', '-s', 's1', '--exclude-indexes' ],
+ qr/could not open file/,
+ 'pg_amcheck of schema s1 across all databases but excluding indexes');
+
+# Checking across a list of databases should also work
+$node->command_like(
+ [ @cmd, '-d', 'db2', '-d', 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/could not open file/,
+ 'pg_amcheck of schema s1 across db1 and db2 but excluding indexes');
+
+# In schema s3, the tables and indexes are both corrupt. We should see
+# corruption messages on stdout, nothing on stderr, and an exit
+# status of zero.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's3' ],
+ 0,
+ [ qr/index "t1_btree" lacks a main relation fork/,
+ qr/could not open file/ ],
+ [ qr/^$/ ],
+ 'pg_amcheck schema s3 reports table and index errors');
+
+# In schema s2, only tables are corrupt. Check that table corruption is
+# reported as expected.
+#
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't1' ],
+ qr/could not open file/,
+ 'pg_amcheck in schema s2 reports table corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck in schema s2 reports table corruption');
+
+# In schema s4, only toast tables are corrupt. Check that under default
+# options the toast corruption is reported, but when excluding toast we get no
+# error reports.
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's4' ],
+ qr/could not open file/,
+ 'pg_amcheck in schema s4 reports toast corruption');
+
+$node->command_like(
+ [ @cmd, '--exclude-toast', '--exclude-toast-pointers', 'db1', '-s', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck in schema s4 excluding toast reports no corruption');
+
+# Check that no corruption is reported in schema s5
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's5' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s5 reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-I', 't1_btree', '-I', 't2_btree' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s1 with corrupt indexes excluded reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s1 with all indexes excluded reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s2 with corrupt tables excluded reports no corruption');
+
+# Check errors about bad block range command line arguments. We use schema s5
+# to avoid getting messages about corrupt tables or indexes.
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', 'junk' ],
+ qr/relation starting block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage startblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--endblock', '1234junk' ],
+ qr/relation ending block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage endblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', '5', '--endblock', '4' ],
+ qr/relation ending block argument precedes starting block argument/,
+ 'pg_amcheck rejects invalid block range');
+
+# Check bt_index_parent_check alternates. We don't create any index corruption
+# that would behave differently under these modes, so just smoke test that the
+# arguments are handled sensibly.
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--parent-check' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck smoke test --parent-check');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--heapallindexed', '--rootdescend' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck smoke test --heapallindexed --rootdescend');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..cd21874735
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,496 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper function to generate a regular expression matching the header we
+# expect verify_heapam() to return given which fields we expect to be non-null.
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ return qr/relation public\.test, block $blkno, offset $offnum, attribute $attnum\s+/ms
+ if (defined $attnum);
+ return qr/relation public\.test, block $blkno, offset $offnum\s+/ms
+ if (defined $offnum);
+ return qr/relation public\.test\s+/ms
+ if (defined $blkno);
+ return qr/relation public\.test\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, undef);
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--exclude-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..379225cbf8
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..7e101f7c11 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -185,6 +185,7 @@ pages.
</para>
&oid2name;
+ &pgamcheck;
&vacuumlo;
</sect1>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db1d369743..5115cb03d0 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..2b2c73ca8b
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,1004 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<refentry id="pgamcheck">
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle><application>pg_amcheck</application></refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>pg_amcheck</refname>
+ <refpurpose>checks for corruption in one or more <productname>PostgreSQL</productname> databases</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_amcheck</command>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+ <arg rep="repeat"><replaceable>dbname</replaceable></arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <application>pg_amcheck</application> supports running
+ <xref linkend="amcheck"/>'s corruption checking functions against one or more
+ databases, with options to select which schemas, tables and indexes to check,
+ which kinds of checking to perform, and whether to perform the checks in
+ parallel, and if so, the number of parallel connections to establish and use.
+ </para>
+
+ <para>
+ Only table relations and btree indexes are currently supported. Other
+ relation types are silently skipped.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Usage</title>
+
+ <refsect2>
+ <title>Parallelism Options</title>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --jobs=20 --all</literal></term>
+ <listitem>
+ <para>
+ Check all databases one after another, but for each database checked,
+ use up to 20 simultaneous connections to check relations in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --jobs=8 mydb yourdb</literal></term>
+ <listitem>
+ <para>
+ Check databases <literal>mydb</literal> and <literal>yourdb</literal>
+ one after another, using up to 8 simultaneous connections to check
+ relations in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Checking Option Specification</title>
+
+ <para>
+ If no checking options are specified, by default all table relation checks
+ and default level btree index checks are performed. A variety of options
+ exist to change the set of checks performed on whichever relations are
+ being checked. They are briefly mentioned here in the following examples,
+ but see their full descriptions below.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --parent-check --heapallindexed</literal></term>
+ <listitem>
+ <para>
+ For each btree index checked, performs more extensive checks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --exclude-toast-pointers</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, do not check toast pointers against
+ the toast relation.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --on-error-stop</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, do not continue checking pages after
+ the first page where corruption is encountered.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --skip="all-frozen"</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, skips over blocks marked as all
+ frozen. Note that "all-visible" may also be specified.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --startblock=3000 --endblock=4000</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, check only blocks in the given block
+ range.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Relation Specification</title>
+
+ <para>
+ If no relations are explicitly listed, by default all relations will be
+ checked, but there are options to specify which relations to check.
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck -r mytable -r yourtable</literal></term>
+ <listitem>
+ <para>
+ If one or more relations are explicitly given, they are interpreted as
+ an exhaustive list of all relations to be checked, with one caveat:
+ for all such relations, associated toast relations and indexes are by
+ default included in the list of relations to check.
+ </para>
+ <para>
+ Assuming <literal>mytable</literal> is an ordinary table, and that it
+ is indexed by <literal>mytable_idx</literal> and has an associated
+ toast table <literal>pg_toast_12345</literal>, checking will be
+ performed on <literal>mytable</literal>,
+ <literal>mytable_idx</literal>, and <literal>pg_toast_12345</literal>.
+ </para>
+ <para>
+ Likewise for <literal>yourtable</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r mytable --no-dependents</literal></term>
+ <listitem>
+ <para>
+ This restricts the list of relations checked to just
+ <literal>mytable</literal>, without pulling in the corresponding
+ indexes or toast, but see also
+ <option>--exclude-toast-pointers</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -t mytable -i myindex</literal></term>
+ <listitem>
+ <para>
+ The <option>-r</option> (<option>--relation</option>) will match any
+ relation, but <option>-t</option> (<option>--table</option>) and
+ <option>-i</option> (<option>--index</option>) may be used to avoid
+ matching objects of the other type.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r="my*" -R="mytemp*"</literal></term>
+ <listitem>
+ <para>
+ Relations may be included (<option>-r</option>) or excluded
+ (<option>-R</option>) using shell-style patterns.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r="my*" -I="myanmar"</literal></term>
+ <listitem>
+ <para>
+ Table and index inclusion and exclusion patterns may be used
+ equivalently with <option>-t</option>, <option>-T</option>,
+ <option>-i</option> and <option>-I</option>. The above example checks
+ all tables and indexes starting with <literal>my</literal> except for
+ indexes starting with <literal>myanmar</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -R="india" -T="laos" -I="myanmar"</literal></term>
+ <listitem>
+ <para>
+ Unlike specifying one ore more <option>--relation</option> options, which
+ disables the default behavior of checking all relations, specifying one or
+ more of <option>-R</option>, <option>-T</option> or <option>-I</option> does not.
+ The above command will check all relations not named
+ <literal>india</literal>, not a table named
+ <literal>laos</literal>, nor an index named <literal>myanmar</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Schema Specification</title>
+
+ <para>
+ If no schemas are explicitly listed, by default all schemas except
+ <literal>pg_catalog</literal> and <literal>pg_toast</literal> will be
+ checked.
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck -s s1 -s s2 -r mytable</literal></term>
+ <listitem>
+ <para>
+ If one or more schemas are listed with <option>-s</option>, unqualified
+ relation names will be checked only in the given schemas. The above
+ command will check tables <literal>s1.mytable</literal> and
+ <literal>s2.mytable</literal> but not tables named
+ <literal>mytable</literal> in other schemas.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -S s1 -S s2 -r mytable</literal></term>
+ <listitem>
+ <para>
+ As with relations, schemas may be excluded. The above command will
+ check any table named <literal>mytable</literal> not in schemas
+ <literal>s1</literal> and <literal>s2</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -S s1 -S s2 -r mytable -t s1.stuff</literal></term>
+ <listitem>
+ <para>
+ Relations may be included or excluded with a schema-qualified name
+ without interference from the <option>-s</option> or
+ <option>-S</option> options. Even though schema <literal>s1</literal>
+ has been excluded, the table <literal>s1.stuff</literal> will be
+ checked.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Database Specification</title>
+
+ <para>
+ If no databases are explicitly listed, the database to check is obtained
+ from environment variables in the usual way. Otherwise, when one or more
+ databases are explicitly given, they are interpreted as an exhaustive list
+ of all databases to be checked. This list of databases to check may
+ contain patterns, but because any such patterns need to be reconciled
+ against a list of all databases to find the matching database names, at
+ least one database specified must be a literal database name and not merely
+ a pattern, and the one so specified must be in a location where
+ <application>pg_amcheck</application> expects to find it.
+ </para>
+ <para>
+ For example:
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --all --maintenance-db=foo</literal></term>
+ <listitem>
+ <para>
+ If the <option>--maintenance-db</option> option is given, it will be
+ used to look up the matching databases, though it will not itself be
+ added to the list of databases for checking.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck foo bar baz</literal></term>
+ <listitem>
+ <para>
+ Otherwise, if one or more plain database name arguments not preceded by
+ <option>-d</option> or <option>--dbname</option> are given, the first
+ one will be used for this purpose, and it will also be included in the
+ list of databases to check.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -d foo -d bar baz</literal></term>
+ <listitem>
+ <para>
+ If a mixture of plain database names and databases preceded with
+ <option>-d</option> or <option>--dbname</option> are given, the first
+ plain database name will be used for this purpose. In the above
+ example, <literal>baz</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --dbname=foo --dbname="bar*"</literal></term>
+ <listitem>
+ <para>
+ Otherwise, if one or more databases are given with the
+ <option>-d</option> or <option>--dbname</option> option, the first one
+ will be used and must be a literal database name. In this example,
+ <literal>foo</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --relation="accounts_*.*.*"</literal></term>
+ <listitem>
+ <para>
+ Otherwise, the environment will be consulted for the database to be
+ used. In the example above, the default database will be queried to
+ find all databases with names that begin with
+ <literal>accounts_</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ As discussed above for schema-qualified relations, a database-qualified
+ relation name or pattern may also be given.
+<programlisting>
+pg_amcheck mydb \
+ --schema="t*" \
+ --exclude-schema="tmp*" \
+ --relation=baz \
+ --relation=bar.baz \
+ --relation=foo.bar.baz \
+ --relation="f*".a.b \
+ --exclude-relation=foo.a.b
+</programlisting>
+ will check relations in database <literal>mydb</literal> using the schema
+ resolution rules discussed above, but additionally will check all relations
+ named <literal>a.b</literal> in all databases with names starting with
+ <literal>f</literal> except database <literal>foo</literal>.
+ </para>
+
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Options</title>
+
+ <para>
+ <application>pg_amcheck</application> accepts the following command-line arguments:
+ </para>
+
+ <refsect2>
+ <title>Help and Version Information Options</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_amcheck</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Print the <application>pg_amcheck</application> version and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-e</option></term>
+ <term><option>--echo</option></term>
+ <listitem>
+ <para>
+ Print to stdout all commands and queries being executed against the
+ server.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-q</option></term>
+ <term><option>--quiet</option></term>
+ <listitem>
+ <para>
+ Do not write additional messages beyond those about corruption.
+ </para>
+ <para>
+ This option does not quiet any output specifically due to the use of
+ the <option>-e</option> <option>--echo</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Increases the log level verbosity. This option may be given more than
+ once.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Database Connection and Concurrent Connection Options</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-h</option></term>
+ <term><option>--host=HOSTNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server is running.
+ If the value begins with a slash, it is used as the directory for the
+ Unix domain socket.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-p</option></term>
+ <term><option>--port=PORT</option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file extension on
+ which the server is listening for connections.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-U</option></term>
+ <term><option>--username=USERNAME</option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires password
+ authentication and a password is not available by other means such as
+ a <filename>.pgpass</filename> file, the connection attempt will fail.
+ This option can be useful in batch jobs and scripts where no user is
+ present to enter a password.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_amcheck</application> to prompt for a password
+ before connecting to a database.
+ </para>
+ <para>
+ This option is never essential, since
+ <application>pg_amcheck</application> will automatically prompt for a
+ password if the server demands password authentication. However,
+ <application>pg_amcheck</application> will waste a connection attempt
+ finding out that the server wants a password. In some cases it is
+ worth typing <option>-W</option> to avoid the extra connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--maintenance-db=DBNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to connect to when querying the
+ list of all databases. If not specified, the
+ <literal>postgres</literal> database will be used; if that does not
+ exist <literal>template1</literal> will be used. This can be a
+ <link linkend="libpq-connstring">connection string</link>. If so,
+ connection string parameters will override any conflicting command
+ line options.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-j</option></term>
+ <term><option>--jobs=NUM</option></term>
+ <listitem>
+ <para>
+ Use the specified number of concurrent connections to the server, or
+ one per object to be checked, whichever number is smaller.
+ </para>
+ <para>
+ When used in conjunction with the <option>-a</option>
+ <option>--all</option> option, the total number of objects to check,
+ and correspondingly the number of concurrent connections to use, is
+ recalculated per database. If the number of objects to check differs
+ from one database to the next and is less than the concurrency level
+ specified, the number of concurrent connections open to the server
+ will fluctuate to meet the needs of each database processed.
+ </para>
+ <para>
+ The default is to use a single connection.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Options Controlling Index Checking Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-P</option></term>
+ <term><option>--parent-check</option></term>
+ <listitem>
+ <para>
+ For each btree index checked, use <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> function, which performs
+ additional checks of parent/child relationships during index checking.
+ </para>
+ <para>
+ The default is to use <application>amcheck</application>'s
+ <function>bt_index_check</function> function, but note that use of the
+ <option>--rootdescend</option> option implicitly
+ selects <function>bt_index_parent_check</function>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-H</option></term>
+ <term><option>--heapallindexed</option></term>
+ <listitem>
+ <para>
+ For each index checked, verify the presence of all heap tuples as index
+ tuples in the index using <application>amcheck</application>'s
+ <option>heapallindexed</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--rootdescend</option></term>
+ <listitem>
+ <para>
+ For each index checked, re-find tuples on the leaf level by performing
+ a new search from the root page for each tuple using
+ <xref linkend="amcheck"/>'s <option>rootdescend</option> option.
+ </para>
+ <para>
+ Use of this option implicitly also selects the <option>-P</option>
+ <option>--parent-check</option> option.
+ </para>
+ <para>
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited use or even
+ of no use in helping detect the kinds of corruption that occur in
+ practice. It may also cause corruption checking to take considerably
+ longer and consume considerably more resources on the server.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Options Controlling Table Checking Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>--exclude-toast-pointers</option></term>
+ <listitem>
+ <para>
+ When checking main relations, do not look up entries in toast tables
+ corresponding to toast pointers in the main releation.
+ </para>
+ <para>
+ The default behavior checks each toast pointer encountered in the main
+ table to verify, as much as possible, that the pointer points at
+ something in the toast table that is reasonable. Toast pointers which
+ point beyond the end of the toast table, or to the middle (rather than
+ the beginning) of a toast entry, are identified as corrupt.
+ </para>
+ <para>
+ The process by which <xref linkend="amcheck"/>'s
+ <function>verify_heapam</function> function checks each toast pointer
+ is slow and may be improved in a future release. Some users may wish
+ to disable this check to save time.
+ </para>
+ <para>
+ Note that, despite their similar names, this option is unrelated to the
+ <option>--exclude-toast</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--on-error-stop</option></term>
+ <listitem>
+ <para>
+ After reporting all corruptions on the first page of a table where
+ corruptions are found, stop processing that table relation and move on
+ to the next table or index.
+ </para>
+ <para>
+ Note that index checking always stops after the first corrupt page.
+ This option only has meaning relative to table relations.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--skip=OPTION</option></term>
+ <listitem>
+ <para>
+ If <literal>"all-frozen"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all frozen.
+ </para>
+ <para>
+ If <literal>"all-visible"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all visible.
+ </para>
+ <para>
+ By default, no pages are skipped.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--startblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) pages prior to the given starting block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--endblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) all pages after the given ending block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Corruption Checking Target Options</title>
+
+ <para>
+ Objects to be checked may span schemas in more than one database. Options
+ for restricting the list of databases, schemas, tables and indexes are
+ described below. In each place where a name may be specified, a
+ <link linkend="app-psql-patterns"><replaceable class="parameter">pattern</replaceable></link>
+ may also be used.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><option>--all</option></term>
+ <listitem>
+ <para>
+ Perform checking in all databases.
+ </para>
+ <para>
+ In the absence of any other options, selects all objects across all
+ schemas and databases.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-db</option> takes
+ precedence over <option>--all</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-d</option></term>
+ <term><option>--dbname</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for checking. By default, all objects in
+ the matching database(s) will be checked.
+ </para>
+ <para>
+ If no <option>maintenance-db</option> argument is given nor is any
+ database name given as a command line argument, the first argument
+ specified with <option>-d</option> <option>--dbname</option> will be
+ used for the initial connection. If that argument is not a literal
+ database name, the attempt to connect will fail.
+ </para>
+ <para>
+ If <option>--all</option> is also specified, <option>-d</option>
+ <option>--dbname</option> does not affect which databases are checked,
+ but may be used to specify the database for the initial connection.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-db</option> takes
+ precedence over <option>-d</option> <option>--dbname</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--dbname=africa</literal></member>
+ <member><literal>--dbname="a*"</literal></member>
+ <member><literal>--dbname="africa|asia|europe"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-D</option></term>
+ <term><option>--exclude-db</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for exclusion.
+ </para>
+ <para>
+ If a database which is included using <option>--all</option> or
+ <option>-d</option> <option>--dbname</option> is also excluded using
+ <option>-D</option> <option>--exclude-db</option>, the database will be
+ excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--exclude-db=america</literal></member>
+ <member><literal>--exclude-db="*pacific*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--schema</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified schema(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for checking. By default, all objects in
+ the matching schema(s) will be checked.
+ </para>
+ <para>
+ Option <option>-S</option> <option>--exclude-schema</option> takes
+ precedence over <option>-s</option> <option>--schema</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--schema=corp</literal></member>
+ <member><literal>--schema="corp|llc|npo"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-S</option></term>
+ <term><option>--exclude-schema</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified schema.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for exclusion.
+ </para>
+ <para>
+ If a schema which is included using
+ <option>-s</option> <option>--schema</option> is also excluded using
+ <option>-S</option> <option>--exclude-schema</option>, the schema will be
+ excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>-S corp -S llc</literal></member>
+ <member><literal>--exclude-schema="*c*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-r</option></term>
+ <term><option>--relation</option></term>
+ <listitem>
+ <para>
+ Perform checking on the specified relation(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ relation (or relation pattern) for checking.
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>.
+ </para>
+ <para>
+ If the relation is not schema qualified, database and schema
+ inclusion/exclusion lists will determine in which databases or schemas
+ matching relations will be checked.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--relation=accounts_idx</literal></member>
+ <member><literal>--relation="llc.accounts_idx"</literal></member>
+ <member><literal>--relation="asia|africa.corp|llc.accounts_idx"</literal></member>
+ </simplelist>
+ </para>
+ <para>
+ The first example, <literal>--relation=accounts_idx</literal>, checks
+ relations named <literal>accounts_idx</literal> in all selected schemas
+ and databases.
+ </para>
+ <para>
+ The second example, <literal>--relation="llc.accounts_idx"</literal>,
+ checks relations named <literal>accounts_idx</literal> in schema
+ <literal>llc</literal> in all selected databases.
+ </para>
+ <para>
+ The third example,
+ <literal>--relation="asia|africa.corp|llc.accounts_idx"</literal>,
+ checks relations named <literal>accounts_idx</literal> in
+ schemas <literal>corp</literal> and <literal>llc</literal> in databases
+ <literal>asia</literal> and <literal>africa</literal>.
+ </para>
+ <para>
+ Note that if a database is implicated in a relation pattern, such as
+ <literal>asia</literal> and <literal>africa</literal> in the third
+ example above, the database need not be otherwise given in the command
+ arguments for the relation to be checked. As an extreme example of
+ this:
+ <simplelist>
+ <member><literal>pg_amcheck --relation="*.*.*" mydb</literal></member>
+ </simplelist>
+ will check all relations in all databases. The <literal>mydb</literal>
+ argument only serves to tell <application>pg_amcheck</application> the
+ name of the database to use for querying the list of all databases.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-R</option></term>
+ <term><option>--exclude-relation</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified relation(s).
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>,
+ <option>-t</option> <option>--table</option> and <option>-i</option>
+ <option>--index</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t</option></term>
+ <term><option>--table</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified tables(s). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T</option></term>
+ <term><option>--exclude-table</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified tables(s). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-i</option></term>
+ <term><option>--index</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified index(es). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-I</option></term>
+ <term><option>--exclude-index</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified index(es). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-dependents</option></term>
+ <listitem>
+ <para>
+ When calculating the list of objects to be checked, do not automatically
+ expand the list to include associated indexes and toast tables of
+ elements otherwise in the list.
+ </para>
+ <para>
+ By default, for each main table relation checked, any associated toast
+ table and all associated indexes are also checked, unless explicitly
+ excluded.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+
+ <para>
+ <application>pg_amcheck</application> is designed to work with
+ <productname>PostgreSQL</productname> 14.0 and later.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Author</title>
+
+ <para>
+ Mark Dilger <email>mark.dilger@enterprisedb.com</email>
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="amcheck"/></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index ea3af48777..6eba8e1870 100644
--- a/src/tools/msvc/Install.pm
+++ b/src/tools/msvc/Install.pm
@@ -18,11 +18,11 @@ our (@ISA, @EXPORT_OK);
@EXPORT_OK = qw(Install);
my $insttype;
-my @client_contribs = ('oid2name', 'pgbench', 'vacuumlo');
+my @client_contribs = ('oid2name', 'pg_amcheck', 'pgbench', 'vacuumlo');
my @client_program_files = (
'clusterdb', 'createdb', 'createuser', 'dropdb',
'dropuser', 'ecpg', 'libecpg', 'libecpg_compat',
- 'libpgtypes', 'libpq', 'pg_basebackup', 'pg_config',
+ 'libpgtypes', 'libpq', 'pg_amcheck', 'pg_basebackup', 'pg_config',
'pg_dump', 'pg_dumpall', 'pg_isready', 'pg_receivewal',
'pg_recvlogical', 'pg_restore', 'psql', 'reindexdb',
'vacuumdb', @client_contribs);
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 941d168e19..6e37653e1c 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4d0d09a5dd..26920cc512 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -336,6 +336,8 @@ CheckPointStmt
CheckpointStatsData
CheckpointerRequest
CheckpointerShmemStruct
+CheckType
+CheckTypeFilter
Chromosome
CkptSortItem
CkptTsStatus
@@ -2847,6 +2849,8 @@ ambuildempty_function
ambuildphasename_function
ambulkdelete_function
amcanreturn_function
+amcheckObjects
+amcheckOptions
amcostestimate_function
amendscan_function
amestimateparallelscan_function
--
2.21.1 (Apple Git-122.3)
v35-0005-Extending-PostgresNode-to-test-corruption.patchapplication/octet-stream; name=v35-0005-Extending-PostgresNode-to-test-corruption.patch; x-unix-mode=0644Download
From d583153cd4c605e3623279f449bc96d67ff87d50 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Sun, 31 Jan 2021 13:12:05 -0800
Subject: [PATCH v35 5/5] Extending PostgresNode to test corruption.
PostgresNode now has functions for overwriting relation files
with full or partial prior versions of those files, creating
corruption beyond merely twiddling the bits of a heap relation
file.
Adding a regression test for pg_amcheck based on this new
functionality.
---
contrib/pg_amcheck/t/006_relfile_damage.pl | 131 +++++++++
src/test/modules/Makefile | 1 +
src/test/modules/corruption/Makefile | 16 ++
.../modules/corruption/t/001_corruption.pl | 83 ++++++
src/test/perl/PostgresNode.pm | 263 ++++++++++++++++++
5 files changed, 494 insertions(+)
create mode 100644 contrib/pg_amcheck/t/006_relfile_damage.pl
create mode 100644 src/test/modules/corruption/Makefile
create mode 100644 src/test/modules/corruption/t/001_corruption.pl
diff --git a/contrib/pg_amcheck/t/006_relfile_damage.pl b/contrib/pg_amcheck/t/006_relfile_damage.pl
new file mode 100644
index 0000000000..6591b812d3
--- /dev/null
+++ b/contrib/pg_amcheck/t/006_relfile_damage.pl
@@ -0,0 +1,131 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 27;
+use PostgresNode;
+
+my ($node, $port);
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create a table with a btree index. Use a fillfactor for the table and index
+# that will allow some fraction of updates to be on the original pages and some
+# on new pages.
+#
+$node->safe_psql('postgres', qq(
+create schema t;
+create table t.t1 (id integer, t text) with (fillfactor=75);
+alter table t.t1 alter column t set storage external;
+insert into t.t1 select gs, repeat('x',gs) from generate_series(9990,10000) gs;
+create index t1_idx on t.t1 (id) with (fillfactor=75);
+));
+
+my $toastrel = relation_toast('postgres', 't.t1');
+
+# Flush relation files to disk and take snapshots of the toast and index
+#
+$node->restart;
+$node->take_relfile_snapshot_minimal('postgres', 'idx', 't.t1_idx');
+$node->take_relfile_snapshot_minimal('postgres', 'toast', $toastrel);
+
+# Insert new data into the table and index
+#
+$node->safe_psql('postgres', qq(
+insert into t.t1 select gs, repeat('y',gs) from generate_series(10001,10100) gs;
+));
+
+# Revert index. The reverted snapshot file is not corrupt, but it also
+# does not match the current contents of the table.
+#
+$node->stop;
+$node->revert_to_snapshot('idx');
+
+# Restart the node and check table and index with varying options.
+#
+$node->start;
+
+# Checks which do not reconcile the index and table via --heapallindexed will
+# not notice any problems
+#
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--parent-check' ],
+ qr/^$/,
+ 'pg_amcheck with torn pages with --parent-check');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--rootdescend' ],
+ qr/^$/,
+ 'pg_amcheck with torn pages with --rootdescend');
+
+# Checks which do reconcile the index and table via --heapallindexed will
+# notice the mismatch in their contents
+#
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed' ],
+ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/,
+ 'pg_amcheck with torn pages with --heapallindexed');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed', '--rootdescend' ],
+ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/,
+ 'pg_amcheck with torn pages with --heapallindexed --rootdescend');
+
+# Revert the toast. The reverted toast table is not corrupt, but it does not
+# have entries for all toast pointers in the main table
+#
+$node->stop;
+$node->revert_to_snapshot('toast');
+
+# Restart the node and check table and toast with varying options.
+#
+$node->start;
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', $toastrel ],
+ qr/^$/,
+ 'pg_amcheck reverted toast table');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--exclude-toast-pointers' ],
+ qr/^$/,
+ 'pg_amcheck with reverted toast using --exclude-toast-pointers');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/ERROR: could not read block/,
+ 'pg_amcheck with reverted toast and default checking');
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 59921b46cf..6698a132de 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -7,6 +7,7 @@ include $(top_builddir)/src/Makefile.global
SUBDIRS = \
brin \
commit_ts \
+ corruption \
delay_execution \
dummy_index_am \
dummy_seclabel \
diff --git a/src/test/modules/corruption/Makefile b/src/test/modules/corruption/Makefile
new file mode 100644
index 0000000000..ba461c645d
--- /dev/null
+++ b/src/test/modules/corruption/Makefile
@@ -0,0 +1,16 @@
+# src/test/modules/corruption/Makefile
+
+# EXTRA_INSTALL = contrib/pg_amcheck
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/corruption
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/corruption/t/001_corruption.pl b/src/test/modules/corruption/t/001_corruption.pl
new file mode 100644
index 0000000000..ae4a262e06
--- /dev/null
+++ b/src/test/modules/corruption/t/001_corruption.pl
@@ -0,0 +1,83 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 10;
+use PostgresNode;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create something non-trivial for the first snapshot
+$node->safe_psql('postgres', qq(
+create table t1 (id integer, short_text text, long_text text);
+insert into t1 (id, short_text, long_text)
+ (select gs, 'foo', repeat('x', gs)
+ from generate_series(1,10000) gs);
+create unique index idx1 on t1 (id, short_text);
+vacuum freeze;
+));
+
+# Flush relation files to disk and take snapshot of them
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap1', 'public.t1');
+
+# Update data in the table, toast table, and index
+$node->safe_psql('postgres', qq(
+update t1 set
+ short_text = 'bar',
+ long_text = repeat('y', id);
+));
+
+# Flush relation files to disk and take second snapshot
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap2', 'public.t1');
+
+# Revert the first page of t1 using a torn snapshot. This should be a partial
+# and corrupt reverting of the update.
+$node->stop;
+$node->revert_to_torn_relfile_snapshot('snap1', 8192);
+
+# Restart the node and count the number of rows in t1 with the original
+# (pre-update) values. It should not be zero, but nor will it be the full
+# 10000.
+$node->start;
+my ($old, $new, $oldtoast, $newtoast) = counts();
+ok($old > 0 && $old < 10000, "Torn snapshot reverts some of the main updates");
+ok($new > 0 && $new <= 10000, "Torn snapshot retains some of the main updates");
+
+# Revert t1 fully to the first snapshot. This should fully restore the
+# original (pre-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap1');
+
+# Restart the node and verify only old values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 10000, "Full snapshot restores all the old main values");
+is($oldtoast, 10000, "Full snapshot restores all the old toast values");
+is($new, 0, "Full snapshot reverts all the new main values");
+is($newtoast, 0, "Full snapshot reverts all the new toast values");
+
+# Restore t1 fully to the second snapshot. This should fully restore the
+# new (post-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap2');
+
+# Restart the node and verify only new values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 0, "Full snapshot reverts all the old main values");
+is($oldtoast, 0, "Full snapshot reverts all the old toast values");
+is($new, 10000, "Full snapshot restores all the new main values");
+is($newtoast, 10000, "Full snapshot restores all the new toast values");
+
+sub counts {
+ return map {
+ $node->safe_psql('postgres', qq(select count(*) from t1 where $_))
+ } ("short_text = 'foo'",
+ "short_text = 'bar'",
+ "long_text ~ 'x'",
+ "long_text ~ 'y'");
+}
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..92293613cc 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2225,6 +2225,269 @@ sub pg_recvlogical_upto
=back
+=head1 DATABASE CORRUPTION METHODS
+
+=over
+
+These routines are able to corrupt a PostgreSQL node.
+
+=item $node->relfile_snapshot_repository()
+
+The path to the parent directory of all directories storing snapshots of
+relation backing files.
+
+=cut
+
+sub relfile_snapshot_repository
+{
+ my ($self) = @_;
+ my $snaprepo = join('/', $self->basedir, 'snapshot');
+ unless (-d $snaprepo)
+ {
+ mkdir $snaprepo
+ or $!{EEXIST}
+ or BAIL_OUT("could not create snapshot repository directory \"$snaprepo\": $!");
+ }
+ return $snaprepo;
+}
+
+=pod
+
+=item $node->relfile_snapshot_directory(snapname)
+
+The path to the directory for storing the named snapshot.
+
+=cut
+
+sub relfile_snapshot_directory
+{
+ my ($self, $snapname) = @_;
+
+ join("/", $self->relfile_snapshot_repository(), $snapname);
+}
+
+=pod
+
+=item $node->take_relfile_snapshot($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relname>, the associated
+toast relations (if any), and all associated indexes (if any). No attempt is
+made to flush these files to disk, meaning the snapshot taken could be stale
+unless the caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relations.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+=pod
+
+=item $node->take_relfile_snapshot_minimal($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relnames>. No attempt is made
+to flush these files to disk, meaning the snapshot taken could be stale unless the
+caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relation.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+sub take_relfile_snapshot
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 1, @relnames);
+}
+
+sub take_relfile_snapshot_minimal
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 0, @relnames);
+}
+
+sub take_relfile_snapshot_helper
+{
+ my ($self, $dbname, $snapname, $extended, @relnames) = @_;
+
+ croak "dbname must be specified" unless defined $dbname;
+ croak "relnames must be defined" unless scalar(grep { defined $_ } @relnames);
+ croak "snapname must be specified" unless defined $snapname;
+ croak "snapname must be unique" if exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snapdir = $self->relfile_snapshot_directory($snapname);
+ croak "snapname directory name already in use: $snapdir" if (-e $snapdir);
+ mkdir $snapdir
+ or BAIL_OUT("could not create snapshot directory \"$snapdir\": $!");
+
+ my @relpaths = map {
+ $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$_')));
+ } @relnames;
+
+ my (@toastpaths, @idxpaths);
+ if ($extended)
+ {
+ for my $relname (@relnames)
+ {
+ push (@toastpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(c.reltoastrelid)
+ FROM pg_catalog.pg_class c
+ WHERE c.oid = '$relname'::regclass
+ AND c.reltoastrelid != 0::oid))));
+ push (@idxpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(i.indexrelid)
+ FROM pg_catalog.pg_index i
+ WHERE i.indrelid = '$relname'::regclass))));
+ }
+ }
+
+ $self->{snapshot}->{$snapname} = {};
+ for my $path (@relpaths, grep { defined($_) } @toastpaths, @idxpaths)
+ {
+ croak "file backing relation is missing: $pgdata/$path" unless -f "$pgdata/$path";
+ copy_file($snapdir, $pgdata, 0, $path);
+ $self->{snapshot}->{$snapname}->{$path} = 1;
+ }
+}
+
+=pod
+
+=item $node->revert_to_snapshot($self, $snapname)
+
+Overwrites the database's relation files with files previously saved in
+B<$snapname>.
+
+Dies if the given B<$snapname> does not exist.
+
+=cut
+
+=pod
+
+=item $node->revert_to_torn_relfile_snapshot($self, $snapname, $bytes)
+
+Partially overwrites the database's relation files using prefixes of the given
+number of bytes from the files saved in B<$snapname>. If B<$bytes> is
+negative, uses suffixes of the given byte length rather than prefixes.
+
+If B<$bytes> is null, replaces the database's relation files using the saved
+files in the B<$snapname>, which unlike for non-undef values, means the file
+may become shorter if the saved file is shorter than the current file.
+
+=cut
+
+sub revert_to_snapshot
+{
+ my ($self, $snapname) = @_;
+ $self->revert_to_torn_relfile_snapshot($snapname, undef);
+}
+
+sub revert_to_torn_relfile_snapshot
+{
+ my ($self, $snapname, $bytes) = @_;
+
+ croak "no such snapshot" unless exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snaprepo = join('/', $self->relfile_snapshot_repository, $snapname);
+ croak "snapname directory missing: $snaprepo" unless (-d $snaprepo);
+
+ if (defined $bytes)
+ {
+ tear_file($pgdata, $snaprepo, $bytes, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+ else
+ {
+ copy_file($pgdata, $snaprepo, 1, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+}
+
+sub copy_file
+{
+ my ($dstdir, $srcdir, $overwrite, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ foreach my $part (split(m{/}, $path))
+ {
+ my $srcpart = "$srcdir/$part";
+ my $dstpart = "$dstdir/$part";
+
+ if (-d $srcpart)
+ {
+ $srcdir = $srcpart;
+ $dstdir = $dstpart;
+ die "$dstdir is in the way" if (-e $dstdir && ! -d $dstdir);
+ unless (-d $dstdir)
+ {
+ mkdir $dstdir
+ or BAIL_OUT("could not create directory \"$dstdir\": $!");
+ }
+ }
+ elsif (-f $srcpart)
+ {
+ die "$dstdir/$part is in the way" if (!$overwrite && -e "$dstdir/$part");
+
+ File::Copy::copy($srcpart, "$dstdir/$part");
+ }
+ }
+}
+
+sub tear_file
+{
+ my ($dstdir, $srcdir, $bytes, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ my $srcfile = "$srcdir/$path";
+ my $dstfile = "$dstdir/$path";
+
+ croak "No such file: $srcfile" unless -f $srcfile;
+ croak "No such file: $dstfile" unless -f $dstfile;
+
+ my ($srcfh, $dstfh);
+ open($srcfh, '<', $srcfile) or die "Cannot read $srcfile: $!";
+ open($dstfh, '+<', $dstfile) or die "Cannot modify $dstfile: $!";
+ binmode($srcfh);
+ binmode($dstfh);
+
+ my $buffer;
+ if ($bytes < 0)
+ {
+ $bytes *= -1; # Easier to use positive value
+ my $srcsize = (stat($srcfh))[7];
+ my $offset = $srcsize - $bytes;
+ seek($srcfh, $offset, 0);
+ seek($dstfh, $offset, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+ else
+ {
+ seek($srcfh, 0, 0);
+ seek($dstfh, 0, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+
+ close($srcfh);
+ close($dstfh);
+}
+
+=pod
+
+=back
+
=cut
1;
--
2.21.1 (Apple Git-122.3)
On Jan 31, 2021, at 4:05 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
Attached is patch set 35.
I found some things to improve in the v35 patch set. Please find attached the v36 patch set, which differs from v35 in the following ways:
0001 -- no changes
0002 -- fixing omissions in @pgfeutilsfiles in file src/tools/msvc/Mkvcbuild.pm
0003 -- no changes
0004:
-- Fixes handling of amcheck contrib module installed in non-default schema.
-- Adds database name to corruption messages to make identifying the relation being complained about unambiguous in multi-database checks
-- Fixes an instance where pg_amcheck was querying pg_database without schema-qualifying it
-- Simplifying some functions in pg_amcheck.c
-- Updating a comment to reflect the renaming of a variable that the comment mentioned by name
0005 -- fixes =pod added in PostgresNode.pm. The =pod was grammatically correct so far I can tell, but rendered strangely in perldoc.
Attachments:
v36-0001-Refactoring-processSQLNamePattern.patchapplication/octet-stream; name=v36-0001-Refactoring-processSQLNamePattern.patch; x-unix-mode=0644Download
From 5375f952570ff75a56e916de13e63c43c3ba2604 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:29:11 -0800
Subject: [PATCH v36 1/5] Refactoring processSQLNamePattern.
Factoring out logic which transforms shell-style patterns into SQL
style regexp format from inside processSQLNamePattern into a
separate new function "patternToSQLRegex". The interface and
semantics of processSQLNamePattern are unchanged.
The motivation for the refactoring is that processSQLNamePattern
mixes the job of transforming the pattern with the job of
constructing a where-clause based on a single pattern, which makes
the code hard to reuse from other places.
The new helper function patternToSQLRegex can handle parsing of
patterns of the form "database.schema.relation", "schema.relation",
and "relation". The three-part form is unused in this commit, as
the pre-existing patternToSQLRegex function ignores the dbname
functionality, and there are not yet any other callers. The
three-part form will be used by pg_amcheck, not yet committed, to
allow specifying on the command line the inclusion and exclusion of
relations spanning multiple databases.
---
src/fe_utils/string_utils.c | 260 +++++++++++++++++-----------
src/include/fe_utils/string_utils.h | 4 +
2 files changed, 167 insertions(+), 97 deletions(-)
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index a1a9d691d5..9a1ea9ab98 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -831,10 +831,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
PQExpBufferData schemabuf;
PQExpBufferData namebuf;
- int encoding = PQclientEncoding(conn);
- bool inquotes;
- const char *cp;
- int i;
bool added_clause = false;
#define WHEREAND() \
@@ -856,98 +852,12 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
initPQExpBuffer(&namebuf);
/*
- * Parse the pattern, converting quotes and lower-casing unquoted letters.
- * Also, adjust shell-style wildcard characters into regexp notation.
- *
- * We surround the pattern with "^(...)$" to force it to match the whole
- * string, as per SQL practice. We have to have parens in case the string
- * contains "|", else the "^" and "$" will be bound into the first and
- * last alternatives which is not what we want.
- *
- * Note: the result of this pass is the actual regexp pattern(s) we want
- * to execute. Quoting/escaping into SQL literal format will be done
- * below using appendStringLiteralConn().
+ * Convert shell-style 'pattern' into the regular expression(s) we want to
+ * execute. Quoting/escaping into SQL literal format will be done below
+ * using appendStringLiteralConn().
*/
- appendPQExpBufferStr(&namebuf, "^(");
-
- inquotes = false;
- cp = pattern;
-
- while (*cp)
- {
- char ch = *cp;
-
- if (ch == '"')
- {
- if (inquotes && cp[1] == '"')
- {
- /* emit one quote, stay in inquotes mode */
- appendPQExpBufferChar(&namebuf, '"');
- cp++;
- }
- else
- inquotes = !inquotes;
- cp++;
- }
- else if (!inquotes && isupper((unsigned char) ch))
- {
- appendPQExpBufferChar(&namebuf,
- pg_tolower((unsigned char) ch));
- cp++;
- }
- else if (!inquotes && ch == '*')
- {
- appendPQExpBufferStr(&namebuf, ".*");
- cp++;
- }
- else if (!inquotes && ch == '?')
- {
- appendPQExpBufferChar(&namebuf, '.');
- cp++;
- }
- else if (!inquotes && ch == '.')
- {
- /* Found schema/name separator, move current pattern to schema */
- resetPQExpBuffer(&schemabuf);
- appendPQExpBufferStr(&schemabuf, namebuf.data);
- resetPQExpBuffer(&namebuf);
- appendPQExpBufferStr(&namebuf, "^(");
- cp++;
- }
- else if (ch == '$')
- {
- /*
- * Dollar is always quoted, whether inside quotes or not. The
- * reason is that it's allowed in SQL identifiers, so there's a
- * significant use-case for treating it literally, while because
- * we anchor the pattern automatically there is no use-case for
- * having it possess its regexp meaning.
- */
- appendPQExpBufferStr(&namebuf, "\\$");
- cp++;
- }
- else
- {
- /*
- * Ordinary data character, transfer to pattern
- *
- * Inside double quotes, or at all times if force_escape is true,
- * quote regexp special characters with a backslash to avoid
- * regexp errors. Outside quotes, however, let them pass through
- * as-is; this lets knowledgeable users build regexp expressions
- * that are more powerful than shell-style patterns.
- */
- if ((inquotes || force_escape) &&
- strchr("|*+?()[]{}.^$\\", ch))
- appendPQExpBufferChar(&namebuf, '\\');
- i = PQmblen(cp, encoding);
- while (i-- && *cp)
- {
- appendPQExpBufferChar(&namebuf, *cp);
- cp++;
- }
- }
- }
+ patternToSQLRegex(PQclientEncoding(conn), NULL, &schemabuf, &namebuf,
+ pattern, force_escape);
/*
* Now decide what we need to emit. We may run under a hostile
@@ -964,7 +874,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
/* We have a name pattern, so constrain the namevar(s) */
- appendPQExpBufferStr(&namebuf, ")$");
/* Optimize away a "*" pattern */
if (strcmp(namebuf.data, "^(.*)$") != 0)
{
@@ -999,7 +908,6 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
{
/* We have a schema pattern, so constrain the schemavar */
- appendPQExpBufferStr(&schemabuf, ")$");
/* Optimize away a "*" pattern */
if (strcmp(schemabuf.data, "^(.*)$") != 0 && schemavar)
{
@@ -1027,3 +935,161 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
return added_clause;
#undef WHEREAND
}
+
+/*
+ * Transform a possibly qualified shell-style object name pattern into up to
+ * three SQL-style regular expressions, converting quotes, lower-casing
+ * unquoted letters, and adjusting shell-style wildcard characters into regexp
+ * notation.
+ *
+ * If the dbnamebuf and schemabuf arguments are non-NULL, and the pattern
+ * contains two or more dbname/schema/name separators, we parse the portions of
+ * the pattern prior to the first and second separators into dbnamebuf and
+ * schemabuf, and the rest into namebuf. (Additional dots in the name portion
+ * are not treated as special.)
+ *
+ * If dbnamebuf is NULL and schemabuf is non-NULL, and the pattern contains at
+ * least one separator, we parse the first portion into schemabuf and the rest
+ * into namebuf.
+ *
+ * Otherwise, we parse all the pattern into namebuf.
+ *
+ * We surround the regexps with "^(...)$" to force them to match whole strings,
+ * as per SQL practice. We have to have parens in case strings contain "|",
+ * else the "^" and "$" will be bound into the first and last alternatives
+ * which is not what we want.
+ *
+ * The regexps we parse into the buffers are appended to the data (if any)
+ * already present. If we parse fewer fields than the number of buffers we
+ * were given, the extra buffers are unaltered.
+ */
+void
+patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf, PQExpBuffer schemabuf,
+ PQExpBuffer namebuf, const char *pattern, bool force_escape)
+{
+ PQExpBufferData buf[3];
+ PQExpBuffer curbuf;
+ PQExpBuffer maxbuf;
+ int i;
+ bool inquotes;
+ const char *cp;
+
+ Assert(pattern != NULL);
+ Assert(namebuf != NULL);
+
+ /* callers should never expect "dbname.relname" format */
+ Assert(dbnamebuf == NULL || schemabuf != NULL);
+
+ inquotes = false;
+ cp = pattern;
+
+ if (dbnamebuf != NULL)
+ maxbuf = &buf[2];
+ else if (schemabuf != NULL)
+ maxbuf = &buf[1];
+ else
+ maxbuf = &buf[0];
+
+ curbuf = &buf[0];
+ initPQExpBuffer(curbuf);
+ appendPQExpBufferStr(curbuf, "^(");
+ while (*cp)
+ {
+ char ch = *cp;
+
+ if (ch == '"')
+ {
+ if (inquotes && cp[1] == '"')
+ {
+ /* emit one quote, stay in inquotes mode */
+ appendPQExpBufferChar(curbuf, '"');
+ cp++;
+ }
+ else
+ inquotes = !inquotes;
+ cp++;
+ }
+ else if (!inquotes && isupper((unsigned char) ch))
+ {
+ appendPQExpBufferChar(curbuf,
+ pg_tolower((unsigned char) ch));
+ cp++;
+ }
+ else if (!inquotes && ch == '*')
+ {
+ appendPQExpBufferStr(curbuf, ".*");
+ cp++;
+ }
+ else if (!inquotes && ch == '?')
+ {
+ appendPQExpBufferChar(curbuf, '.');
+ cp++;
+ }
+
+ /*
+ * When we find a dbname/schema/name separator, we treat it specially
+ * only if the caller requested more patterns to be parsed than we
+ * have already parsed from the pattern. Otherwise, dot characters
+ * are not special.
+ */
+ else if (!inquotes && ch == '.' && curbuf < maxbuf)
+ {
+ appendPQExpBufferStr(curbuf, ")$");
+ curbuf++;
+ initPQExpBuffer(curbuf);
+ appendPQExpBufferStr(curbuf, "^(");
+ cp++;
+ }
+ else if (ch == '$')
+ {
+ /*
+ * Dollar is always quoted, whether inside quotes or not. The
+ * reason is that it's allowed in SQL identifiers, so there's a
+ * significant use-case for treating it literally, while because
+ * we anchor the pattern automatically there is no use-case for
+ * having it possess its regexp meaning.
+ */
+ appendPQExpBufferStr(curbuf, "\\$");
+ cp++;
+ }
+ else
+ {
+ /*
+ * Ordinary data character, transfer to pattern
+ *
+ * Inside double quotes, or at all times if force_escape is true,
+ * quote regexp special characters with a backslash to avoid
+ * regexp errors. Outside quotes, however, let them pass through
+ * as-is; this lets knowledgeable users build regexp expressions
+ * that are more powerful than shell-style patterns.
+ */
+ if ((inquotes || force_escape) &&
+ strchr("|*+?()[]{}.^$\\", ch))
+ appendPQExpBufferChar(curbuf, '\\');
+ i = PQmblen(cp, encoding);
+ while (i-- && *cp)
+ {
+ appendPQExpBufferChar(curbuf, *cp);
+ cp++;
+ }
+ }
+ }
+ appendPQExpBufferStr(curbuf, ")$");
+
+ appendPQExpBufferStr(namebuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+
+ if (curbuf > buf)
+ {
+ curbuf--;
+ appendPQExpBufferStr(schemabuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+
+ if (curbuf > buf)
+ {
+ curbuf--;
+ appendPQExpBufferStr(dbnamebuf, curbuf->data);
+ termPQExpBuffer(curbuf);
+ }
+ }
+}
diff --git a/src/include/fe_utils/string_utils.h b/src/include/fe_utils/string_utils.h
index c290c302f5..caafb97d29 100644
--- a/src/include/fe_utils/string_utils.h
+++ b/src/include/fe_utils/string_utils.h
@@ -56,4 +56,8 @@ extern bool processSQLNamePattern(PGconn *conn, PQExpBuffer buf,
const char *schemavar, const char *namevar,
const char *altnamevar, const char *visibilityrule);
+extern void patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf,
+ PQExpBuffer schemabuf, PQExpBuffer namebuf,
+ const char *pattern, bool force_escape);
+
#endif /* STRING_UTILS_H */
--
2.21.1 (Apple Git-122.3)
v36-0002-Moving-code-from-src-bin-scripts-to-fe_utils.patchapplication/octet-stream; name=v36-0002-Moving-code-from-src-bin-scripts-to-fe_utils.patch; x-unix-mode=0644Download
From 3c671447d64d68bd2e378aebc04db1e6dae8206c Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:30:53 -0800
Subject: [PATCH v36 2/5] Moving code from src/bin/scripts to fe_utils
To make code useable from contrib/pg_amcheck, moving
scripts_parallel.[ch] and handle_help_version_opts() into
fe_utils.
Moving supporting code from src/bin/scripts/common into fe_utils.
Updating applications in src/bin/scripts with the new location.
---
src/bin/scripts/Makefile | 6 +-
src/bin/scripts/clusterdb.c | 2 +
src/bin/scripts/common.c | 318 +-----------------
src/bin/scripts/common.h | 49 +--
src/bin/scripts/createdb.c | 1 +
src/bin/scripts/createuser.c | 1 +
src/bin/scripts/dropdb.c | 1 +
src/bin/scripts/dropuser.c | 1 +
src/bin/scripts/nls.mk | 2 +-
src/bin/scripts/pg_isready.c | 1 +
src/bin/scripts/reindexdb.c | 4 +-
src/bin/scripts/vacuumdb.c | 4 +-
src/fe_utils/Makefile | 4 +
src/fe_utils/connect_utils.c | 170 ++++++++++
src/fe_utils/option_utils.c | 38 +++
.../parallel_slot.c} | 63 +++-
src/fe_utils/query_utils.c | 92 +++++
src/fe_utils/string_utils.c | 17 +-
src/include/fe_utils/connect_utils.h | 48 +++
src/include/fe_utils/option_utils.h | 23 ++
.../fe_utils/parallel_slot.h} | 13 +-
src/include/fe_utils/query_utils.h | 26 ++
src/tools/msvc/Mkvcbuild.pm | 4 +-
src/tools/pgindent/typedefs.list | 1 +
24 files changed, 495 insertions(+), 394 deletions(-)
create mode 100644 src/fe_utils/connect_utils.c
create mode 100644 src/fe_utils/option_utils.c
rename src/{bin/scripts/scripts_parallel.c => fe_utils/parallel_slot.c} (80%)
create mode 100644 src/fe_utils/query_utils.c
create mode 100644 src/include/fe_utils/connect_utils.h
create mode 100644 src/include/fe_utils/option_utils.h
rename src/{bin/scripts/scripts_parallel.h => include/fe_utils/parallel_slot.h} (82%)
create mode 100644 src/include/fe_utils/query_utils.h
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index a02e4e430c..b8d7cf2f2d 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -28,8 +28,8 @@ createuser: createuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport
dropdb: dropdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
dropuser: dropuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-vacuumdb: vacuumdb.o common.o scripts_parallel.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-reindexdb: reindexdb.o common.o scripts_parallel.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+vacuumdb: vacuumdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
install: all installdirs
@@ -50,7 +50,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
- rm -f common.o scripts_parallel.o $(WIN32RES)
+ rm -f common.o $(WIN32RES)
rm -rf tmp_check
check:
diff --git a/src/bin/scripts/clusterdb.c b/src/bin/scripts/clusterdb.c
index 7d25bb31d4..fc771eed77 100644
--- a/src/bin/scripts/clusterdb.c
+++ b/src/bin/scripts/clusterdb.c
@@ -13,6 +13,8 @@
#include "common.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 21ef297e6e..c86c19eae2 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -22,325 +22,9 @@
#include "common/logging.h"
#include "common/string.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
-#define ERRCODE_UNDEFINED_TABLE "42P01"
-
-/*
- * Provide strictly harmonized handling of --help and --version
- * options.
- */
-void
-handle_help_version_opts(int argc, char *argv[],
- const char *fixed_progname, help_handler hlp)
-{
- if (argc > 1)
- {
- if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
- {
- hlp(get_progname(argv[0]));
- exit(0);
- }
- if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
- {
- printf("%s (PostgreSQL) " PG_VERSION "\n", fixed_progname);
- exit(0);
- }
- }
-}
-
-
-/*
- * Make a database connection with the given parameters.
- *
- * An interactive password prompt is automatically issued if needed and
- * allowed by cparams->prompt_password.
- *
- * If allow_password_reuse is true, we will try to re-use any password
- * given during previous calls to this routine. (Callers should not pass
- * allow_password_reuse=true unless reconnecting to the same database+user
- * as before, else we might create password exposure hazards.)
- */
-PGconn *
-connectDatabase(const ConnParams *cparams, const char *progname,
- bool echo, bool fail_ok, bool allow_password_reuse)
-{
- PGconn *conn;
- bool new_pass;
- static char *password = NULL;
-
- /* Callers must supply at least dbname; other params can be NULL */
- Assert(cparams->dbname);
-
- if (!allow_password_reuse && password)
- {
- free(password);
- password = NULL;
- }
-
- if (cparams->prompt_password == TRI_YES && password == NULL)
- password = simple_prompt("Password: ", false);
-
- /*
- * Start the connection. Loop until we have a password if requested by
- * backend.
- */
- do
- {
- const char *keywords[8];
- const char *values[8];
- int i = 0;
-
- /*
- * If dbname is a connstring, its entries can override the other
- * values obtained from cparams; but in turn, override_dbname can
- * override the dbname component of it.
- */
- keywords[i] = "host";
- values[i++] = cparams->pghost;
- keywords[i] = "port";
- values[i++] = cparams->pgport;
- keywords[i] = "user";
- values[i++] = cparams->pguser;
- keywords[i] = "password";
- values[i++] = password;
- keywords[i] = "dbname";
- values[i++] = cparams->dbname;
- if (cparams->override_dbname)
- {
- keywords[i] = "dbname";
- values[i++] = cparams->override_dbname;
- }
- keywords[i] = "fallback_application_name";
- values[i++] = progname;
- keywords[i] = NULL;
- values[i++] = NULL;
- Assert(i <= lengthof(keywords));
-
- new_pass = false;
- conn = PQconnectdbParams(keywords, values, true);
-
- if (!conn)
- {
- pg_log_error("could not connect to database %s: out of memory",
- cparams->dbname);
- exit(1);
- }
-
- /*
- * No luck? Trying asking (again) for a password.
- */
- if (PQstatus(conn) == CONNECTION_BAD &&
- PQconnectionNeedsPassword(conn) &&
- cparams->prompt_password != TRI_NO)
- {
- PQfinish(conn);
- if (password)
- free(password);
- password = simple_prompt("Password: ", false);
- new_pass = true;
- }
- } while (new_pass);
-
- /* check to see that the backend connection was successfully made */
- if (PQstatus(conn) == CONNECTION_BAD)
- {
- if (fail_ok)
- {
- PQfinish(conn);
- return NULL;
- }
- pg_log_error("%s", PQerrorMessage(conn));
- exit(1);
- }
-
- /* Start strict; callers may override this. */
- PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
-
- return conn;
-}
-
-/*
- * Try to connect to the appropriate maintenance database.
- *
- * This differs from connectDatabase only in that it has a rule for
- * inserting a default "dbname" if none was given (which is why cparams
- * is not const). Note that cparams->dbname should typically come from
- * a --maintenance-db command line parameter.
- */
-PGconn *
-connectMaintenanceDatabase(ConnParams *cparams,
- const char *progname, bool echo)
-{
- PGconn *conn;
-
- /* If a maintenance database name was specified, just connect to it. */
- if (cparams->dbname)
- return connectDatabase(cparams, progname, echo, false, false);
-
- /* Otherwise, try postgres first and then template1. */
- cparams->dbname = "postgres";
- conn = connectDatabase(cparams, progname, echo, true, false);
- if (!conn)
- {
- cparams->dbname = "template1";
- conn = connectDatabase(cparams, progname, echo, false, false);
- }
- return conn;
-}
-
-/*
- * Disconnect the given connection, canceling any statement if one is active.
- */
-void
-disconnectDatabase(PGconn *conn)
-{
- char errbuf[256];
-
- Assert(conn != NULL);
-
- if (PQtransactionStatus(conn) == PQTRANS_ACTIVE)
- {
- PGcancel *cancel;
-
- if ((cancel = PQgetCancel(conn)))
- {
- (void) PQcancel(cancel, errbuf, sizeof(errbuf));
- PQfreeCancel(cancel);
- }
- }
-
- PQfinish(conn);
-}
-
-/*
- * Run a query, return the results, exit program on failure.
- */
-PGresult *
-executeQuery(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
-
- if (echo)
- printf("%s\n", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_TUPLES_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_info("query was: %s", query);
- PQfinish(conn);
- exit(1);
- }
-
- return res;
-}
-
-
-/*
- * As above for a SQL command (which returns nothing).
- */
-void
-executeCommand(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
-
- if (echo)
- printf("%s\n", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_COMMAND_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_info("query was: %s", query);
- PQfinish(conn);
- exit(1);
- }
-
- PQclear(res);
-}
-
-
-/*
- * As above for a SQL maintenance command (returns command success).
- * Command is executed with a cancel handler set, so Ctrl-C can
- * interrupt it.
- */
-bool
-executeMaintenanceCommand(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
- bool r;
-
- if (echo)
- printf("%s\n", query);
-
- SetCancelConn(conn);
- res = PQexec(conn, query);
- ResetCancelConn();
-
- r = (res && PQresultStatus(res) == PGRES_COMMAND_OK);
-
- if (res)
- PQclear(res);
-
- return r;
-}
-
-/*
- * Consume all the results generated for the given connection until
- * nothing remains. If at least one error is encountered, return false.
- * Note that this will block if the connection is busy.
- */
-bool
-consumeQueryResult(PGconn *conn)
-{
- bool ok = true;
- PGresult *result;
-
- SetCancelConn(conn);
- while ((result = PQgetResult(conn)) != NULL)
- {
- if (!processQueryResult(conn, result))
- ok = false;
- }
- ResetCancelConn();
- return ok;
-}
-
-/*
- * Process (and delete) a query result. Returns true if there's no error,
- * false otherwise -- but errors about trying to work on a missing relation
- * are reported and subsequently ignored.
- */
-bool
-processQueryResult(PGconn *conn, PGresult *result)
-{
- /*
- * If it's an error, report it. Errors about a missing table are harmless
- * so we continue processing; but die for other errors.
- */
- if (PQresultStatus(result) != PGRES_COMMAND_OK)
- {
- char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
-
- pg_log_error("processing of database \"%s\" failed: %s",
- PQdb(conn), PQerrorMessage(conn));
-
- if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
- {
- PQclear(result);
- return false;
- }
- }
-
- PQclear(result);
- return true;
-}
-
-
/*
* Split TABLE[(COLUMNS)] into TABLE and [(COLUMNS)] portions. When you
* finish using them, pg_free(*table). *columns is a pointer into "spec",
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index 5630975712..ddd8f35274 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -10,58 +10,11 @@
#define COMMON_H
#include "common/username.h"
+#include "fe_utils/connect_utils.h"
#include "getopt_long.h" /* pgrminclude ignore */
#include "libpq-fe.h"
#include "pqexpbuffer.h" /* pgrminclude ignore */
-enum trivalue
-{
- TRI_DEFAULT,
- TRI_NO,
- TRI_YES
-};
-
-/* Parameters needed by connectDatabase/connectMaintenanceDatabase */
-typedef struct _connParams
-{
- /* These fields record the actual command line parameters */
- const char *dbname; /* this may be a connstring! */
- const char *pghost;
- const char *pgport;
- const char *pguser;
- enum trivalue prompt_password;
- /* If not NULL, this overrides the dbname obtained from command line */
- /* (but *only* the DB name, not anything else in the connstring) */
- const char *override_dbname;
-} ConnParams;
-
-typedef void (*help_handler) (const char *progname);
-
-extern void handle_help_version_opts(int argc, char *argv[],
- const char *fixed_progname,
- help_handler hlp);
-
-extern PGconn *connectDatabase(const ConnParams *cparams,
- const char *progname,
- bool echo, bool fail_ok,
- bool allow_password_reuse);
-
-extern PGconn *connectMaintenanceDatabase(ConnParams *cparams,
- const char *progname, bool echo);
-
-extern void disconnectDatabase(PGconn *conn);
-
-extern PGresult *executeQuery(PGconn *conn, const char *query, bool echo);
-
-extern void executeCommand(PGconn *conn, const char *query, bool echo);
-
-extern bool executeMaintenanceCommand(PGconn *conn, const char *query,
- bool echo);
-
-extern bool consumeQueryResult(PGconn *conn);
-
-extern bool processQueryResult(PGconn *conn, PGresult *result);
-
extern void splitTableColumnsSpec(const char *spec, int encoding,
char **table, const char **columns);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index abf21d4942..041454f075 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -13,6 +13,7 @@
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index 47b0e28bc6..ef7e0e549f 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -14,6 +14,7 @@
#include "common.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/dropdb.c b/src/bin/scripts/dropdb.c
index ba0dcdecb9..b154ed1bb6 100644
--- a/src/bin/scripts/dropdb.c
+++ b/src/bin/scripts/dropdb.c
@@ -13,6 +13,7 @@
#include "postgres_fe.h"
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/dropuser.c b/src/bin/scripts/dropuser.c
index ff5b455ae5..61b8557bc7 100644
--- a/src/bin/scripts/dropuser.c
+++ b/src/bin/scripts/dropuser.c
@@ -14,6 +14,7 @@
#include "common.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/nls.mk b/src/bin/scripts/nls.mk
index 5d5dd11b7b..7fc716092e 100644
--- a/src/bin/scripts/nls.mk
+++ b/src/bin/scripts/nls.mk
@@ -7,7 +7,7 @@ GETTEXT_FILES = $(FRONTEND_COMMON_GETTEXT_FILES) \
clusterdb.c vacuumdb.c reindexdb.c \
pg_isready.c \
common.c \
- scripts_parallel.c \
+ ../../fe_utils/parallel_slot.c \
../../fe_utils/cancel.c ../../fe_utils/print.c \
../../common/fe_memutils.c ../../common/username.c
GETTEXT_TRIGGERS = $(FRONTEND_COMMON_GETTEXT_TRIGGERS) simple_prompt yesno_prompt
diff --git a/src/bin/scripts/pg_isready.c b/src/bin/scripts/pg_isready.c
index ceb8a09b4c..fc6f7b0a93 100644
--- a/src/bin/scripts/pg_isready.c
+++ b/src/bin/scripts/pg_isready.c
@@ -12,6 +12,7 @@
#include "postgres_fe.h"
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#define DEFAULT_CONNECT_TIMEOUT "3"
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index dece8200fa..7781fb1151 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -16,9 +16,11 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
-#include "scripts_parallel.h"
typedef enum ReindexType
{
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 8246327770..ed320817bc 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -18,9 +18,11 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
-#include "scripts_parallel.h"
/* vacuum options controlled by user flags */
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 10d6838cf9..456c441a33 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -23,9 +23,13 @@ OBJS = \
archive.o \
cancel.o \
conditional.o \
+ connect_utils.o \
mbprint.o \
+ option_utils.o \
+ parallel_slot.o \
print.o \
psqlscan.o \
+ query_utils.o \
recovery_gen.o \
simple_list.o \
string_utils.o
diff --git a/src/fe_utils/connect_utils.c b/src/fe_utils/connect_utils.c
new file mode 100644
index 0000000000..7475e2f366
--- /dev/null
+++ b/src/fe_utils/connect_utils.c
@@ -0,0 +1,170 @@
+#include "postgres_fe.h"
+
+#include "common/connect.h"
+#include "common/logging.h"
+#include "common/string.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * Make a database connection with the given parameters.
+ *
+ * An interactive password prompt is automatically issued if needed and
+ * allowed by cparams->prompt_password.
+ *
+ * If allow_password_reuse is true, we will try to re-use any password
+ * given during previous calls to this routine. (Callers should not pass
+ * allow_password_reuse=true unless reconnecting to the same database+user
+ * as before, else we might create password exposure hazards.)
+ */
+PGconn *
+connectDatabase(const ConnParams *cparams, const char *progname,
+ bool echo, bool fail_ok, bool allow_password_reuse)
+{
+ PGconn *conn;
+ bool new_pass;
+ static char *password = NULL;
+
+ /* Callers must supply at least dbname; other params can be NULL */
+ Assert(cparams->dbname);
+
+ if (!allow_password_reuse && password)
+ {
+ free(password);
+ password = NULL;
+ }
+
+ if (cparams->prompt_password == TRI_YES && password == NULL)
+ password = simple_prompt("Password: ", false);
+
+ /*
+ * Start the connection. Loop until we have a password if requested by
+ * backend.
+ */
+ do
+ {
+ const char *keywords[8];
+ const char *values[8];
+ int i = 0;
+
+ /*
+ * If dbname is a connstring, its entries can override the other
+ * values obtained from cparams; but in turn, override_dbname can
+ * override the dbname component of it.
+ */
+ keywords[i] = "host";
+ values[i++] = cparams->pghost;
+ keywords[i] = "port";
+ values[i++] = cparams->pgport;
+ keywords[i] = "user";
+ values[i++] = cparams->pguser;
+ keywords[i] = "password";
+ values[i++] = password;
+ keywords[i] = "dbname";
+ values[i++] = cparams->dbname;
+ if (cparams->override_dbname)
+ {
+ keywords[i] = "dbname";
+ values[i++] = cparams->override_dbname;
+ }
+ keywords[i] = "fallback_application_name";
+ values[i++] = progname;
+ keywords[i] = NULL;
+ values[i++] = NULL;
+ Assert(i <= lengthof(keywords));
+
+ new_pass = false;
+ conn = PQconnectdbParams(keywords, values, true);
+
+ if (!conn)
+ {
+ pg_log_error("could not connect to database %s: out of memory",
+ cparams->dbname);
+ exit(1);
+ }
+
+ /*
+ * No luck? Trying asking (again) for a password.
+ */
+ if (PQstatus(conn) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(conn) &&
+ cparams->prompt_password != TRI_NO)
+ {
+ PQfinish(conn);
+ if (password)
+ free(password);
+ password = simple_prompt("Password: ", false);
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ /* check to see that the backend connection was successfully made */
+ if (PQstatus(conn) == CONNECTION_BAD)
+ {
+ if (fail_ok)
+ {
+ PQfinish(conn);
+ return NULL;
+ }
+ pg_log_error("%s", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* Start strict; callers may override this. */
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
+
+ return conn;
+}
+
+/*
+ * Try to connect to the appropriate maintenance database.
+ *
+ * This differs from connectDatabase only in that it has a rule for
+ * inserting a default "dbname" if none was given (which is why cparams
+ * is not const). Note that cparams->dbname should typically come from
+ * a --maintenance-db command line parameter.
+ */
+PGconn *
+connectMaintenanceDatabase(ConnParams *cparams,
+ const char *progname, bool echo)
+{
+ PGconn *conn;
+
+ /* If a maintenance database name was specified, just connect to it. */
+ if (cparams->dbname)
+ return connectDatabase(cparams, progname, echo, false, false);
+
+ /* Otherwise, try postgres first and then template1. */
+ cparams->dbname = "postgres";
+ conn = connectDatabase(cparams, progname, echo, true, false);
+ if (!conn)
+ {
+ cparams->dbname = "template1";
+ conn = connectDatabase(cparams, progname, echo, false, false);
+ }
+ return conn;
+}
+
+/*
+ * Disconnect the given connection, canceling any statement if one is active.
+ */
+void
+disconnectDatabase(PGconn *conn)
+{
+ char errbuf[256];
+
+ Assert(conn != NULL);
+
+ if (PQtransactionStatus(conn) == PQTRANS_ACTIVE)
+ {
+ PGcancel *cancel;
+
+ if ((cancel = PQgetCancel(conn)))
+ {
+ (void) PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(conn);
+}
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
new file mode 100644
index 0000000000..97aca1f02b
--- /dev/null
+++ b/src/fe_utils/option_utils.c
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command line option processing facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/option_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "fe_utils/option_utils.h"
+
+/*
+ * Provide strictly harmonized handling of --help and --version
+ * options.
+ */
+void
+handle_help_version_opts(int argc, char *argv[],
+ const char *fixed_progname, help_handler hlp)
+{
+ if (argc > 1)
+ {
+ if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+ {
+ hlp(get_progname(argv[0]));
+ exit(0);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ printf("%s (PostgreSQL) " PG_VERSION "\n", fixed_progname);
+ exit(0);
+ }
+ }
+}
diff --git a/src/bin/scripts/scripts_parallel.c b/src/fe_utils/parallel_slot.c
similarity index 80%
rename from src/bin/scripts/scripts_parallel.c
rename to src/fe_utils/parallel_slot.c
index 1f863a1bb4..3987a4702b 100644
--- a/src/bin/scripts/scripts_parallel.c
+++ b/src/fe_utils/parallel_slot.c
@@ -1,13 +1,13 @@
/*-------------------------------------------------------------------------
*
- * scripts_parallel.c
- * Parallel support for bin/scripts/
+ * parallel_slot.c
+ * Parallel support for front-end parallel database connections
*
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
- * src/bin/scripts/scripts_parallel.c
+ * src/fe_utils/parallel_slot.c
*
*-------------------------------------------------------------------------
*/
@@ -22,13 +22,15 @@
#include <sys/select.h>
#endif
-#include "common.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
-#include "scripts_parallel.h"
+#include "fe_utils/parallel_slot.h"
+
+#define ERRCODE_UNDEFINED_TABLE "42P01"
static void init_slot(ParallelSlot *slot, PGconn *conn);
static int select_loop(int maxFd, fd_set *workerset);
+static bool processQueryResult(PGconn *conn, PGresult *result);
static void
init_slot(ParallelSlot *slot, PGconn *conn)
@@ -38,6 +40,57 @@ init_slot(ParallelSlot *slot, PGconn *conn)
slot->isFree = true;
}
+/*
+ * Process (and delete) a query result. Returns true if there's no error,
+ * false otherwise -- but errors about trying to work on a missing relation
+ * are reported and subsequently ignored.
+ */
+static bool
+processQueryResult(PGconn *conn, PGresult *result)
+{
+ /*
+ * If it's an error, report it. Errors about a missing table are harmless
+ * so we continue processing; but die for other errors.
+ */
+ if (PQresultStatus(result) != PGRES_COMMAND_OK)
+ {
+ char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+
+ pg_log_error("processing of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
+
+ if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
+ {
+ PQclear(result);
+ return false;
+ }
+ }
+
+ PQclear(result);
+ return true;
+}
+
+/*
+ * Consume all the results generated for the given connection until
+ * nothing remains. If at least one error is encountered, return false.
+ * Note that this will block if the connection is busy.
+ */
+static bool
+consumeQueryResult(PGconn *conn)
+{
+ bool ok = true;
+ PGresult *result;
+
+ SetCancelConn(conn);
+ while ((result = PQgetResult(conn)) != NULL)
+ {
+ if (!processQueryResult(conn, result))
+ ok = false;
+ }
+ ResetCancelConn();
+ return ok;
+}
+
/*
* Wait until a file descriptor from the given set becomes readable.
*
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
new file mode 100644
index 0000000000..a70ae3c082
--- /dev/null
+++ b/src/fe_utils/query_utils.c
@@ -0,0 +1,92 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to query a databases.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/query_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * Run a query, return the results, exit program on failure.
+ */
+PGresult *
+executeQuery(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+
+ if (echo)
+ printf("%s\n", query);
+
+ res = PQexec(conn, query);
+ if (!res ||
+ PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_info("query was: %s", query);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ return res;
+}
+
+
+/*
+ * As above for a SQL command (which returns nothing).
+ */
+void
+executeCommand(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+
+ if (echo)
+ printf("%s\n", query);
+
+ res = PQexec(conn, query);
+ if (!res ||
+ PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_info("query was: %s", query);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ PQclear(res);
+}
+
+
+/*
+ * As above for a SQL maintenance command (returns command success).
+ * Command is executed with a cancel handler set, so Ctrl-C can
+ * interrupt it.
+ */
+bool
+executeMaintenanceCommand(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+ bool r;
+
+ if (echo)
+ printf("%s\n", query);
+
+ SetCancelConn(conn);
+ res = PQexec(conn, query);
+ ResetCancelConn();
+
+ r = (res && PQresultStatus(res) == PGRES_COMMAND_OK);
+
+ if (res)
+ PQclear(res);
+
+ return r;
+}
diff --git a/src/fe_utils/string_utils.c b/src/fe_utils/string_utils.c
index 9a1ea9ab98..94941132ac 100644
--- a/src/fe_utils/string_utils.c
+++ b/src/fe_utils/string_utils.c
@@ -852,9 +852,9 @@ processSQLNamePattern(PGconn *conn, PQExpBuffer buf, const char *pattern,
initPQExpBuffer(&namebuf);
/*
- * Convert shell-style 'pattern' into the regular expression(s) we want to
- * execute. Quoting/escaping into SQL literal format will be done below
- * using appendStringLiteralConn().
+ * Convert shell-style 'pattern' into the regular expression(s) we want
+ * to execute. Quoting/escaping into SQL literal format will be done
+ * below using appendStringLiteralConn().
*/
patternToSQLRegex(PQclientEncoding(conn), NULL, &schemabuf, &namebuf,
pattern, force_escape);
@@ -968,8 +968,8 @@ patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf, PQExpBuffer schemabuf,
PQExpBuffer namebuf, const char *pattern, bool force_escape)
{
PQExpBufferData buf[3];
- PQExpBuffer curbuf;
- PQExpBuffer maxbuf;
+ PQExpBuffer curbuf;
+ PQExpBuffer maxbuf;
int i;
bool inquotes;
const char *cp;
@@ -1025,12 +1025,11 @@ patternToSQLRegex(int encoding, PQExpBuffer dbnamebuf, PQExpBuffer schemabuf,
appendPQExpBufferChar(curbuf, '.');
cp++;
}
-
/*
* When we find a dbname/schema/name separator, we treat it specially
- * only if the caller requested more patterns to be parsed than we
- * have already parsed from the pattern. Otherwise, dot characters
- * are not special.
+ * only if the caller requested more patterns to be parsed than we have
+ * already parsed from the pattern. Otherwise, dot characters are not
+ * special.
*/
else if (!inquotes && ch == '.' && curbuf < maxbuf)
{
diff --git a/src/include/fe_utils/connect_utils.h b/src/include/fe_utils/connect_utils.h
new file mode 100644
index 0000000000..8fde0ea2a0
--- /dev/null
+++ b/src/include/fe_utils/connect_utils.h
@@ -0,0 +1,48 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to connect to and disconnect from databases.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/connect_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CONNECT_UTILS_H
+#define CONNECT_UTILS_H
+
+#include "libpq-fe.h"
+
+enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+};
+
+/* Parameters needed by connectDatabase/connectMaintenanceDatabase */
+typedef struct _connParams
+{
+ /* These fields record the actual command line parameters */
+ const char *dbname; /* this may be a connstring! */
+ const char *pghost;
+ const char *pgport;
+ const char *pguser;
+ enum trivalue prompt_password;
+ /* If not NULL, this overrides the dbname obtained from command line */
+ /* (but *only* the DB name, not anything else in the connstring) */
+ const char *override_dbname;
+} ConnParams;
+
+extern PGconn *connectDatabase(const ConnParams *cparams,
+ const char *progname,
+ bool echo, bool fail_ok,
+ bool allow_password_reuse);
+
+extern PGconn *connectMaintenanceDatabase(ConnParams *cparams,
+ const char *progname, bool echo);
+
+extern void disconnectDatabase(PGconn *conn);
+
+#endif /* CONNECT_UTILS_H */
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
new file mode 100644
index 0000000000..ef6eb24ae0
--- /dev/null
+++ b/src/include/fe_utils/option_utils.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command line option processing facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/option_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OPTION_UTILS_H
+#define OPTION_UTILS_H
+
+#include "postgres_fe.h"
+
+typedef void (*help_handler) (const char *progname);
+
+extern void handle_help_version_opts(int argc, char *argv[],
+ const char *fixed_progname,
+ help_handler hlp);
+
+#endif /* OPTION_UTILS_H */
diff --git a/src/bin/scripts/scripts_parallel.h b/src/include/fe_utils/parallel_slot.h
similarity index 82%
rename from src/bin/scripts/scripts_parallel.h
rename to src/include/fe_utils/parallel_slot.h
index f62692510a..99eeb3328d 100644
--- a/src/bin/scripts/scripts_parallel.h
+++ b/src/include/fe_utils/parallel_slot.h
@@ -1,21 +1,20 @@
/*-------------------------------------------------------------------------
*
- * scripts_parallel.h
+ * parallel_slot.h
* Parallel support for bin/scripts/
*
* Copyright (c) 2003-2021, PostgreSQL Global Development Group
*
- * src/bin/scripts/scripts_parallel.h
+ * src/include/fe_utils/parallel_slot.h
*
*-------------------------------------------------------------------------
*/
-#ifndef SCRIPTS_PARALLEL_H
-#define SCRIPTS_PARALLEL_H
+#ifndef PARALLEL_SLOT_H
+#define PARALLEL_SLOT_H
-#include "common.h"
+#include "fe_utils/connect_utils.h"
#include "libpq-fe.h"
-
typedef struct ParallelSlot
{
PGconn *connection; /* One connection */
@@ -33,4 +32,4 @@ extern void ParallelSlotsTerminate(ParallelSlot *slots, int numslots);
extern bool ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots);
-#endif /* SCRIPTS_PARALLEL_H */
+#endif /* PARALLEL_SLOT_H */
diff --git a/src/include/fe_utils/query_utils.h b/src/include/fe_utils/query_utils.h
new file mode 100644
index 0000000000..1f5812bbf6
--- /dev/null
+++ b/src/include/fe_utils/query_utils.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to query a databases.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/query_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERY_UTILS_H
+#define QUERY_UTILS_H
+
+#include "postgres_fe.h"
+
+#include "libpq-fe.h"
+
+extern PGresult *executeQuery(PGconn *conn, const char *query, bool echo);
+
+extern void executeCommand(PGconn *conn, const char *query, bool echo);
+
+extern bool executeMaintenanceCommand(PGconn *conn, const char *query,
+ bool echo);
+
+#endif /* QUERY_UTILS_H */
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 90328db04e..7be6e6c9e5 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -147,8 +147,8 @@ sub mkvcbuild
our @pgcommonbkndfiles = @pgcommonallfiles;
our @pgfeutilsfiles = qw(
- archive.c cancel.c conditional.c mbprint.c print.c psqlscan.l
- psqlscan.c simple_list.c string_utils.c recovery_gen.c);
+ archive.c cancel.c conditional.c connect_utils.c mbprint.c option_utils.c parallel_slot.c print.c psqlscan.l
+ psqlscan.c query_utils.c simple_list.c string_utils.c recovery_gen.c);
$libpgport = $solution->AddProject('libpgport', 'lib', 'misc');
$libpgport->AddDefine('FRONTEND');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe489..4d0d09a5dd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -403,6 +403,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnParams
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
v36-0003-Parameterizing-parallel-slot-result-handling.patchapplication/octet-stream; name=v36-0003-Parameterizing-parallel-slot-result-handling.patch; x-unix-mode=0644Download
From 968f9bcb440580f912d34a74ed2928e8658fb710 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:35:56 -0800
Subject: [PATCH v36 3/5] Parameterizing parallel slot result handling
The function consumeQueryResult was being used to handle all results
returned by queries executed through the parallel slot interface,
but this hardcodes knowledge about the expectations of reindexdb and
vacuumdb such as the expected result status being PGRES_COMMAND_OK
(as opposed to, say, PGRES_TUPLES_OK).
Reworking the slot interface to optionally include a PGresultHandler
and related fields per slot. The idea is that a caller who executes
a command or query through the slot can set the handler to be called
when the query completes.
The old logic of consumeQueryResults is moved into a new callback
function, TableCommandSlotHandler(), which gets registered as the
slot handler explicitly from vacuumdb and reindexdb. This is
defined in fe_utils/parallel_slot.c rather than somewhere in
src/bin/scripts where its only callers reside, partly to keep it
close to the rest of the shared parallel slot handling code and
partly in anticipation that other utility programs will eventually
want to use it also.
Adding a default handler which is used to handle results for slots
which have no handler explicitly registered. The default simply
checks the status of the result and makes a judgement about whether
the status is ok, similarly to psql's AcceptResult(). I also
considered whether to just have a missing handler always be an
error, but decided against requiring users of the parallel slot
infrastructure to pedantically specify the default handler. Both
designs seem reasonable, but the tie-breaker for me is that edge
cases that do not come up in testing will be better handled in
production with this design than with pedantically erroring out.
The expectation of this commit is that pg_amcheck will have handlers
for table and index checks which will process the PGresults of calls
to the amcheck functions. This commit sets up the infrastructure
necessary to support those handlers being different from the one
used by vacuumdb and reindexdb.
---
src/bin/scripts/reindexdb.c | 1 +
src/bin/scripts/vacuumdb.c | 1 +
src/fe_utils/parallel_slot.c | 142 +++++++++++++++++++++------
src/include/fe_utils/parallel_slot.h | 32 ++++++
4 files changed, 148 insertions(+), 28 deletions(-)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 7781fb1151..29394d4a4a 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -466,6 +466,7 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
goto finish;
}
+ ParallelSlotSetHandler(free_slot, TableCommandSlotHandler, NULL);
run_reindex_command(free_slot->connection, process_type, objname,
echo, verbose, concurrently, true);
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index ed320817bc..1158f7b776 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -713,6 +713,7 @@ vacuum_one_database(const ConnParams *cparams,
* Execute the vacuum. All errors are handled in processQueryResult
* through ParallelSlotsGetIdle.
*/
+ ParallelSlotSetHandler(free_slot, TableCommandSlotHandler, NULL);
run_vacuum_command(free_slot->connection, sql.data,
echo, tabname);
diff --git a/src/fe_utils/parallel_slot.c b/src/fe_utils/parallel_slot.c
index 3987a4702b..8e0c65988d 100644
--- a/src/fe_utils/parallel_slot.c
+++ b/src/fe_utils/parallel_slot.c
@@ -30,7 +30,7 @@
static void init_slot(ParallelSlot *slot, PGconn *conn);
static int select_loop(int maxFd, fd_set *workerset);
-static bool processQueryResult(PGconn *conn, PGresult *result);
+static bool handleOneQueryResult(ParallelSlot *slot, PGresult *result);
static void
init_slot(ParallelSlot *slot, PGconn *conn)
@@ -38,53 +38,46 @@ init_slot(ParallelSlot *slot, PGconn *conn)
slot->connection = conn;
/* Initially assume connection is idle */
slot->isFree = true;
+ ParallelSlotClearHandler(slot);
}
/*
- * Process (and delete) a query result. Returns true if there's no error,
- * false otherwise -- but errors about trying to work on a missing relation
- * are reported and subsequently ignored.
+ * Invoke the slot's handler for a single query result, or fall back to the
+ * default handler if none is defined for the slot. Returns true if the
+ * handler reports that there's no error, false otherwise.
*/
static bool
-processQueryResult(PGconn *conn, PGresult *result)
+handleOneQueryResult(ParallelSlot *slot, PGresult *result)
{
- /*
- * If it's an error, report it. Errors about a missing table are harmless
- * so we continue processing; but die for other errors.
- */
- if (PQresultStatus(result) != PGRES_COMMAND_OK)
- {
- char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+ PGresultHandler handler = slot->handler;
- pg_log_error("processing of database \"%s\" failed: %s",
- PQdb(conn), PQerrorMessage(conn));
+ if (!handler)
+ handler = DefaultSlotHandler;
- if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
- {
- PQclear(result);
- return false;
- }
- }
+ /* On failure, the handler should return NULL after freeing the result */
+ if (!handler(result, slot->connection, slot->handler_context))
+ return false;
+ /* Ok, we have to free it ourself */
PQclear(result);
return true;
}
/*
- * Consume all the results generated for the given connection until
+ * Handle all the results generated for the given connection until
* nothing remains. If at least one error is encountered, return false.
* Note that this will block if the connection is busy.
*/
static bool
-consumeQueryResult(PGconn *conn)
+handleQueryResults(ParallelSlot *slot)
{
bool ok = true;
PGresult *result;
- SetCancelConn(conn);
- while ((result = PQgetResult(conn)) != NULL)
+ SetCancelConn(slot->connection);
+ while ((result = PQgetResult(slot->connection)) != NULL)
{
- if (!processQueryResult(conn, result))
+ if (!handleOneQueryResult(slot, result))
ok = false;
}
ResetCancelConn();
@@ -227,14 +220,15 @@ ParallelSlotsGetIdle(ParallelSlot *slots, int numslots)
if (result != NULL)
{
- /* Check and discard the command result */
- if (!processQueryResult(slots[i].connection, result))
+ /* Handle and discard the command result */
+ if (!handleOneQueryResult(slots + i, result))
return NULL;
}
else
{
/* This connection has become idle */
slots[i].isFree = true;
+ ParallelSlotClearHandler(slots + i);
if (firstFree < 0)
firstFree = i;
break;
@@ -329,9 +323,101 @@ ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots)
for (i = 0; i < numslots; i++)
{
- if (!consumeQueryResult((slots + i)->connection))
+ if (!handleQueryResults(slots + i))
return false;
}
return true;
}
+
+/*
+ * DefaultSlotHandler
+ *
+ * PGresultHandler for query results from slots with no handler registered.
+ * Success or failure is determined entirely by examining the status of the
+ * query result. This is very basic, but users who need better can register a
+ * custom handler.
+ *
+ * res: PGresult from the query executed on the slot's connection
+ * conn: connection belonging to the slot
+ * context: unused
+ */
+PGresult *
+DefaultSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ switch (PQresultStatus(res))
+ {
+ /* Success codes */
+ case PGRES_EMPTY_QUERY:
+ case PGRES_COMMAND_OK:
+ case PGRES_TUPLES_OK:
+ case PGRES_COPY_OUT:
+ case PGRES_COPY_IN:
+ case PGRES_COPY_BOTH:
+ case PGRES_SINGLE_TUPLE:
+ /* Ok */
+ return res;
+
+ /* Error codes */
+ case PGRES_BAD_RESPONSE:
+ case PGRES_NONFATAL_ERROR:
+ case PGRES_FATAL_ERROR:
+ break;
+
+ /* Intentionally no default here */
+ }
+
+ /*
+ * Handle all error cases here, including anything not matched in the
+ * switch (though that should not happen.) The 'query' argument may be
+ * NULL or garbage left over from a prior usage of the lot. Don't include
+ * it in the error message!
+ */
+ pg_log_error("processing in database \"%s\" failed: %s", PQdb(conn),
+ PQerrorMessage(conn));
+ PQclear(res);
+ return NULL;
+}
+
+/*
+ * TableCommandSlotHandler
+ *
+ * PGresultHandler for results of commands (not queries) against tables.
+ *
+ * Requires that the result status is either PGRES_COMMAND_OK or an error about
+ * a missing table. This is useful for utilities that compile a list of tables
+ * to process and then run commands (vacuum, reindex, or whatever) against
+ * those tables, as there is a race condition between the time the list is
+ * compiled and the time the command attempts to open the table.
+ *
+ * For missing tables, logs an error but allows processing to continue.
+ *
+ * For all other errors, logs an error and terminates further processing.
+ *
+ * res: PGresult from the query executed on the slot's connection
+ * conn: connection belonging to the slot
+ * context: unused
+ */
+PGresult *
+TableCommandSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ /*
+ * If it's an error, report it. Errors about a missing table are harmless
+ * so we continue processing; but die for other errors.
+ */
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ char *sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+ pg_log_error("processing of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
+
+ if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
+ {
+ PQclear(res);
+ return NULL;
+ }
+ }
+
+ return res;
+}
diff --git a/src/include/fe_utils/parallel_slot.h b/src/include/fe_utils/parallel_slot.h
index 99eeb3328d..524d62306d 100644
--- a/src/include/fe_utils/parallel_slot.h
+++ b/src/include/fe_utils/parallel_slot.h
@@ -15,12 +15,39 @@
#include "fe_utils/connect_utils.h"
#include "libpq-fe.h"
+typedef PGresult *(*PGresultHandler) (PGresult *res, PGconn *conn,
+ void *context);
+
typedef struct ParallelSlot
{
PGconn *connection; /* One connection */
bool isFree; /* Is it known to be idle? */
+
+ /*
+ * Prior to issuing a command or query on 'connection', a handler callback
+ * function may optionally be registered to be invoked to process the
+ * results, and context information may optionally be registered for use
+ * by the handler. If unset, these fields should be NULL.
+ */
+ PGresultHandler handler;
+ void *handler_context;
} ParallelSlot;
+static inline void
+ParallelSlotSetHandler(ParallelSlot *slot, PGresultHandler handler,
+ void *context)
+{
+ slot->handler = handler;
+ slot->handler_context = context;
+}
+
+static inline void
+ParallelSlotClearHandler(ParallelSlot *slot)
+{
+ slot->handler = NULL;
+ slot->handler_context = NULL;
+}
+
extern ParallelSlot *ParallelSlotsGetIdle(ParallelSlot *slots, int numslots);
extern ParallelSlot *ParallelSlotsSetup(const ConnParams *cparams,
@@ -31,5 +58,10 @@ extern void ParallelSlotsTerminate(ParallelSlot *slots, int numslots);
extern bool ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots);
+extern PGresult *DefaultSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+
+extern PGresult *TableCommandSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
#endif /* PARALLEL_SLOT_H */
--
2.21.1 (Apple Git-122.3)
v36-0004-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v36-0004-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 1fbb9903eaef4d5e30f986f77226fdd7fdf7c680 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:36:59 -0800
Subject: [PATCH v36 4/5] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 29 +
contrib/pg_amcheck/pg_amcheck.c | 1519 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.h | 135 ++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 78 +
contrib/pg_amcheck/t/003_check.pl | 475 ++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 496 +++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 1004 +++++++++++++
src/tools/msvc/Install.pm | 2 +-
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 4 +
16 files changed, 3811 insertions(+), 4 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.h
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..a72dcf7304 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..c21a14de31
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..bc61ee7970
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,29 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+SHLIB_PREREQS = submake-libpq
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..4209b5ec50
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1519 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_class.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "getopt_long.h" /* pgrminclude ignore */
+#include "libpq-fe.h"
+#include "pg_amcheck.h"
+#include "pqexpbuffer.h" /* pgrminclude ignore */
+#include "storage/block.h"
+
+/* Keep this in order by CheckType */
+static const CheckTypeFilter ctfilter[] = {
+ {
+ .relam = HEAP_TABLE_AM_OID,
+ .relkinds = CppAsString2(RELKIND_RELATION) ","
+ CppAsString2(RELKIND_MATVIEW) ","
+ CppAsString2(RELKIND_TOASTVALUE),
+ .typname = "heap"
+ },
+ {
+ .relam = BTREE_AM_OID,
+ .relkinds = CppAsString2(RELKIND_INDEX),
+ .typname = "btree index"
+ }
+};
+
+/*
+ * Query for determining if contrib's amcheck is installed. If so, selects the
+ * namespace name where amcheck's functions can be found.
+ */
+static const char *amcheck_sql =
+"SELECT n.nspname, x.extversion"
+"\nFROM pg_catalog.pg_extension x"
+"\nJOIN pg_catalog.pg_namespace n"
+"\nON x.extnamespace OPERATOR(pg_catalog.=) n.oid"
+"\nWHERE x.extname OPERATOR(pg_catalog.=) 'amcheck'";
+
+int
+main(int argc, char *argv[])
+{
+ static struct option long_options[] = {
+ /* Connection options */
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"maintenance-db", required_argument, NULL, 1},
+
+ /* check options */
+ {"all", no_argument, NULL, 'a'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"exclude-dbname", required_argument, NULL, 'D'},
+ {"echo", no_argument, NULL, 'e'},
+ {"heapallindexed", no_argument, NULL, 'H'},
+ {"index", required_argument, NULL, 'i'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"jobs", required_argument, NULL, 'j'},
+ {"parent-check", no_argument, NULL, 'P'},
+ {"quiet", no_argument, NULL, 'q'},
+ {"relation", required_argument, NULL, 'r'},
+ {"exclude-relation", required_argument, NULL, 'R'},
+ {"schema", required_argument, NULL, 's'},
+ {"exclude-schema", required_argument, NULL, 'S'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"exclude-indexes", no_argument, NULL, 2},
+ {"exclude-toast", no_argument, NULL, 3},
+ {"exclude-toast-pointers", no_argument, NULL, 4},
+ {"on-error-stop", no_argument, NULL, 5},
+ {"skip", required_argument, NULL, 6},
+ {"startblock", required_argument, NULL, 7},
+ {"endblock", required_argument, NULL, 8},
+ {"rootdescend", no_argument, NULL, 9},
+ {"no-dependents", no_argument, NULL, 10},
+
+ {NULL, 0, NULL, 0}
+ };
+
+ const char *progname;
+ int optindex;
+ int c;
+
+ const char *maintenance_db = NULL;
+ const char *connect_db = NULL;
+ const char *host = NULL;
+ const char *port = NULL;
+ const char *username = NULL;
+ enum trivalue prompt_password = TRI_DEFAULT;
+ ConnParams cparams;
+
+ amcheckOptions checkopts = {
+ .alldb = false,
+ .echo = false,
+ .quiet = false,
+ .verbose = false,
+ .dependents = true,
+ .no_indexes = false,
+ .on_error_stop = false,
+ .parent_check = false,
+ .rootdescend = false,
+ .heapallindexed = false,
+ .exclude_toast = false,
+ .reconcile_toast = true,
+ .skip = "none",
+ .jobs = -1,
+ .startblock = -1,
+ .endblock = -1
+ };
+
+ amcheckObjects objects = {
+ .databases = {NULL, NULL},
+ .schemas = {NULL, NULL},
+ .tables = {NULL, NULL},
+ .indexes = {NULL, NULL},
+ .exclude_databases = {NULL, NULL},
+ .exclude_schemas = {NULL, NULL},
+ .exclude_tables = {NULL, NULL},
+ .exclude_indexes = {NULL, NULL}
+ };
+
+ pg_logging_init(argv[0]);
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("contrib"));
+
+ handle_help_version_opts(argc, argv, progname, help);
+
+ /* process command-line options */
+ while ((c = getopt_long(argc, argv, "ad:D:eh:Hi:I:j:p:Pqr:R:s:S:t:T:U:wWv",
+ long_options, &optindex)) != -1)
+ {
+ char *endptr;
+
+ switch (c)
+ {
+ case 'a':
+ checkopts.alldb = true;
+ break;
+ case 'd':
+ simple_string_list_append(&objects.databases, optarg);
+ break;
+ case 'D':
+ simple_string_list_append(&objects.exclude_databases, optarg);
+ break;
+ case 'e':
+ checkopts.echo = true;
+ break;
+ case 'h':
+ host = pg_strdup(optarg);
+ break;
+ case 'H':
+ checkopts.heapallindexed = true;
+ break;
+ case 'i':
+ simple_string_list_append(&objects.indexes, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&objects.exclude_indexes, optarg);
+ break;
+ case 'j':
+ checkopts.jobs = atoi(optarg);
+ if (checkopts.jobs <= 0)
+ {
+ pg_log_error("number of parallel jobs must be at least 1");
+ exit(1);
+ }
+ break;
+ case 'p':
+ port = pg_strdup(optarg);
+ break;
+ case 'P':
+ checkopts.parent_check = true;
+ break;
+ case 'q':
+ checkopts.quiet = true;
+ break;
+ case 'r':
+ simple_string_list_append(&objects.indexes, optarg);
+ simple_string_list_append(&objects.tables, optarg);
+ break;
+ case 'R':
+ simple_string_list_append(&objects.exclude_tables, optarg);
+ simple_string_list_append(&objects.exclude_indexes, optarg);
+ break;
+ case 's':
+ simple_string_list_append(&objects.schemas, optarg);
+ break;
+ case 'S':
+ simple_string_list_append(&objects.exclude_schemas, optarg);
+ break;
+ case 't':
+ simple_string_list_append(&objects.tables, optarg);
+ break;
+ case 'T':
+ simple_string_list_append(&objects.exclude_tables, optarg);
+ break;
+ case 'U':
+ username = pg_strdup(optarg);
+ break;
+ case 'w':
+ prompt_password = TRI_NO;
+ break;
+ case 'W':
+ prompt_password = TRI_YES;
+ break;
+ case 'v':
+ checkopts.verbose = true;
+ pg_logging_increase_verbosity();
+ break;
+ case 1:
+ maintenance_db = pg_strdup(optarg);
+ break;
+ case 2:
+ checkopts.no_indexes = true;
+ break;
+ case 3:
+ checkopts.exclude_toast = true;
+ break;
+ case 4:
+ checkopts.reconcile_toast = false;
+ break;
+ case 5:
+ checkopts.on_error_stop = true;
+ break;
+ case 6:
+ if (pg_strcasecmp(optarg, "all-visible") == 0)
+ checkopts.skip = "all visible";
+ else if (pg_strcasecmp(optarg, "all-frozen") == 0)
+ checkopts.skip = "all frozen";
+ else
+ {
+ fprintf(stderr, _("invalid skip options"));
+ exit(1);
+ }
+ break;
+ case 7:
+ checkopts.startblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ _("relation starting block argument contains garbage characters"));
+ exit(1);
+ }
+ if (checkopts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ _("relation starting block argument out of bounds"));
+ exit(1);
+ }
+ break;
+ case 8:
+ checkopts.endblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ _("relation ending block argument contains garbage characters"));
+ exit(1);
+ }
+ if (checkopts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ _("relation ending block argument out of bounds"));
+ exit(1);
+ }
+ break;
+ case 9:
+ checkopts.rootdescend = true;
+ checkopts.parent_check = true;
+ break;
+ case 10:
+ checkopts.dependents = false;
+ break;
+ default:
+ fprintf(stderr,
+ _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ }
+
+ if (checkopts.endblock >= 0 && checkopts.endblock < checkopts.startblock)
+ {
+ pg_log_error("relation ending block argument precedes starting block argument");
+ exit(1);
+ }
+
+ /* non-option arguments specify database names */
+ while (optind < argc)
+ {
+ if (connect_db == NULL)
+ connect_db = argv[optind];
+ simple_string_list_append(&objects.databases, argv[optind]);
+ optind++;
+ }
+
+ /* fill cparams except for dbname, which is set below */
+ cparams.pghost = host;
+ cparams.pgport = port;
+ cparams.pguser = username;
+ cparams.prompt_password = prompt_password;
+ cparams.override_dbname = NULL;
+
+ setup_cancel_handler(NULL);
+
+ /* choose the database for our initial connection */
+ if (maintenance_db)
+ cparams.dbname = maintenance_db;
+ else if (connect_db != NULL)
+ cparams.dbname = connect_db;
+ else if (objects.databases.head != NULL)
+ cparams.dbname = objects.databases.head->val;
+ else
+ {
+ const char *default_db;
+
+ if (getenv("PGDATABASE"))
+ default_db = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ default_db = getenv("PGUSER");
+ else
+ default_db = get_user_name_or_exit(progname);
+
+ if (objects.databases.head == NULL)
+ simple_string_list_append(&objects.databases, default_db);
+
+ cparams.dbname = default_db;
+ }
+
+ check_each_database(&cparams, &objects, &checkopts, progname);
+
+ exit(0);
+}
+
+/*
+ * check_each_database
+ *
+ * Connects to the initial database and resolves a list of all databases that
+ * should be checked per the user supplied options. Sequentially checks each
+ * database in the list.
+ *
+ * The user supplied options may include zero databases, or only one database,
+ * in which case we could skip the step of resolving a list of databases, but
+ * it seems not worth optimizing, especially considering that there are
+ * multiple ways in which no databases or just one database might be specified,
+ * including a pattern that happens to match no entries or to match only one
+ * entry in pg_database.
+ *
+ * cparams: parameters for the initial database connection
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ */
+static void
+check_each_database(ConnParams *cparams, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname)
+{
+ PGconn *conn;
+ PGresult *databases;
+ PQExpBufferData sql;
+ int ntups;
+ int i;
+ SimpleStringList dbregex = {NULL, NULL};
+ SimpleStringList exclude = {NULL, NULL};
+
+ /*
+ * Get a list of all database SQL regexes to use for selecting database
+ * names. We assemble these regexes from fully-qualified relation
+ * patterns and database patterns. This process may result in the same
+ * database regex in the list multiple times, but the query against
+ * pg_database will deduplice, so we don't care.
+ */
+ get_db_regexes_from_fqrps(&dbregex, &objects->tables);
+ get_db_regexes_from_fqrps(&dbregex, &objects->indexes);
+ get_db_regexes_from_patterns(&dbregex, &objects->databases);
+
+ /*
+ * Assemble SQL regexes for databases to be excluded. Note that excluded
+ * relations are not considered here, as excluding relation x.y.z does not
+ * imply excluding database x. Excluding x.*.* would imply excluding
+ * database x, but we do not check for that here.
+ */
+ get_db_regexes_from_patterns(&exclude, &objects->exclude_databases);
+
+ conn = connectMaintenanceDatabase(cparams, progname, checkopts->echo);
+
+ initPQExpBuffer(&sql);
+ dbname_select(conn, &sql, &dbregex, checkopts->alldb);
+ appendPQExpBufferStr(&sql, "\nEXCEPT");
+ dbname_select(conn, &sql, &exclude, false);
+ executeCommand(conn, "RESET search_path;", checkopts->echo);
+ databases = executeQuery(conn, sql.data, checkopts->echo);
+ if (PQresultStatus(databases) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ termPQExpBuffer(&sql);
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, checkopts->echo));
+ PQfinish(conn);
+
+ ntups = PQntuples(databases);
+ if (ntups == 0 && !checkopts->quiet)
+ printf(_("%s: no databases to check\n"), progname);
+
+ for (i = 0; i < ntups; i++)
+ {
+ cparams->override_dbname = PQgetvalue(databases, i, 0);
+ check_one_database(cparams, objects, checkopts, progname);
+ }
+
+ PQclear(databases);
+}
+
+/*
+ * string_in_list
+ *
+ * Returns whether a given string is in the list of strings.
+ */
+static bool
+string_in_list(const SimpleStringList *list, const char *str)
+{
+ const SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next)
+ if (strcmp(cell->val, str) == 0)
+ return true;
+ return false;
+}
+
+/*
+ * check_one_database
+ *
+ * Connects to this next database and checks all relations that match the
+ * supplied objects list. Patterns in the object lists are matched to the
+ * relations that exit in this next database.
+ *
+ * cparams: parameters for this next database connection
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ */
+static void
+check_one_database(const ConnParams *cparams, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname)
+{
+ PQExpBufferData sql;
+ PGconn *conn;
+ PGresult *result;
+ ParallelSlot *slots;
+ int ntups;
+ int i;
+ int parallel_workers;
+ bool inclusive;
+ bool failed = false;
+ char *amcheck_schema = NULL;
+
+ conn = connectDatabase(cparams, progname, checkopts->echo, false, true);
+
+ if (!checkopts->quiet)
+ {
+ printf(_("%s: checking database \"%s\"\n"),
+ progname, PQdb(conn));
+ fflush(stdout);
+ }
+
+ /*
+ * Verify that amcheck is installed for this next database. User error
+ * could result in a database not having amcheck that should have it, but
+ * we also could be iterating over multiple databases where not all of
+ * them have amcheck installed (for example, 'template1').
+ */
+ result = executeQuery(conn, amcheck_sql, checkopts->echo);
+ if (PQresultStatus(result) != PGRES_TUPLES_OK)
+ {
+ /* Querying the catalog failed. */
+ pg_log_error(_("%s: database \"%s\": %s\n"),
+ progname, PQdb(conn), PQerrorMessage(conn));
+ pg_log_error(_("%s: query was: %s"), progname, amcheck_sql);
+ PQclear(result);
+ PQfinish(conn);
+ return;
+ }
+ ntups = PQntuples(result);
+ if (ntups == 0)
+ {
+ /* Querying the catalog succeeded, but amcheck is missing. */
+ if (!checkopts->quiet &&
+ (checkopts->verbose ||
+ string_in_list(&objects->databases, PQdb(conn))))
+ {
+ printf(_("%s: skipping database \"%s\": amcheck is not installed"),
+ progname, PQdb(conn));
+ }
+ PQfinish(conn);
+ return;
+ }
+ amcheck_schema = PQgetvalue(result, 0, 0);
+ if (checkopts->verbose)
+ printf(_("%s: in database \"%s\": using amcheck version \"%s\" in schema \"%s\""),
+ progname, PQdb(conn), PQgetvalue(result, 0, 1), amcheck_schema);
+ amcheck_schema = PQescapeIdentifier(conn, amcheck_schema, strlen(amcheck_schema));
+ PQclear(result);
+
+ /*
+ * If we were given no tables nor indexes to check, then we select all
+ * targets not excluded. Otherwise, we select only the targets that we
+ * were given.
+ */
+ inclusive = objects->tables.head == NULL &&
+ objects->indexes.head == NULL;
+
+ initPQExpBuffer(&sql);
+ target_select(conn, &sql, objects, checkopts, progname, inclusive);
+ executeCommand(conn, "RESET search_path;", checkopts->echo);
+ result = executeQuery(conn, sql.data, checkopts->echo);
+ if (PQresultStatus(result) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ PQfinish(conn);
+ exit(1);
+ }
+ termPQExpBuffer(&sql);
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, checkopts->echo));
+
+ /*
+ * If no rows are returned, there are no matching relations, so we are
+ * done.
+ */
+ ntups = PQntuples(result);
+ if (ntups == 0)
+ {
+ PQclear(result);
+ PQfinish(conn);
+ PQfreemem(amcheck_schema);
+ return;
+ }
+
+ /*
+ * Ensure parallel_workers is sane. If there are more connections than
+ * relations to be checked, we don't need to use them all.
+ */
+ parallel_workers = checkopts->jobs;
+ if (parallel_workers > ntups)
+ parallel_workers = ntups;
+ if (parallel_workers <= 0)
+ parallel_workers = 1;
+
+ /*
+ * Setup the database connections. We reuse the connection we already
+ * have for the first slot. If not in parallel mode, the first slot in
+ * the array contains the connection.
+ */
+ slots = ParallelSlotsSetup(cparams, progname, checkopts->echo, conn,
+ parallel_workers);
+
+ initPQExpBuffer(&sql);
+
+ /*
+ * Loop over all objects to be checked, and execute amcheck checking
+ * commands for each. We do not wait for the checks to complete, nor do
+ * we handle the results of those checks in the loop. We register
+ * handlers for doing all that.
+ */
+ for (i = 0; i < ntups; i++)
+ {
+ ParallelSlot *free_slot;
+
+ CheckType checktype = atoi(PQgetvalue(result, i, 0));
+ Oid reloid = atooid(PQgetvalue(result, i, 1));
+
+ if (CancelRequested)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ /*
+ * Get a parallel slot for the next amcheck command, blocking if
+ * necessary until one is available, or until a previously issued slot
+ * command fails, indicating that we should abort checking the
+ * remaining objects.
+ */
+ free_slot = ParallelSlotsGetIdle(slots, parallel_workers);
+ if (!free_slot)
+ {
+ /*
+ * Something failed. We don't need to know what it was, because
+ * the handler should already have emitted the necessary error
+ * messages.
+ */
+ failed = true;
+ goto finish;
+ }
+
+ /* Execute the amcheck command for the given relation type. */
+ switch (checktype)
+ {
+ /* heapam types */
+ case CT_TABLE:
+ prepare_table_command(&sql, checkopts, reloid, amcheck_schema);
+ ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler, sql.data);
+ run_command(free_slot->connection, sql.data, checkopts, reloid,
+ ctfilter[checktype].typname);
+ break;
+
+ /* btreeam types */
+ case CT_BTREE:
+ prepare_btree_command(&sql, checkopts, reloid, amcheck_schema);
+ ParallelSlotSetHandler(free_slot, VerifyBtreeSlotHandler, NULL);
+ run_command(free_slot->connection, sql.data, checkopts, reloid,
+ ctfilter[checktype].typname);
+ break;
+
+ /* intentionally no default here */
+ }
+ }
+
+ /*
+ * Wait for all slots to complete, or for one to indicate that an error
+ * occurred. Like above, we rely on the handler emitting the necessary
+ * error messages.
+ */
+ if (!ParallelSlotsWaitCompletion(slots, parallel_workers))
+ failed = true;
+
+finish:
+ ParallelSlotsTerminate(slots, parallel_workers);
+ pg_free(slots);
+
+ termPQExpBuffer(&sql);
+
+ if (amcheck_schema != NULL)
+ PQfreemem(amcheck_schema);
+
+ if (failed)
+ exit(1);
+}
+
+/*
+ * prepare_table_command
+ *
+ * Creates a SQL command for running amcheck checking on the given heap
+ * relation. The command is phrased as a SQL query, with column order and
+ * names matching the expectations of VerifyHeapamSlotHandler, which will
+ * receive and handle each row returned from the verify_heapam() function.
+ *
+ * sql: buffer into which the table checking command will be written
+ * checkopts: user supplied program options
+ * reloid: relation of the table to be checked
+ */
+static void
+prepare_table_command(PQExpBuffer sql, const amcheckOptions *checkopts,
+ Oid reloid, const char *amcheck_schema)
+{
+ resetPQExpBuffer(sql);
+ appendPQExpBuffer(sql,
+ "SELECT n.nspname, c.relname, v.blkno, v.offnum, v.attnum, v.msg"
+ "\nFROM %s.verify_heapam("
+ "\nrelation := %u,"
+ "\non_error_stop := %s,"
+ "\ncheck_toast := %s,"
+ "\nskip := '%s'",
+ amcheck_schema,
+ reloid,
+ checkopts->on_error_stop ? "true" : "false",
+ checkopts->reconcile_toast ? "true" : "false",
+ checkopts->skip);
+ if (checkopts->startblock >= 0)
+ appendPQExpBuffer(sql, ",\nstartblock := %ld", checkopts->startblock);
+ if (checkopts->endblock >= 0)
+ appendPQExpBuffer(sql, ",\nendblock := %ld", checkopts->endblock);
+ appendPQExpBuffer(sql, "\n) v,"
+ "\npg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE c.oid OPERATOR(pg_catalog.=) %u",
+ reloid);
+}
+
+/*
+ * prepare_btree_command
+ *
+ * Creates a SQL command for running amcheck checking on the given btree index
+ * relation. The command does not select any columns, as btree checking
+ * functions do not return any, but rather return corruption information by
+ * raising errors, which VerifyBtreeSlotHandler expects.
+ *
+ * sql: buffer into which the table checking command will be written
+ * checkopts: user supplied program options
+ * reloid: relation of the table to be checked
+ */
+static void
+prepare_btree_command(PQExpBuffer sql, const amcheckOptions *checkopts,
+ Oid reloid, const char *amcheck_schema)
+{
+ resetPQExpBuffer(sql);
+ if (checkopts->parent_check)
+ appendPQExpBuffer(sql,
+ "SELECT %s.bt_index_parent_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s,"
+ "\nrootdescend := %s)",
+ amcheck_schema,
+ reloid,
+ (checkopts->heapallindexed ? "true" : "false"),
+ (checkopts->rootdescend ? "true" : "false"));
+ else
+ appendPQExpBuffer(sql,
+ "SELECT %s.bt_index_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s)",
+ amcheck_schema,
+ reloid,
+ (checkopts->heapallindexed ? "true" : "false"));
+}
+
+/*
+ * run_command
+ *
+ * Sends a command to the server without waiting for the command to complete.
+ * Logs an error if the command cannot be sent, but otherwise any errors are
+ * expected to be handled by a ParallelSlotHandler.
+ *
+ * conn: connection to the server associated with the slot to use
+ * sql: query to send
+ * checkopts: user supplied program options
+ * reloid: oid of the object being checked, for error reporting
+ * typ: type of object being checked, for error reporting
+ */
+static void
+run_command(PGconn *conn, const char *sql, const amcheckOptions *checkopts,
+ Oid reloid, const char *typ)
+{
+ bool status;
+
+ if (checkopts->echo)
+ printf("%s\n", sql);
+
+ status = PQsendQuery(conn, sql) == 1;
+
+ if (!status)
+ {
+ pg_log_error("check of %s with id %u in database \"%s\" failed: %s",
+ typ, reloid, PQdb(conn), PQerrorMessage(conn));
+ pg_log_error("command was: %s", sql);
+ }
+}
+
+/*
+ * VerifyHeapamSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a table checking command
+ * created by prepare_table_command and outputs the results for the user.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: the sql query being handled, as a cstring
+ */
+static PGresult *
+VerifyHeapamSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ int ntups = PQntuples(res);
+
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ int i;
+
+ for (i = 0; i < ntups; i++)
+ {
+ if (!PQgetisnull(res, i, 4))
+ printf("relation %s.%s.%s, block %s, offset %s, attribute %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ PQgetvalue(res, i, 4), /* attnum */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 3))
+ printf("relation %s.%s.%s, block %s, offset %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 2))
+ printf("relation %s.%s.%s, block %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 1))
+ printf("relation %s.%s.%s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ /* blkno is null: 2 */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else
+ printf("%s.%s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 5)); /* msg */
+ }
+ }
+ else if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ printf("%s: %s\n", PQdb(conn), PQerrorMessage(conn));
+ printf(_("query was: %s\n"), (const char *) context);
+ }
+
+ return res;
+}
+
+/*
+ * VerifyBtreeSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a btree checking command
+ * created by prepare_btree_command and outputs them for the user. The results
+ * from the btree checking command is assumed to be empty, but when the results
+ * are an error code, the useful information about the corruption is expected
+ * in the connection's error message.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: unused
+ */
+static PGresult *
+VerifyBtreeSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ printf("%s: %s\n", PQdb(conn), PQerrorMessage(conn));
+ return res;
+}
+
+/*
+ * help
+ *
+ * Prints help page for the program
+ *
+ * progname: the name of the executed program, such as "pg_amcheck"
+ */
+static void
+help(const char *progname)
+{
+ printf(_("%s checks objects in a PostgreSQL database for corruption.\n\n"), progname);
+ printf(_("Usage:\n"));
+ printf(_(" %s [OPTION]... [DBNAME]\n"), progname);
+ printf(_("\nTarget Options:\n"));
+ printf(_(" -a, --all check all databases\n"));
+ printf(_(" -d, --dbname=DBNAME check specific database(s)\n"));
+ printf(_(" -D, --exclude-dbname=DBNAME do NOT check specific database(s)\n"));
+ printf(_(" -i, --index=INDEX check specific index(es)\n"));
+ printf(_(" -I, --exclude-index=INDEX do NOT check specific index(es)\n"));
+ printf(_(" -r, --relation=RELNAME check specific relation(s)\n"));
+ printf(_(" -R, --exclude-relation=RELNAME do NOT check specific relation(s)\n"));
+ printf(_(" -s, --schema=SCHEMA check specific schema(s)\n"));
+ printf(_(" -S, --exclude-schema=SCHEMA do NOT check specific schema(s)\n"));
+ printf(_(" -t, --table=TABLE check specific table(s)\n"));
+ printf(_(" -T, --exclude-table=TABLE do NOT check specific table(s)\n"));
+ printf(_(" --exclude-indexes do NOT perform any index checking\n"));
+ printf(_(" --exclude-toast do NOT check any toast tables or indexes\n"));
+ printf(_(" --no-dependents do NOT automatically check dependent objects\n"));
+ printf(_("\nIndex Checking Options:\n"));
+ printf(_(" -H, --heapallindexed check all heap tuples are found within indexes\n"));
+ printf(_(" -P, --parent-check check parent/child relationships during index checking\n"));
+ printf(_(" --rootdescend search from root page to refind tuples at the leaf level\n"));
+ printf(_("\nTable Checking Options:\n"));
+ printf(_(" --exclude-toast-pointers do NOT check relation toast pointers against toast\n"));
+ printf(_(" --on-error-stop stop checking a relation at end of first corrupt page\n"));
+ printf(_(" --skip=OPTION do NOT check \"all-frozen\" or \"all-visible\" blocks\n"));
+ printf(_(" --startblock begin checking table(s) at the given starting block number\n"));
+ printf(_(" --endblock check table(s) only up to the given ending block number\n"));
+ printf(_("\nConnection options:\n"));
+ printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
+ printf(_(" -p, --port=PORT database server port\n"));
+ printf(_(" -U, --username=USERNAME user name to connect as\n"));
+ printf(_(" -w, --no-password never prompt for password\n"));
+ printf(_(" -W, --password force password prompt\n"));
+ printf(_(" --maintenance-db=DBNAME alternate maintenance database\n"));
+ printf(_("\nOther Options:\n"));
+ printf(_(" -e, --echo show the commands being sent to the server\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to the server\n"));
+ printf(_(" -q, --quiet don't write any messages\n"));
+ printf(_(" -v, --verbose write a lot of output\n"));
+ printf(_(" -V, --version output version information, then exit\n"));
+ printf(_(" -?, --help show this help, then exit\n"));
+
+ printf(_("\nRead the description of the amcheck contrib module for details.\n"));
+ printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+ printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
+
+/*
+ * get_db_regexes_from_fqrps
+ *
+ * For each pattern in the patterns list, if it is in fully-qualified
+ * database.schema.name format (fully-qualifed relation pattern (fqrp)), parse
+ * the database portion of the pattern, convert it to SQL regex format, and
+ * append it to the databases list. Patterns that are not fully-qualified are
+ * skipped over. No deduplication of regexes is performed.
+ *
+ * regexes: list to which parsed and converted database regexes are appended
+ * patterns: list of all patterns to parse
+ */
+static void
+get_db_regexes_from_fqrps(SimpleStringList *regexes,
+ const SimpleStringList *patterns)
+{
+ const SimpleStringListCell *cell;
+ PQExpBufferData dbnamebuf;
+ PQExpBufferData schemabuf;
+ PQExpBufferData namebuf;
+ int encoding = pg_get_encoding_from_locale(NULL, false);
+
+ initPQExpBuffer(&dbnamebuf);
+ initPQExpBuffer(&schemabuf);
+ initPQExpBuffer(&namebuf);
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /* parse the pattern as dbname.schema.relname, if possible */
+ patternToSQLRegex(encoding, &dbnamebuf, &schemabuf, &namebuf,
+ cell->val, false);
+
+ /* add the database name (or pattern), if any, to the list */
+ if (dbnamebuf.data[0])
+ simple_string_list_append(regexes, dbnamebuf.data);
+
+ /* we do not use the schema or relname portions */
+
+ /* we may have dirtied the buffers */
+ resetPQExpBuffer(&dbnamebuf);
+ resetPQExpBuffer(&schemabuf);
+ resetPQExpBuffer(&namebuf);
+ }
+ termPQExpBuffer(&dbnamebuf);
+ termPQExpBuffer(&schemabuf);
+ termPQExpBuffer(&namebuf);
+}
+
+/*
+ * get_db_regexes_from_patterns
+ *
+ * Convert each unqualified pattern in the list to SQL regex format and append
+ * it to the regexes list.
+ *
+ * regexes: list to which converted regexes are appended
+ * patterns: list of patterns to be converted
+ */
+static void
+get_db_regexes_from_patterns(SimpleStringList *regexes,
+ const SimpleStringList *patterns)
+{
+ const SimpleStringListCell *cell;
+ PQExpBufferData buf;
+ int encoding = pg_get_encoding_from_locale(NULL, false);
+
+ initPQExpBuffer(&buf);
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ patternToSQLRegex(encoding, NULL, NULL, &buf, cell->val, false);
+ if (buf.data[0])
+ simple_string_list_append(regexes, buf.data);
+ resetPQExpBuffer(&buf);
+ }
+ termPQExpBuffer(&buf);
+}
+
+/*
+ * dbname_select
+ *
+ * Appends a statement which selects the names of all databases matching the
+ * given SQL regular expressions.
+ *
+ * conn: connection to the initial database
+ * sql: buffer into which the constructed sql statement is appended
+ * regexes: list of database name regular expressions to match
+ * alldb: when true, select all databases which allow connections
+ */
+static void
+dbname_select(PGconn *conn, PQExpBuffer sql, const SimpleStringList *regexes,
+ bool alldb)
+{
+ SimpleStringListCell *cell;
+ const char *comma;
+
+ if (alldb)
+ {
+ appendPQExpBufferStr(sql, "\nSELECT datname::TEXT AS datname"
+ "\nFROM pg_catalog.pg_database"
+ "\nWHERE datallowconn");
+ return;
+ }
+ else if (regexes->head == NULL)
+ {
+ appendPQExpBufferStr(sql, "\nSELECT ''::TEXT AS datname"
+ "\nWHERE false");
+ return;
+ }
+
+ appendPQExpBufferStr(sql, "\nSELECT datname::TEXT AS datname"
+ "\nFROM pg_catalog.pg_database"
+ "\nWHERE datallowconn"
+ "\nAND datname::TEXT OPERATOR(pg_catalog.~) ANY(ARRAY[\n");
+ for (cell = regexes->head, comma = ""; cell; cell = cell->next, comma = ",\n")
+ {
+ appendPQExpBufferStr(sql, comma);
+ appendStringLiteralConn(sql, cell->val, conn);
+ appendPQExpBufferStr(sql, "::TEXT COLLATE pg_catalog.default");
+ }
+ appendPQExpBufferStr(sql, "\n]::TEXT[])");
+}
+
+/*
+ * schema_select
+ *
+ * Appends a statement which selects all schemas matching the given patterns
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * fieldname: alias to use for the oid field within the created SELECT
+ * statement
+ * patterns: list of schema name patterns to match
+ * inclusive: when patterns is an empty list, whether the select statement
+ * should match all non-system schemas
+ */
+static void
+schema_select(PGconn *conn, PQExpBuffer sql, const SimpleStringList *patterns,
+ bool inclusive)
+{
+ SimpleStringListCell *cell;
+ const char *comma;
+ int encoding = PQclientEncoding(conn);
+
+ if (patterns->head == NULL)
+ {
+ if (!inclusive)
+ appendPQExpBufferStr(sql,
+ "\nSELECT 0::pg_catalog.oid AS nspoid WHERE false");
+ else
+ appendPQExpBufferStr(sql,
+ "\nSELECT oid AS nspoid"
+ "\nFROM pg_catalog.pg_namespace"
+ "\nWHERE oid OPERATOR(pg_catalog.!=) pg_catalog.regnamespace('pg_catalog')"
+ "\nAND oid OPERATOR(pg_catalog.!=) pg_catalog.regnamespace('pg_toast')");
+ return;
+ }
+
+ appendPQExpBufferStr(sql,
+ "\nSELECT oid AS nspoid"
+ "\nFROM pg_catalog.pg_namespace"
+ "\nWHERE nspname OPERATOR(pg_catalog.~) ANY(ARRAY[\n");
+ for (cell = patterns->head, comma = ""; cell; cell = cell->next, comma = ",\n")
+ {
+ PQExpBufferData regexbuf;
+
+ initPQExpBuffer(®exbuf);
+ patternToSQLRegex(encoding, NULL, NULL, ®exbuf, cell->val, false);
+ appendPQExpBufferStr(sql, comma);
+ appendStringLiteralConn(sql, regexbuf.data, conn);
+ appendPQExpBufferStr(sql, "::TEXT COLLATE pg_catalog.default");
+ termPQExpBuffer(®exbuf);
+ }
+ appendPQExpBufferStr(sql, "\n]::TEXT[])");
+}
+
+/*
+ * schema_cte
+ *
+ * Appends a Common Table Expression (CTE) which selects all schemas to be
+ * checked, with the CTE and oid field named as requested. The CTE will select
+ * all schemas matching the include list except any schemas matching the
+ * exclude list.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * ctename: name of the schema CTE to be created
+ * fieldname: name of the oid field within the schema CTE to be created
+ * include: list of schema name patterns for inclusion
+ * exclude: list of schema name patterns for exclusion
+ * inclusive: when 'include' is an empty list, whether to use all schemas in
+ * the database in lieu of the include list.
+ */
+static void
+schema_cte(PGconn *conn, PQExpBuffer sql, const char *ctename,
+ const SimpleStringList *include, const SimpleStringList *exclude,
+ bool inclusive)
+{
+ appendPQExpBuffer(sql, "\n%s (nspoid) AS (", ctename);
+ schema_select(conn, sql, include, inclusive);
+ appendPQExpBufferStr(sql, "\nEXCEPT");
+ schema_select(conn, sql, exclude, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * append_ctfilter_quals
+ *
+ * Appends quals to a buffer that restrict the rows selected from pg_class to
+ * only those which match the given checktype. No initial "WHERE" or "AND" is
+ * appended, nor do we surround our appended clauses in parens. The caller is
+ * assumed to take care of such matters.
+ *
+ * sql: buffer into which the constructed sql quals are appended
+ * relname: name (or alias) of pg_class in the surrounding query
+ * checktype: struct containing filter info
+ */
+static void
+append_ctfilter_quals(PQExpBuffer sql, const char *relname, CheckType checktype)
+{
+ appendPQExpBuffer(sql,
+ "%s.relam OPERATOR(pg_catalog.=) %u"
+ "\nAND %s.relkind OPERATOR(pg_catalog.=) ANY(ARRAY[%s])",
+ relname, ctfilter[checktype].relam,
+ relname, ctfilter[checktype].relkinds);
+}
+
+/*
+ * relation_select
+ *
+ * Appends a statement which selects the oid of all relations matching the
+ * given parameters. Expects a mixture of qualified and unqualified relation
+ * name patterns.
+ *
+ * For unqualified relation patterns, selects relations that match the relation
+ * name portion of the pattern which are in namespaces that are in the given
+ * namespace CTE.
+ *
+ * For qualified relation patterns, ignores the given namespace CTE and selects
+ * relations that match the relation name portion of the pattern which are in
+ * namespaces that match the schema portion of the pattern.
+ *
+ * For fully qualified relation patterns (database.schema.name), the pattern
+ * will be ignored unless the database portion of the pattern matches the name
+ * of the current database, as retrieved from conn.
+ *
+ * Only relations of the specified checktype will be selected.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * fieldname: alias to use for the oid field within the created SELECT
+ * statement
+ * patterns: list of (possibly qualified) relation name patterns to match
+ * checktype: the type of relation to select
+ * inclusive: when patterns is an empty list, whether the select statement
+ * should match all relations of the given type
+ */
+static void
+relation_select(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *fieldname,
+ const SimpleStringList *patterns, CheckType checktype,
+ bool inclusive)
+{
+ SimpleStringListCell *cell;
+ const char *comma = "";
+ const char *qor = "";
+ PQExpBufferData qualified;
+ PQExpBufferData unqualified;
+ PQExpBufferData dbnamebuf;
+ PQExpBufferData schemabuf;
+ PQExpBufferData namebuf;
+ int encoding = PQclientEncoding(conn);
+
+ if (patterns->head == NULL)
+ {
+ if (!inclusive)
+ appendPQExpBuffer(sql,
+ "\nSELECT 0::pg_catalog.oid AS %s WHERE false",
+ fieldname);
+ else
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT oid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN %s n"
+ "\nON n.nspoid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE ",
+ fieldname, schemacte);
+ append_ctfilter_quals(sql, "c", checktype);
+ }
+ return;
+ }
+
+ /*
+ * We have to distinguish between schema-qualified and unqualified
+ * relation patterns. The unqualified patterns need to be restricted by
+ * the list of schemas returned by the schema CTE, but not so for the
+ * qualified patterns.
+ *
+ * We treat fully-qualified relation patterns (database.schema.relation)
+ * like schema-qualified patterns except that we also require the database
+ * portion to match the current database name.
+ */
+ initPQExpBuffer(&qualified);
+ initPQExpBuffer(&unqualified);
+ initPQExpBuffer(&dbnamebuf);
+ initPQExpBuffer(&schemabuf);
+ initPQExpBuffer(&namebuf);
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ patternToSQLRegex(encoding, &dbnamebuf, &schemabuf, &namebuf,
+ cell->val, false);
+
+ if (schemabuf.data[0])
+ {
+ /* Qualified relation pattern */
+ appendPQExpBuffer(&qualified, "%s\n(", qor);
+
+ if (dbnamebuf.data[0])
+ {
+ /*
+ * Fully-qualified relation pattern. Require the database
+ * name of our connection to match the database portion of the
+ * relation pattern.
+ */
+ appendPQExpBufferStr(&qualified, "\n");
+ appendStringLiteralConn(&qualified, PQdb(conn), conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, dbnamebuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default AND");
+ }
+
+ /*
+ * Require the namespace name to match the schema portion of the
+ * relation pattern and the relation name to match the relname
+ * portion of the relation pattern.
+ */
+ appendPQExpBufferStr(&qualified,
+ "\nn.nspname OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, schemabuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default AND"
+ "\nc.relname OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, namebuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default)");
+ qor = "\nOR";
+ }
+ else
+ {
+ /* Unqualified relation pattern */
+ appendPQExpBufferStr(&unqualified, comma);
+ appendStringLiteralConn(&unqualified, namebuf.data, conn);
+ appendPQExpBufferStr(&unqualified,
+ "::TEXT COLLATE pg_catalog.default");
+ comma = "\n, ";
+ }
+
+ resetPQExpBuffer(&dbnamebuf);
+ resetPQExpBuffer(&schemabuf);
+ resetPQExpBuffer(&namebuf);
+ }
+
+ if (qualified.data[0])
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT c.oid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE (",
+ fieldname);
+ appendPQExpBufferStr(sql, qualified.data);
+ appendPQExpBufferStr(sql, ")\nAND ");
+ append_ctfilter_quals(sql, "c", checktype);
+ if (unqualified.data[0])
+ appendPQExpBufferStr(sql, "\nUNION ALL");
+ }
+ if (unqualified.data[0])
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT c.oid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN %s ls"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) ls.nspoid"
+ "\nWHERE c.relname OPERATOR(pg_catalog.~) ANY(ARRAY[",
+ fieldname, schemacte);
+ appendPQExpBufferStr(sql, unqualified.data);
+ appendPQExpBufferStr(sql, "\n]::TEXT[])\nAND ");
+ append_ctfilter_quals(sql, "c", checktype);
+ }
+}
+
+/*
+ * table_cte
+ *
+ * Appends to the buffer 'sql' a Common Table Expression (CTE) which selects
+ * all table relations matching the given filters.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * ctename: name of the table CTE to be created
+ * fieldname: name of the oid field within the table CTE to be created
+ * include: list of table name patterns for inclusion
+ * exclude: list of table name patterns for exclusion
+ * inclusive: when 'include' is an empty list, whether the select statement
+ * should match all relations
+ * toast: whether to also select the associated toast tables
+ */
+static void
+table_cte(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *ctename, const char *fieldname,
+ const SimpleStringList *include, const SimpleStringList *exclude,
+ bool inclusive, bool toast)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+
+ if (toast)
+ {
+ /*
+ * Compute the primary tables, then union on all associated toast
+ * tables. We depend on left to right evaluation of the UNION before
+ * the EXCEPT which gets added below. UNION and EXCEPT have equal
+ * precedence, so be careful if you rearrange this query.
+ */
+ appendPQExpBuffer(sql, "\nWITH primary_table AS (");
+ relation_select(conn, sql, schemacte, fieldname, include,
+ CT_TABLE, inclusive);
+ appendPQExpBuffer(sql, "\n)"
+ "\nSELECT %s"
+ "\nFROM primary_table"
+ "\nUNION"
+ "\nSELECT c.reltoastrelid AS %s"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN primary_table pt"
+ "\nON pt.%s OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE c.reltoastrelid OPERATOR(pg_catalog.!=) 0",
+ fieldname, fieldname, fieldname);
+ }
+ else
+ relation_select(conn, sql, schemacte, fieldname, include,
+ CT_TABLE, inclusive);
+
+ appendPQExpBufferStr(sql, "\nEXCEPT");
+ relation_select(conn, sql, schemacte, fieldname, exclude,
+ CT_TABLE, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * exclude_index_cte
+ * Appends a CTE which selects all indexes to be excluded
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql CTE is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * ctename: name of the index CTE to be created
+ * fieldname: name of the oid field within the index CTE to be created
+ * patterns: list of index name patterns to match
+ */
+static void
+exclude_index_cte(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *ctename,
+ const char *fieldname, const SimpleStringList *patterns)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+ relation_select(conn, sql, schemacte, fieldname, patterns,
+ CT_BTREE, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * index_cte
+ * Appends a CTE which selects all indexes to be checked
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql CTE is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * ctename: name of the index CTE to be created
+ * fieldname: name of the oid field within the index CTE to be created
+ * excludecte: name of the CTE which contains all indexes to be excluded
+ * tablescte: optional; if automatically including indexes for checked tables,
+ * the name of the CTE which contains all tables to be checked
+ * tablesfield: if tablescte is not NULL, the name of the oid field in the
+ * tables CTE
+ * patterns: list of index name patterns to match
+ * inclusive: when 'include' is an empty list, whether the select statement should match all relations
+ */
+static void
+index_cte(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *ctename, const char *fieldname,
+ const char *excludecte, const char *tablescte,
+ const char *tablesfield, const SimpleStringList *patterns,
+ bool inclusive)
+{
+ appendPQExpBuffer(sql, "\n%s (%s) AS (", ctename, fieldname);
+ appendPQExpBuffer(sql, "\nSELECT %s FROM (", fieldname);
+ relation_select(conn, sql, schemacte, fieldname, patterns,
+ CT_BTREE, inclusive);
+ if (tablescte)
+ {
+ appendPQExpBuffer(sql,
+ "\nUNION"
+ "\nSELECT i.indexrelid AS %s"
+ "\nFROM pg_catalog.pg_index i"
+ "\nJOIN %s t ON t.%s OPERATOR(pg_catalog.=) i.indrelid",
+ fieldname, tablescte, tablesfield);
+ }
+ appendPQExpBuffer(sql,
+ "\n) AS included_indexes"
+ "\nEXCEPT"
+ "\nSELECT %s FROM %s",
+ fieldname, excludecte);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * target_select
+ *
+ * Construct a query that will return a list of all tables and indexes in
+ * the database matching the user specified options, sorted by size. We
+ * want the largest tables and indexes first, so that the parallel
+ * processing of the larger database objects gets started sooner.
+ *
+ * If 'inclusive' is true, include all tables and indexes not otherwise
+ * excluded; if false, include only tables and indexes explicitly included.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql select statement is appended
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ * inclusive: when list of objects to include is empty, whether the select
+ * statement should match all objects not otherwise excluded
+ */
+static void
+target_select(PGconn *conn, PQExpBuffer sql, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname,
+ bool inclusive)
+{
+ appendPQExpBufferStr(sql, "WITH");
+ schema_cte(conn, sql, "namespaces", &objects->schemas,
+ &objects->exclude_schemas, inclusive);
+ appendPQExpBufferStr(sql, ",");
+ table_cte(conn, sql, "namespaces", "tables", "tbloid",
+ &objects->tables, &objects->exclude_tables, inclusive,
+ !checkopts->exclude_toast);
+ if (!checkopts->no_indexes)
+ {
+ appendPQExpBufferStr(sql, ",");
+ exclude_index_cte(conn, sql, "namespaces",
+ "excluded_indexes", "idxoid",
+ &objects->exclude_indexes);
+ appendPQExpBufferStr(sql, ",");
+ if (checkopts->dependents)
+ index_cte(conn, sql, "namespaces", "indexes", "idxoid",
+ "excluded_indexes", "tables", "tbloid",
+ &objects->indexes, inclusive);
+ else
+ index_cte(conn, sql, "namespaces", "indexes", "idxoid",
+ "excluded_indexes", NULL, NULL, &objects->indexes,
+ inclusive);
+ }
+ appendPQExpBuffer(sql,
+ "\nSELECT checktype, oid FROM ("
+ "\nSELECT %u AS checktype, tables.tbloid AS oid, c.relpages"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN tables"
+ "\nON tables.tbloid OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE ",
+ CT_TABLE);
+ append_ctfilter_quals(sql, "c", CT_TABLE);
+ if (!checkopts->no_indexes)
+ {
+ appendPQExpBuffer(sql,
+ "\nUNION ALL"
+ "\nSELECT %u AS checktype, indexes.idxoid AS oid, c.relpages"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN indexes"
+ "\nON indexes.idxoid OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE ",
+ CT_BTREE);
+ append_ctfilter_quals(sql, "c", CT_BTREE);
+ }
+ appendPQExpBufferStr(sql,
+ "\n) AS ss"
+ "\nORDER BY relpages DESC, checktype, oid");
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.h b/contrib/pg_amcheck/pg_amcheck.h
new file mode 100644
index 0000000000..b5ca276033
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.h
@@ -0,0 +1,135 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.h
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2020-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_AMCHECK_H
+#define PG_AMCHECK_H
+
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "libpq-fe.h"
+#include "pqexpbuffer.h" /* pgrminclude ignore */
+
+/* amcheck options controlled by user flags */
+typedef struct amcheckOptions
+{
+ bool alldb;
+ bool echo;
+ bool quiet;
+ bool verbose;
+ bool dependents;
+ bool no_indexes;
+ bool exclude_toast;
+ bool reconcile_toast;
+ bool on_error_stop;
+ bool parent_check;
+ bool rootdescend;
+ bool heapallindexed;
+ const char *skip;
+ int jobs; /* >= 0 indicates user specified the parallel
+ * degree, otherwise -1 */
+ long startblock;
+ long endblock;
+} amcheckOptions;
+
+/* names of database objects to include or exclude controlled by user flags */
+typedef struct amcheckObjects
+{
+ SimpleStringList databases;
+ SimpleStringList schemas;
+ SimpleStringList tables;
+ SimpleStringList indexes;
+ SimpleStringList exclude_databases;
+ SimpleStringList exclude_schemas;
+ SimpleStringList exclude_tables;
+ SimpleStringList exclude_indexes;
+} amcheckObjects;
+
+/*
+ * We cannot launch the same amcheck function for all checked objects. For
+ * btree indexes, we must use either bt_index_check() or
+ * bt_index_parent_check(). For heap relations, we must use verify_heapam().
+ * We silently ignore all other object types.
+ *
+ * The following CheckType enum and corresponding ctfilter array track which
+ * which kinds of relations get which treatment.
+ */
+typedef enum
+{
+ CT_TABLE = 0,
+ CT_BTREE
+} CheckType;
+
+/*
+ * This struct is used for filtering relations in pg_catalog.pg_class to just
+ * those of a given CheckType. The relam field should equal pg_class.relam,
+ * and the pg_class.relkind should be contained in the relkinds comma separated
+ * list.
+ *
+ * The 'typname' field is not strictly for filtering, but for printing messages
+ * about relations that matched the filter.
+ */
+typedef struct
+{
+ Oid relam;
+ const char *relkinds;
+ const char *typname;
+} CheckTypeFilter;
+
+/* Constants taken from pg_catalog/pg_am.dat */
+#define HEAP_TABLE_AM_OID 2
+#define BTREE_AM_OID 403
+
+static void check_each_database(ConnParams *cparams,
+ const amcheckObjects *objects,
+ const amcheckOptions *checkopts,
+ const char *progname);
+
+static void check_one_database(const ConnParams *cparams,
+ const amcheckObjects *objects,
+ const amcheckOptions *checkopts,
+ const char *progname);
+static void prepare_table_command(PQExpBuffer sql,
+ const amcheckOptions *checkopts, Oid reloid,
+ const char *nspname);
+
+static void prepare_btree_command(PQExpBuffer sql,
+ const amcheckOptions *checkopts, Oid reloid,
+ const char *nspname);
+
+static void run_command(PGconn *conn, const char *sql,
+ const amcheckOptions *checkopts, Oid reloid,
+ const char *typ);
+
+static PGresult *VerifyHeapamSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+
+static PGresult *VerifyBtreeSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+
+static void help(const char *progname);
+
+
+static void get_db_regexes_from_fqrps(SimpleStringList *regexes,
+ const SimpleStringList *patterns);
+
+static void get_db_regexes_from_patterns(SimpleStringList *regexes,
+ const SimpleStringList *patterns);
+
+static void dbname_select(PGconn *conn, PQExpBuffer sql,
+ const SimpleStringList *regexes, bool alldb);
+
+static void target_select(PGconn *conn, PQExpBuffer sql,
+ const amcheckObjects *objects,
+ const amcheckOptions *options, const char *progname,
+ bool inclusive);
+
+#endif /* PG_AMCHECK_H */
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..b52039c79b
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,78 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 16;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'qqq' ],
+ qr/database "qqq" does not exist/,
+ 'checking a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-U=no_such_user', 'postgres' ],
+ qr/role "=no_such_user" does not exist/,
+ 'checking with a non-existent user');
+
+#########################################
+# Test checking a database without amcheck installed, by name. We should see a
+# message about missing amcheck
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'template1' ],
+ qr/pg_amcheck: skipping database "template1": amcheck is not installed/,
+ 'checking a database by name without amcheck installed');
+
+#########################################
+# Test checking a database without amcheck installed, by only indirectly using
+# a dbname pattern. In verbose mode, we should see a message about missing
+# amcheck
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, '-v', '-d', '*', 'postgres' ],
+ qr/pg_amcheck: skipping database "template1": amcheck is not installed/,
+ 'checking a database by dbname implication without amcheck installed');
+
+#########################################
+# Test checking non-existent schemas, tables, and indexes
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no_such_schema' ],
+ 'checking a non-existent schema');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no_such_table' ],
+ 'checking a non-existent table');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no_such_index' ],
+ 'checking a non-existent schema');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no*such*schema*' ],
+ 'no matching schemas');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no*such*table*' ],
+ 'no matching tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no*such*index' ],
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..957094fcdd
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,475 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 70;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the test node is running.
+sub corrupt_first_page($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# toast table (if any) corresponding to the given main table relation, and
+# restarts the node.
+#
+# Assumes the test node is running
+sub remove_toast_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $toastname = relation_toast($dbname, $relname);
+ remove_relation_file($dbname, $toastname) if ($toastname);
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+for my $dbname (qw(db1 db2 db3))
+{
+ # Create the database
+ $node->safe_psql('postgres', qq(CREATE DATABASE $dbname));
+
+ # Load the amcheck extension, upon which pg_amcheck depends. Put the
+ # extension in an unexpected location to test that pg_amcheck finds it
+ # correctly. Create tables with names that look like pg_catalog names to
+ # check that pg_amcheck does not get confused by them. Create functions in
+ # schema public that look like amcheck functions to check that pg_amcheck
+ # does not use them.
+ $node->safe_psql($dbname, q(
+ CREATE SCHEMA amcheck_schema;
+ CREATE EXTENSION amcheck WITH SCHEMA amcheck_schema;
+ CREATE TABLE amcheck_schema.pg_database (junk text);
+ CREATE TABLE amcheck_schema.pg_namespace (junk text);
+ CREATE TABLE amcheck_schema.pg_class (junk text);
+ CREATE TABLE amcheck_schema.pg_operator (junk text);
+ CREATE TABLE amcheck_schema.pg_proc (junk text);
+ CREATE TABLE amcheck_schema.pg_tablespace (junk text);
+
+ CREATE FUNCTION public.bt_index_check(index regclass,
+ heapallindexed boolean default false)
+ RETURNS VOID AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong bt_index_check!';
+ END;
+ $$ LANGUAGE plpgsql;
+
+ CREATE FUNCTION public.bt_index_parent_check(index regclass,
+ heapallindexed boolean default false,
+ rootdescend boolean default false)
+ RETURNS VOID AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong bt_index_parent_check!';
+ END;
+ $$ LANGUAGE plpgsql;
+
+ CREATE FUNCTION public.verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip text default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ RETURNS SETOF record AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong verify_heapam!';
+ END;
+ $$ LANGUAGE plpgsql;
+ ));
+
+ # Create schemas, tables and indexes in five separate
+ # schemas. The schemas are all identical to start, but
+ # we will corrupt them differently later.
+ #
+ for my $schema (qw(s1 s2 s3 s4 s5))
+ {
+ $node->safe_psql($dbname, qq(
+ CREATE SCHEMA $schema;
+ CREATE SEQUENCE $schema.seq1;
+ CREATE SEQUENCE $schema.seq2;
+ CREATE TABLE $schema.t1 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE TABLE $schema.t2 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE VIEW $schema.t2_view AS (
+ SELECT i*2, t FROM $schema.t2
+ );
+ ALTER TABLE $schema.t2
+ ALTER COLUMN t
+ SET STORAGE EXTERNAL;
+
+ INSERT INTO $schema.t1 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ INSERT INTO $schema.t2 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ CREATE MATERIALIZED VIEW $schema.t1_mv AS SELECT * FROM $schema.t1;
+ CREATE MATERIALIZED VIEW $schema.t2_mv AS SELECT * FROM $schema.t2;
+
+ create table $schema.p1 (a int, b int) PARTITION BY list (a);
+ create table $schema.p2 (a int, b int) PARTITION BY list (a);
+
+ create table $schema.p1_1 partition of $schema.p1 for values in (1, 2, 3);
+ create table $schema.p1_2 partition of $schema.p1 for values in (4, 5, 6);
+ create table $schema.p2_1 partition of $schema.p2 for values in (1, 2, 3);
+ create table $schema.p2_2 partition of $schema.p2 for values in (4, 5, 6);
+
+ CREATE INDEX t1_btree ON $schema.t1 USING BTREE (i);
+ CREATE INDEX t2_btree ON $schema.t2 USING BTREE (i);
+
+ CREATE INDEX t1_hash ON $schema.t1 USING HASH (i);
+ CREATE INDEX t2_hash ON $schema.t2 USING HASH (i);
+
+ CREATE INDEX t1_brin ON $schema.t1 USING BRIN (i);
+ CREATE INDEX t2_brin ON $schema.t2 USING BRIN (i);
+
+ CREATE INDEX t1_gist ON $schema.t1 USING GIST (b);
+ CREATE INDEX t2_gist ON $schema.t2 USING GIST (b);
+
+ CREATE INDEX t1_gin ON $schema.t1 USING GIN (ia);
+ CREATE INDEX t2_gin ON $schema.t2 USING GIN (ia);
+
+ CREATE INDEX t1_spgist ON $schema.t1 USING SPGIST (ir);
+ CREATE INDEX t2_spgist ON $schema.t2 USING SPGIST (ir);
+ ));
+ }
+}
+
+# Database 'db1' corruptions
+#
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('db1', 's1.t1_btree');
+corrupt_first_page('db1', 's1.t2_btree');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('db1', 's2.t1');
+corrupt_first_page('db1', 's2.t2');
+
+# Corrupt tables, partitions, matviews, and btrees in schema "s3"
+remove_relation_file('db1', 's3.t1');
+corrupt_first_page('db1', 's3.t2');
+
+remove_relation_file('db1', 's3.t1_mv');
+remove_relation_file('db1', 's3.p1_1');
+
+corrupt_first_page('db1', 's3.t2_mv');
+corrupt_first_page('db1', 's3.p2_1');
+
+remove_relation_file('db1', 's3.t1_btree');
+corrupt_first_page('db1', 's3.t2_btree');
+
+# Corrupt toast table, partitions, and materialized views in schema "s4"
+remove_toast_file('db1', 's4.t2');
+
+# Corrupt all other object types in schema "s5". We don't have amcheck support
+# for these types, but we check that their corruption does not trigger any
+# errors in pg_amcheck
+remove_relation_file('db1', 's5.seq1');
+remove_relation_file('db1', 's5.t1_hash');
+remove_relation_file('db1', 's5.t1_gist');
+remove_relation_file('db1', 's5.t1_gin');
+remove_relation_file('db1', 's5.t1_brin');
+remove_relation_file('db1', 's5.t1_spgist');
+
+corrupt_first_page('db1', 's5.seq2');
+corrupt_first_page('db1', 's5.t2_hash');
+corrupt_first_page('db1', 's5.t2_gist');
+corrupt_first_page('db1', 's5.t2_gin');
+corrupt_first_page('db1', 's5.t2_brin');
+corrupt_first_page('db1', 's5.t2_spgist');
+
+
+# Database 'db2' corruptions
+#
+remove_relation_file('db2', 's1.t1');
+remove_relation_file('db2', 's1.t1_btree');
+
+
+# Leave 'db3' uncorrupted
+#
+
+
+# Standard first arguments to TestLib functions
+my @cmd = ('pg_amcheck', '--quiet', '-p', $port);
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ @cmd,, 'db1' ],
+ 'pg_amcheck all schemas, tables and indexes in database db1');
+
+$node->command_ok(
+ [ @cmd,, 'db1', 'db2', 'db3' ],
+ 'pg_amcheck all schemas, tables and indexes in databases db1, db2 and db3');
+
+$node->command_ok(
+ [ @cmd, '--all' ],
+ 'pg_amcheck all schemas, tables and indexes in all databases');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-s', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-r', 's*.t1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-i', 'i*.idx', '-i', 'idx.i*' ],
+ 'pg_amcheck all indexes with qualified names matching /i*.idx/ or /idx.i*/');
+
+$node->command_ok(
+ [ @cmd, '--no-dependents', 'db1', '-r', 's*.t1' ],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/');
+
+$node->command_ok(
+ [ @cmd, '--no-dependents', 'db1', '-t', 's*.t1', '-t', 'foo*.bar*' ],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/ or /foo*.bar*/');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-T', 't1' ],
+ 'pg_amcheck everything except tables named t1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-S', 's1', '-R', 't1' ],
+ 'pg_amcheck everything not named t1 nor in schema s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.*.*' ],
+ 'pg_amcheck all tables across all databases and schemas');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.*.t1' ],
+ 'pg_amcheck all tables named t1 across all databases and schemas');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.s1.*' ],
+ 'pg_amcheck all tables across all databases in schemas named s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', 'db2.*.*' ],
+ 'pg_amcheck all tables across all schemas in database db2');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', 'db2.*.*', '-t', 'db3.*.*' ],
+ 'pg_amcheck all tables across all schemas in databases db2 and db3');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ @cmd, '--all', '-s', 's1', '-i', 't1_btree' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck index s1.t1_btree reports missing main relation fork');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't2_btree' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+# Checking db1.s1 should show no corruptions if indexes are excluded
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/^$/,
+ 'pg_amcheck of db1.s1 excluding indexes');
+
+# But checking across all databases in schema s1 should show corruptions
+# messages for tables in db2
+$node->command_like(
+ [ @cmd, '--all', '-s', 's1', '--exclude-indexes' ],
+ qr/could not open file/,
+ 'pg_amcheck of schema s1 across all databases but excluding indexes');
+
+# Checking across a list of databases should also work
+$node->command_like(
+ [ @cmd, '-d', 'db2', '-d', 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/could not open file/,
+ 'pg_amcheck of schema s1 across db1 and db2 but excluding indexes');
+
+# In schema s3, the tables and indexes are both corrupt. We should see
+# corruption messages on stdout, nothing on stderr, and an exit
+# status of zero.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's3' ],
+ 0,
+ [ qr/index "t1_btree" lacks a main relation fork/,
+ qr/could not open file/ ],
+ [ qr/^$/ ],
+ 'pg_amcheck schema s3 reports table and index errors');
+
+# In schema s2, only tables are corrupt. Check that table corruption is
+# reported as expected.
+#
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't1' ],
+ qr/could not open file/,
+ 'pg_amcheck in schema s2 reports table corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck in schema s2 reports table corruption');
+
+# In schema s4, only toast tables are corrupt. Check that under default
+# options the toast corruption is reported, but when excluding toast we get no
+# error reports.
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's4' ],
+ qr/could not open file/,
+ 'pg_amcheck in schema s4 reports toast corruption');
+
+$node->command_like(
+ [ @cmd, '--exclude-toast', '--exclude-toast-pointers', 'db1', '-s', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck in schema s4 excluding toast reports no corruption');
+
+# Check that no corruption is reported in schema s5
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's5' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s5 reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-I', 't1_btree', '-I', 't2_btree' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s1 with corrupt indexes excluded reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s1 with all indexes excluded reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s2 with corrupt tables excluded reports no corruption');
+
+# Check errors about bad block range command line arguments. We use schema s5
+# to avoid getting messages about corrupt tables or indexes.
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', 'junk' ],
+ qr/relation starting block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage startblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--endblock', '1234junk' ],
+ qr/relation ending block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage endblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', '5', '--endblock', '4' ],
+ qr/relation ending block argument precedes starting block argument/,
+ 'pg_amcheck rejects invalid block range');
+
+# Check bt_index_parent_check alternates. We don't create any index corruption
+# that would behave differently under these modes, so just smoke test that the
+# arguments are handled sensibly.
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--parent-check' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck smoke test --parent-check');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--heapallindexed', '--rootdescend' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck smoke test --heapallindexed --rootdescend');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..7e71d612fc
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,496 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper function to generate a regular expression matching the header we
+# expect verify_heapam() to return given which fields we expect to be non-null.
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ return qr/relation postgres\.public\.test, block $blkno, offset $offnum, attribute $attnum\s+/ms
+ if (defined $attnum);
+ return qr/relation postgres\.public\.test, block $blkno, offset $offnum\s+/ms
+ if (defined $offnum);
+ return qr/relation postgres\.public\.test\s+/ms
+ if (defined $blkno);
+ return qr/relation postgres\.public\.test\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, undef);
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--exclude-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..379225cbf8
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..7e101f7c11 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -185,6 +185,7 @@ pages.
</para>
&oid2name;
+ &pgamcheck;
&vacuumlo;
</sect1>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db1d369743..5115cb03d0 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..2b2c73ca8b
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,1004 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<refentry id="pgamcheck">
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle><application>pg_amcheck</application></refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>pg_amcheck</refname>
+ <refpurpose>checks for corruption in one or more <productname>PostgreSQL</productname> databases</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_amcheck</command>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+ <arg rep="repeat"><replaceable>dbname</replaceable></arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <application>pg_amcheck</application> supports running
+ <xref linkend="amcheck"/>'s corruption checking functions against one or more
+ databases, with options to select which schemas, tables and indexes to check,
+ which kinds of checking to perform, and whether to perform the checks in
+ parallel, and if so, the number of parallel connections to establish and use.
+ </para>
+
+ <para>
+ Only table relations and btree indexes are currently supported. Other
+ relation types are silently skipped.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Usage</title>
+
+ <refsect2>
+ <title>Parallelism Options</title>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --jobs=20 --all</literal></term>
+ <listitem>
+ <para>
+ Check all databases one after another, but for each database checked,
+ use up to 20 simultaneous connections to check relations in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --jobs=8 mydb yourdb</literal></term>
+ <listitem>
+ <para>
+ Check databases <literal>mydb</literal> and <literal>yourdb</literal>
+ one after another, using up to 8 simultaneous connections to check
+ relations in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Checking Option Specification</title>
+
+ <para>
+ If no checking options are specified, by default all table relation checks
+ and default level btree index checks are performed. A variety of options
+ exist to change the set of checks performed on whichever relations are
+ being checked. They are briefly mentioned here in the following examples,
+ but see their full descriptions below.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --parent-check --heapallindexed</literal></term>
+ <listitem>
+ <para>
+ For each btree index checked, performs more extensive checks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --exclude-toast-pointers</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, do not check toast pointers against
+ the toast relation.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --on-error-stop</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, do not continue checking pages after
+ the first page where corruption is encountered.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --skip="all-frozen"</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, skips over blocks marked as all
+ frozen. Note that "all-visible" may also be specified.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --startblock=3000 --endblock=4000</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, check only blocks in the given block
+ range.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Relation Specification</title>
+
+ <para>
+ If no relations are explicitly listed, by default all relations will be
+ checked, but there are options to specify which relations to check.
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck -r mytable -r yourtable</literal></term>
+ <listitem>
+ <para>
+ If one or more relations are explicitly given, they are interpreted as
+ an exhaustive list of all relations to be checked, with one caveat:
+ for all such relations, associated toast relations and indexes are by
+ default included in the list of relations to check.
+ </para>
+ <para>
+ Assuming <literal>mytable</literal> is an ordinary table, and that it
+ is indexed by <literal>mytable_idx</literal> and has an associated
+ toast table <literal>pg_toast_12345</literal>, checking will be
+ performed on <literal>mytable</literal>,
+ <literal>mytable_idx</literal>, and <literal>pg_toast_12345</literal>.
+ </para>
+ <para>
+ Likewise for <literal>yourtable</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r mytable --no-dependents</literal></term>
+ <listitem>
+ <para>
+ This restricts the list of relations checked to just
+ <literal>mytable</literal>, without pulling in the corresponding
+ indexes or toast, but see also
+ <option>--exclude-toast-pointers</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -t mytable -i myindex</literal></term>
+ <listitem>
+ <para>
+ The <option>-r</option> (<option>--relation</option>) will match any
+ relation, but <option>-t</option> (<option>--table</option>) and
+ <option>-i</option> (<option>--index</option>) may be used to avoid
+ matching objects of the other type.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r="my*" -R="mytemp*"</literal></term>
+ <listitem>
+ <para>
+ Relations may be included (<option>-r</option>) or excluded
+ (<option>-R</option>) using shell-style patterns.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r="my*" -I="myanmar"</literal></term>
+ <listitem>
+ <para>
+ Table and index inclusion and exclusion patterns may be used
+ equivalently with <option>-t</option>, <option>-T</option>,
+ <option>-i</option> and <option>-I</option>. The above example checks
+ all tables and indexes starting with <literal>my</literal> except for
+ indexes starting with <literal>myanmar</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -R="india" -T="laos" -I="myanmar"</literal></term>
+ <listitem>
+ <para>
+ Unlike specifying one ore more <option>--relation</option> options, which
+ disables the default behavior of checking all relations, specifying one or
+ more of <option>-R</option>, <option>-T</option> or <option>-I</option> does not.
+ The above command will check all relations not named
+ <literal>india</literal>, not a table named
+ <literal>laos</literal>, nor an index named <literal>myanmar</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Schema Specification</title>
+
+ <para>
+ If no schemas are explicitly listed, by default all schemas except
+ <literal>pg_catalog</literal> and <literal>pg_toast</literal> will be
+ checked.
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck -s s1 -s s2 -r mytable</literal></term>
+ <listitem>
+ <para>
+ If one or more schemas are listed with <option>-s</option>, unqualified
+ relation names will be checked only in the given schemas. The above
+ command will check tables <literal>s1.mytable</literal> and
+ <literal>s2.mytable</literal> but not tables named
+ <literal>mytable</literal> in other schemas.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -S s1 -S s2 -r mytable</literal></term>
+ <listitem>
+ <para>
+ As with relations, schemas may be excluded. The above command will
+ check any table named <literal>mytable</literal> not in schemas
+ <literal>s1</literal> and <literal>s2</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -S s1 -S s2 -r mytable -t s1.stuff</literal></term>
+ <listitem>
+ <para>
+ Relations may be included or excluded with a schema-qualified name
+ without interference from the <option>-s</option> or
+ <option>-S</option> options. Even though schema <literal>s1</literal>
+ has been excluded, the table <literal>s1.stuff</literal> will be
+ checked.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Database Specification</title>
+
+ <para>
+ If no databases are explicitly listed, the database to check is obtained
+ from environment variables in the usual way. Otherwise, when one or more
+ databases are explicitly given, they are interpreted as an exhaustive list
+ of all databases to be checked. This list of databases to check may
+ contain patterns, but because any such patterns need to be reconciled
+ against a list of all databases to find the matching database names, at
+ least one database specified must be a literal database name and not merely
+ a pattern, and the one so specified must be in a location where
+ <application>pg_amcheck</application> expects to find it.
+ </para>
+ <para>
+ For example:
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --all --maintenance-db=foo</literal></term>
+ <listitem>
+ <para>
+ If the <option>--maintenance-db</option> option is given, it will be
+ used to look up the matching databases, though it will not itself be
+ added to the list of databases for checking.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck foo bar baz</literal></term>
+ <listitem>
+ <para>
+ Otherwise, if one or more plain database name arguments not preceded by
+ <option>-d</option> or <option>--dbname</option> are given, the first
+ one will be used for this purpose, and it will also be included in the
+ list of databases to check.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -d foo -d bar baz</literal></term>
+ <listitem>
+ <para>
+ If a mixture of plain database names and databases preceded with
+ <option>-d</option> or <option>--dbname</option> are given, the first
+ plain database name will be used for this purpose. In the above
+ example, <literal>baz</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --dbname=foo --dbname="bar*"</literal></term>
+ <listitem>
+ <para>
+ Otherwise, if one or more databases are given with the
+ <option>-d</option> or <option>--dbname</option> option, the first one
+ will be used and must be a literal database name. In this example,
+ <literal>foo</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --relation="accounts_*.*.*"</literal></term>
+ <listitem>
+ <para>
+ Otherwise, the environment will be consulted for the database to be
+ used. In the example above, the default database will be queried to
+ find all databases with names that begin with
+ <literal>accounts_</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ As discussed above for schema-qualified relations, a database-qualified
+ relation name or pattern may also be given.
+<programlisting>
+pg_amcheck mydb \
+ --schema="t*" \
+ --exclude-schema="tmp*" \
+ --relation=baz \
+ --relation=bar.baz \
+ --relation=foo.bar.baz \
+ --relation="f*".a.b \
+ --exclude-relation=foo.a.b
+</programlisting>
+ will check relations in database <literal>mydb</literal> using the schema
+ resolution rules discussed above, but additionally will check all relations
+ named <literal>a.b</literal> in all databases with names starting with
+ <literal>f</literal> except database <literal>foo</literal>.
+ </para>
+
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Options</title>
+
+ <para>
+ <application>pg_amcheck</application> accepts the following command-line arguments:
+ </para>
+
+ <refsect2>
+ <title>Help and Version Information Options</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_amcheck</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Print the <application>pg_amcheck</application> version and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-e</option></term>
+ <term><option>--echo</option></term>
+ <listitem>
+ <para>
+ Print to stdout all commands and queries being executed against the
+ server.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-q</option></term>
+ <term><option>--quiet</option></term>
+ <listitem>
+ <para>
+ Do not write additional messages beyond those about corruption.
+ </para>
+ <para>
+ This option does not quiet any output specifically due to the use of
+ the <option>-e</option> <option>--echo</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Increases the log level verbosity. This option may be given more than
+ once.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Database Connection and Concurrent Connection Options</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-h</option></term>
+ <term><option>--host=HOSTNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server is running.
+ If the value begins with a slash, it is used as the directory for the
+ Unix domain socket.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-p</option></term>
+ <term><option>--port=PORT</option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file extension on
+ which the server is listening for connections.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-U</option></term>
+ <term><option>--username=USERNAME</option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires password
+ authentication and a password is not available by other means such as
+ a <filename>.pgpass</filename> file, the connection attempt will fail.
+ This option can be useful in batch jobs and scripts where no user is
+ present to enter a password.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_amcheck</application> to prompt for a password
+ before connecting to a database.
+ </para>
+ <para>
+ This option is never essential, since
+ <application>pg_amcheck</application> will automatically prompt for a
+ password if the server demands password authentication. However,
+ <application>pg_amcheck</application> will waste a connection attempt
+ finding out that the server wants a password. In some cases it is
+ worth typing <option>-W</option> to avoid the extra connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--maintenance-db=DBNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to connect to when querying the
+ list of all databases. If not specified, the
+ <literal>postgres</literal> database will be used; if that does not
+ exist <literal>template1</literal> will be used. This can be a
+ <link linkend="libpq-connstring">connection string</link>. If so,
+ connection string parameters will override any conflicting command
+ line options.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-j</option></term>
+ <term><option>--jobs=NUM</option></term>
+ <listitem>
+ <para>
+ Use the specified number of concurrent connections to the server, or
+ one per object to be checked, whichever number is smaller.
+ </para>
+ <para>
+ When used in conjunction with the <option>-a</option>
+ <option>--all</option> option, the total number of objects to check,
+ and correspondingly the number of concurrent connections to use, is
+ recalculated per database. If the number of objects to check differs
+ from one database to the next and is less than the concurrency level
+ specified, the number of concurrent connections open to the server
+ will fluctuate to meet the needs of each database processed.
+ </para>
+ <para>
+ The default is to use a single connection.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Options Controlling Index Checking Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-P</option></term>
+ <term><option>--parent-check</option></term>
+ <listitem>
+ <para>
+ For each btree index checked, use <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> function, which performs
+ additional checks of parent/child relationships during index checking.
+ </para>
+ <para>
+ The default is to use <application>amcheck</application>'s
+ <function>bt_index_check</function> function, but note that use of the
+ <option>--rootdescend</option> option implicitly
+ selects <function>bt_index_parent_check</function>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-H</option></term>
+ <term><option>--heapallindexed</option></term>
+ <listitem>
+ <para>
+ For each index checked, verify the presence of all heap tuples as index
+ tuples in the index using <application>amcheck</application>'s
+ <option>heapallindexed</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--rootdescend</option></term>
+ <listitem>
+ <para>
+ For each index checked, re-find tuples on the leaf level by performing
+ a new search from the root page for each tuple using
+ <xref linkend="amcheck"/>'s <option>rootdescend</option> option.
+ </para>
+ <para>
+ Use of this option implicitly also selects the <option>-P</option>
+ <option>--parent-check</option> option.
+ </para>
+ <para>
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited use or even
+ of no use in helping detect the kinds of corruption that occur in
+ practice. It may also cause corruption checking to take considerably
+ longer and consume considerably more resources on the server.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Options Controlling Table Checking Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>--exclude-toast-pointers</option></term>
+ <listitem>
+ <para>
+ When checking main relations, do not look up entries in toast tables
+ corresponding to toast pointers in the main releation.
+ </para>
+ <para>
+ The default behavior checks each toast pointer encountered in the main
+ table to verify, as much as possible, that the pointer points at
+ something in the toast table that is reasonable. Toast pointers which
+ point beyond the end of the toast table, or to the middle (rather than
+ the beginning) of a toast entry, are identified as corrupt.
+ </para>
+ <para>
+ The process by which <xref linkend="amcheck"/>'s
+ <function>verify_heapam</function> function checks each toast pointer
+ is slow and may be improved in a future release. Some users may wish
+ to disable this check to save time.
+ </para>
+ <para>
+ Note that, despite their similar names, this option is unrelated to the
+ <option>--exclude-toast</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--on-error-stop</option></term>
+ <listitem>
+ <para>
+ After reporting all corruptions on the first page of a table where
+ corruptions are found, stop processing that table relation and move on
+ to the next table or index.
+ </para>
+ <para>
+ Note that index checking always stops after the first corrupt page.
+ This option only has meaning relative to table relations.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--skip=OPTION</option></term>
+ <listitem>
+ <para>
+ If <literal>"all-frozen"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all frozen.
+ </para>
+ <para>
+ If <literal>"all-visible"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all visible.
+ </para>
+ <para>
+ By default, no pages are skipped.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--startblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) pages prior to the given starting block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--endblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) all pages after the given ending block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Corruption Checking Target Options</title>
+
+ <para>
+ Objects to be checked may span schemas in more than one database. Options
+ for restricting the list of databases, schemas, tables and indexes are
+ described below. In each place where a name may be specified, a
+ <link linkend="app-psql-patterns"><replaceable class="parameter">pattern</replaceable></link>
+ may also be used.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><option>--all</option></term>
+ <listitem>
+ <para>
+ Perform checking in all databases.
+ </para>
+ <para>
+ In the absence of any other options, selects all objects across all
+ schemas and databases.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-db</option> takes
+ precedence over <option>--all</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-d</option></term>
+ <term><option>--dbname</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for checking. By default, all objects in
+ the matching database(s) will be checked.
+ </para>
+ <para>
+ If no <option>maintenance-db</option> argument is given nor is any
+ database name given as a command line argument, the first argument
+ specified with <option>-d</option> <option>--dbname</option> will be
+ used for the initial connection. If that argument is not a literal
+ database name, the attempt to connect will fail.
+ </para>
+ <para>
+ If <option>--all</option> is also specified, <option>-d</option>
+ <option>--dbname</option> does not affect which databases are checked,
+ but may be used to specify the database for the initial connection.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-db</option> takes
+ precedence over <option>-d</option> <option>--dbname</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--dbname=africa</literal></member>
+ <member><literal>--dbname="a*"</literal></member>
+ <member><literal>--dbname="africa|asia|europe"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-D</option></term>
+ <term><option>--exclude-db</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for exclusion.
+ </para>
+ <para>
+ If a database which is included using <option>--all</option> or
+ <option>-d</option> <option>--dbname</option> is also excluded using
+ <option>-D</option> <option>--exclude-db</option>, the database will be
+ excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--exclude-db=america</literal></member>
+ <member><literal>--exclude-db="*pacific*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--schema</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified schema(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for checking. By default, all objects in
+ the matching schema(s) will be checked.
+ </para>
+ <para>
+ Option <option>-S</option> <option>--exclude-schema</option> takes
+ precedence over <option>-s</option> <option>--schema</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--schema=corp</literal></member>
+ <member><literal>--schema="corp|llc|npo"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-S</option></term>
+ <term><option>--exclude-schema</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified schema.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for exclusion.
+ </para>
+ <para>
+ If a schema which is included using
+ <option>-s</option> <option>--schema</option> is also excluded using
+ <option>-S</option> <option>--exclude-schema</option>, the schema will be
+ excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>-S corp -S llc</literal></member>
+ <member><literal>--exclude-schema="*c*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-r</option></term>
+ <term><option>--relation</option></term>
+ <listitem>
+ <para>
+ Perform checking on the specified relation(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ relation (or relation pattern) for checking.
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>.
+ </para>
+ <para>
+ If the relation is not schema qualified, database and schema
+ inclusion/exclusion lists will determine in which databases or schemas
+ matching relations will be checked.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--relation=accounts_idx</literal></member>
+ <member><literal>--relation="llc.accounts_idx"</literal></member>
+ <member><literal>--relation="asia|africa.corp|llc.accounts_idx"</literal></member>
+ </simplelist>
+ </para>
+ <para>
+ The first example, <literal>--relation=accounts_idx</literal>, checks
+ relations named <literal>accounts_idx</literal> in all selected schemas
+ and databases.
+ </para>
+ <para>
+ The second example, <literal>--relation="llc.accounts_idx"</literal>,
+ checks relations named <literal>accounts_idx</literal> in schema
+ <literal>llc</literal> in all selected databases.
+ </para>
+ <para>
+ The third example,
+ <literal>--relation="asia|africa.corp|llc.accounts_idx"</literal>,
+ checks relations named <literal>accounts_idx</literal> in
+ schemas <literal>corp</literal> and <literal>llc</literal> in databases
+ <literal>asia</literal> and <literal>africa</literal>.
+ </para>
+ <para>
+ Note that if a database is implicated in a relation pattern, such as
+ <literal>asia</literal> and <literal>africa</literal> in the third
+ example above, the database need not be otherwise given in the command
+ arguments for the relation to be checked. As an extreme example of
+ this:
+ <simplelist>
+ <member><literal>pg_amcheck --relation="*.*.*" mydb</literal></member>
+ </simplelist>
+ will check all relations in all databases. The <literal>mydb</literal>
+ argument only serves to tell <application>pg_amcheck</application> the
+ name of the database to use for querying the list of all databases.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-R</option></term>
+ <term><option>--exclude-relation</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified relation(s).
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>,
+ <option>-t</option> <option>--table</option> and <option>-i</option>
+ <option>--index</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t</option></term>
+ <term><option>--table</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified tables(s). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T</option></term>
+ <term><option>--exclude-table</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified tables(s). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-i</option></term>
+ <term><option>--index</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified index(es). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-I</option></term>
+ <term><option>--exclude-index</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified index(es). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-dependents</option></term>
+ <listitem>
+ <para>
+ When calculating the list of objects to be checked, do not automatically
+ expand the list to include associated indexes and toast tables of
+ elements otherwise in the list.
+ </para>
+ <para>
+ By default, for each main table relation checked, any associated toast
+ table and all associated indexes are also checked, unless explicitly
+ excluded.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+
+ <para>
+ <application>pg_amcheck</application> is designed to work with
+ <productname>PostgreSQL</productname> 14.0 and later.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Author</title>
+
+ <para>
+ Mark Dilger <email>mark.dilger@enterprisedb.com</email>
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="amcheck"/></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index ea3af48777..49ad558b74 100644
--- a/src/tools/msvc/Install.pm
+++ b/src/tools/msvc/Install.pm
@@ -18,7 +18,7 @@ our (@ISA, @EXPORT_OK);
@EXPORT_OK = qw(Install);
my $insttype;
-my @client_contribs = ('oid2name', 'pgbench', 'vacuumlo');
+my @client_contribs = ('oid2name', 'pg_amcheck', 'pgbench', 'vacuumlo');
my @client_program_files = (
'clusterdb', 'createdb', 'createuser', 'dropdb',
'dropuser', 'ecpg', 'libecpg', 'libecpg_compat',
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 7be6e6c9e5..53fbfa012e 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4d0d09a5dd..26920cc512 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -336,6 +336,8 @@ CheckPointStmt
CheckpointStatsData
CheckpointerRequest
CheckpointerShmemStruct
+CheckType
+CheckTypeFilter
Chromosome
CkptSortItem
CkptTsStatus
@@ -2847,6 +2849,8 @@ ambuildempty_function
ambuildphasename_function
ambulkdelete_function
amcanreturn_function
+amcheckObjects
+amcheckOptions
amcostestimate_function
amendscan_function
amestimateparallelscan_function
--
2.21.1 (Apple Git-122.3)
v36-0005-Extending-PostgresNode-to-test-corruption.patchapplication/octet-stream; name=v36-0005-Extending-PostgresNode-to-test-corruption.patch; x-unix-mode=0644Download
From 30808fc59c1a1eb6f9a238b70e103846de44b834 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:37:58 -0800
Subject: [PATCH v36 5/5] Extending PostgresNode to test corruption.
PostgresNode now has functions for overwriting relation files
with full or partial prior versions of those files, creating
corruption beyond merely twiddling the bits of a heap relation
file.
Adding a regression test for pg_amcheck based on this new
functionality.
---
contrib/pg_amcheck/t/006_relfile_damage.pl | 135 +++++++++
src/test/modules/Makefile | 1 +
src/test/modules/corruption/Makefile | 16 ++
.../modules/corruption/t/001_corruption.pl | 83 ++++++
src/test/perl/PostgresNode.pm | 261 ++++++++++++++++++
5 files changed, 496 insertions(+)
create mode 100644 contrib/pg_amcheck/t/006_relfile_damage.pl
create mode 100644 src/test/modules/corruption/Makefile
create mode 100644 src/test/modules/corruption/t/001_corruption.pl
diff --git a/contrib/pg_amcheck/t/006_relfile_damage.pl b/contrib/pg_amcheck/t/006_relfile_damage.pl
new file mode 100644
index 0000000000..d997db5b63
--- /dev/null
+++ b/contrib/pg_amcheck/t/006_relfile_damage.pl
@@ -0,0 +1,135 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 27;
+use PostgresNode;
+
+my ($node, $port);
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create a table with a btree index. Use a fillfactor for the table and index
+# that will allow some fraction of updates to be on the original pages and some
+# on new pages.
+#
+$node->safe_psql('postgres', qq(
+create schema t;
+create table t.t1 (id integer, t text) with (fillfactor=75);
+alter table t.t1 alter column t set storage external;
+insert into t.t1 select gs, repeat('x',gs) from generate_series(9990,10000) gs;
+create index t1_idx on t.t1 (id) with (fillfactor=75);
+));
+
+my $toastrel = relation_toast('postgres', 't.t1');
+
+# Flush relation files to disk and take snapshots of the toast and index
+#
+$node->restart;
+$node->take_relfile_snapshot_minimal('postgres', 'idx', 't.t1_idx');
+$node->take_relfile_snapshot_minimal('postgres', 'toast', $toastrel);
+
+# Insert new data into the table and index
+#
+$node->safe_psql('postgres', qq(
+insert into t.t1 select gs, repeat('y',gs) from generate_series(10001,10100) gs;
+));
+
+# Revert index. The reverted snapshot file is not corrupt, but it also
+# does not match the current contents of the table.
+#
+$node->stop;
+$node->revert_to_snapshot('idx');
+
+# Restart the node and check table and index with varying options.
+#
+$node->start;
+
+# Checks which do not reconcile the index and table via --heapallindexed will
+# not notice any problems
+#
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--parent-check' ],
+ qr/^$/,
+ 'pg_amcheck reverted index with --parent-check');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--rootdescend' ],
+ qr/^$/,
+ 'pg_amcheck reverted index with --rootdescend');
+
+# Checks which do reconcile the index and table via --heapallindexed will
+# notice the mismatch in their contents
+#
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed' ],
+ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/,
+ 'pg_amcheck reverted index with --heapallindexed');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed', '--rootdescend' ],
+ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/,
+ 'pg_amcheck reverted index with --heapallindexed --rootdescend');
+
+# Revert the toast. The reverted toast table is not corrupt, but it does not
+# have entries for all toast pointers in the main table
+#
+$node->stop;
+$node->revert_to_snapshot('toast');
+
+# Restart the node and check table and toast with varying options. When
+# checking the toast pointers, we may get errors produced by verify_heapam, but
+# we may also get errors from failure to read toast blocks that are beyond the
+# end of the toast table, of the form /ERROR: could not read block/. To avoid
+# having a brittle test, we accept any error message.
+#
+$node->start;
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', $toastrel ],
+ qr/^$/,
+ 'pg_amcheck reverted toast table');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--exclude-toast-pointers' ],
+ qr/^$/,
+ 'pg_amcheck with reverted toast using --exclude-toast-pointers');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck with reverted toast and default checking');
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 5391f461a2..c92d1702b4 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -7,6 +7,7 @@ include $(top_builddir)/src/Makefile.global
SUBDIRS = \
brin \
commit_ts \
+ corruption \
delay_execution \
dummy_index_am \
dummy_seclabel \
diff --git a/src/test/modules/corruption/Makefile b/src/test/modules/corruption/Makefile
new file mode 100644
index 0000000000..ba461c645d
--- /dev/null
+++ b/src/test/modules/corruption/Makefile
@@ -0,0 +1,16 @@
+# src/test/modules/corruption/Makefile
+
+# EXTRA_INSTALL = contrib/pg_amcheck
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/corruption
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/corruption/t/001_corruption.pl b/src/test/modules/corruption/t/001_corruption.pl
new file mode 100644
index 0000000000..ae4a262e06
--- /dev/null
+++ b/src/test/modules/corruption/t/001_corruption.pl
@@ -0,0 +1,83 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 10;
+use PostgresNode;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create something non-trivial for the first snapshot
+$node->safe_psql('postgres', qq(
+create table t1 (id integer, short_text text, long_text text);
+insert into t1 (id, short_text, long_text)
+ (select gs, 'foo', repeat('x', gs)
+ from generate_series(1,10000) gs);
+create unique index idx1 on t1 (id, short_text);
+vacuum freeze;
+));
+
+# Flush relation files to disk and take snapshot of them
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap1', 'public.t1');
+
+# Update data in the table, toast table, and index
+$node->safe_psql('postgres', qq(
+update t1 set
+ short_text = 'bar',
+ long_text = repeat('y', id);
+));
+
+# Flush relation files to disk and take second snapshot
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap2', 'public.t1');
+
+# Revert the first page of t1 using a torn snapshot. This should be a partial
+# and corrupt reverting of the update.
+$node->stop;
+$node->revert_to_torn_relfile_snapshot('snap1', 8192);
+
+# Restart the node and count the number of rows in t1 with the original
+# (pre-update) values. It should not be zero, but nor will it be the full
+# 10000.
+$node->start;
+my ($old, $new, $oldtoast, $newtoast) = counts();
+ok($old > 0 && $old < 10000, "Torn snapshot reverts some of the main updates");
+ok($new > 0 && $new <= 10000, "Torn snapshot retains some of the main updates");
+
+# Revert t1 fully to the first snapshot. This should fully restore the
+# original (pre-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap1');
+
+# Restart the node and verify only old values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 10000, "Full snapshot restores all the old main values");
+is($oldtoast, 10000, "Full snapshot restores all the old toast values");
+is($new, 0, "Full snapshot reverts all the new main values");
+is($newtoast, 0, "Full snapshot reverts all the new toast values");
+
+# Restore t1 fully to the second snapshot. This should fully restore the
+# new (post-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap2');
+
+# Restart the node and verify only new values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 0, "Full snapshot reverts all the old main values");
+is($oldtoast, 0, "Full snapshot reverts all the old toast values");
+is($new, 10000, "Full snapshot restores all the new main values");
+is($newtoast, 10000, "Full snapshot restores all the new toast values");
+
+sub counts {
+ return map {
+ $node->safe_psql('postgres', qq(select count(*) from t1 where $_))
+ } ("short_text = 'foo'",
+ "short_text = 'bar'",
+ "long_text ~ 'x'",
+ "long_text ~ 'y'");
+}
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..d470af93c5 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2225,6 +2225,267 @@ sub pg_recvlogical_upto
=back
+=head1 DATABASE CORRUPTION METHODS
+
+=over
+
+=item $node->relfile_snapshot_repository()
+
+The path to the parent directory of all directories storing snapshots of
+relation backing files.
+
+=cut
+
+sub relfile_snapshot_repository
+{
+ my ($self) = @_;
+ my $snaprepo = join('/', $self->basedir, 'snapshot');
+ unless (-d $snaprepo)
+ {
+ mkdir $snaprepo
+ or $!{EEXIST}
+ or BAIL_OUT("could not create snapshot repository directory \"$snaprepo\": $!");
+ }
+ return $snaprepo;
+}
+
+=pod
+
+=item $node->relfile_snapshot_directory(snapname)
+
+The path to the directory for storing the named snapshot.
+
+=cut
+
+sub relfile_snapshot_directory
+{
+ my ($self, $snapname) = @_;
+
+ join("/", $self->relfile_snapshot_repository(), $snapname);
+}
+
+=pod
+
+=item $node->take_relfile_snapshot($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relname>, the associated
+toast relations (if any), and all associated indexes (if any). No attempt is
+made to flush these files to disk, meaning the snapshot taken could be stale
+unless the caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relations.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+=pod
+
+=item $node->take_relfile_snapshot_minimal($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relnames>. No attempt is made
+to flush these files to disk, meaning the snapshot taken could be stale unless the
+caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relation.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+sub take_relfile_snapshot
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 1, @relnames);
+}
+
+sub take_relfile_snapshot_minimal
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 0, @relnames);
+}
+
+sub take_relfile_snapshot_helper
+{
+ my ($self, $dbname, $snapname, $extended, @relnames) = @_;
+
+ croak "dbname must be specified" unless defined $dbname;
+ croak "relnames must be defined" unless scalar(grep { defined $_ } @relnames);
+ croak "snapname must be specified" unless defined $snapname;
+ croak "snapname must be unique" if exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snapdir = $self->relfile_snapshot_directory($snapname);
+ croak "snapname directory name already in use: $snapdir" if (-e $snapdir);
+ mkdir $snapdir
+ or BAIL_OUT("could not create snapshot directory \"$snapdir\": $!");
+
+ my @relpaths = map {
+ $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$_')));
+ } @relnames;
+
+ my (@toastpaths, @idxpaths);
+ if ($extended)
+ {
+ for my $relname (@relnames)
+ {
+ push (@toastpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(c.reltoastrelid)
+ FROM pg_catalog.pg_class c
+ WHERE c.oid = '$relname'::regclass
+ AND c.reltoastrelid != 0::oid))));
+ push (@idxpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(i.indexrelid)
+ FROM pg_catalog.pg_index i
+ WHERE i.indrelid = '$relname'::regclass))));
+ }
+ }
+
+ $self->{snapshot}->{$snapname} = {};
+ for my $path (@relpaths, grep { defined($_) } @toastpaths, @idxpaths)
+ {
+ croak "file backing relation is missing: $pgdata/$path" unless -f "$pgdata/$path";
+ copy_file($snapdir, $pgdata, 0, $path);
+ $self->{snapshot}->{$snapname}->{$path} = 1;
+ }
+}
+
+=pod
+
+=item $node->revert_to_snapshot($self, $snapname)
+
+Overwrites the database's relation files with files previously saved in
+B<$snapname>.
+
+Dies if the given B<$snapname> does not exist.
+
+=cut
+
+=pod
+
+=item $node->revert_to_torn_relfile_snapshot($self, $snapname, $bytes)
+
+Partially overwrites the database's relation files using prefixes of the given
+number of bytes from the files saved in B<$snapname>. If B<$bytes> is
+negative, uses suffixes of the given byte length rather than prefixes.
+
+If B<$bytes> is null, replaces the database's relation files using the saved
+files in the B<$snapname>, which unlike for non-undef values, means the file
+may become shorter if the saved file is shorter than the current file.
+
+=cut
+
+sub revert_to_snapshot
+{
+ my ($self, $snapname) = @_;
+ $self->revert_to_torn_relfile_snapshot($snapname, undef);
+}
+
+sub revert_to_torn_relfile_snapshot
+{
+ my ($self, $snapname, $bytes) = @_;
+
+ croak "no such snapshot" unless exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snaprepo = join('/', $self->relfile_snapshot_repository, $snapname);
+ croak "snapname directory missing: $snaprepo" unless (-d $snaprepo);
+
+ if (defined $bytes)
+ {
+ tear_file($pgdata, $snaprepo, $bytes, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+ else
+ {
+ copy_file($pgdata, $snaprepo, 1, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+}
+
+sub copy_file
+{
+ my ($dstdir, $srcdir, $overwrite, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ foreach my $part (split(m{/}, $path))
+ {
+ my $srcpart = "$srcdir/$part";
+ my $dstpart = "$dstdir/$part";
+
+ if (-d $srcpart)
+ {
+ $srcdir = $srcpart;
+ $dstdir = $dstpart;
+ die "$dstdir is in the way" if (-e $dstdir && ! -d $dstdir);
+ unless (-d $dstdir)
+ {
+ mkdir $dstdir
+ or BAIL_OUT("could not create directory \"$dstdir\": $!");
+ }
+ }
+ elsif (-f $srcpart)
+ {
+ die "$dstdir/$part is in the way" if (!$overwrite && -e "$dstdir/$part");
+
+ File::Copy::copy($srcpart, "$dstdir/$part");
+ }
+ }
+}
+
+sub tear_file
+{
+ my ($dstdir, $srcdir, $bytes, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ my $srcfile = "$srcdir/$path";
+ my $dstfile = "$dstdir/$path";
+
+ croak "No such file: $srcfile" unless -f $srcfile;
+ croak "No such file: $dstfile" unless -f $dstfile;
+
+ my ($srcfh, $dstfh);
+ open($srcfh, '<', $srcfile) or die "Cannot read $srcfile: $!";
+ open($dstfh, '+<', $dstfile) or die "Cannot modify $dstfile: $!";
+ binmode($srcfh);
+ binmode($dstfh);
+
+ my $buffer;
+ if ($bytes < 0)
+ {
+ $bytes *= -1; # Easier to use positive value
+ my $srcsize = (stat($srcfh))[7];
+ my $offset = $srcsize - $bytes;
+ seek($srcfh, $offset, 0);
+ seek($dstfh, $offset, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+ else
+ {
+ seek($srcfh, 0, 0);
+ seek($dstfh, 0, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+
+ close($srcfh);
+ close($dstfh);
+}
+
+=pod
+
+=back
+
=cut
1;
--
2.21.1 (Apple Git-122.3)
On Tue, Feb 2, 2021 at 6:10 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
0001 -- no changes
Committed.
0002 -- fixing omissions in @pgfeutilsfiles in file src/tools/msvc/Mkvcbuild.pm
Here are a few minor cosmetic issues with this patch:
- connect_utils.c lacks a file header comment.
- Some or perhaps all of the other file header comments need an update for 2021.
- There's bogus hunks in the diff for string_utils.c.
I think the rest of this looks good. I spent a long time puzzling over
whether consumeQueryResult() and processQueryResult() needed to be
moved, but then I realized that this patch actually makes them into
static functions inside parallel_slot.c, rather than public functions
as they were before. I like that. The only reason those functions need
to be moved at all is so that the scripts_parallel/parallel_slot stuff
can continue to do its thing, so this is actually a better way of
grouping things together than what we have now.
0003 -- no changes
I think it would be better if there were no handler by default, and
failing to set one leads to an assertion failure when we get to the
point where one would be called.
I don't think I understand the point of renaming processQueryResult
and consumeQueryResult. Isn't that just code churn for its own sake?
PGresultHandler seems too generic. How about ParallelSlotHandler or
ParallelSlotResultHandler?
I'm somewhat inclined to propose s/ParallelSlot/ConnectionSlot/g but I
guess it's better not to get sucked into renaming things.
It's a little strange that we end up with mutators to set the slot's
handler and handler context when we elsewhere feel free to monkey with
a slot's connection directly, but it's not a perfect world and I can't
think of anything I'd like better.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Feb 3, 2021, at 2:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Feb 2, 2021 at 6:10 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
0001 -- no changes
Committed.
Thanks!
0002 -- fixing omissions in @pgfeutilsfiles in file src/tools/msvc/Mkvcbuild.pm
Numbered 0001 in this next patch set.
Here are a few minor cosmetic issues with this patch:
- connect_utils.c lacks a file header comment.
Fixed
- Some or perhaps all of the other file header comments need an update for 2021.
Fixed.
- There's bogus hunks in the diff for string_utils.c.
Removed.
I think the rest of this looks good. I spent a long time puzzling over
whether consumeQueryResult() and processQueryResult() needed to be
moved, but then I realized that this patch actually makes them into
static functions inside parallel_slot.c, rather than public functions
as they were before. I like that. The only reason those functions need
to be moved at all is so that the scripts_parallel/parallel_slot stuff
can continue to do its thing, so this is actually a better way of
grouping things together than what we have now.
0003 -- no changes
Numbered 0002 in this next patch set.
I think it would be better if there were no handler by default, and
failing to set one leads to an assertion failure when we get to the
point where one would be called.
Changed to have no default handler, and to use Assert(PointerIsValid(handler)) as you suggest.
I don't think I understand the point of renaming processQueryResult
and consumeQueryResult. Isn't that just code churn for its own sake?
I didn't like the names. I had to constantly look back where they were defined to remember which of them processed/consumed all the results and which only processed/consumed one of them. Part of that problem was that their names are both singular. I have restored the names in this next patch set.
PGresultHandler seems too generic. How about ParallelSlotHandler or
ParallelSlotResultHandler?
ParallelSlotResultHandler works for me. I'm using that, and renaming s/TableCommandSlotHandler/TableCommandResultHandler/ to be consistent.
I'm somewhat inclined to propose s/ParallelSlot/ConnectionSlot/g but I
guess it's better not to get sucked into renaming things.
I admit that I lost a fair amount of time on this project because I thought "scripts_parallel.c" and "parallel_slot" referred to some kind of threading, but only later looked closely enough to see that this is an event loop, not a parallel threading system. I don't think "slot" is terribly informative, and if we rename I don't think it needs to be part of the name we choose. ConnectionEventLoop would be more intuitive to me than either of ParallelSlot/ConnectionSlot, but this seems like bikeshedding so I'm going to ignore it for now.
It's a little strange that we end up with mutators to set the slot's
handler and handler context when we elsewhere feel free to monkey with
a slot's connection directly, but it's not a perfect world and I can't
think of anything I'd like better.
I created those mutators in an earlier version of the patch where the slot had a few more fields to set, and it helped to have a single function call set all the fields. I agree it looks less nice now that there are only two fields to set.
I also made changes to clean up 0003 (formerly numbered 0004)
Attachments:
v37-0001-Moving-code-from-src-bin-scripts-to-fe_utils.patchapplication/octet-stream; name=v37-0001-Moving-code-from-src-bin-scripts-to-fe_utils.patch; x-unix-mode=0644Download
From c0a225ba3f77b9b972e4225858cecac6b723deab Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:30:53 -0800
Subject: [PATCH v37 1/4] Moving code from src/bin/scripts to fe_utils
To make code useable from contrib/pg_amcheck, moving
scripts_parallel.[ch] and handle_help_version_opts() into
fe_utils.
Moving supporting code from src/bin/scripts/common into fe_utils.
Updating applications in src/bin/scripts with the new location.
---
src/bin/scripts/Makefile | 6 +-
src/bin/scripts/clusterdb.c | 2 +
src/bin/scripts/common.c | 318 +-----------------
src/bin/scripts/common.h | 49 +--
src/bin/scripts/createdb.c | 1 +
src/bin/scripts/createuser.c | 1 +
src/bin/scripts/dropdb.c | 1 +
src/bin/scripts/dropuser.c | 1 +
src/bin/scripts/nls.mk | 2 +-
src/bin/scripts/pg_isready.c | 1 +
src/bin/scripts/reindexdb.c | 4 +-
src/bin/scripts/vacuumdb.c | 4 +-
src/fe_utils/Makefile | 4 +
src/fe_utils/connect_utils.c | 181 ++++++++++
src/fe_utils/option_utils.c | 38 +++
.../parallel_slot.c} | 63 +++-
src/fe_utils/query_utils.c | 92 +++++
src/include/fe_utils/connect_utils.h | 48 +++
src/include/fe_utils/option_utils.h | 23 ++
.../fe_utils/parallel_slot.h} | 13 +-
src/include/fe_utils/query_utils.h | 26 ++
src/tools/msvc/Mkvcbuild.pm | 4 +-
src/tools/pgindent/typedefs.list | 1 +
23 files changed, 498 insertions(+), 385 deletions(-)
create mode 100644 src/fe_utils/connect_utils.c
create mode 100644 src/fe_utils/option_utils.c
rename src/{bin/scripts/scripts_parallel.c => fe_utils/parallel_slot.c} (80%)
create mode 100644 src/fe_utils/query_utils.c
create mode 100644 src/include/fe_utils/connect_utils.h
create mode 100644 src/include/fe_utils/option_utils.h
rename src/{bin/scripts/scripts_parallel.h => include/fe_utils/parallel_slot.h} (82%)
create mode 100644 src/include/fe_utils/query_utils.h
diff --git a/src/bin/scripts/Makefile b/src/bin/scripts/Makefile
index a02e4e430c..b8d7cf2f2d 100644
--- a/src/bin/scripts/Makefile
+++ b/src/bin/scripts/Makefile
@@ -28,8 +28,8 @@ createuser: createuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport
dropdb: dropdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
dropuser: dropuser.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
clusterdb: clusterdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-vacuumdb: vacuumdb.o common.o scripts_parallel.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
-reindexdb: reindexdb.o common.o scripts_parallel.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+vacuumdb: vacuumdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
+reindexdb: reindexdb.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
pg_isready: pg_isready.o common.o $(WIN32RES) | submake-libpq submake-libpgport submake-libpgfeutils
install: all installdirs
@@ -50,7 +50,7 @@ uninstall:
clean distclean maintainer-clean:
rm -f $(addsuffix $(X), $(PROGRAMS)) $(addsuffix .o, $(PROGRAMS))
- rm -f common.o scripts_parallel.o $(WIN32RES)
+ rm -f common.o $(WIN32RES)
rm -rf tmp_check
check:
diff --git a/src/bin/scripts/clusterdb.c b/src/bin/scripts/clusterdb.c
index 7d25bb31d4..fc771eed77 100644
--- a/src/bin/scripts/clusterdb.c
+++ b/src/bin/scripts/clusterdb.c
@@ -13,6 +13,8 @@
#include "common.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/common.c b/src/bin/scripts/common.c
index 21ef297e6e..c86c19eae2 100644
--- a/src/bin/scripts/common.c
+++ b/src/bin/scripts/common.c
@@ -22,325 +22,9 @@
#include "common/logging.h"
#include "common/string.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/string_utils.h"
-#define ERRCODE_UNDEFINED_TABLE "42P01"
-
-/*
- * Provide strictly harmonized handling of --help and --version
- * options.
- */
-void
-handle_help_version_opts(int argc, char *argv[],
- const char *fixed_progname, help_handler hlp)
-{
- if (argc > 1)
- {
- if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
- {
- hlp(get_progname(argv[0]));
- exit(0);
- }
- if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
- {
- printf("%s (PostgreSQL) " PG_VERSION "\n", fixed_progname);
- exit(0);
- }
- }
-}
-
-
-/*
- * Make a database connection with the given parameters.
- *
- * An interactive password prompt is automatically issued if needed and
- * allowed by cparams->prompt_password.
- *
- * If allow_password_reuse is true, we will try to re-use any password
- * given during previous calls to this routine. (Callers should not pass
- * allow_password_reuse=true unless reconnecting to the same database+user
- * as before, else we might create password exposure hazards.)
- */
-PGconn *
-connectDatabase(const ConnParams *cparams, const char *progname,
- bool echo, bool fail_ok, bool allow_password_reuse)
-{
- PGconn *conn;
- bool new_pass;
- static char *password = NULL;
-
- /* Callers must supply at least dbname; other params can be NULL */
- Assert(cparams->dbname);
-
- if (!allow_password_reuse && password)
- {
- free(password);
- password = NULL;
- }
-
- if (cparams->prompt_password == TRI_YES && password == NULL)
- password = simple_prompt("Password: ", false);
-
- /*
- * Start the connection. Loop until we have a password if requested by
- * backend.
- */
- do
- {
- const char *keywords[8];
- const char *values[8];
- int i = 0;
-
- /*
- * If dbname is a connstring, its entries can override the other
- * values obtained from cparams; but in turn, override_dbname can
- * override the dbname component of it.
- */
- keywords[i] = "host";
- values[i++] = cparams->pghost;
- keywords[i] = "port";
- values[i++] = cparams->pgport;
- keywords[i] = "user";
- values[i++] = cparams->pguser;
- keywords[i] = "password";
- values[i++] = password;
- keywords[i] = "dbname";
- values[i++] = cparams->dbname;
- if (cparams->override_dbname)
- {
- keywords[i] = "dbname";
- values[i++] = cparams->override_dbname;
- }
- keywords[i] = "fallback_application_name";
- values[i++] = progname;
- keywords[i] = NULL;
- values[i++] = NULL;
- Assert(i <= lengthof(keywords));
-
- new_pass = false;
- conn = PQconnectdbParams(keywords, values, true);
-
- if (!conn)
- {
- pg_log_error("could not connect to database %s: out of memory",
- cparams->dbname);
- exit(1);
- }
-
- /*
- * No luck? Trying asking (again) for a password.
- */
- if (PQstatus(conn) == CONNECTION_BAD &&
- PQconnectionNeedsPassword(conn) &&
- cparams->prompt_password != TRI_NO)
- {
- PQfinish(conn);
- if (password)
- free(password);
- password = simple_prompt("Password: ", false);
- new_pass = true;
- }
- } while (new_pass);
-
- /* check to see that the backend connection was successfully made */
- if (PQstatus(conn) == CONNECTION_BAD)
- {
- if (fail_ok)
- {
- PQfinish(conn);
- return NULL;
- }
- pg_log_error("%s", PQerrorMessage(conn));
- exit(1);
- }
-
- /* Start strict; callers may override this. */
- PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
-
- return conn;
-}
-
-/*
- * Try to connect to the appropriate maintenance database.
- *
- * This differs from connectDatabase only in that it has a rule for
- * inserting a default "dbname" if none was given (which is why cparams
- * is not const). Note that cparams->dbname should typically come from
- * a --maintenance-db command line parameter.
- */
-PGconn *
-connectMaintenanceDatabase(ConnParams *cparams,
- const char *progname, bool echo)
-{
- PGconn *conn;
-
- /* If a maintenance database name was specified, just connect to it. */
- if (cparams->dbname)
- return connectDatabase(cparams, progname, echo, false, false);
-
- /* Otherwise, try postgres first and then template1. */
- cparams->dbname = "postgres";
- conn = connectDatabase(cparams, progname, echo, true, false);
- if (!conn)
- {
- cparams->dbname = "template1";
- conn = connectDatabase(cparams, progname, echo, false, false);
- }
- return conn;
-}
-
-/*
- * Disconnect the given connection, canceling any statement if one is active.
- */
-void
-disconnectDatabase(PGconn *conn)
-{
- char errbuf[256];
-
- Assert(conn != NULL);
-
- if (PQtransactionStatus(conn) == PQTRANS_ACTIVE)
- {
- PGcancel *cancel;
-
- if ((cancel = PQgetCancel(conn)))
- {
- (void) PQcancel(cancel, errbuf, sizeof(errbuf));
- PQfreeCancel(cancel);
- }
- }
-
- PQfinish(conn);
-}
-
-/*
- * Run a query, return the results, exit program on failure.
- */
-PGresult *
-executeQuery(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
-
- if (echo)
- printf("%s\n", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_TUPLES_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_info("query was: %s", query);
- PQfinish(conn);
- exit(1);
- }
-
- return res;
-}
-
-
-/*
- * As above for a SQL command (which returns nothing).
- */
-void
-executeCommand(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
-
- if (echo)
- printf("%s\n", query);
-
- res = PQexec(conn, query);
- if (!res ||
- PQresultStatus(res) != PGRES_COMMAND_OK)
- {
- pg_log_error("query failed: %s", PQerrorMessage(conn));
- pg_log_info("query was: %s", query);
- PQfinish(conn);
- exit(1);
- }
-
- PQclear(res);
-}
-
-
-/*
- * As above for a SQL maintenance command (returns command success).
- * Command is executed with a cancel handler set, so Ctrl-C can
- * interrupt it.
- */
-bool
-executeMaintenanceCommand(PGconn *conn, const char *query, bool echo)
-{
- PGresult *res;
- bool r;
-
- if (echo)
- printf("%s\n", query);
-
- SetCancelConn(conn);
- res = PQexec(conn, query);
- ResetCancelConn();
-
- r = (res && PQresultStatus(res) == PGRES_COMMAND_OK);
-
- if (res)
- PQclear(res);
-
- return r;
-}
-
-/*
- * Consume all the results generated for the given connection until
- * nothing remains. If at least one error is encountered, return false.
- * Note that this will block if the connection is busy.
- */
-bool
-consumeQueryResult(PGconn *conn)
-{
- bool ok = true;
- PGresult *result;
-
- SetCancelConn(conn);
- while ((result = PQgetResult(conn)) != NULL)
- {
- if (!processQueryResult(conn, result))
- ok = false;
- }
- ResetCancelConn();
- return ok;
-}
-
-/*
- * Process (and delete) a query result. Returns true if there's no error,
- * false otherwise -- but errors about trying to work on a missing relation
- * are reported and subsequently ignored.
- */
-bool
-processQueryResult(PGconn *conn, PGresult *result)
-{
- /*
- * If it's an error, report it. Errors about a missing table are harmless
- * so we continue processing; but die for other errors.
- */
- if (PQresultStatus(result) != PGRES_COMMAND_OK)
- {
- char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
-
- pg_log_error("processing of database \"%s\" failed: %s",
- PQdb(conn), PQerrorMessage(conn));
-
- if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
- {
- PQclear(result);
- return false;
- }
- }
-
- PQclear(result);
- return true;
-}
-
-
/*
* Split TABLE[(COLUMNS)] into TABLE and [(COLUMNS)] portions. When you
* finish using them, pg_free(*table). *columns is a pointer into "spec",
diff --git a/src/bin/scripts/common.h b/src/bin/scripts/common.h
index 5630975712..ddd8f35274 100644
--- a/src/bin/scripts/common.h
+++ b/src/bin/scripts/common.h
@@ -10,58 +10,11 @@
#define COMMON_H
#include "common/username.h"
+#include "fe_utils/connect_utils.h"
#include "getopt_long.h" /* pgrminclude ignore */
#include "libpq-fe.h"
#include "pqexpbuffer.h" /* pgrminclude ignore */
-enum trivalue
-{
- TRI_DEFAULT,
- TRI_NO,
- TRI_YES
-};
-
-/* Parameters needed by connectDatabase/connectMaintenanceDatabase */
-typedef struct _connParams
-{
- /* These fields record the actual command line parameters */
- const char *dbname; /* this may be a connstring! */
- const char *pghost;
- const char *pgport;
- const char *pguser;
- enum trivalue prompt_password;
- /* If not NULL, this overrides the dbname obtained from command line */
- /* (but *only* the DB name, not anything else in the connstring) */
- const char *override_dbname;
-} ConnParams;
-
-typedef void (*help_handler) (const char *progname);
-
-extern void handle_help_version_opts(int argc, char *argv[],
- const char *fixed_progname,
- help_handler hlp);
-
-extern PGconn *connectDatabase(const ConnParams *cparams,
- const char *progname,
- bool echo, bool fail_ok,
- bool allow_password_reuse);
-
-extern PGconn *connectMaintenanceDatabase(ConnParams *cparams,
- const char *progname, bool echo);
-
-extern void disconnectDatabase(PGconn *conn);
-
-extern PGresult *executeQuery(PGconn *conn, const char *query, bool echo);
-
-extern void executeCommand(PGconn *conn, const char *query, bool echo);
-
-extern bool executeMaintenanceCommand(PGconn *conn, const char *query,
- bool echo);
-
-extern bool consumeQueryResult(PGconn *conn);
-
-extern bool processQueryResult(PGconn *conn, PGresult *result);
-
extern void splitTableColumnsSpec(const char *spec, int encoding,
char **table, const char **columns);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index abf21d4942..041454f075 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -13,6 +13,7 @@
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index 47b0e28bc6..ef7e0e549f 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -14,6 +14,7 @@
#include "common.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/dropdb.c b/src/bin/scripts/dropdb.c
index ba0dcdecb9..b154ed1bb6 100644
--- a/src/bin/scripts/dropdb.c
+++ b/src/bin/scripts/dropdb.c
@@ -13,6 +13,7 @@
#include "postgres_fe.h"
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/dropuser.c b/src/bin/scripts/dropuser.c
index ff5b455ae5..61b8557bc7 100644
--- a/src/bin/scripts/dropuser.c
+++ b/src/bin/scripts/dropuser.c
@@ -14,6 +14,7 @@
#include "common.h"
#include "common/logging.h"
#include "common/string.h"
+#include "fe_utils/option_utils.h"
#include "fe_utils/string_utils.h"
diff --git a/src/bin/scripts/nls.mk b/src/bin/scripts/nls.mk
index 5d5dd11b7b..7fc716092e 100644
--- a/src/bin/scripts/nls.mk
+++ b/src/bin/scripts/nls.mk
@@ -7,7 +7,7 @@ GETTEXT_FILES = $(FRONTEND_COMMON_GETTEXT_FILES) \
clusterdb.c vacuumdb.c reindexdb.c \
pg_isready.c \
common.c \
- scripts_parallel.c \
+ ../../fe_utils/parallel_slot.c \
../../fe_utils/cancel.c ../../fe_utils/print.c \
../../common/fe_memutils.c ../../common/username.c
GETTEXT_TRIGGERS = $(FRONTEND_COMMON_GETTEXT_TRIGGERS) simple_prompt yesno_prompt
diff --git a/src/bin/scripts/pg_isready.c b/src/bin/scripts/pg_isready.c
index ceb8a09b4c..fc6f7b0a93 100644
--- a/src/bin/scripts/pg_isready.c
+++ b/src/bin/scripts/pg_isready.c
@@ -12,6 +12,7 @@
#include "postgres_fe.h"
#include "common.h"
#include "common/logging.h"
+#include "fe_utils/option_utils.h"
#define DEFAULT_CONNECT_TIMEOUT "3"
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index dece8200fa..7781fb1151 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -16,9 +16,11 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
-#include "scripts_parallel.h"
typedef enum ReindexType
{
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 8246327770..ed320817bc 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -18,9 +18,11 @@
#include "common/connect.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
#include "fe_utils/simple_list.h"
#include "fe_utils/string_utils.h"
-#include "scripts_parallel.h"
/* vacuum options controlled by user flags */
diff --git a/src/fe_utils/Makefile b/src/fe_utils/Makefile
index 10d6838cf9..456c441a33 100644
--- a/src/fe_utils/Makefile
+++ b/src/fe_utils/Makefile
@@ -23,9 +23,13 @@ OBJS = \
archive.o \
cancel.o \
conditional.o \
+ connect_utils.o \
mbprint.o \
+ option_utils.o \
+ parallel_slot.o \
print.o \
psqlscan.o \
+ query_utils.o \
recovery_gen.o \
simple_list.o \
string_utils.o
diff --git a/src/fe_utils/connect_utils.c b/src/fe_utils/connect_utils.c
new file mode 100644
index 0000000000..96bb798316
--- /dev/null
+++ b/src/fe_utils/connect_utils.c
@@ -0,0 +1,181 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to connect to and disconnect from databases.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/connect_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "common/connect.h"
+#include "common/logging.h"
+#include "common/string.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * Make a database connection with the given parameters.
+ *
+ * An interactive password prompt is automatically issued if needed and
+ * allowed by cparams->prompt_password.
+ *
+ * If allow_password_reuse is true, we will try to re-use any password
+ * given during previous calls to this routine. (Callers should not pass
+ * allow_password_reuse=true unless reconnecting to the same database+user
+ * as before, else we might create password exposure hazards.)
+ */
+PGconn *
+connectDatabase(const ConnParams *cparams, const char *progname,
+ bool echo, bool fail_ok, bool allow_password_reuse)
+{
+ PGconn *conn;
+ bool new_pass;
+ static char *password = NULL;
+
+ /* Callers must supply at least dbname; other params can be NULL */
+ Assert(cparams->dbname);
+
+ if (!allow_password_reuse && password)
+ {
+ free(password);
+ password = NULL;
+ }
+
+ if (cparams->prompt_password == TRI_YES && password == NULL)
+ password = simple_prompt("Password: ", false);
+
+ /*
+ * Start the connection. Loop until we have a password if requested by
+ * backend.
+ */
+ do
+ {
+ const char *keywords[8];
+ const char *values[8];
+ int i = 0;
+
+ /*
+ * If dbname is a connstring, its entries can override the other
+ * values obtained from cparams; but in turn, override_dbname can
+ * override the dbname component of it.
+ */
+ keywords[i] = "host";
+ values[i++] = cparams->pghost;
+ keywords[i] = "port";
+ values[i++] = cparams->pgport;
+ keywords[i] = "user";
+ values[i++] = cparams->pguser;
+ keywords[i] = "password";
+ values[i++] = password;
+ keywords[i] = "dbname";
+ values[i++] = cparams->dbname;
+ if (cparams->override_dbname)
+ {
+ keywords[i] = "dbname";
+ values[i++] = cparams->override_dbname;
+ }
+ keywords[i] = "fallback_application_name";
+ values[i++] = progname;
+ keywords[i] = NULL;
+ values[i++] = NULL;
+ Assert(i <= lengthof(keywords));
+
+ new_pass = false;
+ conn = PQconnectdbParams(keywords, values, true);
+
+ if (!conn)
+ {
+ pg_log_error("could not connect to database %s: out of memory",
+ cparams->dbname);
+ exit(1);
+ }
+
+ /*
+ * No luck? Trying asking (again) for a password.
+ */
+ if (PQstatus(conn) == CONNECTION_BAD &&
+ PQconnectionNeedsPassword(conn) &&
+ cparams->prompt_password != TRI_NO)
+ {
+ PQfinish(conn);
+ if (password)
+ free(password);
+ password = simple_prompt("Password: ", false);
+ new_pass = true;
+ }
+ } while (new_pass);
+
+ /* check to see that the backend connection was successfully made */
+ if (PQstatus(conn) == CONNECTION_BAD)
+ {
+ if (fail_ok)
+ {
+ PQfinish(conn);
+ return NULL;
+ }
+ pg_log_error("%s", PQerrorMessage(conn));
+ exit(1);
+ }
+
+ /* Start strict; callers may override this. */
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, echo));
+
+ return conn;
+}
+
+/*
+ * Try to connect to the appropriate maintenance database.
+ *
+ * This differs from connectDatabase only in that it has a rule for
+ * inserting a default "dbname" if none was given (which is why cparams
+ * is not const). Note that cparams->dbname should typically come from
+ * a --maintenance-db command line parameter.
+ */
+PGconn *
+connectMaintenanceDatabase(ConnParams *cparams,
+ const char *progname, bool echo)
+{
+ PGconn *conn;
+
+ /* If a maintenance database name was specified, just connect to it. */
+ if (cparams->dbname)
+ return connectDatabase(cparams, progname, echo, false, false);
+
+ /* Otherwise, try postgres first and then template1. */
+ cparams->dbname = "postgres";
+ conn = connectDatabase(cparams, progname, echo, true, false);
+ if (!conn)
+ {
+ cparams->dbname = "template1";
+ conn = connectDatabase(cparams, progname, echo, false, false);
+ }
+ return conn;
+}
+
+/*
+ * Disconnect the given connection, canceling any statement if one is active.
+ */
+void
+disconnectDatabase(PGconn *conn)
+{
+ char errbuf[256];
+
+ Assert(conn != NULL);
+
+ if (PQtransactionStatus(conn) == PQTRANS_ACTIVE)
+ {
+ PGcancel *cancel;
+
+ if ((cancel = PQgetCancel(conn)))
+ {
+ (void) PQcancel(cancel, errbuf, sizeof(errbuf));
+ PQfreeCancel(cancel);
+ }
+ }
+
+ PQfinish(conn);
+}
diff --git a/src/fe_utils/option_utils.c b/src/fe_utils/option_utils.c
new file mode 100644
index 0000000000..e19a495dba
--- /dev/null
+++ b/src/fe_utils/option_utils.c
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command line option processing facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/option_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#include "postgres_fe.h"
+
+#include "fe_utils/option_utils.h"
+
+/*
+ * Provide strictly harmonized handling of --help and --version
+ * options.
+ */
+void
+handle_help_version_opts(int argc, char *argv[],
+ const char *fixed_progname, help_handler hlp)
+{
+ if (argc > 1)
+ {
+ if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
+ {
+ hlp(get_progname(argv[0]));
+ exit(0);
+ }
+ if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
+ {
+ printf("%s (PostgreSQL) " PG_VERSION "\n", fixed_progname);
+ exit(0);
+ }
+ }
+}
diff --git a/src/bin/scripts/scripts_parallel.c b/src/fe_utils/parallel_slot.c
similarity index 80%
rename from src/bin/scripts/scripts_parallel.c
rename to src/fe_utils/parallel_slot.c
index 1f863a1bb4..3987a4702b 100644
--- a/src/bin/scripts/scripts_parallel.c
+++ b/src/fe_utils/parallel_slot.c
@@ -1,13 +1,13 @@
/*-------------------------------------------------------------------------
*
- * scripts_parallel.c
- * Parallel support for bin/scripts/
+ * parallel_slot.c
+ * Parallel support for front-end parallel database connections
*
*
* Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
- * src/bin/scripts/scripts_parallel.c
+ * src/fe_utils/parallel_slot.c
*
*-------------------------------------------------------------------------
*/
@@ -22,13 +22,15 @@
#include <sys/select.h>
#endif
-#include "common.h"
#include "common/logging.h"
#include "fe_utils/cancel.h"
-#include "scripts_parallel.h"
+#include "fe_utils/parallel_slot.h"
+
+#define ERRCODE_UNDEFINED_TABLE "42P01"
static void init_slot(ParallelSlot *slot, PGconn *conn);
static int select_loop(int maxFd, fd_set *workerset);
+static bool processQueryResult(PGconn *conn, PGresult *result);
static void
init_slot(ParallelSlot *slot, PGconn *conn)
@@ -38,6 +40,57 @@ init_slot(ParallelSlot *slot, PGconn *conn)
slot->isFree = true;
}
+/*
+ * Process (and delete) a query result. Returns true if there's no error,
+ * false otherwise -- but errors about trying to work on a missing relation
+ * are reported and subsequently ignored.
+ */
+static bool
+processQueryResult(PGconn *conn, PGresult *result)
+{
+ /*
+ * If it's an error, report it. Errors about a missing table are harmless
+ * so we continue processing; but die for other errors.
+ */
+ if (PQresultStatus(result) != PGRES_COMMAND_OK)
+ {
+ char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+
+ pg_log_error("processing of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
+
+ if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
+ {
+ PQclear(result);
+ return false;
+ }
+ }
+
+ PQclear(result);
+ return true;
+}
+
+/*
+ * Consume all the results generated for the given connection until
+ * nothing remains. If at least one error is encountered, return false.
+ * Note that this will block if the connection is busy.
+ */
+static bool
+consumeQueryResult(PGconn *conn)
+{
+ bool ok = true;
+ PGresult *result;
+
+ SetCancelConn(conn);
+ while ((result = PQgetResult(conn)) != NULL)
+ {
+ if (!processQueryResult(conn, result))
+ ok = false;
+ }
+ ResetCancelConn();
+ return ok;
+}
+
/*
* Wait until a file descriptor from the given set becomes readable.
*
diff --git a/src/fe_utils/query_utils.c b/src/fe_utils/query_utils.c
new file mode 100644
index 0000000000..d5ffe56fd6
--- /dev/null
+++ b/src/fe_utils/query_utils.c
@@ -0,0 +1,92 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to query a databases.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/fe_utils/query_utils.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "common/logging.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/query_utils.h"
+
+/*
+ * Run a query, return the results, exit program on failure.
+ */
+PGresult *
+executeQuery(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+
+ if (echo)
+ printf("%s\n", query);
+
+ res = PQexec(conn, query);
+ if (!res ||
+ PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_info("query was: %s", query);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ return res;
+}
+
+
+/*
+ * As above for a SQL command (which returns nothing).
+ */
+void
+executeCommand(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+
+ if (echo)
+ printf("%s\n", query);
+
+ res = PQexec(conn, query);
+ if (!res ||
+ PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_info("query was: %s", query);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ PQclear(res);
+}
+
+
+/*
+ * As above for a SQL maintenance command (returns command success).
+ * Command is executed with a cancel handler set, so Ctrl-C can
+ * interrupt it.
+ */
+bool
+executeMaintenanceCommand(PGconn *conn, const char *query, bool echo)
+{
+ PGresult *res;
+ bool r;
+
+ if (echo)
+ printf("%s\n", query);
+
+ SetCancelConn(conn);
+ res = PQexec(conn, query);
+ ResetCancelConn();
+
+ r = (res && PQresultStatus(res) == PGRES_COMMAND_OK);
+
+ if (res)
+ PQclear(res);
+
+ return r;
+}
diff --git a/src/include/fe_utils/connect_utils.h b/src/include/fe_utils/connect_utils.h
new file mode 100644
index 0000000000..5048940509
--- /dev/null
+++ b/src/include/fe_utils/connect_utils.h
@@ -0,0 +1,48 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to connect to and disconnect from databases.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/connect_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef CONNECT_UTILS_H
+#define CONNECT_UTILS_H
+
+#include "libpq-fe.h"
+
+enum trivalue
+{
+ TRI_DEFAULT,
+ TRI_NO,
+ TRI_YES
+};
+
+/* Parameters needed by connectDatabase/connectMaintenanceDatabase */
+typedef struct _connParams
+{
+ /* These fields record the actual command line parameters */
+ const char *dbname; /* this may be a connstring! */
+ const char *pghost;
+ const char *pgport;
+ const char *pguser;
+ enum trivalue prompt_password;
+ /* If not NULL, this overrides the dbname obtained from command line */
+ /* (but *only* the DB name, not anything else in the connstring) */
+ const char *override_dbname;
+} ConnParams;
+
+extern PGconn *connectDatabase(const ConnParams *cparams,
+ const char *progname,
+ bool echo, bool fail_ok,
+ bool allow_password_reuse);
+
+extern PGconn *connectMaintenanceDatabase(ConnParams *cparams,
+ const char *progname, bool echo);
+
+extern void disconnectDatabase(PGconn *conn);
+
+#endif /* CONNECT_UTILS_H */
diff --git a/src/include/fe_utils/option_utils.h b/src/include/fe_utils/option_utils.h
new file mode 100644
index 0000000000..d653cb94e3
--- /dev/null
+++ b/src/include/fe_utils/option_utils.h
@@ -0,0 +1,23 @@
+/*-------------------------------------------------------------------------
+ *
+ * Command line option processing facilities for frontend code
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/option_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef OPTION_UTILS_H
+#define OPTION_UTILS_H
+
+#include "postgres_fe.h"
+
+typedef void (*help_handler) (const char *progname);
+
+extern void handle_help_version_opts(int argc, char *argv[],
+ const char *fixed_progname,
+ help_handler hlp);
+
+#endif /* OPTION_UTILS_H */
diff --git a/src/bin/scripts/scripts_parallel.h b/src/include/fe_utils/parallel_slot.h
similarity index 82%
rename from src/bin/scripts/scripts_parallel.h
rename to src/include/fe_utils/parallel_slot.h
index f62692510a..99eeb3328d 100644
--- a/src/bin/scripts/scripts_parallel.h
+++ b/src/include/fe_utils/parallel_slot.h
@@ -1,21 +1,20 @@
/*-------------------------------------------------------------------------
*
- * scripts_parallel.h
+ * parallel_slot.h
* Parallel support for bin/scripts/
*
* Copyright (c) 2003-2021, PostgreSQL Global Development Group
*
- * src/bin/scripts/scripts_parallel.h
+ * src/include/fe_utils/parallel_slot.h
*
*-------------------------------------------------------------------------
*/
-#ifndef SCRIPTS_PARALLEL_H
-#define SCRIPTS_PARALLEL_H
+#ifndef PARALLEL_SLOT_H
+#define PARALLEL_SLOT_H
-#include "common.h"
+#include "fe_utils/connect_utils.h"
#include "libpq-fe.h"
-
typedef struct ParallelSlot
{
PGconn *connection; /* One connection */
@@ -33,4 +32,4 @@ extern void ParallelSlotsTerminate(ParallelSlot *slots, int numslots);
extern bool ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots);
-#endif /* SCRIPTS_PARALLEL_H */
+#endif /* PARALLEL_SLOT_H */
diff --git a/src/include/fe_utils/query_utils.h b/src/include/fe_utils/query_utils.h
new file mode 100644
index 0000000000..1099260193
--- /dev/null
+++ b/src/include/fe_utils/query_utils.h
@@ -0,0 +1,26 @@
+/*-------------------------------------------------------------------------
+ *
+ * Facilities for frontend code to query a databases.
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/fe_utils/query_utils.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef QUERY_UTILS_H
+#define QUERY_UTILS_H
+
+#include "postgres_fe.h"
+
+#include "libpq-fe.h"
+
+extern PGresult *executeQuery(PGconn *conn, const char *query, bool echo);
+
+extern void executeCommand(PGconn *conn, const char *query, bool echo);
+
+extern bool executeMaintenanceCommand(PGconn *conn, const char *query,
+ bool echo);
+
+#endif /* QUERY_UTILS_H */
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 90328db04e..7be6e6c9e5 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -147,8 +147,8 @@ sub mkvcbuild
our @pgcommonbkndfiles = @pgcommonallfiles;
our @pgfeutilsfiles = qw(
- archive.c cancel.c conditional.c mbprint.c print.c psqlscan.l
- psqlscan.c simple_list.c string_utils.c recovery_gen.c);
+ archive.c cancel.c conditional.c connect_utils.c mbprint.c option_utils.c parallel_slot.c print.c psqlscan.l
+ psqlscan.c query_utils.c simple_list.c string_utils.c recovery_gen.c);
$libpgport = $solution->AddProject('libpgport', 'lib', 'misc');
$libpgport->AddDefine('FRONTEND');
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1d540fe489..4d0d09a5dd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -403,6 +403,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnParams
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
v37-0002-Parameterizing-parallel-slot-result-handling.patchapplication/octet-stream; name=v37-0002-Parameterizing-parallel-slot-result-handling.patch; x-unix-mode=0644Download
From de650441202f54710d5e84692e096200f8d053bb Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:35:56 -0800
Subject: [PATCH v37 2/4] Parameterizing parallel slot result handling
The function consumeQueryResult was being used to handle all results
returned by queries executed through the parallel slot interface,
but this hardcodes knowledge about the expectations of reindexdb and
vacuumdb such as the expected result status being PGRES_COMMAND_OK
(as opposed to, say, PGRES_TUPLES_OK).
Reworking the slot interface to optionally include a
ParallelSlotResultHandler and context variable per slot. The idea
is that a caller who executes a command or query through the slot
can set the handler to be called when the query completes with
necessary context information stored for by the handler when
processing the result.
The old logic of consumeQueryResults is moved into a new callback
function, TableCommandResultHandler(), which gets registered as the
slot handler explicitly from vacuumdb and reindexdb. This is
defined in fe_utils/parallel_slot.c rather than somewhere in
src/bin/scripts where its only callers reside, partly to keep it
close to the rest of the shared parallel slot handling code and
partly in anticipation that other utility programs will eventually
want to use it also.
The expectation of this commit is that pg_amcheck will have handlers
for table and index checks which will process the PGresults of calls
to the amcheck functions. This commit sets up the infrastructure
necessary to support those handlers being different from the one
used by vacuumdb and reindexdb.
---
src/bin/scripts/reindexdb.c | 1 +
src/bin/scripts/vacuumdb.c | 1 +
src/fe_utils/parallel_slot.c | 92 +++++++++++++++++++---------
src/include/fe_utils/parallel_slot.h | 29 +++++++++
4 files changed, 95 insertions(+), 28 deletions(-)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 7781fb1151..9f072ac49a 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -466,6 +466,7 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
goto finish;
}
+ ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
run_reindex_command(free_slot->connection, process_type, objname,
echo, verbose, concurrently, true);
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index ed320817bc..9dc8aca29f 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -713,6 +713,7 @@ vacuum_one_database(const ConnParams *cparams,
* Execute the vacuum. All errors are handled in processQueryResult
* through ParallelSlotsGetIdle.
*/
+ ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
run_vacuum_command(free_slot->connection, sql.data,
echo, tabname);
diff --git a/src/fe_utils/parallel_slot.c b/src/fe_utils/parallel_slot.c
index 3987a4702b..b75dc26a49 100644
--- a/src/fe_utils/parallel_slot.c
+++ b/src/fe_utils/parallel_slot.c
@@ -30,7 +30,7 @@
static void init_slot(ParallelSlot *slot, PGconn *conn);
static int select_loop(int maxFd, fd_set *workerset);
-static bool processQueryResult(PGconn *conn, PGresult *result);
+static bool processQueryResult(ParallelSlot *slot, PGresult *result);
static void
init_slot(ParallelSlot *slot, PGconn *conn)
@@ -38,53 +38,45 @@ init_slot(ParallelSlot *slot, PGconn *conn)
slot->connection = conn;
/* Initially assume connection is idle */
slot->isFree = true;
+ ParallelSlotClearHandler(slot);
}
/*
- * Process (and delete) a query result. Returns true if there's no error,
- * false otherwise -- but errors about trying to work on a missing relation
- * are reported and subsequently ignored.
+ * Invoke the slot's handler for a single query result, or fall back to the
+ * default handler if none is defined for the slot. Returns true if the
+ * handler reports that there's no error, false otherwise.
*/
static bool
-processQueryResult(PGconn *conn, PGresult *result)
+processQueryResult(ParallelSlot *slot, PGresult *result)
{
- /*
- * If it's an error, report it. Errors about a missing table are harmless
- * so we continue processing; but die for other errors.
- */
- if (PQresultStatus(result) != PGRES_COMMAND_OK)
- {
- char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE);
+ ParallelSlotResultHandler handler = slot->handler;
- pg_log_error("processing of database \"%s\" failed: %s",
- PQdb(conn), PQerrorMessage(conn));
+ Assert(PointerIsValid(handler));
- if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
- {
- PQclear(result);
- return false;
- }
- }
+ /* On failure, the handler should return NULL after freeing the result */
+ if (!handler(result, slot->connection, slot->handler_context))
+ return false;
+ /* Ok, we have to free it ourself */
PQclear(result);
return true;
}
/*
- * Consume all the results generated for the given connection until
+ * Handle all the results generated for the given connection until
* nothing remains. If at least one error is encountered, return false.
* Note that this will block if the connection is busy.
*/
static bool
-consumeQueryResult(PGconn *conn)
+consumeQueryResult(ParallelSlot *slot)
{
bool ok = true;
PGresult *result;
- SetCancelConn(conn);
- while ((result = PQgetResult(conn)) != NULL)
+ SetCancelConn(slot->connection);
+ while ((result = PQgetResult(slot->connection)) != NULL)
{
- if (!processQueryResult(conn, result))
+ if (!processQueryResult(slot, result))
ok = false;
}
ResetCancelConn();
@@ -227,14 +219,15 @@ ParallelSlotsGetIdle(ParallelSlot *slots, int numslots)
if (result != NULL)
{
- /* Check and discard the command result */
- if (!processQueryResult(slots[i].connection, result))
+ /* Handle and discard the command result */
+ if (!processQueryResult(slots + i, result))
return NULL;
}
else
{
/* This connection has become idle */
slots[i].isFree = true;
+ ParallelSlotClearHandler(slots + i);
if (firstFree < 0)
firstFree = i;
break;
@@ -329,9 +322,52 @@ ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots)
for (i = 0; i < numslots; i++)
{
- if (!consumeQueryResult((slots + i)->connection))
+ if (!consumeQueryResult(slots + i))
return false;
}
return true;
}
+
+/*
+ * TableCommandResultHandler
+ *
+ * ParallelSlotResultHandler for results of commands (not queries) against tables.
+ *
+ * Requires that the result status is either PGRES_COMMAND_OK or an error about
+ * a missing table. This is useful for utilities that compile a list of tables
+ * to process and then run commands (vacuum, reindex, or whatever) against
+ * those tables, as there is a race condition between the time the list is
+ * compiled and the time the command attempts to open the table.
+ *
+ * For missing tables, logs an error but allows processing to continue.
+ *
+ * For all other errors, logs an error and terminates further processing.
+ *
+ * res: PGresult from the query executed on the slot's connection
+ * conn: connection belonging to the slot
+ * context: unused
+ */
+PGresult *
+TableCommandResultHandler(PGresult *res, PGconn *conn, void *context)
+{
+ /*
+ * If it's an error, report it. Errors about a missing table are harmless
+ * so we continue processing; but die for other errors.
+ */
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {
+ char *sqlState = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+
+ pg_log_error("processing of database \"%s\" failed: %s",
+ PQdb(conn), PQerrorMessage(conn));
+
+ if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0)
+ {
+ PQclear(res);
+ return NULL;
+ }
+ }
+
+ return res;
+}
diff --git a/src/include/fe_utils/parallel_slot.h b/src/include/fe_utils/parallel_slot.h
index 99eeb3328d..6fe58d2a26 100644
--- a/src/include/fe_utils/parallel_slot.h
+++ b/src/include/fe_utils/parallel_slot.h
@@ -15,12 +15,39 @@
#include "fe_utils/connect_utils.h"
#include "libpq-fe.h"
+typedef PGresult *(*ParallelSlotResultHandler) (PGresult *res, PGconn *conn,
+ void *context);
+
typedef struct ParallelSlot
{
PGconn *connection; /* One connection */
bool isFree; /* Is it known to be idle? */
+
+ /*
+ * Prior to issuing a command or query on 'connection', a handler callback
+ * function may optionally be registered to be invoked to process the
+ * results, and context information may optionally be registered for use
+ * by the handler. If unset, these fields should be NULL.
+ */
+ ParallelSlotResultHandler handler;
+ void *handler_context;
} ParallelSlot;
+static inline void
+ParallelSlotSetHandler(ParallelSlot *slot, ParallelSlotResultHandler handler,
+ void *context)
+{
+ slot->handler = handler;
+ slot->handler_context = context;
+}
+
+static inline void
+ParallelSlotClearHandler(ParallelSlot *slot)
+{
+ slot->handler = NULL;
+ slot->handler_context = NULL;
+}
+
extern ParallelSlot *ParallelSlotsGetIdle(ParallelSlot *slots, int numslots);
extern ParallelSlot *ParallelSlotsSetup(const ConnParams *cparams,
@@ -31,5 +58,7 @@ extern void ParallelSlotsTerminate(ParallelSlot *slots, int numslots);
extern bool ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots);
+extern PGresult *TableCommandResultHandler(PGresult *res, PGconn *conn,
+ void *context);
#endif /* PARALLEL_SLOT_H */
--
2.21.1 (Apple Git-122.3)
v37-0003-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v37-0003-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From d22394d457162a5591b2fe3659003737c1552229 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:36:59 -0800
Subject: [PATCH v37 3/4] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 29 +
contrib/pg_amcheck/pg_amcheck.c | 1518 ++++++++++++++++++++
contrib/pg_amcheck/pg_amcheck.h | 91 ++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 78 +
contrib/pg_amcheck/t/003_check.pl | 475 ++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 496 +++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 52 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 1004 +++++++++++++
src/tools/msvc/Install.pm | 2 +-
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 4 +
16 files changed, 3766 insertions(+), 4 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/pg_amcheck.h
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..a72dcf7304 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..c21a14de31
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..bc61ee7970
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,29 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+SHLIB_PREREQS = submake-libpq
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..dc4da95ad3
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1518 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_class.h"
+#include "common/connect.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/connect_utils.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "getopt_long.h" /* pgrminclude ignore */
+#include "libpq-fe.h"
+#include "pg_amcheck.h"
+#include "pqexpbuffer.h" /* pgrminclude ignore */
+#include "storage/block.h"
+
+/* Keep this in order by CheckType */
+static const CheckTypeFilter ctfilter[] = {
+ {
+ .relam = HEAP_TABLE_AM_OID,
+ .relkinds = CppAsString2(RELKIND_RELATION) ","
+ CppAsString2(RELKIND_MATVIEW) ","
+ CppAsString2(RELKIND_TOASTVALUE),
+ .typname = "heap"
+ },
+ {
+ .relam = BTREE_AM_OID,
+ .relkinds = CppAsString2(RELKIND_INDEX),
+ .typname = "btree index"
+ }
+};
+
+/*
+ * Query for determining if contrib's amcheck is installed. If so, selects the
+ * namespace name where amcheck's functions can be found.
+ */
+static const char *amcheck_sql =
+"SELECT n.nspname, x.extversion"
+"\nFROM pg_catalog.pg_extension x"
+"\nJOIN pg_catalog.pg_namespace n"
+"\nON x.extnamespace OPERATOR(pg_catalog.=) n.oid"
+"\nWHERE x.extname OPERATOR(pg_catalog.=) 'amcheck'";
+
+static void check_each_database(ConnParams *cparams,
+ const amcheckObjects *objects,
+ const amcheckOptions *checkopts,
+ const char *progname);
+
+static void check_one_database(const ConnParams *cparams,
+ const amcheckObjects *objects,
+ const amcheckOptions *checkopts,
+ const char *progname);
+static void prepare_table_command(PQExpBuffer sql,
+ const amcheckOptions *checkopts, Oid reloid,
+ const char *nspname);
+
+static void prepare_btree_command(PQExpBuffer sql,
+ const amcheckOptions *checkopts, Oid reloid,
+ const char *nspname);
+
+static void run_command(PGconn *conn, const char *sql,
+ const amcheckOptions *checkopts, Oid reloid,
+ const char *typ);
+
+static PGresult *VerifyHeapamSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+
+static PGresult *VerifyBtreeSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+
+static void help(const char *progname);
+
+
+static void get_db_regexps_from_fqrps(SimpleStringList *regexps,
+ const SimpleStringList *patterns);
+
+static void get_db_regexps_from_patterns(SimpleStringList *regexps,
+ const SimpleStringList *patterns);
+
+static void appendDatabaseSelect(PGconn *conn, PQExpBuffer sql,
+ const SimpleStringList *regexps, bool alldb);
+
+static void appendTargetSelect(PGconn *conn, PQExpBuffer sql,
+ const amcheckObjects *objects,
+ const amcheckOptions *options, const char *progname,
+ bool inclusive);
+
+int
+main(int argc, char *argv[])
+{
+ static struct option long_options[] = {
+ /* Connection options */
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"maintenance-db", required_argument, NULL, 1},
+
+ /* check options */
+ {"all", no_argument, NULL, 'a'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"exclude-dbname", required_argument, NULL, 'D'},
+ {"echo", no_argument, NULL, 'e'},
+ {"heapallindexed", no_argument, NULL, 'H'},
+ {"index", required_argument, NULL, 'i'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"jobs", required_argument, NULL, 'j'},
+ {"parent-check", no_argument, NULL, 'P'},
+ {"quiet", no_argument, NULL, 'q'},
+ {"relation", required_argument, NULL, 'r'},
+ {"exclude-relation", required_argument, NULL, 'R'},
+ {"schema", required_argument, NULL, 's'},
+ {"exclude-schema", required_argument, NULL, 'S'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"exclude-indexes", no_argument, NULL, 2},
+ {"exclude-toast", no_argument, NULL, 3},
+ {"exclude-toast-pointers", no_argument, NULL, 4},
+ {"on-error-stop", no_argument, NULL, 5},
+ {"skip", required_argument, NULL, 6},
+ {"startblock", required_argument, NULL, 7},
+ {"endblock", required_argument, NULL, 8},
+ {"rootdescend", no_argument, NULL, 9},
+ {"no-dependents", no_argument, NULL, 10},
+
+ {NULL, 0, NULL, 0}
+ };
+
+ const char *progname;
+ int optindex;
+ int c;
+
+ const char *maintenance_db = NULL;
+ const char *connect_db = NULL;
+ const char *host = NULL;
+ const char *port = NULL;
+ const char *username = NULL;
+ enum trivalue prompt_password = TRI_DEFAULT;
+ ConnParams cparams;
+
+ amcheckOptions checkopts = {
+ .alldb = false,
+ .echo = false,
+ .quiet = false,
+ .verbose = false,
+ .dependents = true,
+ .no_indexes = false,
+ .on_error_stop = false,
+ .parent_check = false,
+ .rootdescend = false,
+ .heapallindexed = false,
+ .exclude_toast = false,
+ .reconcile_toast = true,
+ .skip = "none",
+ .jobs = -1,
+ .startblock = -1,
+ .endblock = -1
+ };
+
+ amcheckObjects objects = {
+ .databases = {NULL, NULL},
+ .schemas = {NULL, NULL},
+ .tables = {NULL, NULL},
+ .indexes = {NULL, NULL},
+ .exclude_databases = {NULL, NULL},
+ .exclude_schemas = {NULL, NULL},
+ .exclude_tables = {NULL, NULL},
+ .exclude_indexes = {NULL, NULL}
+ };
+
+ pg_logging_init(argv[0]);
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("contrib"));
+
+ handle_help_version_opts(argc, argv, progname, help);
+
+ /* process command-line options */
+ while ((c = getopt_long(argc, argv, "ad:D:eh:Hi:I:j:p:Pqr:R:s:S:t:T:U:wWv",
+ long_options, &optindex)) != -1)
+ {
+ char *endptr;
+
+ switch (c)
+ {
+ case 'a':
+ checkopts.alldb = true;
+ break;
+ case 'd':
+ simple_string_list_append(&objects.databases, optarg);
+ break;
+ case 'D':
+ simple_string_list_append(&objects.exclude_databases, optarg);
+ break;
+ case 'e':
+ checkopts.echo = true;
+ break;
+ case 'h':
+ host = pg_strdup(optarg);
+ break;
+ case 'H':
+ checkopts.heapallindexed = true;
+ break;
+ case 'i':
+ simple_string_list_append(&objects.indexes, optarg);
+ break;
+ case 'I':
+ simple_string_list_append(&objects.exclude_indexes, optarg);
+ break;
+ case 'j':
+ checkopts.jobs = atoi(optarg);
+ if (checkopts.jobs <= 0)
+ {
+ pg_log_error("number of parallel jobs must be at least 1");
+ exit(1);
+ }
+ break;
+ case 'p':
+ port = pg_strdup(optarg);
+ break;
+ case 'P':
+ checkopts.parent_check = true;
+ break;
+ case 'q':
+ checkopts.quiet = true;
+ break;
+ case 'r':
+ simple_string_list_append(&objects.indexes, optarg);
+ simple_string_list_append(&objects.tables, optarg);
+ break;
+ case 'R':
+ simple_string_list_append(&objects.exclude_tables, optarg);
+ simple_string_list_append(&objects.exclude_indexes, optarg);
+ break;
+ case 's':
+ simple_string_list_append(&objects.schemas, optarg);
+ break;
+ case 'S':
+ simple_string_list_append(&objects.exclude_schemas, optarg);
+ break;
+ case 't':
+ simple_string_list_append(&objects.tables, optarg);
+ break;
+ case 'T':
+ simple_string_list_append(&objects.exclude_tables, optarg);
+ break;
+ case 'U':
+ username = pg_strdup(optarg);
+ break;
+ case 'w':
+ prompt_password = TRI_NO;
+ break;
+ case 'W':
+ prompt_password = TRI_YES;
+ break;
+ case 'v':
+ checkopts.verbose = true;
+ pg_logging_increase_verbosity();
+ break;
+ case 1:
+ maintenance_db = pg_strdup(optarg);
+ break;
+ case 2:
+ checkopts.no_indexes = true;
+ break;
+ case 3:
+ checkopts.exclude_toast = true;
+ break;
+ case 4:
+ checkopts.reconcile_toast = false;
+ break;
+ case 5:
+ checkopts.on_error_stop = true;
+ break;
+ case 6:
+ if (pg_strcasecmp(optarg, "all-visible") == 0)
+ checkopts.skip = "all visible";
+ else if (pg_strcasecmp(optarg, "all-frozen") == 0)
+ checkopts.skip = "all frozen";
+ else
+ {
+ fprintf(stderr, _("invalid skip options"));
+ exit(1);
+ }
+ break;
+ case 7:
+ checkopts.startblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ _("relation starting block argument contains garbage characters"));
+ exit(1);
+ }
+ if (checkopts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ _("relation starting block argument out of bounds"));
+ exit(1);
+ }
+ break;
+ case 8:
+ checkopts.endblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ _("relation ending block argument contains garbage characters"));
+ exit(1);
+ }
+ if (checkopts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ _("relation ending block argument out of bounds"));
+ exit(1);
+ }
+ break;
+ case 9:
+ checkopts.rootdescend = true;
+ checkopts.parent_check = true;
+ break;
+ case 10:
+ checkopts.dependents = false;
+ break;
+ default:
+ fprintf(stderr,
+ _("Try \"%s --help\" for more information.\n"),
+ progname);
+ exit(1);
+ }
+ }
+
+ if (checkopts.endblock >= 0 && checkopts.endblock < checkopts.startblock)
+ {
+ pg_log_error("relation ending block argument precedes starting block argument");
+ exit(1);
+ }
+
+ /* non-option arguments specify database names */
+ while (optind < argc)
+ {
+ if (connect_db == NULL)
+ connect_db = argv[optind];
+ simple_string_list_append(&objects.databases, argv[optind]);
+ optind++;
+ }
+
+ /* fill cparams except for dbname, which is set below */
+ cparams.pghost = host;
+ cparams.pgport = port;
+ cparams.pguser = username;
+ cparams.prompt_password = prompt_password;
+ cparams.override_dbname = NULL;
+
+ setup_cancel_handler(NULL);
+
+ /* choose the database for our initial connection */
+ if (maintenance_db)
+ cparams.dbname = maintenance_db;
+ else if (connect_db != NULL)
+ cparams.dbname = connect_db;
+ else if (objects.databases.head != NULL)
+ cparams.dbname = objects.databases.head->val;
+ else
+ {
+ const char *default_db;
+
+ if (getenv("PGDATABASE"))
+ default_db = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ default_db = getenv("PGUSER");
+ else
+ default_db = get_user_name_or_exit(progname);
+
+ if (objects.databases.head == NULL)
+ simple_string_list_append(&objects.databases, default_db);
+
+ cparams.dbname = default_db;
+ }
+
+ check_each_database(&cparams, &objects, &checkopts, progname);
+
+ exit(0);
+}
+
+/*
+ * check_each_database
+ *
+ * Connects to the initial database and resolves a list of all databases that
+ * should be checked per the user supplied options. Sequentially checks each
+ * database in the list.
+ *
+ * The user supplied options may include zero databases, or only one database,
+ * in which case we could skip the step of resolving a list of databases, but
+ * it seems not worth optimizing, especially considering that there are
+ * multiple ways in which no databases or just one database might be specified,
+ * including a pattern that happens to match no entries or to match only one
+ * entry in pg_database.
+ *
+ * cparams: parameters for the initial database connection
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ */
+static void
+check_each_database(ConnParams *cparams, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname)
+{
+ PGconn *conn;
+ PGresult *databases;
+ PQExpBufferData sql;
+ int ntups;
+ int i;
+ SimpleStringList dbregex = {NULL, NULL};
+ SimpleStringList exclude = {NULL, NULL};
+
+ /*
+ * Get a list of all database SQL regexps to use for selecting database
+ * names. We assemble these regexps from fully-qualified relation
+ * patterns and database patterns. This process may result in the same
+ * database regex in the list multiple times, but the query against
+ * pg_database will deduplice, so we don't care.
+ */
+ get_db_regexps_from_fqrps(&dbregex, &objects->tables);
+ get_db_regexps_from_fqrps(&dbregex, &objects->indexes);
+ get_db_regexps_from_patterns(&dbregex, &objects->databases);
+
+ /*
+ * Assemble SQL regexps for databases to be excluded. Note that excluded
+ * relations are not considered here, as excluding relation x.y.z does not
+ * imply excluding database x. Excluding x.*.* would imply excluding
+ * database x, but we do not check for that here.
+ */
+ get_db_regexps_from_patterns(&exclude, &objects->exclude_databases);
+
+ conn = connectMaintenanceDatabase(cparams, progname, checkopts->echo);
+
+ initPQExpBuffer(&sql);
+ appendDatabaseSelect(conn, &sql, &dbregex, checkopts->alldb);
+ appendPQExpBufferStr(&sql, "\nEXCEPT");
+ appendDatabaseSelect(conn, &sql, &exclude, false);
+ executeCommand(conn, "RESET search_path;", checkopts->echo);
+ databases = executeQuery(conn, sql.data, checkopts->echo);
+ if (PQresultStatus(databases) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ PQfinish(conn);
+ exit(1);
+ }
+
+ termPQExpBuffer(&sql);
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, checkopts->echo));
+ PQfinish(conn);
+
+ ntups = PQntuples(databases);
+ if (ntups == 0 && !checkopts->quiet)
+ printf(_("%s: no databases to check\n"), progname);
+
+ for (i = 0; i < ntups; i++)
+ {
+ cparams->override_dbname = PQgetvalue(databases, i, 0);
+ check_one_database(cparams, objects, checkopts, progname);
+ }
+
+ PQclear(databases);
+}
+
+/*
+ * string_in_list
+ *
+ * Returns whether a given string is in the list of strings.
+ */
+static bool
+string_in_list(const SimpleStringList *list, const char *str)
+{
+ const SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next)
+ if (strcmp(cell->val, str) == 0)
+ return true;
+ return false;
+}
+
+/*
+ * check_one_database
+ *
+ * Connects to this next database and checks all relations that match the
+ * supplied objects list. Patterns in the object lists are matched to the
+ * relations that exit in this next database.
+ *
+ * cparams: parameters for this next database connection
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ */
+static void
+check_one_database(const ConnParams *cparams, const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname)
+{
+ PQExpBufferData sql;
+ PGconn *conn;
+ PGresult *result;
+ ParallelSlot *slots;
+ int ntups;
+ int i;
+ int parallel_workers;
+ bool inclusive;
+ bool failed = false;
+ char *amcheck_schema = NULL;
+
+ conn = connectDatabase(cparams, progname, checkopts->echo, false, true);
+
+ if (!checkopts->quiet)
+ {
+ printf(_("%s: checking database \"%s\"\n"),
+ progname, PQdb(conn));
+ fflush(stdout);
+ }
+
+ /*
+ * Verify that amcheck is installed for this next database. User error
+ * could result in a database not having amcheck that should have it, but
+ * we also could be iterating over multiple databases where not all of
+ * them have amcheck installed (for example, 'template1').
+ */
+ result = executeQuery(conn, amcheck_sql, checkopts->echo);
+ if (PQresultStatus(result) != PGRES_TUPLES_OK)
+ {
+ /* Querying the catalog failed. */
+ pg_log_error(_("%s: database \"%s\": %s\n"),
+ progname, PQdb(conn), PQerrorMessage(conn));
+ pg_log_error(_("%s: query was: %s"), progname, amcheck_sql);
+ PQclear(result);
+ PQfinish(conn);
+ return;
+ }
+ ntups = PQntuples(result);
+ if (ntups == 0)
+ {
+ /* Querying the catalog succeeded, but amcheck is missing. */
+ if (!checkopts->quiet &&
+ (checkopts->verbose ||
+ string_in_list(&objects->databases, PQdb(conn))))
+ {
+ printf(_("%s: skipping database \"%s\": amcheck is not installed"),
+ progname, PQdb(conn));
+ }
+ PQfinish(conn);
+ return;
+ }
+ amcheck_schema = PQgetvalue(result, 0, 0);
+ if (checkopts->verbose)
+ printf(_("%s: in database \"%s\": using amcheck version \"%s\" in schema \"%s\""),
+ progname, PQdb(conn), PQgetvalue(result, 0, 1), amcheck_schema);
+ amcheck_schema = PQescapeIdentifier(conn, amcheck_schema, strlen(amcheck_schema));
+ PQclear(result);
+
+ /*
+ * If we were given no tables nor indexes to check, then we select all
+ * targets not excluded. Otherwise, we select only the targets that we
+ * were given.
+ */
+ inclusive = objects->tables.head == NULL &&
+ objects->indexes.head == NULL;
+
+ initPQExpBuffer(&sql);
+ appendTargetSelect(conn, &sql, objects, checkopts, progname, inclusive);
+ executeCommand(conn, "RESET search_path;", checkopts->echo);
+ result = executeQuery(conn, sql.data, checkopts->echo);
+ if (PQresultStatus(result) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ PQfinish(conn);
+ exit(1);
+ }
+ termPQExpBuffer(&sql);
+ PQclear(executeQuery(conn, ALWAYS_SECURE_SEARCH_PATH_SQL, checkopts->echo));
+
+ /*
+ * If no rows are returned, there are no matching relations, so we are
+ * done.
+ */
+ ntups = PQntuples(result);
+ if (ntups == 0)
+ {
+ PQclear(result);
+ PQfinish(conn);
+ PQfreemem(amcheck_schema);
+ return;
+ }
+
+ /*
+ * Ensure parallel_workers is sane. If there are more connections than
+ * relations to be checked, we don't need to use them all.
+ */
+ parallel_workers = checkopts->jobs;
+ if (parallel_workers > ntups)
+ parallel_workers = ntups;
+ if (parallel_workers <= 0)
+ parallel_workers = 1;
+
+ /*
+ * Setup the database connections. We reuse the connection we already
+ * have for the first slot. If not in parallel mode, the first slot in
+ * the array contains the connection.
+ */
+ slots = ParallelSlotsSetup(cparams, progname, checkopts->echo, conn,
+ parallel_workers);
+
+ initPQExpBuffer(&sql);
+
+ /*
+ * Loop over all objects to be checked, and execute amcheck checking
+ * commands for each. We do not wait for the checks to complete, nor do
+ * we handle the results of those checks in the loop. We register
+ * handlers for doing all that.
+ */
+ for (i = 0; i < ntups; i++)
+ {
+ ParallelSlot *free_slot;
+
+ CheckType checktype = atoi(PQgetvalue(result, i, 0));
+ Oid reloid = atooid(PQgetvalue(result, i, 1));
+
+ if (CancelRequested)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ /*
+ * Get a parallel slot for the next amcheck command, blocking if
+ * necessary until one is available, or until a previously issued slot
+ * command fails, indicating that we should abort checking the
+ * remaining objects.
+ */
+ free_slot = ParallelSlotsGetIdle(slots, parallel_workers);
+ if (!free_slot)
+ {
+ /*
+ * Something failed. We don't need to know what it was, because
+ * the handler should already have emitted the necessary error
+ * messages.
+ */
+ failed = true;
+ goto finish;
+ }
+
+ /* Execute the amcheck command for the given relation type. */
+ switch (checktype)
+ {
+ /* heapam types */
+ case CT_TABLE:
+ prepare_table_command(&sql, checkopts, reloid, amcheck_schema);
+ ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler, sql.data);
+ run_command(free_slot->connection, sql.data, checkopts, reloid,
+ ctfilter[checktype].typname);
+ break;
+
+ /* btreeam types */
+ case CT_BTREE:
+ prepare_btree_command(&sql, checkopts, reloid, amcheck_schema);
+ ParallelSlotSetHandler(free_slot, VerifyBtreeSlotHandler, NULL);
+ run_command(free_slot->connection, sql.data, checkopts, reloid,
+ ctfilter[checktype].typname);
+ break;
+
+ /* intentionally no default here */
+ }
+ }
+
+ /*
+ * Wait for all slots to complete, or for one to indicate that an error
+ * occurred. Like above, we rely on the handler emitting the necessary
+ * error messages.
+ */
+ if (!ParallelSlotsWaitCompletion(slots, parallel_workers))
+ failed = true;
+
+finish:
+ ParallelSlotsTerminate(slots, parallel_workers);
+ pg_free(slots);
+
+ termPQExpBuffer(&sql);
+
+ if (amcheck_schema != NULL)
+ PQfreemem(amcheck_schema);
+
+ if (failed)
+ exit(1);
+}
+
+/*
+ * prepare_table_command
+ *
+ * Creates a SQL command for running amcheck checking on the given heap
+ * relation. The command is phrased as a SQL query, with column order and
+ * names matching the expectations of VerifyHeapamSlotHandler, which will
+ * receive and handle each row returned from the verify_heapam() function.
+ *
+ * sql: buffer into which the table checking command will be written
+ * checkopts: user supplied program options
+ * reloid: relation of the table to be checked
+ * amcheck_schema: schema in which amcheck contrib module is installed
+ */
+static void
+prepare_table_command(PQExpBuffer sql, const amcheckOptions *checkopts,
+ Oid reloid, const char *amcheck_schema)
+{
+ resetPQExpBuffer(sql);
+ appendPQExpBuffer(sql,
+ "SELECT n.nspname, c.relname, v.blkno, v.offnum, v.attnum, v.msg"
+ "\nFROM %s.verify_heapam("
+ "\nrelation := %u,"
+ "\non_error_stop := %s,"
+ "\ncheck_toast := %s,"
+ "\nskip := '%s'",
+ amcheck_schema,
+ reloid,
+ checkopts->on_error_stop ? "true" : "false",
+ checkopts->reconcile_toast ? "true" : "false",
+ checkopts->skip);
+ if (checkopts->startblock >= 0)
+ appendPQExpBuffer(sql, ",\nstartblock := %ld", checkopts->startblock);
+ if (checkopts->endblock >= 0)
+ appendPQExpBuffer(sql, ",\nendblock := %ld", checkopts->endblock);
+ appendPQExpBuffer(sql, "\n) v,"
+ "\npg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE c.oid OPERATOR(pg_catalog.=) %u",
+ reloid);
+}
+
+/*
+ * prepare_btree_command
+ *
+ * Creates a SQL command for running amcheck checking on the given btree index
+ * relation. The command does not select any columns, as btree checking
+ * functions do not return any, but rather return corruption information by
+ * raising errors, which VerifyBtreeSlotHandler expects.
+ *
+ * sql: buffer into which the table checking command will be written
+ * checkopts: user supplied program options
+ * reloid: relation of the table to be checked
+ * amcheck_schema: schema in which amcheck contrib module is installed
+ */
+static void
+prepare_btree_command(PQExpBuffer sql, const amcheckOptions *checkopts,
+ Oid reloid, const char *amcheck_schema)
+{
+ resetPQExpBuffer(sql);
+ if (checkopts->parent_check)
+ appendPQExpBuffer(sql,
+ "SELECT %s.bt_index_parent_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s,"
+ "\nrootdescend := %s)",
+ amcheck_schema,
+ reloid,
+ (checkopts->heapallindexed ? "true" : "false"),
+ (checkopts->rootdescend ? "true" : "false"));
+ else
+ appendPQExpBuffer(sql,
+ "SELECT %s.bt_index_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s)",
+ amcheck_schema,
+ reloid,
+ (checkopts->heapallindexed ? "true" : "false"));
+}
+
+/*
+ * run_command
+ *
+ * Sends a command to the server without waiting for the command to complete.
+ * Logs an error if the command cannot be sent, but otherwise any errors are
+ * expected to be handled by a ParallelSlotHandler.
+ *
+ * conn: connection to the server associated with the slot to use
+ * sql: query to send
+ * checkopts: user supplied program options
+ * reloid: oid of the object being checked, for error reporting
+ * typ: type of object being checked, for error reporting
+ */
+static void
+run_command(PGconn *conn, const char *sql, const amcheckOptions *checkopts,
+ Oid reloid, const char *typ)
+{
+ bool status;
+
+ if (checkopts->echo)
+ printf("%s\n", sql);
+
+ status = PQsendQuery(conn, sql) == 1;
+
+ if (!status)
+ {
+ pg_log_error("check of %s with id %u in database \"%s\" failed: %s",
+ typ, reloid, PQdb(conn), PQerrorMessage(conn));
+ pg_log_error("command was: %s", sql);
+ }
+}
+
+/*
+ * VerifyHeapamSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a table checking command
+ * created by prepare_table_command and outputs the results for the user.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: the sql query being handled, as a cstring
+ */
+static PGresult *
+VerifyHeapamSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ int ntups = PQntuples(res);
+
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ int i;
+
+ for (i = 0; i < ntups; i++)
+ {
+ if (!PQgetisnull(res, i, 4))
+ printf("relation %s.%s.%s, block %s, offset %s, attribute %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ PQgetvalue(res, i, 4), /* attnum */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 3))
+ printf("relation %s.%s.%s, block %s, offset %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 2))
+ printf("relation %s.%s.%s, block %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 1))
+ printf("relation %s.%s.%s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ /* blkno is null: 2 */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else
+ printf("%s.%s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 5)); /* msg */
+ }
+ }
+ else if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ printf("%s: %s\n", PQdb(conn), PQerrorMessage(conn));
+ printf(_("query was: %s\n"), (const char *) context);
+ }
+
+ return res;
+}
+
+/*
+ * VerifyBtreeSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a btree checking command
+ * created by prepare_btree_command and outputs them for the user. The results
+ * from the btree checking command is assumed to be empty, but when the results
+ * are an error code, the useful information about the corruption is expected
+ * in the connection's error message.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: unused
+ */
+static PGresult *
+VerifyBtreeSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ printf("%s: %s\n", PQdb(conn), PQerrorMessage(conn));
+ return res;
+}
+
+/*
+ * help
+ *
+ * Prints help page for the program
+ *
+ * progname: the name of the executed program, such as "pg_amcheck"
+ */
+static void
+help(const char *progname)
+{
+ printf(_("%s checks objects in a PostgreSQL database for corruption.\n\n"), progname);
+ printf(_("Usage:\n"));
+ printf(_(" %s [OPTION]... [DBNAME]\n"), progname);
+ printf(_("\nTarget Options:\n"));
+ printf(_(" -a, --all check all databases\n"));
+ printf(_(" -d, --dbname=DBNAME check specific database(s)\n"));
+ printf(_(" -D, --exclude-dbname=DBNAME do NOT check specific database(s)\n"));
+ printf(_(" -i, --index=INDEX check specific index(es)\n"));
+ printf(_(" -I, --exclude-index=INDEX do NOT check specific index(es)\n"));
+ printf(_(" -r, --relation=RELNAME check specific relation(s)\n"));
+ printf(_(" -R, --exclude-relation=RELNAME do NOT check specific relation(s)\n"));
+ printf(_(" -s, --schema=SCHEMA check specific schema(s)\n"));
+ printf(_(" -S, --exclude-schema=SCHEMA do NOT check specific schema(s)\n"));
+ printf(_(" -t, --table=TABLE check specific table(s)\n"));
+ printf(_(" -T, --exclude-table=TABLE do NOT check specific table(s)\n"));
+ printf(_(" --exclude-indexes do NOT perform any index checking\n"));
+ printf(_(" --exclude-toast do NOT check any toast tables or indexes\n"));
+ printf(_(" --no-dependents do NOT automatically check dependent objects\n"));
+ printf(_("\nIndex Checking Options:\n"));
+ printf(_(" -H, --heapallindexed check all heap tuples are found within indexes\n"));
+ printf(_(" -P, --parent-check check parent/child relationships during index checking\n"));
+ printf(_(" --rootdescend search from root page to refind tuples at the leaf level\n"));
+ printf(_("\nTable Checking Options:\n"));
+ printf(_(" --exclude-toast-pointers do NOT check relation toast pointers against toast\n"));
+ printf(_(" --on-error-stop stop checking a relation at end of first corrupt page\n"));
+ printf(_(" --skip=OPTION do NOT check \"all-frozen\" or \"all-visible\" blocks\n"));
+ printf(_(" --startblock begin checking table(s) at the given starting block number\n"));
+ printf(_(" --endblock check table(s) only up to the given ending block number\n"));
+ printf(_("\nConnection options:\n"));
+ printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
+ printf(_(" -p, --port=PORT database server port\n"));
+ printf(_(" -U, --username=USERNAME user name to connect as\n"));
+ printf(_(" -w, --no-password never prompt for password\n"));
+ printf(_(" -W, --password force password prompt\n"));
+ printf(_(" --maintenance-db=DBNAME alternate maintenance database\n"));
+ printf(_("\nOther Options:\n"));
+ printf(_(" -e, --echo show the commands being sent to the server\n"));
+ printf(_(" -j, --jobs=NUM use this many concurrent connections to the server\n"));
+ printf(_(" -q, --quiet don't write any messages\n"));
+ printf(_(" -v, --verbose write a lot of output\n"));
+ printf(_(" -V, --version output version information, then exit\n"));
+ printf(_(" -?, --help show this help, then exit\n"));
+
+ printf(_("\nRead the description of the amcheck contrib module for details.\n"));
+ printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
+ printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
+}
+
+/*
+ * get_db_regexps_from_fqrps
+ *
+ * For each pattern in the patterns list, if it is in fully-qualified
+ * database.schema.name format (fully-qualifed relation pattern (fqrp)), parse
+ * the database portion of the pattern, convert it to SQL regex format, and
+ * append it to the databases list. Patterns that are not fully-qualified are
+ * skipped over. No deduplication of regexps is performed.
+ *
+ * regexps: list to which parsed and converted database regexps are appended
+ * patterns: list of all patterns to parse
+ */
+static void
+get_db_regexps_from_fqrps(SimpleStringList *regexps,
+ const SimpleStringList *patterns)
+{
+ const SimpleStringListCell *cell;
+ PQExpBufferData dbnamebuf;
+ PQExpBufferData schemabuf;
+ PQExpBufferData namebuf;
+ int encoding = pg_get_encoding_from_locale(NULL, false);
+
+ initPQExpBuffer(&dbnamebuf);
+ initPQExpBuffer(&schemabuf);
+ initPQExpBuffer(&namebuf);
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ /* parse the pattern as dbname.schema.relname, if possible */
+ patternToSQLRegex(encoding, &dbnamebuf, &schemabuf, &namebuf,
+ cell->val, false);
+
+ /* add the database name (or pattern), if any, to the list */
+ if (dbnamebuf.data[0])
+ simple_string_list_append(regexps, dbnamebuf.data);
+
+ /* we do not use the schema or relname portions */
+
+ /* we may have dirtied the buffers */
+ resetPQExpBuffer(&dbnamebuf);
+ resetPQExpBuffer(&schemabuf);
+ resetPQExpBuffer(&namebuf);
+ }
+ termPQExpBuffer(&dbnamebuf);
+ termPQExpBuffer(&schemabuf);
+ termPQExpBuffer(&namebuf);
+}
+
+/*
+ * get_db_regexps_from_patterns
+ *
+ * Convert each unqualified pattern in the list to SQL regex format and append
+ * it to the regexps list. No deduplication of regexps is performed.
+ *
+ * regexps: list to which converted regexps are appended
+ * patterns: list of patterns to be converted
+ */
+static void
+get_db_regexps_from_patterns(SimpleStringList *regexps,
+ const SimpleStringList *patterns)
+{
+ const SimpleStringListCell *cell;
+ PQExpBufferData buf;
+ int encoding = pg_get_encoding_from_locale(NULL, false);
+
+ initPQExpBuffer(&buf);
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ patternToSQLRegex(encoding, NULL, NULL, &buf, cell->val, false);
+ if (buf.data[0])
+ simple_string_list_append(regexps, buf.data);
+ resetPQExpBuffer(&buf);
+ }
+ termPQExpBuffer(&buf);
+}
+
+/*
+ * appendDatabaseSelect
+ *
+ * Appends a statement which selects the names of all databases matching the
+ * given SQL regular expressions.
+ *
+ * conn: connection to the initial database
+ * sql: buffer into which the constructed sql statement is appended
+ * regexps: list of database name regular expressions to match
+ * alldb: when true, select all databases which allow connections
+ */
+static void
+appendDatabaseSelect(PGconn *conn, PQExpBuffer sql, const SimpleStringList *regexps,
+ bool alldb)
+{
+ SimpleStringListCell *cell;
+ const char *comma;
+
+ if (alldb)
+ {
+ appendPQExpBufferStr(sql,
+ "\nSELECT datname::TEXT AS datname"
+ "\nFROM pg_catalog.pg_database"
+ "\nWHERE datallowconn");
+ return;
+ }
+ else if (regexps->head == NULL)
+ {
+ appendPQExpBufferStr(sql,
+ "\nSELECT ''::TEXT AS datname"
+ "\nWHERE false");
+ return;
+ }
+
+ appendPQExpBufferStr(sql,
+ "\nSELECT datname::TEXT AS datname"
+ "\nFROM pg_catalog.pg_database"
+ "\nWHERE datallowconn"
+ "\nAND datname::TEXT OPERATOR(pg_catalog.~) ANY(ARRAY[\n");
+ for (cell = regexps->head, comma = ""; cell; cell = cell->next, comma = ",\n")
+ {
+ appendPQExpBufferStr(sql, comma);
+ appendStringLiteralConn(sql, cell->val, conn);
+ appendPQExpBufferStr(sql, "::TEXT COLLATE pg_catalog.default");
+ }
+ appendPQExpBufferStr(sql, "\n]::TEXT[])");
+}
+
+/*
+ * appendSchemaSelect
+ *
+ * Appends a statement which selects all schemas matching the given patterns
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * patterns: list of schema name patterns to match
+ * inclusive: when patterns is an empty list, whether the select statement
+ * should match all non-system schemas
+ */
+static void
+appendSchemaSelect(PGconn *conn, PQExpBuffer sql,
+ const SimpleStringList *patterns, bool inclusive)
+{
+ SimpleStringListCell *cell;
+ const char *comma;
+ int encoding = PQclientEncoding(conn);
+
+ if (patterns->head == NULL)
+ {
+ if (!inclusive)
+ appendPQExpBufferStr(sql,
+ "\nSELECT 0::pg_catalog.oid AS nspoid WHERE false");
+ else
+ appendPQExpBufferStr(sql,
+ "\nSELECT oid AS nspoid"
+ "\nFROM pg_catalog.pg_namespace"
+ "\nWHERE oid OPERATOR(pg_catalog.!=) pg_catalog.regnamespace('pg_catalog')"
+ "\nAND oid OPERATOR(pg_catalog.!=) pg_catalog.regnamespace('pg_toast')");
+ return;
+ }
+
+ appendPQExpBufferStr(sql,
+ "\nSELECT oid AS nspoid"
+ "\nFROM pg_catalog.pg_namespace"
+ "\nWHERE nspname OPERATOR(pg_catalog.~) ANY(ARRAY[\n");
+ for (cell = patterns->head, comma = ""; cell; cell = cell->next, comma = ",\n")
+ {
+ PQExpBufferData regexbuf;
+
+ initPQExpBuffer(®exbuf);
+ patternToSQLRegex(encoding, NULL, NULL, ®exbuf, cell->val, false);
+ appendPQExpBufferStr(sql, comma);
+ appendStringLiteralConn(sql, regexbuf.data, conn);
+ appendPQExpBufferStr(sql, "::TEXT COLLATE pg_catalog.default");
+ termPQExpBuffer(®exbuf);
+ }
+ appendPQExpBufferStr(sql, "\n]::TEXT[])");
+}
+
+/*
+ * appendSchemaCTE
+ *
+ * Appends a Common Table Expression (CTE) which selects all schemas to be
+ * checked, with the CTE and oid field named as requested. The CTE will select
+ * all schemas matching the include list except any schemas matching the
+ * exclude list.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * ctename: name of the schema CTE to be created
+ * include: list of schema name patterns for inclusion
+ * exclude: list of schema name patterns for exclusion
+ * inclusive: when 'include' is an empty list, whether to use all schemas in
+ * the database in lieu of the include list.
+ */
+static void
+appendSchemaCTE(PGconn *conn, PQExpBuffer sql, const char *ctename, const
+ SimpleStringList *include, const SimpleStringList *exclude,
+ bool inclusive)
+{
+ appendPQExpBuffer(sql, "\n%s (nspoid) AS (", ctename);
+ appendSchemaSelect(conn, sql, include, inclusive);
+ appendPQExpBufferStr(sql, "\nEXCEPT");
+ appendSchemaSelect(conn, sql, exclude, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * appendCTFilterQuals
+ *
+ * Appends quals to a buffer that restrict the rows selected from pg_class to
+ * only those which match the given checktype. No initial "WHERE" or "AND" is
+ * appended, nor do we surround our appended clauses in parens. The caller is
+ * assumed to take care of such matters.
+ *
+ * sql: buffer into which the constructed sql quals are appended
+ * relname: name (or alias) of pg_class in the surrounding query
+ * checktype: struct containing filter info
+ */
+static void
+appendCTFilterQuals(PQExpBuffer sql, const char *relname, CheckType checktype)
+{
+ appendPQExpBuffer(sql,
+ "%s.relam OPERATOR(pg_catalog.=) %u"
+ "\nAND %s.relkind OPERATOR(pg_catalog.=) ANY(ARRAY[%s])",
+ relname, ctfilter[checktype].relam,
+ relname, ctfilter[checktype].relkinds);
+}
+
+/*
+ * appendRelationSelect
+ *
+ * Appends a statement which selects the oid of all relations matching the
+ * given parameters. Expects a mixture of qualified and unqualified relation
+ * name patterns.
+ *
+ * For unqualified relation patterns, selects relations that match the relation
+ * name portion of the pattern which are in namespaces that are in the given
+ * namespace CTE.
+ *
+ * For qualified relation patterns, ignores the given namespace CTE and selects
+ * relations that match the relation name portion of the pattern which are in
+ * namespaces that match the schema portion of the pattern.
+ *
+ * For fully qualified relation patterns (database.schema.name), the pattern
+ * will be ignored unless the database portion of the pattern matches the name
+ * of the current database, as retrieved from conn.
+ *
+ * Only relations of the specified checktype will be selected.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * patterns: list of (possibly qualified) relation name patterns to match
+ * checktype: the type of relation to select
+ * inclusive: when patterns is an empty list, whether the select statement
+ * should match all relations of the given type
+ */
+static void
+appendRelationSelect(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const SimpleStringList *patterns, CheckType checktype,
+ bool inclusive)
+{
+ SimpleStringListCell *cell;
+ const char *comma = "";
+ const char *qor = "";
+ PQExpBufferData qualified;
+ PQExpBufferData unqualified;
+ PQExpBufferData dbnamebuf;
+ PQExpBufferData schemabuf;
+ PQExpBufferData namebuf;
+ int encoding = PQclientEncoding(conn);
+
+ if (patterns->head == NULL)
+ {
+ if (!inclusive)
+ appendPQExpBufferStr(sql,
+ "\nSELECT 0::pg_catalog.oid WHERE false");
+ else
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN %s n"
+ "\nON n.nspoid OPERATOR(pg_catalog.=) c.relnamespace"
+ "\nWHERE ",
+ schemacte);
+ appendCTFilterQuals(sql, "c", checktype);
+ }
+ return;
+ }
+
+ /*
+ * We have to distinguish between schema-qualified and unqualified
+ * relation patterns. The unqualified patterns need to be restricted by
+ * the list of schemas returned by the schema CTE, but not so for the
+ * qualified patterns.
+ *
+ * We treat fully-qualified relation patterns (database.schema.relation)
+ * like schema-qualified patterns except that we also require the database
+ * portion to match the current database name.
+ */
+ initPQExpBuffer(&qualified);
+ initPQExpBuffer(&unqualified);
+ initPQExpBuffer(&dbnamebuf);
+ initPQExpBuffer(&schemabuf);
+ initPQExpBuffer(&namebuf);
+
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ patternToSQLRegex(encoding, &dbnamebuf, &schemabuf, &namebuf,
+ cell->val, false);
+
+ if (schemabuf.data[0])
+ {
+ /* Qualified relation pattern */
+ appendPQExpBuffer(&qualified, "%s\n(", qor);
+
+ if (dbnamebuf.data[0])
+ {
+ /*
+ * Fully-qualified relation pattern. Require the database
+ * name of our connection to match the database portion of the
+ * relation pattern.
+ */
+ appendPQExpBufferStr(&qualified, "\n");
+ appendStringLiteralConn(&qualified, PQdb(conn), conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, dbnamebuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default AND");
+ }
+
+ /*
+ * Require the namespace name to match the schema portion of the
+ * relation pattern and the relation name to match the relname
+ * portion of the relation pattern.
+ */
+ appendPQExpBufferStr(&qualified,
+ "\nn.nspname OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, schemabuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default AND"
+ "\nc.relname OPERATOR(pg_catalog.~) ");
+ appendStringLiteralConn(&qualified, namebuf.data, conn);
+ appendPQExpBufferStr(&qualified,
+ "::TEXT COLLATE pg_catalog.default)");
+ qor = "\nOR";
+ }
+ else
+ {
+ /* Unqualified relation pattern */
+ appendPQExpBufferStr(&unqualified, comma);
+ appendStringLiteralConn(&unqualified, namebuf.data, conn);
+ appendPQExpBufferStr(&unqualified,
+ "::TEXT COLLATE pg_catalog.default");
+ comma = "\n, ";
+ }
+
+ resetPQExpBuffer(&dbnamebuf);
+ resetPQExpBuffer(&schemabuf);
+ resetPQExpBuffer(&namebuf);
+ }
+
+ if (qualified.data[0])
+ {
+ appendPQExpBufferStr(sql,
+ "\nSELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE (");
+ appendPQExpBufferStr(sql, qualified.data);
+ appendPQExpBufferStr(sql, ")\nAND ");
+ appendCTFilterQuals(sql, "c", checktype);
+ if (unqualified.data[0])
+ appendPQExpBufferStr(sql, "\nUNION ALL");
+ }
+ if (unqualified.data[0])
+ {
+ appendPQExpBuffer(sql,
+ "\nSELECT c.oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN %s ls"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) ls.nspoid"
+ "\nWHERE c.relname OPERATOR(pg_catalog.~) ANY(ARRAY[",
+ schemacte);
+ appendPQExpBufferStr(sql, unqualified.data);
+ appendPQExpBufferStr(sql, "\n]::TEXT[])\nAND ");
+ appendCTFilterQuals(sql, "c", checktype);
+ }
+}
+
+/*
+ * appendTableCTE
+ *
+ * Appends to the buffer 'sql' a Common Table Expression (CTE) which selects
+ * all table relations matching the given filters.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql statement is appended
+ * schemacte: name of the CTE which selects all schemas to be checked
+ * ctename: name of the table CTE to be created
+ * include: list of table name patterns for inclusion
+ * exclude: list of table name patterns for exclusion
+ * inclusive: when 'include' is an empty list, whether the select statement
+ * should match all relations
+ * toast: whether to also select the associated toast tables
+ */
+static void
+appendTableCTE(PGconn *conn, PQExpBuffer sql, const char *schemacte,
+ const char *ctename, const SimpleStringList *include,
+ const SimpleStringList *exclude, bool inclusive, bool toast)
+{
+ appendPQExpBuffer(sql, "\n%s (oid) AS (", ctename);
+
+ if (toast)
+ {
+ /*
+ * Compute the primary tables, then union on all associated toast
+ * tables. We depend on left to right evaluation of the UNION before
+ * the EXCEPT which gets added below. UNION and EXCEPT have equal
+ * precedence, so be careful if you rearrange this query.
+ */
+ appendPQExpBuffer(sql, "\nWITH primary_table AS (");
+ appendRelationSelect(conn, sql, schemacte, include, CT_TABLE, inclusive);
+ appendPQExpBufferStr(sql,
+ "\n)"
+ "\nSELECT oid"
+ "\nFROM primary_table"
+ "\nUNION"
+ "\nSELECT c.reltoastrelid AS oid"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN primary_table pt"
+ "\nON pt.oid OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE c.reltoastrelid OPERATOR(pg_catalog.!=) 0");
+ }
+ else
+ appendRelationSelect(conn, sql, schemacte, include, CT_TABLE, inclusive);
+
+ appendPQExpBufferStr(sql, "\nEXCEPT");
+ appendRelationSelect(conn, sql, schemacte, exclude, CT_TABLE, false);
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * appendIndexCTE
+ *
+ * Appends a Common Table Expression (CTE) which selects all indexes to be
+ * checked
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql CTE is appended
+ * tablescte: optional; if automatically including indexes for checked tables,
+ * the name of the CTE which contains all tables to be checked
+ * patterns: list of index name patterns to match
+ * inclusive: when 'include' is an empty list, whether the select statement
+ * should match all relations
+ */
+static void
+appendIndexCTE(PGconn *conn, PQExpBuffer sql, const char *tablescte,
+ const SimpleStringList *patterns, bool inclusive)
+{
+ appendPQExpBufferStr(sql, "\nindexes (oid) AS (");
+ appendPQExpBufferStr(sql, "\nSELECT oid FROM (");
+ appendRelationSelect(conn, sql, "namespaces", patterns, CT_BTREE, inclusive);
+ if (tablescte)
+ {
+ appendPQExpBuffer(sql,
+ "\nUNION"
+ "\nSELECT i.indexrelid AS oid"
+ "\nFROM pg_catalog.pg_index i"
+ "\nJOIN %s t ON t.oid OPERATOR(pg_catalog.=) i.indrelid",
+ tablescte);
+ }
+ appendPQExpBufferStr(sql,
+ "\n) AS included_indexes"
+ "\nEXCEPT"
+ "\nSELECT oid FROM excluded_indexes");
+ appendPQExpBufferStr(sql, "\n)");
+}
+
+/*
+ * appendTargetSelect
+ *
+ * Construct a query that will return a list of all tables and indexes in
+ * the database matching the user specified options, sorted by size. We
+ * want the largest tables and indexes first, so that the parallel
+ * processing of the larger database objects gets started sooner.
+ *
+ * conn: connection to the current database
+ * sql: buffer into which the constructed sql select statement is appended
+ * objects: lists of include and exclude patterns for filtering objects
+ * checkopts: user supplied program options
+ * progname: name of this program, such as "pg_amcheck"
+ * inclusive: when list of objects to include is empty, whether the select
+ * statement should match all objects not otherwise excluded
+ */
+static void
+appendTargetSelect(PGconn *conn, PQExpBuffer sql,
+ const amcheckObjects *objects,
+ const amcheckOptions *checkopts, const char *progname,
+ bool inclusive)
+{
+ appendPQExpBufferStr(sql, "WITH");
+ appendSchemaCTE(conn, sql, "namespaces", &objects->schemas,
+ &objects->exclude_schemas, inclusive);
+ appendPQExpBufferStr(sql, ",");
+ appendTableCTE(conn, sql, "namespaces", "tables", &objects->tables,
+ &objects->exclude_tables, inclusive,
+ !checkopts->exclude_toast);
+ if (!checkopts->no_indexes)
+ {
+ appendPQExpBufferStr(sql, ",\nexcluded_indexes (oid) AS (");
+ appendRelationSelect(conn, sql, "namespaces",
+ &objects->exclude_indexes, CT_BTREE, false);
+ appendPQExpBufferStr(sql, "\n),");
+ if (checkopts->dependents)
+ appendIndexCTE(conn, sql, "tables", &objects->indexes, inclusive);
+ else
+ appendIndexCTE(conn, sql, NULL, &objects->indexes, inclusive);
+ }
+ appendPQExpBuffer(sql,
+ "\nSELECT checktype, oid FROM ("
+ "\nSELECT %u AS checktype, tables.oid, c.relpages"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN tables"
+ "\nON tables.oid OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE ",
+ CT_TABLE);
+ appendCTFilterQuals(sql, "c", CT_TABLE);
+ if (!checkopts->no_indexes)
+ {
+ appendPQExpBuffer(sql,
+ "\nUNION ALL"
+ "\nSELECT %u AS checktype, indexes.oid, c.relpages"
+ "\nFROM pg_catalog.pg_class c"
+ "\nJOIN indexes"
+ "\nON indexes.oid OPERATOR(pg_catalog.=) c.oid"
+ "\nWHERE ",
+ CT_BTREE);
+ appendCTFilterQuals(sql, "c", CT_BTREE);
+ }
+ appendPQExpBufferStr(sql,
+ "\n) AS ss"
+ "\nORDER BY relpages DESC, checktype, oid");
+}
diff --git a/contrib/pg_amcheck/pg_amcheck.h b/contrib/pg_amcheck/pg_amcheck.h
new file mode 100644
index 0000000000..7e9595a3a3
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.h
@@ -0,0 +1,91 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.h
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2020-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef PG_AMCHECK_H
+#define PG_AMCHECK_H
+
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "libpq-fe.h"
+#include "pqexpbuffer.h" /* pgrminclude ignore */
+
+/* amcheck options controlled by user flags */
+typedef struct amcheckOptions
+{
+ bool alldb;
+ bool echo;
+ bool quiet;
+ bool verbose;
+ bool dependents;
+ bool no_indexes;
+ bool exclude_toast;
+ bool reconcile_toast;
+ bool on_error_stop;
+ bool parent_check;
+ bool rootdescend;
+ bool heapallindexed;
+ const char *skip;
+ int jobs; /* >= 0 indicates user specified the parallel
+ * degree, otherwise -1 */
+ long startblock;
+ long endblock;
+} amcheckOptions;
+
+/* names of database objects to include or exclude controlled by user flags */
+typedef struct amcheckObjects
+{
+ SimpleStringList databases;
+ SimpleStringList schemas;
+ SimpleStringList tables;
+ SimpleStringList indexes;
+ SimpleStringList exclude_databases;
+ SimpleStringList exclude_schemas;
+ SimpleStringList exclude_tables;
+ SimpleStringList exclude_indexes;
+} amcheckObjects;
+
+/*
+ * We cannot launch the same amcheck function for all checked objects. For
+ * btree indexes, we must use either bt_index_check() or
+ * bt_index_parent_check(). For heap relations, we must use verify_heapam().
+ * We silently ignore all other object types.
+ *
+ * The following CheckType enum and corresponding ctfilter array track which
+ * which kinds of relations get which treatment.
+ */
+typedef enum
+{
+ CT_TABLE = 0,
+ CT_BTREE
+} CheckType;
+
+/*
+ * This struct is used for filtering relations in pg_catalog.pg_class to just
+ * those of a given CheckType. The relam field should equal pg_class.relam,
+ * and the pg_class.relkind should be contained in the relkinds comma separated
+ * list.
+ *
+ * The 'typname' field is not strictly for filtering, but for printing messages
+ * about relations that matched the filter.
+ */
+typedef struct
+{
+ Oid relam;
+ const char *relkinds;
+ const char *typname;
+} CheckTypeFilter;
+
+/* Constants taken from pg_catalog/pg_am.dat */
+#define HEAP_TABLE_AM_OID 2
+#define BTREE_AM_OID 403
+
+#endif /* PG_AMCHECK_H */
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..b52039c79b
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,78 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 16;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'qqq' ],
+ qr/database "qqq" does not exist/,
+ 'checking a non-existent database');
+
+#########################################
+# Test connecting with a non-existent user
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-U=no_such_user', 'postgres' ],
+ qr/role "=no_such_user" does not exist/,
+ 'checking with a non-existent user');
+
+#########################################
+# Test checking a database without amcheck installed, by name. We should see a
+# message about missing amcheck
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, 'template1' ],
+ qr/pg_amcheck: skipping database "template1": amcheck is not installed/,
+ 'checking a database by name without amcheck installed');
+
+#########################################
+# Test checking a database without amcheck installed, by only indirectly using
+# a dbname pattern. In verbose mode, we should see a message about missing
+# amcheck
+
+$node->command_like(
+ [ 'pg_amcheck', '-p', $port, '-v', '-d', '*', 'postgres' ],
+ qr/pg_amcheck: skipping database "template1": amcheck is not installed/,
+ 'checking a database by dbname implication without amcheck installed');
+
+#########################################
+# Test checking non-existent schemas, tables, and indexes
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no_such_schema' ],
+ 'checking a non-existent schema');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no_such_table' ],
+ 'checking a non-existent table');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no_such_index' ],
+ 'checking a non-existent schema');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no*such*schema*' ],
+ 'no matching schemas');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no*such*table*' ],
+ 'no matching tables');
+
+$node->command_ok(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no*such*index' ],
+ 'no matching indexes');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..957094fcdd
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,475 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 70;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the test node is running.
+sub corrupt_first_page($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# toast table (if any) corresponding to the given main table relation, and
+# restarts the node.
+#
+# Assumes the test node is running
+sub remove_toast_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $toastname = relation_toast($dbname, $relname);
+ remove_relation_file($dbname, $toastname) if ($toastname);
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+for my $dbname (qw(db1 db2 db3))
+{
+ # Create the database
+ $node->safe_psql('postgres', qq(CREATE DATABASE $dbname));
+
+ # Load the amcheck extension, upon which pg_amcheck depends. Put the
+ # extension in an unexpected location to test that pg_amcheck finds it
+ # correctly. Create tables with names that look like pg_catalog names to
+ # check that pg_amcheck does not get confused by them. Create functions in
+ # schema public that look like amcheck functions to check that pg_amcheck
+ # does not use them.
+ $node->safe_psql($dbname, q(
+ CREATE SCHEMA amcheck_schema;
+ CREATE EXTENSION amcheck WITH SCHEMA amcheck_schema;
+ CREATE TABLE amcheck_schema.pg_database (junk text);
+ CREATE TABLE amcheck_schema.pg_namespace (junk text);
+ CREATE TABLE amcheck_schema.pg_class (junk text);
+ CREATE TABLE amcheck_schema.pg_operator (junk text);
+ CREATE TABLE amcheck_schema.pg_proc (junk text);
+ CREATE TABLE amcheck_schema.pg_tablespace (junk text);
+
+ CREATE FUNCTION public.bt_index_check(index regclass,
+ heapallindexed boolean default false)
+ RETURNS VOID AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong bt_index_check!';
+ END;
+ $$ LANGUAGE plpgsql;
+
+ CREATE FUNCTION public.bt_index_parent_check(index regclass,
+ heapallindexed boolean default false,
+ rootdescend boolean default false)
+ RETURNS VOID AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong bt_index_parent_check!';
+ END;
+ $$ LANGUAGE plpgsql;
+
+ CREATE FUNCTION public.verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip text default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ RETURNS SETOF record AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong verify_heapam!';
+ END;
+ $$ LANGUAGE plpgsql;
+ ));
+
+ # Create schemas, tables and indexes in five separate
+ # schemas. The schemas are all identical to start, but
+ # we will corrupt them differently later.
+ #
+ for my $schema (qw(s1 s2 s3 s4 s5))
+ {
+ $node->safe_psql($dbname, qq(
+ CREATE SCHEMA $schema;
+ CREATE SEQUENCE $schema.seq1;
+ CREATE SEQUENCE $schema.seq2;
+ CREATE TABLE $schema.t1 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE TABLE $schema.t2 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE VIEW $schema.t2_view AS (
+ SELECT i*2, t FROM $schema.t2
+ );
+ ALTER TABLE $schema.t2
+ ALTER COLUMN t
+ SET STORAGE EXTERNAL;
+
+ INSERT INTO $schema.t1 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ INSERT INTO $schema.t2 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ CREATE MATERIALIZED VIEW $schema.t1_mv AS SELECT * FROM $schema.t1;
+ CREATE MATERIALIZED VIEW $schema.t2_mv AS SELECT * FROM $schema.t2;
+
+ create table $schema.p1 (a int, b int) PARTITION BY list (a);
+ create table $schema.p2 (a int, b int) PARTITION BY list (a);
+
+ create table $schema.p1_1 partition of $schema.p1 for values in (1, 2, 3);
+ create table $schema.p1_2 partition of $schema.p1 for values in (4, 5, 6);
+ create table $schema.p2_1 partition of $schema.p2 for values in (1, 2, 3);
+ create table $schema.p2_2 partition of $schema.p2 for values in (4, 5, 6);
+
+ CREATE INDEX t1_btree ON $schema.t1 USING BTREE (i);
+ CREATE INDEX t2_btree ON $schema.t2 USING BTREE (i);
+
+ CREATE INDEX t1_hash ON $schema.t1 USING HASH (i);
+ CREATE INDEX t2_hash ON $schema.t2 USING HASH (i);
+
+ CREATE INDEX t1_brin ON $schema.t1 USING BRIN (i);
+ CREATE INDEX t2_brin ON $schema.t2 USING BRIN (i);
+
+ CREATE INDEX t1_gist ON $schema.t1 USING GIST (b);
+ CREATE INDEX t2_gist ON $schema.t2 USING GIST (b);
+
+ CREATE INDEX t1_gin ON $schema.t1 USING GIN (ia);
+ CREATE INDEX t2_gin ON $schema.t2 USING GIN (ia);
+
+ CREATE INDEX t1_spgist ON $schema.t1 USING SPGIST (ir);
+ CREATE INDEX t2_spgist ON $schema.t2 USING SPGIST (ir);
+ ));
+ }
+}
+
+# Database 'db1' corruptions
+#
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('db1', 's1.t1_btree');
+corrupt_first_page('db1', 's1.t2_btree');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('db1', 's2.t1');
+corrupt_first_page('db1', 's2.t2');
+
+# Corrupt tables, partitions, matviews, and btrees in schema "s3"
+remove_relation_file('db1', 's3.t1');
+corrupt_first_page('db1', 's3.t2');
+
+remove_relation_file('db1', 's3.t1_mv');
+remove_relation_file('db1', 's3.p1_1');
+
+corrupt_first_page('db1', 's3.t2_mv');
+corrupt_first_page('db1', 's3.p2_1');
+
+remove_relation_file('db1', 's3.t1_btree');
+corrupt_first_page('db1', 's3.t2_btree');
+
+# Corrupt toast table, partitions, and materialized views in schema "s4"
+remove_toast_file('db1', 's4.t2');
+
+# Corrupt all other object types in schema "s5". We don't have amcheck support
+# for these types, but we check that their corruption does not trigger any
+# errors in pg_amcheck
+remove_relation_file('db1', 's5.seq1');
+remove_relation_file('db1', 's5.t1_hash');
+remove_relation_file('db1', 's5.t1_gist');
+remove_relation_file('db1', 's5.t1_gin');
+remove_relation_file('db1', 's5.t1_brin');
+remove_relation_file('db1', 's5.t1_spgist');
+
+corrupt_first_page('db1', 's5.seq2');
+corrupt_first_page('db1', 's5.t2_hash');
+corrupt_first_page('db1', 's5.t2_gist');
+corrupt_first_page('db1', 's5.t2_gin');
+corrupt_first_page('db1', 's5.t2_brin');
+corrupt_first_page('db1', 's5.t2_spgist');
+
+
+# Database 'db2' corruptions
+#
+remove_relation_file('db2', 's1.t1');
+remove_relation_file('db2', 's1.t1_btree');
+
+
+# Leave 'db3' uncorrupted
+#
+
+
+# Standard first arguments to TestLib functions
+my @cmd = ('pg_amcheck', '--quiet', '-p', $port);
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ @cmd,, 'db1' ],
+ 'pg_amcheck all schemas, tables and indexes in database db1');
+
+$node->command_ok(
+ [ @cmd,, 'db1', 'db2', 'db3' ],
+ 'pg_amcheck all schemas, tables and indexes in databases db1, db2 and db3');
+
+$node->command_ok(
+ [ @cmd, '--all' ],
+ 'pg_amcheck all schemas, tables and indexes in all databases');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-s', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-r', 's*.t1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-i', 'i*.idx', '-i', 'idx.i*' ],
+ 'pg_amcheck all indexes with qualified names matching /i*.idx/ or /idx.i*/');
+
+$node->command_ok(
+ [ @cmd, '--no-dependents', 'db1', '-r', 's*.t1' ],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/');
+
+$node->command_ok(
+ [ @cmd, '--no-dependents', 'db1', '-t', 's*.t1', '-t', 'foo*.bar*' ],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/ or /foo*.bar*/');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-T', 't1' ],
+ 'pg_amcheck everything except tables named t1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-S', 's1', '-R', 't1' ],
+ 'pg_amcheck everything not named t1 nor in schema s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.*.*' ],
+ 'pg_amcheck all tables across all databases and schemas');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.*.t1' ],
+ 'pg_amcheck all tables named t1 across all databases and schemas');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.s1.*' ],
+ 'pg_amcheck all tables across all databases in schemas named s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', 'db2.*.*' ],
+ 'pg_amcheck all tables across all schemas in database db2');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', 'db2.*.*', '-t', 'db3.*.*' ],
+ 'pg_amcheck all tables across all schemas in databases db2 and db3');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+$node->command_like(
+ [ @cmd, '--all', '-s', 's1', '-i', 't1_btree' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck index s1.t1_btree reports missing main relation fork');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't2_btree' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+# Checking db1.s1 should show no corruptions if indexes are excluded
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/^$/,
+ 'pg_amcheck of db1.s1 excluding indexes');
+
+# But checking across all databases in schema s1 should show corruptions
+# messages for tables in db2
+$node->command_like(
+ [ @cmd, '--all', '-s', 's1', '--exclude-indexes' ],
+ qr/could not open file/,
+ 'pg_amcheck of schema s1 across all databases but excluding indexes');
+
+# Checking across a list of databases should also work
+$node->command_like(
+ [ @cmd, '-d', 'db2', '-d', 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/could not open file/,
+ 'pg_amcheck of schema s1 across db1 and db2 but excluding indexes');
+
+# In schema s3, the tables and indexes are both corrupt. We should see
+# corruption messages on stdout, nothing on stderr, and an exit
+# status of zero.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's3' ],
+ 0,
+ [ qr/index "t1_btree" lacks a main relation fork/,
+ qr/could not open file/ ],
+ [ qr/^$/ ],
+ 'pg_amcheck schema s3 reports table and index errors');
+
+# In schema s2, only tables are corrupt. Check that table corruption is
+# reported as expected.
+#
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't1' ],
+ qr/could not open file/,
+ 'pg_amcheck in schema s2 reports table corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck in schema s2 reports table corruption');
+
+# In schema s4, only toast tables are corrupt. Check that under default
+# options the toast corruption is reported, but when excluding toast we get no
+# error reports.
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's4' ],
+ qr/could not open file/,
+ 'pg_amcheck in schema s4 reports toast corruption');
+
+$node->command_like(
+ [ @cmd, '--exclude-toast', '--exclude-toast-pointers', 'db1', '-s', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck in schema s4 excluding toast reports no corruption');
+
+# Check that no corruption is reported in schema s5
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's5' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s5 reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-I', 't1_btree', '-I', 't2_btree' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s1 with corrupt indexes excluded reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s1 with all indexes excluded reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s2 with corrupt tables excluded reports no corruption');
+
+# Check errors about bad block range command line arguments. We use schema s5
+# to avoid getting messages about corrupt tables or indexes.
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', 'junk' ],
+ qr/relation starting block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage startblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--endblock', '1234junk' ],
+ qr/relation ending block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage endblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', '5', '--endblock', '4' ],
+ qr/relation ending block argument precedes starting block argument/,
+ 'pg_amcheck rejects invalid block range');
+
+# Check bt_index_parent_check alternates. We don't create any index corruption
+# that would behave differently under these modes, so just smoke test that the
+# arguments are handled sensibly.
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--parent-check' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck smoke test --parent-check');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--heapallindexed', '--rootdescend' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck smoke test --heapallindexed --rootdescend');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..7e71d612fc
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,496 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper function to generate a regular expression matching the header we
+# expect verify_heapam() to return given which fields we expect to be non-null.
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ return qr/relation postgres\.public\.test, block $blkno, offset $offnum, attribute $attnum\s+/ms
+ if (defined $attnum);
+ return qr/relation postgres\.public\.test, block $blkno, offset $offnum\s+/ms
+ if (defined $offnum);
+ return qr/relation postgres\.public\.test\s+/ms
+ if (defined $blkno);
+ return qr/relation postgres\.public\.test\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, undef);
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--exclude-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/^$/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..379225cbf8
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,52 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_like(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ qr/item order invariant violated for index "fickleidx"/,
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..7e101f7c11 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -185,6 +185,7 @@ pages.
</para>
&oid2name;
+ &pgamcheck;
&vacuumlo;
</sect1>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db1d369743..5115cb03d0 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..2b2c73ca8b
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,1004 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<refentry id="pgamcheck">
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle><application>pg_amcheck</application></refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>pg_amcheck</refname>
+ <refpurpose>checks for corruption in one or more <productname>PostgreSQL</productname> databases</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_amcheck</command>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+ <arg rep="repeat"><replaceable>dbname</replaceable></arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <application>pg_amcheck</application> supports running
+ <xref linkend="amcheck"/>'s corruption checking functions against one or more
+ databases, with options to select which schemas, tables and indexes to check,
+ which kinds of checking to perform, and whether to perform the checks in
+ parallel, and if so, the number of parallel connections to establish and use.
+ </para>
+
+ <para>
+ Only table relations and btree indexes are currently supported. Other
+ relation types are silently skipped.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Usage</title>
+
+ <refsect2>
+ <title>Parallelism Options</title>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --jobs=20 --all</literal></term>
+ <listitem>
+ <para>
+ Check all databases one after another, but for each database checked,
+ use up to 20 simultaneous connections to check relations in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --jobs=8 mydb yourdb</literal></term>
+ <listitem>
+ <para>
+ Check databases <literal>mydb</literal> and <literal>yourdb</literal>
+ one after another, using up to 8 simultaneous connections to check
+ relations in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Checking Option Specification</title>
+
+ <para>
+ If no checking options are specified, by default all table relation checks
+ and default level btree index checks are performed. A variety of options
+ exist to change the set of checks performed on whichever relations are
+ being checked. They are briefly mentioned here in the following examples,
+ but see their full descriptions below.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --parent-check --heapallindexed</literal></term>
+ <listitem>
+ <para>
+ For each btree index checked, performs more extensive checks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --exclude-toast-pointers</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, do not check toast pointers against
+ the toast relation.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --on-error-stop</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, do not continue checking pages after
+ the first page where corruption is encountered.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --skip="all-frozen"</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, skips over blocks marked as all
+ frozen. Note that "all-visible" may also be specified.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --startblock=3000 --endblock=4000</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, check only blocks in the given block
+ range.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Relation Specification</title>
+
+ <para>
+ If no relations are explicitly listed, by default all relations will be
+ checked, but there are options to specify which relations to check.
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck -r mytable -r yourtable</literal></term>
+ <listitem>
+ <para>
+ If one or more relations are explicitly given, they are interpreted as
+ an exhaustive list of all relations to be checked, with one caveat:
+ for all such relations, associated toast relations and indexes are by
+ default included in the list of relations to check.
+ </para>
+ <para>
+ Assuming <literal>mytable</literal> is an ordinary table, and that it
+ is indexed by <literal>mytable_idx</literal> and has an associated
+ toast table <literal>pg_toast_12345</literal>, checking will be
+ performed on <literal>mytable</literal>,
+ <literal>mytable_idx</literal>, and <literal>pg_toast_12345</literal>.
+ </para>
+ <para>
+ Likewise for <literal>yourtable</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r mytable --no-dependents</literal></term>
+ <listitem>
+ <para>
+ This restricts the list of relations checked to just
+ <literal>mytable</literal>, without pulling in the corresponding
+ indexes or toast, but see also
+ <option>--exclude-toast-pointers</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -t mytable -i myindex</literal></term>
+ <listitem>
+ <para>
+ The <option>-r</option> (<option>--relation</option>) will match any
+ relation, but <option>-t</option> (<option>--table</option>) and
+ <option>-i</option> (<option>--index</option>) may be used to avoid
+ matching objects of the other type.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r="my*" -R="mytemp*"</literal></term>
+ <listitem>
+ <para>
+ Relations may be included (<option>-r</option>) or excluded
+ (<option>-R</option>) using shell-style patterns.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r="my*" -I="myanmar"</literal></term>
+ <listitem>
+ <para>
+ Table and index inclusion and exclusion patterns may be used
+ equivalently with <option>-t</option>, <option>-T</option>,
+ <option>-i</option> and <option>-I</option>. The above example checks
+ all tables and indexes starting with <literal>my</literal> except for
+ indexes starting with <literal>myanmar</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -R="india" -T="laos" -I="myanmar"</literal></term>
+ <listitem>
+ <para>
+ Unlike specifying one ore more <option>--relation</option> options, which
+ disables the default behavior of checking all relations, specifying one or
+ more of <option>-R</option>, <option>-T</option> or <option>-I</option> does not.
+ The above command will check all relations not named
+ <literal>india</literal>, not a table named
+ <literal>laos</literal>, nor an index named <literal>myanmar</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Schema Specification</title>
+
+ <para>
+ If no schemas are explicitly listed, by default all schemas except
+ <literal>pg_catalog</literal> and <literal>pg_toast</literal> will be
+ checked.
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck -s s1 -s s2 -r mytable</literal></term>
+ <listitem>
+ <para>
+ If one or more schemas are listed with <option>-s</option>, unqualified
+ relation names will be checked only in the given schemas. The above
+ command will check tables <literal>s1.mytable</literal> and
+ <literal>s2.mytable</literal> but not tables named
+ <literal>mytable</literal> in other schemas.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -S s1 -S s2 -r mytable</literal></term>
+ <listitem>
+ <para>
+ As with relations, schemas may be excluded. The above command will
+ check any table named <literal>mytable</literal> not in schemas
+ <literal>s1</literal> and <literal>s2</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -S s1 -S s2 -r mytable -t s1.stuff</literal></term>
+ <listitem>
+ <para>
+ Relations may be included or excluded with a schema-qualified name
+ without interference from the <option>-s</option> or
+ <option>-S</option> options. Even though schema <literal>s1</literal>
+ has been excluded, the table <literal>s1.stuff</literal> will be
+ checked.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Database Specification</title>
+
+ <para>
+ If no databases are explicitly listed, the database to check is obtained
+ from environment variables in the usual way. Otherwise, when one or more
+ databases are explicitly given, they are interpreted as an exhaustive list
+ of all databases to be checked. This list of databases to check may
+ contain patterns, but because any such patterns need to be reconciled
+ against a list of all databases to find the matching database names, at
+ least one database specified must be a literal database name and not merely
+ a pattern, and the one so specified must be in a location where
+ <application>pg_amcheck</application> expects to find it.
+ </para>
+ <para>
+ For example:
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --all --maintenance-db=foo</literal></term>
+ <listitem>
+ <para>
+ If the <option>--maintenance-db</option> option is given, it will be
+ used to look up the matching databases, though it will not itself be
+ added to the list of databases for checking.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck foo bar baz</literal></term>
+ <listitem>
+ <para>
+ Otherwise, if one or more plain database name arguments not preceded by
+ <option>-d</option> or <option>--dbname</option> are given, the first
+ one will be used for this purpose, and it will also be included in the
+ list of databases to check.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -d foo -d bar baz</literal></term>
+ <listitem>
+ <para>
+ If a mixture of plain database names and databases preceded with
+ <option>-d</option> or <option>--dbname</option> are given, the first
+ plain database name will be used for this purpose. In the above
+ example, <literal>baz</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --dbname=foo --dbname="bar*"</literal></term>
+ <listitem>
+ <para>
+ Otherwise, if one or more databases are given with the
+ <option>-d</option> or <option>--dbname</option> option, the first one
+ will be used and must be a literal database name. In this example,
+ <literal>foo</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --relation="accounts_*.*.*"</literal></term>
+ <listitem>
+ <para>
+ Otherwise, the environment will be consulted for the database to be
+ used. In the example above, the default database will be queried to
+ find all databases with names that begin with
+ <literal>accounts_</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ As discussed above for schema-qualified relations, a database-qualified
+ relation name or pattern may also be given.
+<programlisting>
+pg_amcheck mydb \
+ --schema="t*" \
+ --exclude-schema="tmp*" \
+ --relation=baz \
+ --relation=bar.baz \
+ --relation=foo.bar.baz \
+ --relation="f*".a.b \
+ --exclude-relation=foo.a.b
+</programlisting>
+ will check relations in database <literal>mydb</literal> using the schema
+ resolution rules discussed above, but additionally will check all relations
+ named <literal>a.b</literal> in all databases with names starting with
+ <literal>f</literal> except database <literal>foo</literal>.
+ </para>
+
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Options</title>
+
+ <para>
+ <application>pg_amcheck</application> accepts the following command-line arguments:
+ </para>
+
+ <refsect2>
+ <title>Help and Version Information Options</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_amcheck</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Print the <application>pg_amcheck</application> version and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-e</option></term>
+ <term><option>--echo</option></term>
+ <listitem>
+ <para>
+ Print to stdout all commands and queries being executed against the
+ server.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-q</option></term>
+ <term><option>--quiet</option></term>
+ <listitem>
+ <para>
+ Do not write additional messages beyond those about corruption.
+ </para>
+ <para>
+ This option does not quiet any output specifically due to the use of
+ the <option>-e</option> <option>--echo</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Increases the log level verbosity. This option may be given more than
+ once.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Database Connection and Concurrent Connection Options</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-h</option></term>
+ <term><option>--host=HOSTNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server is running.
+ If the value begins with a slash, it is used as the directory for the
+ Unix domain socket.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-p</option></term>
+ <term><option>--port=PORT</option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file extension on
+ which the server is listening for connections.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-U</option></term>
+ <term><option>--username=USERNAME</option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires password
+ authentication and a password is not available by other means such as
+ a <filename>.pgpass</filename> file, the connection attempt will fail.
+ This option can be useful in batch jobs and scripts where no user is
+ present to enter a password.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_amcheck</application> to prompt for a password
+ before connecting to a database.
+ </para>
+ <para>
+ This option is never essential, since
+ <application>pg_amcheck</application> will automatically prompt for a
+ password if the server demands password authentication. However,
+ <application>pg_amcheck</application> will waste a connection attempt
+ finding out that the server wants a password. In some cases it is
+ worth typing <option>-W</option> to avoid the extra connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--maintenance-db=DBNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to connect to when querying the
+ list of all databases. If not specified, the
+ <literal>postgres</literal> database will be used; if that does not
+ exist <literal>template1</literal> will be used. This can be a
+ <link linkend="libpq-connstring">connection string</link>. If so,
+ connection string parameters will override any conflicting command
+ line options.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-j</option></term>
+ <term><option>--jobs=NUM</option></term>
+ <listitem>
+ <para>
+ Use the specified number of concurrent connections to the server, or
+ one per object to be checked, whichever number is smaller.
+ </para>
+ <para>
+ When used in conjunction with the <option>-a</option>
+ <option>--all</option> option, the total number of objects to check,
+ and correspondingly the number of concurrent connections to use, is
+ recalculated per database. If the number of objects to check differs
+ from one database to the next and is less than the concurrency level
+ specified, the number of concurrent connections open to the server
+ will fluctuate to meet the needs of each database processed.
+ </para>
+ <para>
+ The default is to use a single connection.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Options Controlling Index Checking Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-P</option></term>
+ <term><option>--parent-check</option></term>
+ <listitem>
+ <para>
+ For each btree index checked, use <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> function, which performs
+ additional checks of parent/child relationships during index checking.
+ </para>
+ <para>
+ The default is to use <application>amcheck</application>'s
+ <function>bt_index_check</function> function, but note that use of the
+ <option>--rootdescend</option> option implicitly
+ selects <function>bt_index_parent_check</function>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-H</option></term>
+ <term><option>--heapallindexed</option></term>
+ <listitem>
+ <para>
+ For each index checked, verify the presence of all heap tuples as index
+ tuples in the index using <application>amcheck</application>'s
+ <option>heapallindexed</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--rootdescend</option></term>
+ <listitem>
+ <para>
+ For each index checked, re-find tuples on the leaf level by performing
+ a new search from the root page for each tuple using
+ <xref linkend="amcheck"/>'s <option>rootdescend</option> option.
+ </para>
+ <para>
+ Use of this option implicitly also selects the <option>-P</option>
+ <option>--parent-check</option> option.
+ </para>
+ <para>
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited use or even
+ of no use in helping detect the kinds of corruption that occur in
+ practice. It may also cause corruption checking to take considerably
+ longer and consume considerably more resources on the server.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Options Controlling Table Checking Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>--exclude-toast-pointers</option></term>
+ <listitem>
+ <para>
+ When checking main relations, do not look up entries in toast tables
+ corresponding to toast pointers in the main releation.
+ </para>
+ <para>
+ The default behavior checks each toast pointer encountered in the main
+ table to verify, as much as possible, that the pointer points at
+ something in the toast table that is reasonable. Toast pointers which
+ point beyond the end of the toast table, or to the middle (rather than
+ the beginning) of a toast entry, are identified as corrupt.
+ </para>
+ <para>
+ The process by which <xref linkend="amcheck"/>'s
+ <function>verify_heapam</function> function checks each toast pointer
+ is slow and may be improved in a future release. Some users may wish
+ to disable this check to save time.
+ </para>
+ <para>
+ Note that, despite their similar names, this option is unrelated to the
+ <option>--exclude-toast</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--on-error-stop</option></term>
+ <listitem>
+ <para>
+ After reporting all corruptions on the first page of a table where
+ corruptions are found, stop processing that table relation and move on
+ to the next table or index.
+ </para>
+ <para>
+ Note that index checking always stops after the first corrupt page.
+ This option only has meaning relative to table relations.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--skip=OPTION</option></term>
+ <listitem>
+ <para>
+ If <literal>"all-frozen"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all frozen.
+ </para>
+ <para>
+ If <literal>"all-visible"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all visible.
+ </para>
+ <para>
+ By default, no pages are skipped.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--startblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) pages prior to the given starting block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--endblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) all pages after the given ending block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Corruption Checking Target Options</title>
+
+ <para>
+ Objects to be checked may span schemas in more than one database. Options
+ for restricting the list of databases, schemas, tables and indexes are
+ described below. In each place where a name may be specified, a
+ <link linkend="app-psql-patterns"><replaceable class="parameter">pattern</replaceable></link>
+ may also be used.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><option>--all</option></term>
+ <listitem>
+ <para>
+ Perform checking in all databases.
+ </para>
+ <para>
+ In the absence of any other options, selects all objects across all
+ schemas and databases.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-db</option> takes
+ precedence over <option>--all</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-d</option></term>
+ <term><option>--dbname</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for checking. By default, all objects in
+ the matching database(s) will be checked.
+ </para>
+ <para>
+ If no <option>maintenance-db</option> argument is given nor is any
+ database name given as a command line argument, the first argument
+ specified with <option>-d</option> <option>--dbname</option> will be
+ used for the initial connection. If that argument is not a literal
+ database name, the attempt to connect will fail.
+ </para>
+ <para>
+ If <option>--all</option> is also specified, <option>-d</option>
+ <option>--dbname</option> does not affect which databases are checked,
+ but may be used to specify the database for the initial connection.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-db</option> takes
+ precedence over <option>-d</option> <option>--dbname</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--dbname=africa</literal></member>
+ <member><literal>--dbname="a*"</literal></member>
+ <member><literal>--dbname="africa|asia|europe"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-D</option></term>
+ <term><option>--exclude-db</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for exclusion.
+ </para>
+ <para>
+ If a database which is included using <option>--all</option> or
+ <option>-d</option> <option>--dbname</option> is also excluded using
+ <option>-D</option> <option>--exclude-db</option>, the database will be
+ excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--exclude-db=america</literal></member>
+ <member><literal>--exclude-db="*pacific*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--schema</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified schema(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for checking. By default, all objects in
+ the matching schema(s) will be checked.
+ </para>
+ <para>
+ Option <option>-S</option> <option>--exclude-schema</option> takes
+ precedence over <option>-s</option> <option>--schema</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--schema=corp</literal></member>
+ <member><literal>--schema="corp|llc|npo"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-S</option></term>
+ <term><option>--exclude-schema</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified schema.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for exclusion.
+ </para>
+ <para>
+ If a schema which is included using
+ <option>-s</option> <option>--schema</option> is also excluded using
+ <option>-S</option> <option>--exclude-schema</option>, the schema will be
+ excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>-S corp -S llc</literal></member>
+ <member><literal>--exclude-schema="*c*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-r</option></term>
+ <term><option>--relation</option></term>
+ <listitem>
+ <para>
+ Perform checking on the specified relation(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ relation (or relation pattern) for checking.
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>.
+ </para>
+ <para>
+ If the relation is not schema qualified, database and schema
+ inclusion/exclusion lists will determine in which databases or schemas
+ matching relations will be checked.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--relation=accounts_idx</literal></member>
+ <member><literal>--relation="llc.accounts_idx"</literal></member>
+ <member><literal>--relation="asia|africa.corp|llc.accounts_idx"</literal></member>
+ </simplelist>
+ </para>
+ <para>
+ The first example, <literal>--relation=accounts_idx</literal>, checks
+ relations named <literal>accounts_idx</literal> in all selected schemas
+ and databases.
+ </para>
+ <para>
+ The second example, <literal>--relation="llc.accounts_idx"</literal>,
+ checks relations named <literal>accounts_idx</literal> in schema
+ <literal>llc</literal> in all selected databases.
+ </para>
+ <para>
+ The third example,
+ <literal>--relation="asia|africa.corp|llc.accounts_idx"</literal>,
+ checks relations named <literal>accounts_idx</literal> in
+ schemas <literal>corp</literal> and <literal>llc</literal> in databases
+ <literal>asia</literal> and <literal>africa</literal>.
+ </para>
+ <para>
+ Note that if a database is implicated in a relation pattern, such as
+ <literal>asia</literal> and <literal>africa</literal> in the third
+ example above, the database need not be otherwise given in the command
+ arguments for the relation to be checked. As an extreme example of
+ this:
+ <simplelist>
+ <member><literal>pg_amcheck --relation="*.*.*" mydb</literal></member>
+ </simplelist>
+ will check all relations in all databases. The <literal>mydb</literal>
+ argument only serves to tell <application>pg_amcheck</application> the
+ name of the database to use for querying the list of all databases.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-R</option></term>
+ <term><option>--exclude-relation</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified relation(s).
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>,
+ <option>-t</option> <option>--table</option> and <option>-i</option>
+ <option>--index</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t</option></term>
+ <term><option>--table</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified tables(s). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T</option></term>
+ <term><option>--exclude-table</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified tables(s). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-i</option></term>
+ <term><option>--index</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified index(es). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-I</option></term>
+ <term><option>--exclude-index</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified index(es). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-dependents</option></term>
+ <listitem>
+ <para>
+ When calculating the list of objects to be checked, do not automatically
+ expand the list to include associated indexes and toast tables of
+ elements otherwise in the list.
+ </para>
+ <para>
+ By default, for each main table relation checked, any associated toast
+ table and all associated indexes are also checked, unless explicitly
+ excluded.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+
+ <para>
+ <application>pg_amcheck</application> is designed to work with
+ <productname>PostgreSQL</productname> 14.0 and later.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Author</title>
+
+ <para>
+ Mark Dilger <email>mark.dilger@enterprisedb.com</email>
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="amcheck"/></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index ea3af48777..49ad558b74 100644
--- a/src/tools/msvc/Install.pm
+++ b/src/tools/msvc/Install.pm
@@ -18,7 +18,7 @@ our (@ISA, @EXPORT_OK);
@EXPORT_OK = qw(Install);
my $insttype;
-my @client_contribs = ('oid2name', 'pgbench', 'vacuumlo');
+my @client_contribs = ('oid2name', 'pg_amcheck', 'pgbench', 'vacuumlo');
my @client_program_files = (
'clusterdb', 'createdb', 'createuser', 'dropdb',
'dropuser', 'ecpg', 'libecpg', 'libecpg_compat',
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 7be6e6c9e5..53fbfa012e 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 4d0d09a5dd..26920cc512 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -336,6 +336,8 @@ CheckPointStmt
CheckpointStatsData
CheckpointerRequest
CheckpointerShmemStruct
+CheckType
+CheckTypeFilter
Chromosome
CkptSortItem
CkptTsStatus
@@ -2847,6 +2849,8 @@ ambuildempty_function
ambuildphasename_function
ambulkdelete_function
amcanreturn_function
+amcheckObjects
+amcheckOptions
amcostestimate_function
amendscan_function
amestimateparallelscan_function
--
2.21.1 (Apple Git-122.3)
v37-0004-Extending-PostgresNode-to-test-corruption.patchapplication/octet-stream; name=v37-0004-Extending-PostgresNode-to-test-corruption.patch; x-unix-mode=0644Download
From 77b423ab2349af4435f414ea3ddc7388b2ce7984 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:37:58 -0800
Subject: [PATCH v37 4/4] Extending PostgresNode to test corruption.
PostgresNode now has functions for overwriting relation files
with full or partial prior versions of those files, creating
corruption beyond merely twiddling the bits of a heap relation
file.
Adding a regression test for pg_amcheck based on this new
functionality.
---
contrib/pg_amcheck/t/006_relfile_damage.pl | 135 +++++++++
src/test/modules/Makefile | 1 +
src/test/modules/corruption/Makefile | 16 ++
.../modules/corruption/t/001_corruption.pl | 83 ++++++
src/test/perl/PostgresNode.pm | 261 ++++++++++++++++++
5 files changed, 496 insertions(+)
create mode 100644 contrib/pg_amcheck/t/006_relfile_damage.pl
create mode 100644 src/test/modules/corruption/Makefile
create mode 100644 src/test/modules/corruption/t/001_corruption.pl
diff --git a/contrib/pg_amcheck/t/006_relfile_damage.pl b/contrib/pg_amcheck/t/006_relfile_damage.pl
new file mode 100644
index 0000000000..d997db5b63
--- /dev/null
+++ b/contrib/pg_amcheck/t/006_relfile_damage.pl
@@ -0,0 +1,135 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 27;
+use PostgresNode;
+
+my ($node, $port);
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create a table with a btree index. Use a fillfactor for the table and index
+# that will allow some fraction of updates to be on the original pages and some
+# on new pages.
+#
+$node->safe_psql('postgres', qq(
+create schema t;
+create table t.t1 (id integer, t text) with (fillfactor=75);
+alter table t.t1 alter column t set storage external;
+insert into t.t1 select gs, repeat('x',gs) from generate_series(9990,10000) gs;
+create index t1_idx on t.t1 (id) with (fillfactor=75);
+));
+
+my $toastrel = relation_toast('postgres', 't.t1');
+
+# Flush relation files to disk and take snapshots of the toast and index
+#
+$node->restart;
+$node->take_relfile_snapshot_minimal('postgres', 'idx', 't.t1_idx');
+$node->take_relfile_snapshot_minimal('postgres', 'toast', $toastrel);
+
+# Insert new data into the table and index
+#
+$node->safe_psql('postgres', qq(
+insert into t.t1 select gs, repeat('y',gs) from generate_series(10001,10100) gs;
+));
+
+# Revert index. The reverted snapshot file is not corrupt, but it also
+# does not match the current contents of the table.
+#
+$node->stop;
+$node->revert_to_snapshot('idx');
+
+# Restart the node and check table and index with varying options.
+#
+$node->start;
+
+# Checks which do not reconcile the index and table via --heapallindexed will
+# not notice any problems
+#
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--parent-check' ],
+ qr/^$/,
+ 'pg_amcheck reverted index with --parent-check');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--rootdescend' ],
+ qr/^$/,
+ 'pg_amcheck reverted index with --rootdescend');
+
+# Checks which do reconcile the index and table via --heapallindexed will
+# notice the mismatch in their contents
+#
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed' ],
+ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/,
+ 'pg_amcheck reverted index with --heapallindexed');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed', '--rootdescend' ],
+ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/,
+ 'pg_amcheck reverted index with --heapallindexed --rootdescend');
+
+# Revert the toast. The reverted toast table is not corrupt, but it does not
+# have entries for all toast pointers in the main table
+#
+$node->stop;
+$node->revert_to_snapshot('toast');
+
+# Restart the node and check table and toast with varying options. When
+# checking the toast pointers, we may get errors produced by verify_heapam, but
+# we may also get errors from failure to read toast blocks that are beyond the
+# end of the toast table, of the form /ERROR: could not read block/. To avoid
+# having a brittle test, we accept any error message.
+#
+$node->start;
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', $toastrel ],
+ qr/^$/,
+ 'pg_amcheck reverted toast table');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--exclude-toast-pointers' ],
+ qr/^$/,
+ 'pg_amcheck with reverted toast using --exclude-toast-pointers');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck with reverted toast and default checking');
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 5391f461a2..c92d1702b4 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -7,6 +7,7 @@ include $(top_builddir)/src/Makefile.global
SUBDIRS = \
brin \
commit_ts \
+ corruption \
delay_execution \
dummy_index_am \
dummy_seclabel \
diff --git a/src/test/modules/corruption/Makefile b/src/test/modules/corruption/Makefile
new file mode 100644
index 0000000000..ba461c645d
--- /dev/null
+++ b/src/test/modules/corruption/Makefile
@@ -0,0 +1,16 @@
+# src/test/modules/corruption/Makefile
+
+# EXTRA_INSTALL = contrib/pg_amcheck
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/corruption
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/corruption/t/001_corruption.pl b/src/test/modules/corruption/t/001_corruption.pl
new file mode 100644
index 0000000000..ae4a262e06
--- /dev/null
+++ b/src/test/modules/corruption/t/001_corruption.pl
@@ -0,0 +1,83 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 10;
+use PostgresNode;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create something non-trivial for the first snapshot
+$node->safe_psql('postgres', qq(
+create table t1 (id integer, short_text text, long_text text);
+insert into t1 (id, short_text, long_text)
+ (select gs, 'foo', repeat('x', gs)
+ from generate_series(1,10000) gs);
+create unique index idx1 on t1 (id, short_text);
+vacuum freeze;
+));
+
+# Flush relation files to disk and take snapshot of them
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap1', 'public.t1');
+
+# Update data in the table, toast table, and index
+$node->safe_psql('postgres', qq(
+update t1 set
+ short_text = 'bar',
+ long_text = repeat('y', id);
+));
+
+# Flush relation files to disk and take second snapshot
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap2', 'public.t1');
+
+# Revert the first page of t1 using a torn snapshot. This should be a partial
+# and corrupt reverting of the update.
+$node->stop;
+$node->revert_to_torn_relfile_snapshot('snap1', 8192);
+
+# Restart the node and count the number of rows in t1 with the original
+# (pre-update) values. It should not be zero, but nor will it be the full
+# 10000.
+$node->start;
+my ($old, $new, $oldtoast, $newtoast) = counts();
+ok($old > 0 && $old < 10000, "Torn snapshot reverts some of the main updates");
+ok($new > 0 && $new <= 10000, "Torn snapshot retains some of the main updates");
+
+# Revert t1 fully to the first snapshot. This should fully restore the
+# original (pre-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap1');
+
+# Restart the node and verify only old values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 10000, "Full snapshot restores all the old main values");
+is($oldtoast, 10000, "Full snapshot restores all the old toast values");
+is($new, 0, "Full snapshot reverts all the new main values");
+is($newtoast, 0, "Full snapshot reverts all the new toast values");
+
+# Restore t1 fully to the second snapshot. This should fully restore the
+# new (post-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap2');
+
+# Restart the node and verify only new values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 0, "Full snapshot reverts all the old main values");
+is($oldtoast, 0, "Full snapshot reverts all the old toast values");
+is($new, 10000, "Full snapshot restores all the new main values");
+is($newtoast, 10000, "Full snapshot restores all the new toast values");
+
+sub counts {
+ return map {
+ $node->safe_psql('postgres', qq(select count(*) from t1 where $_))
+ } ("short_text = 'foo'",
+ "short_text = 'bar'",
+ "long_text ~ 'x'",
+ "long_text ~ 'y'");
+}
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..d470af93c5 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2225,6 +2225,267 @@ sub pg_recvlogical_upto
=back
+=head1 DATABASE CORRUPTION METHODS
+
+=over
+
+=item $node->relfile_snapshot_repository()
+
+The path to the parent directory of all directories storing snapshots of
+relation backing files.
+
+=cut
+
+sub relfile_snapshot_repository
+{
+ my ($self) = @_;
+ my $snaprepo = join('/', $self->basedir, 'snapshot');
+ unless (-d $snaprepo)
+ {
+ mkdir $snaprepo
+ or $!{EEXIST}
+ or BAIL_OUT("could not create snapshot repository directory \"$snaprepo\": $!");
+ }
+ return $snaprepo;
+}
+
+=pod
+
+=item $node->relfile_snapshot_directory(snapname)
+
+The path to the directory for storing the named snapshot.
+
+=cut
+
+sub relfile_snapshot_directory
+{
+ my ($self, $snapname) = @_;
+
+ join("/", $self->relfile_snapshot_repository(), $snapname);
+}
+
+=pod
+
+=item $node->take_relfile_snapshot($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relname>, the associated
+toast relations (if any), and all associated indexes (if any). No attempt is
+made to flush these files to disk, meaning the snapshot taken could be stale
+unless the caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relations.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+=pod
+
+=item $node->take_relfile_snapshot_minimal($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relnames>. No attempt is made
+to flush these files to disk, meaning the snapshot taken could be stale unless the
+caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relation.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+sub take_relfile_snapshot
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 1, @relnames);
+}
+
+sub take_relfile_snapshot_minimal
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 0, @relnames);
+}
+
+sub take_relfile_snapshot_helper
+{
+ my ($self, $dbname, $snapname, $extended, @relnames) = @_;
+
+ croak "dbname must be specified" unless defined $dbname;
+ croak "relnames must be defined" unless scalar(grep { defined $_ } @relnames);
+ croak "snapname must be specified" unless defined $snapname;
+ croak "snapname must be unique" if exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snapdir = $self->relfile_snapshot_directory($snapname);
+ croak "snapname directory name already in use: $snapdir" if (-e $snapdir);
+ mkdir $snapdir
+ or BAIL_OUT("could not create snapshot directory \"$snapdir\": $!");
+
+ my @relpaths = map {
+ $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$_')));
+ } @relnames;
+
+ my (@toastpaths, @idxpaths);
+ if ($extended)
+ {
+ for my $relname (@relnames)
+ {
+ push (@toastpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(c.reltoastrelid)
+ FROM pg_catalog.pg_class c
+ WHERE c.oid = '$relname'::regclass
+ AND c.reltoastrelid != 0::oid))));
+ push (@idxpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(i.indexrelid)
+ FROM pg_catalog.pg_index i
+ WHERE i.indrelid = '$relname'::regclass))));
+ }
+ }
+
+ $self->{snapshot}->{$snapname} = {};
+ for my $path (@relpaths, grep { defined($_) } @toastpaths, @idxpaths)
+ {
+ croak "file backing relation is missing: $pgdata/$path" unless -f "$pgdata/$path";
+ copy_file($snapdir, $pgdata, 0, $path);
+ $self->{snapshot}->{$snapname}->{$path} = 1;
+ }
+}
+
+=pod
+
+=item $node->revert_to_snapshot($self, $snapname)
+
+Overwrites the database's relation files with files previously saved in
+B<$snapname>.
+
+Dies if the given B<$snapname> does not exist.
+
+=cut
+
+=pod
+
+=item $node->revert_to_torn_relfile_snapshot($self, $snapname, $bytes)
+
+Partially overwrites the database's relation files using prefixes of the given
+number of bytes from the files saved in B<$snapname>. If B<$bytes> is
+negative, uses suffixes of the given byte length rather than prefixes.
+
+If B<$bytes> is null, replaces the database's relation files using the saved
+files in the B<$snapname>, which unlike for non-undef values, means the file
+may become shorter if the saved file is shorter than the current file.
+
+=cut
+
+sub revert_to_snapshot
+{
+ my ($self, $snapname) = @_;
+ $self->revert_to_torn_relfile_snapshot($snapname, undef);
+}
+
+sub revert_to_torn_relfile_snapshot
+{
+ my ($self, $snapname, $bytes) = @_;
+
+ croak "no such snapshot" unless exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snaprepo = join('/', $self->relfile_snapshot_repository, $snapname);
+ croak "snapname directory missing: $snaprepo" unless (-d $snaprepo);
+
+ if (defined $bytes)
+ {
+ tear_file($pgdata, $snaprepo, $bytes, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+ else
+ {
+ copy_file($pgdata, $snaprepo, 1, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+}
+
+sub copy_file
+{
+ my ($dstdir, $srcdir, $overwrite, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ foreach my $part (split(m{/}, $path))
+ {
+ my $srcpart = "$srcdir/$part";
+ my $dstpart = "$dstdir/$part";
+
+ if (-d $srcpart)
+ {
+ $srcdir = $srcpart;
+ $dstdir = $dstpart;
+ die "$dstdir is in the way" if (-e $dstdir && ! -d $dstdir);
+ unless (-d $dstdir)
+ {
+ mkdir $dstdir
+ or BAIL_OUT("could not create directory \"$dstdir\": $!");
+ }
+ }
+ elsif (-f $srcpart)
+ {
+ die "$dstdir/$part is in the way" if (!$overwrite && -e "$dstdir/$part");
+
+ File::Copy::copy($srcpart, "$dstdir/$part");
+ }
+ }
+}
+
+sub tear_file
+{
+ my ($dstdir, $srcdir, $bytes, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ my $srcfile = "$srcdir/$path";
+ my $dstfile = "$dstdir/$path";
+
+ croak "No such file: $srcfile" unless -f $srcfile;
+ croak "No such file: $dstfile" unless -f $dstfile;
+
+ my ($srcfh, $dstfh);
+ open($srcfh, '<', $srcfile) or die "Cannot read $srcfile: $!";
+ open($dstfh, '+<', $dstfile) or die "Cannot modify $dstfile: $!";
+ binmode($srcfh);
+ binmode($dstfh);
+
+ my $buffer;
+ if ($bytes < 0)
+ {
+ $bytes *= -1; # Easier to use positive value
+ my $srcsize = (stat($srcfh))[7];
+ my $offset = $srcsize - $bytes;
+ seek($srcfh, $offset, 0);
+ seek($dstfh, $offset, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+ else
+ {
+ seek($srcfh, 0, 0);
+ seek($dstfh, 0, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+
+ close($srcfh);
+ close($dstfh);
+}
+
+=pod
+
+=back
+
=cut
1;
--
2.21.1 (Apple Git-122.3)
On Thu, Feb 4, 2021 at 11:10 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
I also made changes to clean up 0003 (formerly numbered 0004)
"deduplice" is a typo.
I'm not sure that I agree with check_each_database()'s commentary
about why it doesn't make sense to optimize the resolve-the-databases
step. Like, suppose I type 'pg_amcheck sasquatch'. I think the way you
have it coded it's going to tell me that there are no databases to
check, which might make me think I used the wrong syntax or something.
I want it to tell me that sasquatch does not exist. If I happen to be
a cryptid believer, I may reject that explanation as inaccurate, but
at least there's no question about what pg_amcheck thinks the problem
is.
Why does check_each_database() go out of its way to run the main query
without the always-secure search path? If there's a good reason, I
think it deserves a comment saying what the reason is. If there's not
a good reason, then I think it should use the always-secure search
path for 100% of everything. Same question applies to
check_one_database().
ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler, sql.data)
could stand to be split over two lines, like you do for the nearly
run_command() call, so that it doesn't go past 80 columns.
I suggest having two variables instead of one for amcheck_schema.
Using the same variable to store the unescaped value and then later
the escaped value is, IMHO, confusing. Whatever you call the escaped
version, I'd rename the function parameters elsewhere to match.
"status = PQsendQuery(conn, sql) == 1" seems a bit uptight to me. Why
not just make status an int and then just "status = PQsendQuery(conn,
sql)" and then test for status != 0? I don't really care if you don't
change this, it's not actually important. But personally I'd rather
code it as if any non-zero value meant success.
I think the pg_log_error() in run_command() could be worded a bit
better. I don't think it's a good idea to try to include the type of
object in there like this, because of the translatability guidelines
around assembling messages from fragments. And I don't think it's good
to say that the check failed because the reality is that we weren't
able to ask for the check to be run in the first place. I would rather
log this as something like "unable to send query: %s". I would also
assume we need to bail out entirely if that happens. I'm not totally
sure what sorts of things can make PQsendQuery() fail but I bet it
boils down to having lost the server connection. Should that occur,
trying to send queries for all of the remaining objects is going to
result in repeating the same error many times, which isn't going to be
what anybody wants. It's unclear to me whether we should give up on
the whole operation but I think we have to at least give up on that
connection... unless I'm confused about what the failure mode is
likely to be here.
It looks to me like the user won't be able to tell by the exit code
what happened. What I did with pg_verifybackup, and what I suggest we
do here, is exit(1) if anything went wrong, either in terms of failing
to execute queries or in terms of those queries returning problem
reports. With pg_verifybackup, I thought about trying to make it like
0 => backup OK, 2 => backup not OK, 2 => trouble, but I found it too
hard to distinguish what should be exit(1) and what should be exit(2)
and the coding wasn't trivial either, so I went with the simpler
scheme.
The opening line of appendDatabaseSelect() could be adjusted to put
the regexps parameter on the next line, avoiding awkward wrapping.
If they are being run with a safe search path, the queries in
appendDatabaseSelect(), appendSchemaSelect(), etc. could be run
without all the paranoia. If not, maybe they should be. The casts to
text don't include the paranoia: with an unsafe search path, we need
pg_catalog.text here. Or no cast at all, which seems like it ought to
be fine too. Not quite sure why you are doing all that casting to
text; the datatype is presumably 'name' and ought to collate like
collate "C" which is probably fine.
It would probably be a better idea for appendSchemaSelect to declare a
PQExpBuffer and call initPQExpBuffer just once, and then
resetPQExpBuffer after each use, and finally termPQExpBuffer just
once. The way you have it is not expensive enough to really matter,
but avoiding repeated allocate/free cycles is probably best.
I wonder if a pattern like .foo.bar ends up meaning the same thing as
a pattern like foo.bar, with the empty database name being treated the
same as if nothing were specified.
From the way appendTableCTE() is coded, it seems to me that if I ask
for tables named j* excluding tables named jam* I still might get
toast tables for my jam, which seems wrong.
There does not seem to be any clear benefit to defining CT_TABLE = 0
in this case, so I would let the compiler deal with it. We should not
be depending on that to have any particular numeric value.
Why does pg_amcheck.c have a header file pg_amcheck.h if there's only
one source file? If you had multiple source files then the header
would be a reasonable place to put stuff they all need, but you don't.
Copying the definitions of HEAP_TABLE_AM_OID and BTREE_AM_OID into
pg_amcheck.h or anywhere else seems bad. I think you just be doing
#include "catalog/pg_am_d.h".
I think I'm out of steam for today but I'll try to look at this more
soon. In general I think this patch and the whole series are pretty
close to being ready to commit, even though there are still things I
think need fixing here and there.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
On Thu, Feb 4, 2021 at 11:10 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Numbered 0001 in this next patch set.
Hi,
I committed 0001 as you had it and 0002 with some more cleanups. Things I did:
- Adjusted some comments.
- Changed processQueryResult so that it didn't do foo(bar) with foo
being a pointer. Generally we prefer (*foo)(bar) when it can be
confused with a direct function call, but wunk->foo(bar) is also
considered acceptable.
- Changed the return type of ParallelSlotResultHandler to be bool,
because having it return PGresult * seemed to offer no advantages.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Feb 4, 2021, at 1:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Feb 4, 2021 at 11:10 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:I also made changes to clean up 0003 (formerly numbered 0004)
"deduplice" is a typo.
Fixed.
I'm not sure that I agree with check_each_database()'s commentary
about why it doesn't make sense to optimize the resolve-the-databases
step. Like, suppose I type 'pg_amcheck sasquatch'. I think the way you
have it coded it's going to tell me that there are no databases to
check, which might make me think I used the wrong syntax or something.
I want it to tell me that sasquatch does not exist. If I happen to be
a cryptid believer, I may reject that explanation as inaccurate, but
at least there's no question about what pg_amcheck thinks the problem
is.
The way v38 is coded, 'pg_amcheck sasquatch" will return a non-zero error code with an error message, database "sasquatch" does not exist.
The problem only comes up if you run it like one of the following:
pg_amcheck --maintenance-db postgres sasquatch
pg_amcheck postgres sasquatch
pg_amcheck "sasquatch.myschema.mytable"
In each of those, pg_amcheck first connects to the initial database ("postgres" or whatever) and tries to resolve all databases to check matching patterns like '^(postgres)$' and '^(sasquatch)$' and doesn't find any sasquatch matches, but also doesn't complain.
In v39, this is changed to complain when patterns do not match. This can be turned off with --no-strict-names.
Why does check_each_database() go out of its way to run the main query
without the always-secure search path? If there's a good reason, I
think it deserves a comment saying what the reason is. If there's not
a good reason, then I think it should use the always-secure search
path for 100% of everything. Same question applies to
check_one_database().
That bit of code survived some refactoring, but it doesn't make sense to keep it, assuming it ever made sense at all. Removed in v39. The calls to connectDatabase will always secure the search_path, so pg_amcheck need not touch that directly.
ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler, sql.data)
could stand to be split over two lines, like you do for the nearly
run_command() call, so that it doesn't go past 80 columns.
Fair enough. The code has been treated to a pass through pgindent as well.
I suggest having two variables instead of one for amcheck_schema.
Using the same variable to store the unescaped value and then later
the escaped value is, IMHO, confusing. Whatever you call the escaped
version, I'd rename the function parameters elsewhere to match.
The escaped version is now part of a struct, so there shouldn't be any confusion about this.
"status = PQsendQuery(conn, sql) == 1" seems a bit uptight to me. Why
not just make status an int and then just "status = PQsendQuery(conn,
sql)" and then test for status != 0? I don't really care if you don't
change this, it's not actually important. But personally I'd rather
code it as if any non-zero value meant success.
I couldn't remember why I coded it like that, since it doesn't look like my style, then noticed I copied that from reindexdb.c, upon which this code is patterned. I agree it looks strange, and I've changed it in v39. Unlike the call site in reindexdb, there isn't any reason for pg_amcheck to store the returned value in a variable, so in v39 it doesn't.
I think the pg_log_error() in run_command() could be worded a bit
better. I don't think it's a good idea to try to include the type of
object in there like this, because of the translatability guidelines
around assembling messages from fragments. And I don't think it's good
to say that the check failed because the reality is that we weren't
able to ask for the check to be run in the first place. I would rather
log this as something like "unable to send query: %s". I would also
assume we need to bail out entirely if that happens. I'm not totally
sure what sorts of things can make PQsendQuery() fail but I bet it
boils down to having lost the server connection. Should that occur,
trying to send queries for all of the remaining objects is going to
result in repeating the same error many times, which isn't going to be
what anybody wants. It's unclear to me whether we should give up on
the whole operation but I think we have to at least give up on that
connection... unless I'm confused about what the failure mode is
likely to be here.
Changed in v39 to report the error as you suggest.
It will reconnect and retry a command one time on error. That should cover the case that the connection to the database was merely lost. If the second attempt also fails, no further retry of the same command is attempted, though commands for remaining relation targets will still be attempted, both for the database that had the error and for other remaining databases in the list.
Assuming something is wrong with "db2", the command `pg_amcheck db1 db2 db3` could result in two failures per relation in db2 before finally moving on to db3. That seems pretty awful considering how many relations that could be, but failing to soldier on in the face of errors seems a strange design for a corruption checking tool.
It looks to me like the user won't be able to tell by the exit code
what happened. What I did with pg_verifybackup, and what I suggest we
do here, is exit(1) if anything went wrong, either in terms of failing
to execute queries or in terms of those queries returning problem
reports. With pg_verifybackup, I thought about trying to make it like
0 => backup OK, 2 => backup not OK, 2 => trouble, but I found it too
hard to distinguish what should be exit(1) and what should be exit(2)
and the coding wasn't trivial either, so I went with the simpler
scheme.
In v39, exit(1) is used for all errors which are intended to stop the program. It is important to recognize that finding corruption is not an error in this sense. A query to verify_heapam() can fail if the relation's checksums are bad, and that happens beyond verify_heapam()'s control when the page is not allowed into the buffers. There can be errors if the file backing a relation is missing. There may be other corruption error cases that I have not yet thought about. The connections' errors get reported to the user, but pg_amcheck does not exit as a consequence of them. As discussed above, failing to send the query to the server is not viewed as a reason to exit, either. It would be hard to quantify all the failure modes, but presumably the catalogs for a database could be messed up enough to cause such failures, and I'm not sure that pg_amcheck should just abort.
The opening line of appendDatabaseSelect() could be adjusted to put
the regexps parameter on the next line, avoiding awkward wrapping.If they are being run with a safe search path, the queries in
appendDatabaseSelect(), appendSchemaSelect(), etc. could be run
without all the paranoia. If not, maybe they should be. The casts to
text don't include the paranoia: with an unsafe search path, we need
pg_catalog.text here. Or no cast at all, which seems like it ought to
be fine too. Not quite sure why you are doing all that casting to
text; the datatype is presumably 'name' and ought to collate like
collate "C" which is probably fine.
In v39, everything is being run with a safe search path, and the paranoia and casts are largely gone.
It would probably be a better idea for appendSchemaSelect to declare a
PQExpBuffer and call initPQExpBuffer just once, and then
resetPQExpBuffer after each use, and finally termPQExpBuffer just
once. The way you have it is not expensive enough to really matter,
but avoiding repeated allocate/free cycles is probably best.
I'm not sure what this comment refers to, but this function doesn't exist in v39.
I wonder if a pattern like .foo.bar ends up meaning the same thing as
a pattern like foo.bar, with the empty database name being treated the
same as if nothing were specified.
That's really a question of how patternToSQLRegex parses that string. In general, "a.b.c" => ("^(a)$", "^(b)$", "^(c)$"), so I would expect your example to have a database pattern "^()$" which should only match databases with zero length names, presumably none. I've added a regression test for this, and indeed that's what it does.
From the way appendTableCTE() is coded, it seems to me that if I ask
for tables named j* excluding tables named jam* I still might get
toast tables for my jam, which seems wrong.
In v39, the query is entirely reworked, so I can't respond directly to this, though I agree that excluding a table should mean the toast table does not automatically get included. There is an interaction, though, if you select both "j*' and "pg_toast.*" and then exclude "jam".
There does not seem to be any clear benefit to defining CT_TABLE = 0
in this case, so I would let the compiler deal with it. We should not
be depending on that to have any particular numeric value.
The enum is removed in v39.
Why does pg_amcheck.c have a header file pg_amcheck.h if there's only
one source file? If you had multiple source files then the header
would be a reasonable place to put stuff they all need, but you don't.
Everything is in pg_amcheck.c now.
Copying the definitions of HEAP_TABLE_AM_OID and BTREE_AM_OID into
pg_amcheck.h or anywhere else seems bad. I think you just be doing
#include "catalog/pg_am_d.h".
Good point. Done.
I think I'm out of steam for today but I'll try to look at this more
soon. In general I think this patch and the whole series are pretty
close to being ready to commit, even though there are still things I
think need fixing here and there.
Reworking the code took a while. Version 39 patches attached.
Attachments:
v39-0001-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v39-0001-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 45badbf39001cc60c855864e6531807818eda6f5 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 16 Feb 2021 13:58:40 -0800
Subject: [PATCH v39 1/2] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 29 +
contrib/pg_amcheck/pg_amcheck.c | 1822 ++++++++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 228 +++
contrib/pg_amcheck/t/003_check.pl | 481 ++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 496 ++++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 54 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 1029 +++++++++++
src/tools/msvc/Install.pm | 2 +-
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 4 +
15 files changed, 4162 insertions(+), 4 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..a72dcf7304 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..c21a14de31
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..bc61ee7970
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,29 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+SHLIB_PREREQS = submake-libpq
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..ad75bafa39
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1822 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include "catalog/pg_am_d.h"
+#include "catalog/pg_namespace_d.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "getopt_long.h" /* pgrminclude ignore */
+#include "storage/block.h"
+
+/* pg_amcheck command line options controlled by user flags */
+typedef struct amcheckOptions
+{
+ bool alldb;
+ bool allrel;
+ bool excludetbl;
+ bool excludeidx;
+ bool echo;
+ bool quiet;
+ bool verbose;
+ bool no_dependents;
+ bool no_indexes;
+ bool no_tables;
+ bool no_toast;
+ bool reconcile_toast;
+ bool on_error_stop;
+ bool parent_check;
+ bool rootdescend;
+ bool heapallindexed;
+ bool strict_names;
+ bool show_progress;
+ const char *skip;
+ int jobs;
+ long startblock;
+ long endblock;
+ SimplePtrList include; /* list of PatternInfo structs */
+ SimplePtrList exclude; /* list of PatternInfo structs */
+} amcheckOptions;
+
+static amcheckOptions opts = {
+ .alldb = false,
+ .allrel = true,
+ .excludetbl = false,
+ .excludeidx = false,
+ .echo = false,
+ .quiet = false,
+ .verbose = false,
+ .no_dependents = false,
+ .no_indexes = false,
+ .no_tables = false,
+ .on_error_stop = false,
+ .parent_check = false,
+ .rootdescend = false,
+ .heapallindexed = false,
+ .no_toast = false,
+ .reconcile_toast = true,
+ .strict_names = true,
+ .show_progress = false,
+ .skip = "none",
+ .jobs = 1,
+ .startblock = -1,
+ .endblock = -1,
+ .include = {NULL, NULL},
+ .exclude = {NULL, NULL},
+};
+
+static const char *progname = NULL;
+
+typedef struct PatternInfo
+{
+ int pattern_id; /* Unique ID of this pattern */
+ const char *pattern; /* Unaltered pattern from the command line */
+ char *dbrgx; /* Database regexp parsed from pattern, or
+ * NULL */
+ char *nsprgx; /* Schema regexp parsed from pattern, or NULL */
+ char *relrgx; /* Relation regexp parsed from pattern, or
+ * NULL */
+ bool tblonly; /* true if relrgx should only match tables */
+ bool idxonly; /* true if relrgx should only match indexes */
+ bool matched; /* true if the pattern matched in any database */
+} PatternInfo;
+
+/* Unique pattern id counter */
+static int next_id = 1;
+
+typedef struct DatabaseInfo
+{
+ char *datname;
+ char *amcheck_schema; /* escaped, quoted literal */
+} DatabaseInfo;
+
+typedef struct RelationInfo
+{
+ const DatabaseInfo *datinfo; /* shared by other relinfos */
+ Oid reloid;
+ bool is_table; /* true if heap, false if btree */
+} RelationInfo;
+
+/*
+ * Query for determining if contrib's amcheck is installed. If so, selects the
+ * namespace name where amcheck's functions can be found.
+ */
+static const char *amcheck_sql =
+"SELECT n.nspname, x.extversion"
+"\nFROM pg_catalog.pg_extension x"
+"\nJOIN pg_catalog.pg_namespace n"
+"\nON x.extnamespace OPERATOR(pg_catalog.=) n.oid"
+"\nWHERE x.extname OPERATOR(pg_catalog.=) 'amcheck'";
+
+static void prepare_table_command(PQExpBuffer sql, Oid reloid,
+ const char *nspname);
+static void prepare_btree_command(PQExpBuffer sql, Oid reloid,
+ const char *nspname);
+static void run_command(ParallelSlot *slot, const char *sql,
+ ConnParams *cparams);
+static bool VerifyHeapamSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+static bool VerifyBtreeSlotHandler(PGresult *res, PGconn *conn, void *context);
+static void help(const char *progname);
+static void appendDatabasePattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendSchemaPattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendRelationPattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendTablePattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendIndexPattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void compileDatabaseList(PGconn *conn, SimplePtrList *databases);
+static void compileRelationListOneDb(PGconn *conn, SimplePtrList *relations,
+ const DatabaseInfo *datinfo);
+
+int
+main(int argc, char *argv[])
+{
+ PGconn *conn;
+ SimplePtrListCell *cell;
+ SimplePtrList databases = {NULL, NULL};
+ SimplePtrList relations = {NULL, NULL};
+ bool failed;
+ const char *prev_datname;
+ int parallel_workers;
+ ParallelSlot *slots;
+ PQExpBufferData sql;
+ long long int reltotal;
+ long long int relprogress;
+
+ static struct option long_options[] = {
+ /* Connection options */
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"maintenance-db", required_argument, NULL, 1},
+
+ /* check options */
+ {"all", no_argument, NULL, 'a'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"exclude-dbname", required_argument, NULL, 'D'},
+ {"echo", no_argument, NULL, 'e'},
+ {"heapallindexed", no_argument, NULL, 'H'},
+ {"index", required_argument, NULL, 'i'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"jobs", required_argument, NULL, 'j'},
+ {"parent-check", no_argument, NULL, 'P'},
+ {"quiet", no_argument, NULL, 'q'},
+ {"relation", required_argument, NULL, 'r'},
+ {"exclude-relation", required_argument, NULL, 'R'},
+ {"schema", required_argument, NULL, 's'},
+ {"exclude-schema", required_argument, NULL, 'S'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"exclude-indexes", no_argument, NULL, 2},
+ {"exclude-tables", no_argument, NULL, 3},
+ {"exclude-toast", no_argument, NULL, 4},
+ {"exclude-toast-pointers", no_argument, NULL, 5},
+ {"on-error-stop", no_argument, NULL, 6},
+ {"skip", required_argument, NULL, 7},
+ {"startblock", required_argument, NULL, 8},
+ {"endblock", required_argument, NULL, 9},
+ {"rootdescend", no_argument, NULL, 10},
+ {"no-dependents", no_argument, NULL, 11},
+ {"no-strict-names", no_argument, NULL, 12},
+ {"progress", no_argument, NULL, 13},
+
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ /*
+ * If a maintenance database is specified, that will be used for the
+ * initial connection. Failing that, the first plain argument (without a
+ * flag) will be used. If neither of those are given, the first database
+ * specified with -d.
+ */
+ const char *primary_db = NULL;
+ const char *secondary_db = NULL;
+ const char *tertiary_db = NULL;
+
+ const char *host = NULL;
+ const char *port = NULL;
+ const char *username = NULL;
+ enum trivalue prompt_password = TRI_DEFAULT;
+ int encoding = pg_get_encoding_from_locale(NULL, false);
+ ConnParams cparams;
+
+ pg_logging_init(argv[0]);
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("contrib"));
+
+ handle_help_version_opts(argc, argv, progname, help);
+
+ /* process command-line options */
+ while ((c = getopt_long(argc, argv, "ad:D:eh:Hi:I:j:p:Pqr:R:s:S:t:T:U:wWv",
+ long_options, &optindex)) != -1)
+ {
+ char *endptr;
+
+ switch (c)
+ {
+ case 'a':
+ opts.alldb = true;
+ break;
+ case 'd':
+ if (tertiary_db == NULL)
+ tertiary_db = optarg;
+ appendDatabasePattern(&opts.include, optarg, encoding);
+ break;
+ case 'D':
+ appendDatabasePattern(&opts.exclude, optarg, encoding);
+ break;
+ case 'e':
+ opts.echo = true;
+ break;
+ case 'h':
+ host = pg_strdup(optarg);
+ break;
+ case 'H':
+ opts.heapallindexed = true;
+ break;
+ case 'i':
+ opts.allrel = false;
+ appendIndexPattern(&opts.include, optarg, encoding);
+ break;
+ case 'I':
+ opts.excludeidx = true;
+ appendIndexPattern(&opts.exclude, optarg, encoding);
+ break;
+ case 'j':
+ opts.jobs = atoi(optarg);
+ if (opts.jobs < 1)
+ {
+ pg_log_error("number of parallel jobs must be at least 1");
+ exit(1);
+ }
+ break;
+ case 'p':
+ port = pg_strdup(optarg);
+ break;
+ case 'P':
+ opts.parent_check = true;
+ break;
+ case 'q':
+ opts.quiet = true;
+ break;
+ case 'r':
+ opts.allrel = false;
+ appendRelationPattern(&opts.include, optarg, encoding);
+ break;
+ case 'R':
+ opts.excludeidx = true;
+ opts.excludetbl = true;
+ appendRelationPattern(&opts.exclude, optarg, encoding);
+ break;
+ case 's':
+ opts.allrel = false;
+ appendSchemaPattern(&opts.include, optarg, encoding);
+ break;
+ case 'S':
+ appendSchemaPattern(&opts.exclude, optarg, encoding);
+ break;
+ case 't':
+ opts.allrel = false;
+ appendTablePattern(&opts.include, optarg, encoding);
+ break;
+ case 'T':
+ opts.excludetbl = true;
+ appendTablePattern(&opts.exclude, optarg, encoding);
+ break;
+ case 'U':
+ username = pg_strdup(optarg);
+ break;
+ case 'w':
+ prompt_password = TRI_NO;
+ break;
+ case 'W':
+ prompt_password = TRI_YES;
+ break;
+ case 'v':
+ opts.verbose = true;
+ pg_logging_increase_verbosity();
+ break;
+ case 1:
+ primary_db = pg_strdup(optarg);
+ break;
+ case 2:
+ opts.no_indexes = true;
+ break;
+ case 3:
+ opts.no_tables = true;
+ break;
+ case 4:
+ opts.no_toast = true;
+ break;
+ case 5:
+ opts.reconcile_toast = false;
+ break;
+ case 6:
+ opts.on_error_stop = true;
+ break;
+ case 7:
+ if (pg_strcasecmp(optarg, "all-visible") == 0)
+ opts.skip = "all visible";
+ else if (pg_strcasecmp(optarg, "all-frozen") == 0)
+ opts.skip = "all frozen";
+ else
+ {
+ fprintf(stderr, "invalid skip options");
+ exit(1);
+ }
+ break;
+ case 8:
+ opts.startblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ "relation starting block argument contains garbage characters");
+ exit(1);
+ }
+ if (opts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ "relation starting block argument out of bounds");
+ exit(1);
+ }
+ break;
+ case 9:
+ opts.endblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ "relation ending block argument contains garbage characters");
+ exit(1);
+ }
+ if (opts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ "relation ending block argument out of bounds");
+ exit(1);
+ }
+ break;
+ case 10:
+ opts.rootdescend = true;
+ opts.parent_check = true;
+ break;
+ case 11:
+ opts.no_dependents = true;
+ break;
+ case 12:
+ opts.strict_names = false;
+ break;
+ case 13:
+ opts.show_progress = true;
+ break;
+ default:
+ fprintf(stderr,
+ "Try \"%s --help\" for more information.\n",
+ progname);
+ exit(1);
+ }
+ }
+
+ if (opts.endblock >= 0 && opts.endblock < opts.startblock)
+ {
+ pg_log_error("relation ending block argument precedes starting block argument");
+ exit(1);
+ }
+
+ /* non-option arguments specify database names */
+ while (optind < argc)
+ {
+ if (secondary_db == NULL)
+ secondary_db = argv[optind];
+ appendDatabasePattern(&opts.include, argv[optind], encoding);
+ optind++;
+ }
+
+ /* fill cparams except for dbname, which is set below */
+ cparams.pghost = host;
+ cparams.pgport = port;
+ cparams.pguser = username;
+ cparams.prompt_password = prompt_password;
+ cparams.override_dbname = NULL;
+
+ setup_cancel_handler(NULL);
+
+ /* choose the database for our initial connection */
+ if (primary_db)
+ cparams.dbname = primary_db;
+ else if (secondary_db != NULL)
+ cparams.dbname = secondary_db;
+ else if (tertiary_db != NULL)
+ cparams.dbname = tertiary_db;
+ else
+ {
+ const char *default_db;
+
+ if (getenv("PGDATABASE"))
+ default_db = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ default_db = getenv("PGUSER");
+ else
+ default_db = get_user_name_or_exit(progname);
+
+ /*
+ * Users expect the database name inferred from the environment to get
+ * checked, not just get used for the initial connection.
+ */
+ appendDatabasePattern(&opts.include, default_db, encoding);
+
+ cparams.dbname = default_db;
+ }
+
+ conn = connectMaintenanceDatabase(&cparams, progname, opts.echo);
+ compileDatabaseList(conn, &databases);
+ disconnectDatabase(conn);
+
+ if (databases.head == NULL)
+ {
+ fprintf(stderr, "%s: no databases to check\n", progname);
+ exit(0);
+ }
+
+ /*
+ * Compile a list of all relations spanning all databases to be checked.
+ */
+ for (cell = databases.head; cell; cell = cell->next)
+ {
+ PGresult *result;
+ int ntups;
+ const char *amcheck_schema;
+ DatabaseInfo *dat = (DatabaseInfo *) cell->ptr;
+
+ cparams.override_dbname = dat->datname;
+
+ /*
+ * Test that this function works, but for now we're not using the list
+ * 'relations' that it builds.
+ */
+ conn = connectDatabase(&cparams, progname, opts.echo, false, true);
+
+ /*
+ * Verify that amcheck is installed for this next database. User
+ * error could result in a database not having amcheck that should
+ * have it, but we also could be iterating over multiple databases
+ * where not all of them have amcheck installed (for example,
+ * 'template1').
+ */
+ result = executeQuery(conn, amcheck_sql, opts.echo);
+ if (PQresultStatus(result) != PGRES_TUPLES_OK)
+ {
+ /* Querying the catalog failed. */
+ pg_log_error("database \"%s\": %s\n",
+ PQdb(conn), PQerrorMessage(conn));
+ pg_log_error("query was: %s", amcheck_sql);
+ PQclear(result);
+ disconnectDatabase(conn);
+ continue;
+ }
+ ntups = PQntuples(result);
+ if (ntups == 0)
+ {
+ /* Querying the catalog succeeded, but amcheck is missing. */
+ fprintf(stderr,
+ "%s: skipping database \"%s\": amcheck is not installed\n",
+ progname, PQdb(conn));
+ disconnectDatabase(conn);
+ continue;
+ }
+ amcheck_schema = PQgetvalue(result, 0, 0);
+ if (opts.verbose)
+ fprintf(stderr,
+ "%s: in database \"%s\": using amcheck version \"%s\" in schema \"%s\"\n",
+ progname, PQdb(conn), PQgetvalue(result, 0, 1),
+ amcheck_schema);
+ dat->amcheck_schema = PQescapeIdentifier(conn, amcheck_schema,
+ strlen(amcheck_schema));
+ PQclear(result);
+
+ compileRelationListOneDb(conn, &relations, dat);
+ disconnectDatabase(conn);
+ }
+
+ /*
+ * Check that all inclusion patterns matched at least one schema or
+ * relation that we can check.
+ */
+ for (failed = false, cell = opts.include.head; cell; cell = cell->next)
+ {
+ PatternInfo *pat = (PatternInfo *) cell->ptr;
+
+ if (!pat->matched && (pat->nsprgx != NULL || pat->relrgx != NULL))
+ {
+ failed = opts.strict_names;
+
+ if (!opts.quiet)
+ {
+ if (pat->tblonly)
+ fprintf(stderr, "%s: no tables to check for \"%s\"\n",
+ progname, pat->pattern);
+ else if (pat->idxonly)
+ fprintf(stderr, "%s: no btree indexes to check for \"%s\"\n",
+ progname, pat->pattern);
+ else if (pat->relrgx == NULL)
+ fprintf(stderr, "%s: no relations to check in schemas for \"%s\"\n",
+ progname, pat->pattern);
+ else
+ fprintf(stderr, "%s: no relations to check for \"%s\"\n",
+ progname, pat->pattern);
+ }
+ }
+ }
+
+ if (failed)
+ exit(1);
+
+ /*
+ * We cannot use more workers than the user specified with --jobs, but if
+ * that value exceeds the number of relations, using that many needlessly
+ * opens extra unused database connections.
+ *
+ * Set parallel_workers to the lesser of opts.jobs and the number of
+ * relations.
+ */
+ for (cell = relations.head, parallel_workers = 0;
+ cell != NULL && parallel_workers < opts.jobs;
+ cell = cell->next, parallel_workers++)
+ ;
+
+ if (opts.show_progress)
+ {
+ /* Count the total number of relations */
+ reltotal = 0;
+ for (cell = relations.head; cell; cell = cell->next)
+ reltotal++;
+ }
+
+ /*
+ * ParallelSlots based vent loop follows.
+ *
+ * We use server-side parallelism to check up to parallel_workers
+ * relations in parallel. As a slot becomes free, we reuse the slot's
+ * connection if it is open and is connected to the next relation's
+ * database. Otherwise, we close the slot's connection (if open) and
+ * connect again. The relations list was computed in database order, so
+ * this strategy should not result in unreasonably many connects and
+ * disconnects.
+ *
+ * Per database, the relations are sorted in relpages order, largest
+ * first, but the cluster-global largest objects may be far down the list
+ * if databases with smaller objects were processed first. We have traded
+ * off the desire to keep reconnections low against the desire to start
+ * processing the cluster-global largest objects first irrespective of
+ * database order.
+ */
+ prev_datname = NULL;
+ failed = false;
+ slots = NULL;
+ initPQExpBuffer(&sql);
+ for (relprogress = 0, cell = relations.head; cell; cell = cell->next)
+ {
+ ParallelSlot *free_slot;
+ RelationInfo *rel;
+
+ Assert(cell);
+ Assert(cell->ptr);
+
+ rel = (RelationInfo *) cell->ptr;
+
+ Assert(rel->datinfo);
+ Assert(rel->datinfo->datname);
+ Assert(rel->datinfo->amcheck_schema);
+
+ if (CancelRequested)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ /*
+ * The relations list is in database sorted order. If this next
+ * relation is in a different database than the last one seen, we are
+ * about to start checking this database. Note that other slots may
+ * still be working on relations from prior databases.
+ */
+ if (!opts.quiet && (!prev_datname || strcmp(rel->datinfo->datname,
+ prev_datname) != 0))
+ fprintf(stderr, "%s: checking database \"%s\"\n",
+ progname, rel->datinfo->datname);
+ prev_datname = rel->datinfo->datname;
+
+ if (opts.show_progress)
+ {
+ fprintf(stderr,
+ ngettext("%s: %lld/%lld relation checked\n",
+ "%s: %lld/%lld relations checked\n",
+ relprogress),
+ progname, relprogress, reltotal);
+ relprogress++;
+ }
+
+ /*
+ * Setup the slots if this is our first time through.
+ *
+ * There is an inefficiency here that would require redesigning
+ * parallel slots to fix: If the number of relations to be checked in
+ * the first database is less than parallel_workers, we will open more
+ * connections to the first database than necessary, and will close
+ * some of them unused as we move on to relations in other databases.
+ * If this happened for *every* database, it would need fixing, but
+ * since it only happens for the first, we tolerate this for now.
+ */
+ if (slots == NULL)
+ {
+ cparams.override_dbname = rel->datinfo->datname;
+ conn = connectDatabase(&cparams, progname, opts.echo, false, true);
+ slots = ParallelSlotsSetup(&cparams, progname, opts.echo, conn,
+ parallel_workers);
+ }
+
+ /*
+ * Get a parallel slot for the next amcheck command, blocking if
+ * necessary until one is available, or until a previously issued slot
+ * command fails, indicating that we should abort checking the
+ * remaining objects.
+ */
+ free_slot = ParallelSlotsGetIdle(slots, parallel_workers);
+ if (!free_slot)
+ {
+ /*
+ * Something failed. We don't need to know what it was, because
+ * the handler should already have emitted the necessary error
+ * messages.
+ */
+ failed = true;
+ goto finish;
+ }
+
+ /*
+ * If the slot is connected to the wrong database for checking this
+ * next relation, reconnect to the right one.
+ */
+ if (strcmp(PQdb(free_slot->connection), rel->datinfo->datname) != 0)
+ {
+ disconnectDatabase(free_slot->connection);
+ free_slot->connection = connectDatabase(&cparams, progname,
+ opts.echo, false, true);
+ }
+
+ /*
+ * Execute the appropriate amcheck command for this relation using our
+ * slot's database connection. We do not wait for the command to
+ * complete, nor do we perform any error checking, as that is done by
+ * the parallel slots and our handler callback functions.
+ */
+ if (rel->is_table)
+ {
+ prepare_table_command(&sql, rel->reloid,
+ rel->datinfo->amcheck_schema);
+ ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler,
+ sql.data);
+ run_command(free_slot, sql.data, &cparams);
+ }
+ else
+ {
+ prepare_btree_command(&sql, rel->reloid,
+ rel->datinfo->amcheck_schema);
+ ParallelSlotSetHandler(free_slot, VerifyBtreeSlotHandler, NULL);
+ run_command(free_slot, sql.data, &cparams);
+ }
+ }
+ termPQExpBuffer(&sql);
+
+ /*
+ * Wait for all slots to complete, or for one to indicate that an error
+ * occurred. Like above, we rely on the handler emitting the necessary
+ * error messages.
+ */
+ if (slots && !ParallelSlotsWaitCompletion(slots, parallel_workers))
+ failed = true;
+
+finish:
+ if (slots)
+ {
+ ParallelSlotsTerminate(slots, parallel_workers);
+ pg_free(slots);
+ }
+
+ if (failed)
+ exit(1);
+}
+
+/*
+ * prepare_table_command
+ *
+ * Creates a SQL command for running amcheck checking on the given heap
+ * relation. The command is phrased as a SQL query, with column order and
+ * names matching the expectations of VerifyHeapamSlotHandler, which will
+ * receive and handle each row returned from the verify_heapam() function.
+ *
+ * sql: buffer into which the table checking command will be written
+ * reloid: relation of the table to be checked
+ * amcheck_schema: escaped and quoted name of schema in which amcheck contrib
+ * module is installed
+ */
+static void
+prepare_table_command(PQExpBuffer sql, Oid reloid, const char *amcheck_schema)
+{
+ resetPQExpBuffer(sql);
+ appendPQExpBuffer(sql,
+ "SELECT n.nspname, c.relname, v.blkno, v.offnum, "
+ "v.attnum, v.msg"
+ "\nFROM %s.verify_heapam("
+ "\nrelation := %u,"
+ "\non_error_stop := %s,"
+ "\ncheck_toast := %s,"
+ "\nskip := '%s'",
+ amcheck_schema,
+ reloid,
+ opts.on_error_stop ? "true" : "false",
+ opts.reconcile_toast ? "true" : "false",
+ opts.skip);
+ if (opts.startblock >= 0)
+ appendPQExpBuffer(sql, ",\nstartblock := %ld", opts.startblock);
+ if (opts.endblock >= 0)
+ appendPQExpBuffer(sql, ",\nendblock := %ld", opts.endblock);
+ appendPQExpBuffer(sql, "\n) v,"
+ "\npg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE c.oid OPERATOR(pg_catalog.=) %u",
+ reloid);
+}
+
+/*
+ * prepare_btree_command
+ *
+ * Creates a SQL command for running amcheck checking on the given btree index
+ * relation. The command does not select any columns, as btree checking
+ * functions do not return any, but rather return corruption information by
+ * raising errors, which VerifyBtreeSlotHandler expects.
+ *
+ * sql: buffer into which the table checking command will be written
+ * reloid: relation of the table to be checked
+ * amcheck_schema: escaped and quoted name of schema in which amcheck contrib
+ * module is installed
+ */
+static void
+prepare_btree_command(PQExpBuffer sql, Oid reloid, const char *amcheck_schema)
+{
+ resetPQExpBuffer(sql);
+ if (opts.parent_check)
+ appendPQExpBuffer(sql,
+ "SELECT %s.bt_index_parent_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s,"
+ "\nrootdescend := %s)",
+ amcheck_schema,
+ reloid,
+ (opts.heapallindexed ? "true" : "false"),
+ (opts.rootdescend ? "true" : "false"));
+ else
+ appendPQExpBuffer(sql,
+ "SELECT %s.bt_index_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s)",
+ amcheck_schema,
+ reloid,
+ (opts.heapallindexed ? "true" : "false"));
+}
+
+/*
+ * run_command
+ *
+ * Sends a command to the server without waiting for the command to complete.
+ * Logs an error if the command cannot be sent, but otherwise any errors are
+ * expected to be handled by a ParallelSlotHandler.
+ *
+ * If reconnecting to the database is necessary, the cparams argument may be
+ * modified.
+ *
+ * slot: slot with connection to the server we should use for the command
+ * sql: query to send
+ * cparams: connection parameters in case the slot needs to be reconnected
+ */
+static void
+run_command(ParallelSlot *slot, const char *sql, ConnParams *cparams)
+{
+ if (opts.echo)
+ printf("%s\n", sql);
+
+ if (PQsendQuery(slot->connection, sql) == 0)
+ {
+ pg_log_warning("error sending command to database \"%s\": %s",
+ PQdb(slot->connection),
+ PQerrorMessage(slot->connection));
+
+ /* reconnect, in case we merely lost the connection */
+ cparams->override_dbname = PQdb(slot->connection);
+ disconnectDatabase(slot->connection);
+ slot->connection = connectDatabase(cparams, progname,
+ opts.echo, false, true);
+
+ /* retry the command, but give up if the retry fails */
+ if (PQsendQuery(slot->connection, sql) == 0)
+ {
+ pg_log_error("retry: error sending command to database \"%s\": %s",
+ PQdb(slot->connection),
+ PQerrorMessage(slot->connection));
+ pg_log_error("command was: %s", sql);
+
+ /*
+ * The retry failed, and continuing to retry risks an infinite
+ * loop. But subsequent remaining commands, particularly if they
+ * are for a different database, may succeed, so we do not exit
+ * here.
+ */
+ }
+ }
+}
+
+/*
+ * VerifyHeapamSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a table checking command
+ * created by prepare_table_command and outputs the results for the user.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: the sql query being handled, as a cstring
+ */
+static bool
+VerifyHeapamSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ int ntups = PQntuples(res);
+
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ int i;
+
+ for (i = 0; i < ntups; i++)
+ {
+ if (!PQgetisnull(res, i, 4))
+ printf("relation %s.%s.%s, block %s, offset %s, attribute %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ PQgetvalue(res, i, 4), /* attnum */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 3))
+ printf("relation %s.%s.%s, block %s, offset %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 2))
+ printf("relation %s.%s.%s, block %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 1))
+ printf("relation %s.%s.%s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ /* blkno is null: 2 */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else
+ printf("%s.%s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 5)); /* msg */
+ }
+ }
+ else if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ printf("%s: %s\n", PQdb(conn), PQerrorMessage(conn));
+ printf("%s: query was: %s\n", PQdb(conn), (const char *) context);
+ }
+
+ return true;
+}
+
+/*
+ * VerifyBtreeSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a btree checking command
+ * created by prepare_btree_command and outputs them for the user. The results
+ * from the btree checking command is assumed to be empty, but when the results
+ * are an error code, the useful information about the corruption is expected
+ * in the connection's error message.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: unused
+ */
+static bool
+VerifyBtreeSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ printf("%s: %s\n", PQdb(conn), PQerrorMessage(conn));
+
+ return true;
+}
+
+/*
+ * help
+ *
+ * Prints help page for the program
+ *
+ * progname: the name of the executed program, such as "pg_amcheck"
+ */
+static void
+help(const char *progname)
+{
+ printf("%s checks objects in a PostgreSQL database for corruption.\n\n", progname);
+ printf("Usage:\n");
+ printf(" %s [OPTION]... [DBNAME]\n", progname);
+ printf("\nTarget Options:\n");
+ printf(" -a, --all check all databases\n");
+ printf(" -d, --dbname=DBNAME check specific database(s)\n");
+ printf(" -D, --exclude-dbname=DBNAME do NOT check specific database(s)\n");
+ printf(" -i, --index=INDEX check specific index(es)\n");
+ printf(" -I, --exclude-index=INDEX do NOT check specific index(es)\n");
+ printf(" -r, --relation=RELNAME check specific relation(s)\n");
+ printf(" -R, --exclude-relation=RELNAME do NOT check specific relation(s)\n");
+ printf(" -s, --schema=SCHEMA check specific schema(s)\n");
+ printf(" -S, --exclude-schema=SCHEMA do NOT check specific schema(s)\n");
+ printf(" -t, --table=TABLE check specific table(s)\n");
+ printf(" -T, --exclude-table=TABLE do NOT check specific table(s)\n");
+ printf(" --exclude-indexes do NOT perform any index checking\n");
+ printf(" --exclude-tables do NOT check any tables\n");
+ printf(" --exclude-toast do NOT check any toast tables nor toast indexes\n");
+ printf(" --no-dependents do NOT automatically check dependent objects\n");
+ printf(" --no-strict-names do NOT require patterns to match objects\n");
+ printf("\nIndex Checking Options:\n");
+ printf(" -H, --heapallindexed check all heap tuples are found within indexes\n");
+ printf(" -P, --parent-check check index parent/child relationships\n");
+ printf(" --rootdescend search from root page to refind tuples\n");
+ printf("\nTable Checking Options:\n");
+ printf(" --exclude-toast-pointers do NOT follow relation toast pointers\n");
+ printf(" --on-error-stop stop checking at end of first corrupt page\n");
+ printf(" --skip=OPTION do NOT check \"all-frozen\" or \"all-visible\" blocks\n");
+ printf(" --startblock=BLOCK begin checking table(s) at the given block number\n");
+ printf(" --endblock=BLOCK check table(s) only up to the given block number\n");
+ printf("\nConnection options:\n");
+ printf(" -h, --host=HOSTNAME database server host or socket directory\n");
+ printf(" -p, --port=PORT database server port\n");
+ printf(" -U, --username=USERNAME user name to connect as\n");
+ printf(" -w, --no-password never prompt for password\n");
+ printf(" -W, --password force password prompt\n");
+ printf(" --maintenance-db=DBNAME alternate maintenance database\n");
+ printf("\nOther Options:\n");
+ printf(" -e, --echo show the commands being sent to the server\n");
+ printf(" -j, --jobs=NUM use this many concurrent connections to the server\n");
+ printf(" -q, --quiet don't write any messages\n");
+ printf(" -v, --verbose write a lot of output\n");
+ printf(" -V, --version output version information, then exit\n");
+ printf(" --progress show progress information\n");
+ printf(" -?, --help show this help, then exit\n");
+
+ printf("\nRead the description of the amcheck contrib module for details.\n");
+ printf("\nReport bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+/*
+ * appendDatabasePattern
+ *
+ * Adds to a list the given pattern interpreted as a database name pattern.
+ *
+ * list: the list to be appended
+ * pattern: the database name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendDatabasePattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ PQExpBufferData buf;
+ PatternInfo *info = (PatternInfo *) palloc0(sizeof(PatternInfo));
+
+ info->pattern_id = next_id++;
+
+ initPQExpBuffer(&buf);
+ patternToSQLRegex(encoding, NULL, NULL, &buf, pattern, false);
+ info->pattern = pattern;
+ info->dbrgx = pstrdup(buf.data);
+
+ termPQExpBuffer(&buf);
+
+ simple_ptr_list_append(list, info);
+}
+
+/*
+ * appendSchemaPattern
+ *
+ * Adds to a list the given pattern interpreted as a schema name pattern.
+ *
+ * list: the list to be appended
+ * pattern: the schema name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendSchemaPattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ PQExpBufferData buf;
+ PatternInfo *info = (PatternInfo *) palloc0(sizeof(PatternInfo));
+
+ info->pattern_id = next_id++;
+
+ initPQExpBuffer(&buf);
+ patternToSQLRegex(encoding, NULL, NULL, &buf, pattern, false);
+ info->pattern = pattern;
+ info->nsprgx = pstrdup(buf.data);
+ termPQExpBuffer(&buf);
+
+ simple_ptr_list_append(list, info);
+}
+
+/*
+ * appendRelationPatternHelper
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ * tblonly: whether the pattern should only be matched against heap tables
+ * idxonly: whether the pattern should only be matched against btree indexes
+ */
+static void
+appendRelationPatternHelper(SimplePtrList *list, const char *pattern,
+ int encoding, bool tblonly, bool idxonly)
+{
+ PQExpBufferData dbbuf;
+ PQExpBufferData nspbuf;
+ PQExpBufferData relbuf;
+ PatternInfo *info = (PatternInfo *) palloc0(sizeof(PatternInfo));
+
+ info->pattern_id = next_id++;
+
+ initPQExpBuffer(&dbbuf);
+ initPQExpBuffer(&nspbuf);
+ initPQExpBuffer(&relbuf);
+
+ patternToSQLRegex(encoding, &dbbuf, &nspbuf, &relbuf, pattern, false);
+ info->pattern = pattern;
+ if (dbbuf.data[0])
+ info->dbrgx = pstrdup(dbbuf.data);
+ if (nspbuf.data[0])
+ info->nsprgx = pstrdup(nspbuf.data);
+ if (relbuf.data[0])
+ info->relrgx = pstrdup(relbuf.data);
+
+ termPQExpBuffer(&dbbuf);
+ termPQExpBuffer(&nspbuf);
+ termPQExpBuffer(&relbuf);
+
+ info->tblonly = tblonly;
+ info->idxonly = idxonly;
+
+ simple_ptr_list_append(list, info);
+}
+
+/*
+ * appendRelationPattern
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern, to be
+ * matched against both tables and indexes.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendRelationPattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ appendRelationPatternHelper(list, pattern, encoding, false, false);
+}
+
+/*
+ * appendTablePattern
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern, to be
+ * matched only against tables.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendTablePattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ appendRelationPatternHelper(list, pattern, encoding, true, false);
+}
+
+/*
+ * appendIndexPattern
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern, to be
+ * matched only against indexes.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendIndexPattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ appendRelationPatternHelper(list, pattern, encoding, false, true);
+}
+
+/*
+ * appendDbPatternCTE
+ *
+ * Appends to the buffer the body of a Common Table Expression (CTE) containing
+ * the database portions filtered from the list of patterns expressed as three
+ * columns:
+ *
+ * id: the unique pattern ID
+ * pat: the full user specified pattern from the command line
+ * rgx: the database regular expression parsed from the pattern
+ *
+ * Patterns without a database portion are skipped. Patterns with more than
+ * just a database portion are optionally skipped, depending on argument
+ * 'inclusive'.
+ *
+ * buf: the buffer to be appended
+ * patterns: the list of patterns to be inserted into the CTE
+ * conn: the database connection
+ * inclusive: whether to include patterns with schema and/or relation parts
+ */
+static void
+appendDbPatternCTE(PQExpBuffer buf, const SimplePtrList *patterns,
+ PGconn *conn, bool inclusive)
+{
+ SimplePtrListCell *cell;
+ const char *comma;
+ bool have_values;
+
+ comma = "";
+ have_values = false;
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ PatternInfo *info = (PatternInfo *) cell->ptr;
+
+ if (info->dbrgx != NULL &&
+ (inclusive || (info->nsprgx == NULL && info->relrgx == NULL)))
+ {
+ if (!have_values)
+ appendPQExpBufferStr(buf, "\nVALUES");
+ have_values = true;
+ appendPQExpBuffer(buf, "%s\n(%d, ", comma, info->pattern_id);
+ appendStringLiteralConn(buf, info->pattern, conn);
+ appendPQExpBufferStr(buf, ", ");
+ appendStringLiteralConn(buf, info->dbrgx, conn);
+ appendPQExpBufferStr(buf, ")");
+ comma = ",";
+ }
+ }
+
+ if (!have_values)
+ appendPQExpBufferStr(buf, "\nSELECT NULL, NULL, NULL WHERE false");
+}
+
+/*
+ * compileDatabaseList
+ *
+ * Compiles a list of databases to check based on the user supplied options,
+ * sorted to preserve the order they were specified on the command line. In
+ * the event that multiple databases match a single command line pattern, they
+ * are secondarily sorted by name.
+ *
+ * conn: connection to the initial database
+ * databases: the list onto which databases should be appended
+ */
+static void
+compileDatabaseList(PGconn *conn, SimplePtrList *databases)
+{
+ PGresult *res;
+ PQExpBufferData sql;
+ int ntups;
+ int i;
+ bool fatal;
+
+ initPQExpBuffer(&sql);
+
+ /* Append the include patterns CTE. */
+ appendPQExpBufferStr(&sql, "WITH include_raw (id, pat, rgx) AS (");
+ appendDbPatternCTE(&sql, &opts.include, conn, true);
+
+ /* Append the exclude patterns CTE. */
+ appendPQExpBufferStr(&sql, "\n),\nexclude_raw (id, pat, rgx) AS (");
+ appendDbPatternCTE(&sql, &opts.exclude, conn, false);
+ appendPQExpBufferStr(&sql, "\n),");
+
+ /*
+ * Append the database CTE, which includes whether each database is
+ * connectable and also joins against exclude_raw to determine whether
+ * each database is excluded.
+ */
+ appendPQExpBufferStr(&sql,
+ "\ndatabase (datname) AS ("
+ "\nSELECT d.datname"
+ "\nFROM pg_catalog.pg_database d"
+ "\nLEFT OUTER JOIN exclude_raw e"
+ "\nON d.datname ~ e.rgx"
+ "\nWHERE d.datallowconn"
+ "\nAND e.id IS NULL"
+ "\n),"
+
+ /*
+ * Append the include_pat CTE, which joins the include_raw CTE against the
+ * databases CTE to determine if all the inclusion patterns had matches,
+ * and whether each matched pattern had the misfortune of only matching
+ * excluded or unconnectable databases.
+ */
+ "\ninclude_pat (id, pat, checkable) AS ("
+ "\nSELECT i.id, i.pat,"
+ "\nCOUNT(*) FILTER ("
+ "\nWHERE d IS NOT NULL"
+ "\n) AS checkable"
+ "\nFROM include_raw i"
+ "\nLEFT OUTER JOIN database d"
+ "\nON d.datname ~ i.rgx"
+ "\nGROUP BY i.id, i.pat"
+ "\n),"
+
+ /*
+ * Append the filtered_databases CTE, which selects from the database CTE
+ * optionally joined against the include_raw CTE to only select databases
+ * that match an inclusion pattern. This appears to duplicate what the
+ * include_pat CTE already did above, but here we want only databses, and
+ * there we wanted patterns.
+ */
+ "\nfiltered_databases (datname) AS ("
+ "\nSELECT DISTINCT d.datname"
+ "\nFROM database d");
+ if (!opts.alldb)
+ appendPQExpBufferStr(&sql,
+ "\nINNER JOIN include_raw i"
+ "\nON d.datname ~ i.rgx");
+ appendPQExpBufferStr(&sql,
+ "\n)"
+
+ /*
+ * Select the checkable databases and the unmatched inclusion patterns.
+ */
+ "\nSELECT pat, datname"
+ "\nFROM ("
+ "\nSELECT id, pat, NULL::TEXT AS datname"
+ "\nFROM include_pat"
+ "\nWHERE checkable = 0"
+ "\nUNION ALL"
+ "\nSELECT NULL, NULL, datname"
+ "\nFROM filtered_databases"
+ "\n) AS combined_records"
+ "\nORDER BY id NULLS LAST, datname");
+
+ res = executeQuery(conn, sql.data, opts.echo);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ disconnectDatabase(conn);
+ exit(1);
+ }
+ termPQExpBuffer(&sql);
+
+ ntups = PQntuples(res);
+ for (fatal = false, i = 0; i < ntups; i++)
+ {
+ const char *pat = NULL;
+ const char *datname = NULL;
+
+ if (!PQgetisnull(res, i, 0))
+ pat = PQgetvalue(res, i, 0);
+ if (!PQgetisnull(res, i, 1))
+ datname = PQgetvalue(res, i, 1);
+
+ if (pat != NULL)
+ {
+ /*
+ * Current record pertains to an inclusion pattern that matched no
+ * checkable databases.
+ */
+ fatal = opts.strict_names;
+ fprintf(stderr, "%s: no checkable database: \"%s\"\n",
+ progname, pat);
+ }
+ else
+ {
+ /* Current record pertains to a database */
+ Assert(datname != NULL);
+
+ DatabaseInfo *dat = (DatabaseInfo *) palloc0(sizeof(DatabaseInfo));
+
+ /* This database is included. Add to list */
+ if (opts.verbose)
+ fprintf(stderr, "%s: including database: \"%s\"\n", progname,
+ datname);
+
+ dat->datname = pstrdup(datname);
+ simple_ptr_list_append(databases, dat);
+ }
+ }
+ PQclear(res);
+
+ if (fatal)
+ {
+ disconnectDatabase(conn);
+ exit(1);
+ }
+}
+
+/*
+ * appendRelPatternRawCTE
+ *
+ * Appends to the buffer the body of a Common Table Expression (CTE) containing
+ * the patterns from the given list as seven columns:
+ *
+ * id: the unique pattern ID
+ * pat: the full user specified pattern from the command line
+ * dbrgx: the database regexp parsed from the pattern, or NULL if the
+ * pattern had no database part
+ * nsprgx: the namespace regexp parsed from the pattern, or NULL if the
+ * pattern had no namespace part
+ * relrgx: the relname regexp parsed from the pattern, or NULL if the
+ * pattern had no relname part
+ * tbl: true if the pattern applies only to tables (not indexes)
+ * idx: true if the pattern applies only to indexes (not tables)
+ *
+ * buf: the buffer to be appended
+ * patterns: the list of patterns to be inserted into the CTE
+ * conn: the database connection
+ */
+static void
+appendRelPatternRawCTE(PQExpBuffer buf, const SimplePtrList *patterns,
+ PGconn *conn)
+{
+ SimplePtrListCell *cell;
+ const char *comma;
+ bool have_values;
+
+ comma = "";
+ have_values = false;
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ PatternInfo *info = (PatternInfo *) cell->ptr;
+
+ if (!have_values)
+ appendPQExpBufferStr(buf, "\nVALUES");
+ have_values = true;
+ appendPQExpBuffer(buf, "%s\n(%d::INTEGER, ", comma, info->pattern_id);
+ appendStringLiteralConn(buf, info->pattern, conn);
+ appendPQExpBufferStr(buf, "::TEXT, ");
+ if (info->dbrgx == NULL)
+ appendPQExpBufferStr(buf, "NULL");
+ else
+ appendStringLiteralConn(buf, info->dbrgx, conn);
+ appendPQExpBufferStr(buf, "::TEXT, ");
+ if (info->nsprgx == NULL)
+ appendPQExpBufferStr(buf, "NULL");
+ else
+ appendStringLiteralConn(buf, info->nsprgx, conn);
+ appendPQExpBufferStr(buf, "::TEXT, ");
+ if (info->relrgx == NULL)
+ appendPQExpBufferStr(buf, "NULL");
+ else
+ appendStringLiteralConn(buf, info->relrgx, conn);
+ if (info->tblonly)
+ appendPQExpBufferStr(buf, "::TEXT, true::BOOLEAN");
+ else
+ appendPQExpBufferStr(buf, "::TEXT, false::BOOLEAN");
+ if (info->idxonly)
+ appendPQExpBufferStr(buf, ", true::BOOLEAN");
+ else
+ appendPQExpBufferStr(buf, ", false::BOOLEAN");
+ appendPQExpBufferStr(buf, ")");
+ comma = ",";
+ }
+
+ if (!have_values)
+ appendPQExpBufferStr(buf,
+ "\nSELECT NULL::INTEGER, NULL::TEXT, NULL::TEXT,"
+ "\nNULL::TEXT, NULL::TEXT, NULL::BOOLEAN,"
+ "\nNULL::BOOLEAN"
+ "\nWHERE false");
+}
+
+/*
+ * appendRelPatternFilteredCTE
+ *
+ * Appends to the buffer a Common Table Expression (CTE) which selects
+ * all patterns from the named raw CTE, filtered by database. All patterns
+ * which have no database portion or whose database portion matches our
+ * connection's database name are selected, with other patterns excluded.
+ *
+ * The basic idea here is that if we're connected to database "foo" and we have
+ * patterns "foo.bar.baz", "alpha.beta" and "one.two.three", we only want to
+ * use the first two while processing relations in this database, as the third
+ * one is not relevant.
+ *
+ * buf: the buffer to be appended
+ * raw: the name of the CTE to select from
+ * filtered: the name of the CTE to create
+ * conn: the database connection
+ */
+static void
+appendRelPatternFilteredCTE(PQExpBuffer buf, const char *raw,
+ const char *filtered, PGconn *conn)
+{
+ appendPQExpBuffer(buf,
+ "\n%s (id, pat, nsprgx, relrgx, tbl, idx) AS ("
+ "\nSELECT id, pat, nsprgx, relrgx, tbl, idx"
+ "\nFROM %s r"
+ "\nWHERE (r.dbrgx IS NULL"
+ "\nOR ",
+ filtered, raw);
+ appendStringLiteralConn(buf, PQdb(conn), conn);
+ appendPQExpBufferStr(buf, " ~ r.dbrgx)");
+ appendPQExpBufferStr(buf,
+ "\nAND (r.nsprgx IS NOT NULL"
+ "\nOR r.relrgx IS NOT NULL)"
+ "\n),");
+}
+
+/*
+ * compileRelationListOneDb
+ *
+ * Compiles a list of relations to check within the currently connected
+ * database based on the user supplied options, sorted by descending size,
+ * and appends them to the given relations list.
+ *
+ * The cells of the constructed list contain all information about the relation
+ * necessary to connect to the database and check the object, including which
+ * database to connect to, where contrib/amcheck is installed, and the Oid and
+ * type of object (table vs. index). Rather than duplicating the database
+ * details per relation, the relation structs use references to the same
+ * database object, provided by the caller.
+ *
+ * conn: connection to this next database, which should be the same as in 'dat'
+ * relations: list onto which the relations information should be appended
+ * dat: the database info struct for use by each relation
+ */
+static void
+compileRelationListOneDb(PGconn *conn, SimplePtrList *relations,
+ const DatabaseInfo *dat)
+{
+ PGresult *res;
+ PQExpBufferData sql;
+ int ntups;
+ int i;
+ const char *datname;
+
+ initPQExpBuffer(&sql);
+ appendPQExpBufferStr(&sql, "WITH");
+
+ /* Append CTEs for the relation inclusion patterns, if any */
+ if (!opts.allrel)
+ {
+ appendPQExpBufferStr(&sql,
+ "\ninclude_raw (id, pat, dbrgx, nsprgx, relrgx, tbl, idx) AS (");
+ appendRelPatternRawCTE(&sql, &opts.include, conn);
+ appendPQExpBufferStr(&sql, "\n),");
+ appendRelPatternFilteredCTE(&sql, "include_raw", "include_pat", conn);
+ }
+
+ /* Append CTEs for the relation exclusion patterns, if any */
+ if (opts.excludetbl || opts.excludeidx)
+ {
+ appendPQExpBufferStr(&sql,
+ "\nexclude_raw (id, pat, dbrgx, nsprgx, relrgx, tbl, idx) AS (");
+ appendRelPatternRawCTE(&sql, &opts.exclude, conn);
+ appendPQExpBufferStr(&sql, "\n),");
+ appendRelPatternFilteredCTE(&sql, "exclude_raw", "exclude_pat", conn);
+ }
+
+ /* Append the relation CTE. */
+ appendPQExpBufferStr(&sql,
+ "\nrelation (id, pat, oid, reltoastrelid, relpages, tbl, idx) AS ("
+ "\nSELECT DISTINCT ON (c.oid");
+ if (!opts.allrel)
+ appendPQExpBufferStr(&sql, ", ip.id) ip.id, ip.pat,");
+ else
+ appendPQExpBufferStr(&sql, ") NULL::INTEGER AS id, NULL::TEXT AS pat,");
+ appendPQExpBuffer(&sql,
+ "\nc.oid, c.reltoastrelid, c.relpages,"
+ "\nc.relam = %u AS tbl,"
+ "\nc.relam = %u AS idx"
+ "\nFROM pg_catalog.pg_class c"
+ "\nINNER JOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace = n.oid",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+ if (!opts.allrel)
+ appendPQExpBuffer(&sql,
+ "\nINNER JOIN include_pat ip"
+ "\nON (n.nspname ~ ip.nsprgx OR ip.nsprgx IS NULL)"
+ "\nAND (c.relname ~ ip.relrgx OR ip.relrgx IS NULL)"
+ "\nAND (c.relam = %u OR NOT ip.tbl)"
+ "\nAND (c.relam = %u OR NOT ip.idx)",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+ if (opts.excludetbl || opts.excludeidx)
+ appendPQExpBuffer(&sql,
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON (n.nspname ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (c.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND (c.relam = %u OR NOT e.tbl)"
+ "\nAND (c.relam = %u OR NOT e.idx)",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+
+ if (opts.excludetbl || opts.excludeidx)
+ appendPQExpBufferStr(&sql, "\nWHERE e.pat IS NULL");
+ else
+ appendPQExpBufferStr(&sql, "\nWHERE true");
+
+ if (opts.no_toast)
+ appendPQExpBuffer(&sql,
+ "\nAND c.relnamespace != %u",
+ PG_TOAST_NAMESPACE);
+ if (opts.no_tables)
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam = %u"
+ "\nAND c.relkind = 'i'",
+ BTREE_AM_OID);
+ else if (opts.no_indexes)
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam = %u"
+ "\nAND c.relkind IN ('r', 'm', 't')",
+ HEAP_TABLE_AM_OID);
+ else
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam IN (%u, %u)"
+ "\nAND c.relkind IN ('r', 'm', 't', 'i')"
+ "\nAND ((c.relam = %u AND c.relkind IN ('r', 'm', 't')) OR"
+ "\n(c.relam = %u AND c.relkind = 'i'))",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID,
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+
+ appendPQExpBufferStr(&sql,
+ "\nORDER BY c.oid"
+ "\n)");
+
+ if (!opts.no_dependents && !opts.no_toast)
+ {
+ /*
+ * Include a CTE for toast tables associated with primary tables
+ * selected above, filtering by exclusion patterns (if any) that match
+ * toast table names.
+ */
+ appendPQExpBufferStr(&sql,
+ ",\ntoast (oid, relpages) AS ("
+ "\nSELECT t.oid, t.relpages"
+ "\nFROM pg_catalog.pg_class t"
+ "\nINNER JOIN relation r"
+ "\nON r.reltoastrelid = t.oid");
+ if (opts.excludetbl)
+ appendPQExpBufferStr(&sql,
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON ('pg_toast' ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (t.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND e.tbl"
+ "\nWHERE e.id IS NULL");
+ appendPQExpBufferStr(&sql,
+ "\n)");
+ }
+ if (!opts.no_dependents && !opts.no_indexes)
+ {
+ /*
+ * Include a CTE for btree indexes associated with primary tables
+ * selected above, filtering by exclusion patterns (if any) that match
+ * btree index names.
+ */
+ appendPQExpBuffer(&sql,
+ ",\nindex (oid, relpages) AS ("
+ "\nSELECT c.oid, c.relpages"
+ "\nFROM relation r"
+ "\nINNER JOIN pg_catalog.pg_index i"
+ "\nON r.oid = i.indrelid"
+ "\nINNER JOIN pg_catalog.pg_class c"
+ "\nON i.indexrelid = c.oid");
+ if (opts.excludeidx)
+ appendPQExpBufferStr(&sql,
+ "\nINNER JOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace = n.oid"
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON (n.nspname ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (c.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND e.idx"
+ "\nWHERE e.id IS NULL");
+ else
+ appendPQExpBufferStr(&sql,
+ "\nWHERE true");
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam = %u"
+ "\nAND c.relkind = 'i'",
+ BTREE_AM_OID);
+ if (opts.no_toast)
+ appendPQExpBuffer(&sql,
+ "\nAND c.relnamespace != %u",
+ PG_TOAST_NAMESPACE);
+ appendPQExpBufferStr(&sql, "\n)");
+ }
+
+ if (!opts.no_dependents && !opts.no_toast && !opts.no_indexes)
+ {
+ /*
+ * Include a CTE for btree indexes associated with toast tables of
+ * primary tables selected above, filtering by exclusion patterns (if
+ * any) that match the toast index names.
+ */
+ appendPQExpBuffer(&sql,
+ ",\ntoast_index (oid, relpages) AS ("
+ "\nSELECT c.oid, c.relpages"
+ "\nFROM toast t"
+ "\nINNER JOIN pg_catalog.pg_index i"
+ "\nON t.oid = i.indrelid"
+ "\nINNER JOIN pg_catalog.pg_class c"
+ "\nON i.indexrelid = c.oid");
+ if (opts.excludeidx)
+ appendPQExpBufferStr(&sql,
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON ('pg_toast' ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (c.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND e.idx"
+ "\nWHERE e.id IS NULL");
+ else
+ appendPQExpBufferStr(&sql,
+ "\nWHERE true");
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam = %u"
+ "\nAND c.relkind = 'i'"
+ "\n)",
+ BTREE_AM_OID);
+ }
+
+ /*
+ * Roll-up distinct rows from CTEs.
+ *
+ * Relations that match more than one pattern may occur more than once in
+ * the list, and indexes and toast for primary relations may also have
+ * matched in their own right, so we rely on UNION to deduplicate the
+ * list.
+ */
+ appendPQExpBuffer(&sql,
+ "\nSELECT id, tbl, idx, oid"
+ "\nFROM (");
+ appendPQExpBufferStr(&sql,
+ /* Inclusion patterns that failed to match */
+ "\nSELECT id, tbl, idx,"
+ "\nNULL::OID AS oid,"
+ "\nNULL::INTEGER AS relpages"
+ "\nFROM relation"
+ "\nWHERE id IS NOT NULL"
+ "\nUNION"
+ /* Primary relations */
+ "\nSELECT NULL::INTEGER AS id,"
+ "\ntbl, idx,"
+ "\noid, relpages"
+ "\nFROM relation");
+ if (!opts.no_dependents && !opts.no_toast)
+ appendPQExpBufferStr(&sql,
+ "\nUNION"
+ /* Toast tables for primary relations */
+ "\nSELECT NULL::INTEGER AS id, TRUE AS tbl,"
+ "\nFALSE AS idx, oid, relpages"
+ "\nFROM toast");
+ if (!opts.no_dependents && !opts.no_indexes)
+ appendPQExpBufferStr(&sql,
+ "\nUNION"
+ /* Indexes for primary relations */
+ "\nSELECT NULL::INTEGER AS id, FALSE AS tbl,"
+ "\nTRUE AS idx, oid, relpages"
+ "\nFROM index");
+ if (!opts.no_dependents && !opts.no_toast && !opts.no_indexes)
+ appendPQExpBufferStr(&sql,
+ "\nUNION"
+ /* Indexes for toast relations */
+ "\nSELECT NULL::INTEGER AS id, FALSE AS tbl,"
+ "\nTRUE AS idx, oid, relpages"
+ "\nFROM toast_index");
+ appendPQExpBufferStr(&sql,
+ "\n) AS combined_records"
+ "\nORDER BY relpages DESC NULLS FIRST, oid");
+
+ res = executeQuery(conn, sql.data, opts.echo);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ disconnectDatabase(conn);
+ exit(1);
+ }
+ termPQExpBuffer(&sql);
+
+ /*
+ * Allocate a single copy of the database name to be shared by all nodes
+ * in the object list, constructed below.
+ */
+ datname = pstrdup(PQdb(conn));
+
+ ntups = PQntuples(res);
+ for (i = 0; i < ntups; i++)
+ {
+ int pattern_id = 0;
+ bool tbl = false;
+ bool idx = false;
+ Oid oid = InvalidOid;
+
+ if (!PQgetisnull(res, i, 0))
+ pattern_id = atoi(PQgetvalue(res, i, 0));
+ if (!PQgetisnull(res, i, 1))
+ tbl = (PQgetvalue(res, i, 1)[0] == 't');
+ if (!PQgetisnull(res, i, 2))
+ idx = (PQgetvalue(res, i, 2)[0] == 't');
+ if (!PQgetisnull(res, i, 3))
+ oid = atooid(PQgetvalue(res, i, 3));
+
+ if (pattern_id > 0)
+ {
+ /*
+ * Current record pertains to an inclusion pattern. Find the
+ * pattern in the list and update its counts. If we expected a
+ * large number of command-line inclusion pattern arguments, the
+ * datastructure here might need to be more efficient, but we
+ * expect the list to be short.
+ */
+ SimplePtrListCell *cell;
+ bool found;
+
+ for (found = false, cell = opts.include.head; cell; cell = cell->next)
+ {
+ PatternInfo *info = (PatternInfo *) cell->ptr;
+
+ if (info->pattern_id == pattern_id)
+ {
+ info->matched = true;
+ found = true;
+ break;
+ }
+ }
+ if (!found)
+ {
+ pg_log_error("internal error: received unexpected pattern_id %d",
+ pattern_id);
+ exit(1);
+ }
+ }
+ else
+ {
+ /* Current record pertains to a relation */
+
+ RelationInfo *rel = (RelationInfo *) palloc0(sizeof(RelationInfo));
+
+ Assert(OidIsValid(oid));
+ Assert(!(tbl && idx));
+
+ rel->datinfo = dat;
+ rel->reloid = oid;
+ rel->is_table = tbl;
+
+ simple_ptr_list_append(relations, rel);
+ }
+ }
+ PQclear(res);
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..c8f5862372
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,228 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 66;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+# Failing to connect to the initial database is an error.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'qqq' ],
+ qr/database "qqq" does not exist/,
+ 'checking a non-existent database');
+
+# Failing to resolve a secondary database name is also an error, though since
+# the string is treated as a pattern, the error message looks different.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', 'qqq' ],
+ qr/pg_amcheck: no checkable database: "qqq"/,
+ 'checking a non-existent database');
+
+# Failing to connect to the initial database is still an error when using
+# --no-strict-names.
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-p', $port, 'qqq' ],
+ qr/database "qqq" does not exist/,
+ 'checking a non-existent database with --no-strict-names');
+
+# But failing to resolve secondary database names is not an error when using
+# --no-strict-names. We should still see the message, but as a non-fatal
+# warning
+$node->command_checks_all(
+ [ 'pg_amcheck', '--no-strict-names', '-p', $port, '-d', 'no_such_database', 'postgres', 'qqq' ],
+ 0,
+ [ ],
+ [ qr/pg_amcheck: checking database "postgres"/,
+ qr/no checkable database: "qqq"/ ],
+ 'checking a non-existent secondary database with --no-strict-names');
+
+# Check that a substring of an existent database name does not get interpreted
+# as a matching pattern.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'post' ],
+ qr/database "post" does not exist/,
+ 'checking a non-existent primary database (substring of existent database)');
+
+# And again, but testing the secondary database name rather than the primary
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', 'post' ],
+ qr/pg_amcheck: no checkable database: "post"/,
+ 'checking a non-existent secondary database (substring of existent database)');
+
+# Likewise, check that a superstring of an existent database name does not get
+# interpreted as a matching pattern.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postresql' ],
+ qr/database "postresql" does not exist/,
+ 'checking a non-existent primary database (superstring of existent database)');
+
+# And again, but testing the secondary database name rather than the primary
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', 'postgresql' ],
+ qr/pg_amcheck: no checkable database: "postgresql"/,
+ 'checking a non-existent secondary database (superstring of existent database)');
+
+#########################################
+# Test connecting with a non-existent user
+
+# Failing to connect to the initial database due to bad username is an error.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-U=no_such_user', 'postgres' ],
+ qr/role "=no_such_user" does not exist/,
+ 'checking with a non-existent user');
+
+# Failing to connect to the initial database due to bad username is still an
+# error when using --no-strict-names.
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-p', $port, '-U=no_such_user', 'postgres' ],
+ qr/role "=no_such_user" does not exist/,
+ 'checking with a non-existent user');
+
+#########################################
+# Test checking databases without amcheck installed
+
+# Attempting to check a database by name where amcheck is not installed should
+# raise a warning.
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'template1' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: skipping database "template1": amcheck is not installed/ ],
+ 'checking a database by name without amcheck installed');
+
+# Likewise, but by database pattern rather than by name.
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, '-d', '*', 'postgres' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: skipping database "template1": amcheck is not installed/ ],
+ 'checking a database by dbname implication without amcheck installed');
+
+# And again, but by checking all databases.
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, '--all', 'postgres' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: skipping database "template1": amcheck is not installed/ ],
+ 'checking a database by --all implication without amcheck installed');
+
+#########################################
+# Test unreasonable patterns
+
+# Check three-part unreasonable pattern that has zero-length names
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-t', '..' ],
+ 1,
+ [ qr/^$/ ],
+ [ qr/pg_amcheck: no checkable database: "\.\."/ ],
+ 'checking table pattern ".."');
+
+# Again, but with non-trivial schema and relation parts
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-t', '.foo.bar' ],
+ 1,
+ [ qr/^$/ ],
+ [ qr/pg_amcheck: no checkable database: "\.foo\.bar"/ ],
+ 'checking table pattern ".foo.bar"');
+
+# Check two-part unreasonable pattern that has zero-length names
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-t', '.' ],
+ 1,
+ [ qr/^$/ ],
+ [ qr/pg_amcheck: no tables to check for "\."/ ],
+ 'checking table pattern "."');
+
+#########################################
+# Test checking non-existent schemas, tables, and indexes
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no_such_schema' ],
+ qr/pg_amcheck: no relations to check in schemas for "no_such_schema"/,
+ 'checking a non-existent schema');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-s', 'no_such_schema' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: in database "postgres": using amcheck version "\d+\.\d+" in schema ".+"/,
+ qr/pg_amcheck: no relations to check in schemas for "no_such_schema"/ ],
+ 'checking a non-existent schema with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no_such_table' ],
+ qr/pg_amcheck: no tables to check for "no_such_table"/,
+ 'checking a non-existent table');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-t', 'no_such_table' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: in database "postgres": using amcheck version "\d+\.\d+" in schema ".+"/,
+ qr/pg_amcheck: no tables to check for "no_such_table"/ ],
+ 'checking a non-existent table with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no_such_index' ],
+ qr/pg_amcheck: no btree indexes to check for "no_such_index"/,
+ 'checking a non-existent index');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-i', 'no_such_index' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: in database "postgres": using amcheck version "\d+\.\d+" in schema ".+"/,
+ qr/pg_amcheck: no btree indexes to check for "no_such_index"/ ],
+ 'checking a non-existent index with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no*such*schema*' ],
+ qr/pg_amcheck: no relations to check in schemas for "no\*such\*schema\*"/,
+ 'no matching schemas');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-s', 'no*such*schema*' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: in database "postgres": using amcheck version "\d+\.\d+" in schema ".+"/,
+ qr/pg_amcheck: no relations to check in schemas for "no\*such\*schema\*"/ ],
+ 'no matching schemas with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no*such*table*' ],
+ qr/pg_amcheck: no tables to check for "no\*such\*table\*"/,
+ 'no matching tables');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-t', 'no*such*table*' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: in database "postgres": using amcheck version "\d+\.\d+" in schema ".+"/,
+ qr/pg_amcheck: no tables to check for "no\*such\*table\*"/ ],
+ 'no matching tables with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no*such*index*' ],
+ qr/pg_amcheck: no btree indexes to check for "no\*such\*index\*"/,
+ 'no matching indexes');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-i', 'no*such*index*' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: in database "postgres": using amcheck version "\d+\.\d+" in schema ".+"/,
+ qr/pg_amcheck: no btree indexes to check for "no\*such\*index\*"/ ],
+ 'no matching indexes with --no-strict-names -v');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..d100460ac9
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,481 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 70;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the test node is running.
+sub corrupt_first_page($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# toast table (if any) corresponding to the given main table relation, and
+# restarts the node.
+#
+# Assumes the test node is running
+sub remove_toast_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $toastname = relation_toast($dbname, $relname);
+ remove_relation_file($dbname, $toastname) if ($toastname);
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+for my $dbname (qw(db1 db2 db3))
+{
+ # Create the database
+ $node->safe_psql('postgres', qq(CREATE DATABASE $dbname));
+
+ # Load the amcheck extension, upon which pg_amcheck depends. Put the
+ # extension in an unexpected location to test that pg_amcheck finds it
+ # correctly. Create tables with names that look like pg_catalog names to
+ # check that pg_amcheck does not get confused by them. Create functions in
+ # schema public that look like amcheck functions to check that pg_amcheck
+ # does not use them.
+ $node->safe_psql($dbname, q(
+ CREATE SCHEMA amcheck_schema;
+ CREATE EXTENSION amcheck WITH SCHEMA amcheck_schema;
+ CREATE TABLE amcheck_schema.pg_database (junk text);
+ CREATE TABLE amcheck_schema.pg_namespace (junk text);
+ CREATE TABLE amcheck_schema.pg_class (junk text);
+ CREATE TABLE amcheck_schema.pg_operator (junk text);
+ CREATE TABLE amcheck_schema.pg_proc (junk text);
+ CREATE TABLE amcheck_schema.pg_tablespace (junk text);
+
+ CREATE FUNCTION public.bt_index_check(index regclass,
+ heapallindexed boolean default false)
+ RETURNS VOID AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong bt_index_check!';
+ END;
+ $$ LANGUAGE plpgsql;
+
+ CREATE FUNCTION public.bt_index_parent_check(index regclass,
+ heapallindexed boolean default false,
+ rootdescend boolean default false)
+ RETURNS VOID AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong bt_index_parent_check!';
+ END;
+ $$ LANGUAGE plpgsql;
+
+ CREATE FUNCTION public.verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip text default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ RETURNS SETOF record AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong verify_heapam!';
+ END;
+ $$ LANGUAGE plpgsql;
+ ));
+
+ # Create schemas, tables and indexes in five separate
+ # schemas. The schemas are all identical to start, but
+ # we will corrupt them differently later.
+ #
+ for my $schema (qw(s1 s2 s3 s4 s5))
+ {
+ $node->safe_psql($dbname, qq(
+ CREATE SCHEMA $schema;
+ CREATE SEQUENCE $schema.seq1;
+ CREATE SEQUENCE $schema.seq2;
+ CREATE TABLE $schema.t1 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE TABLE $schema.t2 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE VIEW $schema.t2_view AS (
+ SELECT i*2, t FROM $schema.t2
+ );
+ ALTER TABLE $schema.t2
+ ALTER COLUMN t
+ SET STORAGE EXTERNAL;
+
+ INSERT INTO $schema.t1 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ INSERT INTO $schema.t2 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ CREATE MATERIALIZED VIEW $schema.t1_mv AS SELECT * FROM $schema.t1;
+ CREATE MATERIALIZED VIEW $schema.t2_mv AS SELECT * FROM $schema.t2;
+
+ create table $schema.p1 (a int, b int) PARTITION BY list (a);
+ create table $schema.p2 (a int, b int) PARTITION BY list (a);
+
+ create table $schema.p1_1 partition of $schema.p1 for values in (1, 2, 3);
+ create table $schema.p1_2 partition of $schema.p1 for values in (4, 5, 6);
+ create table $schema.p2_1 partition of $schema.p2 for values in (1, 2, 3);
+ create table $schema.p2_2 partition of $schema.p2 for values in (4, 5, 6);
+
+ CREATE INDEX t1_btree ON $schema.t1 USING BTREE (i);
+ CREATE INDEX t2_btree ON $schema.t2 USING BTREE (i);
+
+ CREATE INDEX t1_hash ON $schema.t1 USING HASH (i);
+ CREATE INDEX t2_hash ON $schema.t2 USING HASH (i);
+
+ CREATE INDEX t1_brin ON $schema.t1 USING BRIN (i);
+ CREATE INDEX t2_brin ON $schema.t2 USING BRIN (i);
+
+ CREATE INDEX t1_gist ON $schema.t1 USING GIST (b);
+ CREATE INDEX t2_gist ON $schema.t2 USING GIST (b);
+
+ CREATE INDEX t1_gin ON $schema.t1 USING GIN (ia);
+ CREATE INDEX t2_gin ON $schema.t2 USING GIN (ia);
+
+ CREATE INDEX t1_spgist ON $schema.t1 USING SPGIST (ir);
+ CREATE INDEX t2_spgist ON $schema.t2 USING SPGIST (ir);
+ ));
+ }
+}
+
+# Database 'db1' corruptions
+#
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('db1', 's1.t1_btree');
+corrupt_first_page('db1', 's1.t2_btree');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('db1', 's2.t1');
+corrupt_first_page('db1', 's2.t2');
+
+# Corrupt tables, partitions, matviews, and btrees in schema "s3"
+remove_relation_file('db1', 's3.t1');
+corrupt_first_page('db1', 's3.t2');
+
+remove_relation_file('db1', 's3.t1_mv');
+remove_relation_file('db1', 's3.p1_1');
+
+corrupt_first_page('db1', 's3.t2_mv');
+corrupt_first_page('db1', 's3.p2_1');
+
+remove_relation_file('db1', 's3.t1_btree');
+corrupt_first_page('db1', 's3.t2_btree');
+
+# Corrupt toast table, partitions, and materialized views in schema "s4"
+remove_toast_file('db1', 's4.t2');
+
+# Corrupt all other object types in schema "s5". We don't have amcheck support
+# for these types, but we check that their corruption does not trigger any
+# errors in pg_amcheck
+remove_relation_file('db1', 's5.seq1');
+remove_relation_file('db1', 's5.t1_hash');
+remove_relation_file('db1', 's5.t1_gist');
+remove_relation_file('db1', 's5.t1_gin');
+remove_relation_file('db1', 's5.t1_brin');
+remove_relation_file('db1', 's5.t1_spgist');
+
+corrupt_first_page('db1', 's5.seq2');
+corrupt_first_page('db1', 's5.t2_hash');
+corrupt_first_page('db1', 's5.t2_gist');
+corrupt_first_page('db1', 's5.t2_gin');
+corrupt_first_page('db1', 's5.t2_brin');
+corrupt_first_page('db1', 's5.t2_spgist');
+
+
+# Database 'db2' corruptions
+#
+remove_relation_file('db2', 's1.t1');
+remove_relation_file('db2', 's1.t1_btree');
+
+
+# Leave 'db3' uncorrupted
+#
+
+
+# Standard first arguments to TestLib functions
+my @cmd = ('pg_amcheck', '--quiet', '-p', $port);
+
+# The pg_amcheck command itself should return a success exit status, even
+# though tables and indexes are corrupt. An error code returned would mean the
+# pg_amcheck command itself failed, for example because a connection to the
+# database could not be established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_ok(
+ [ @cmd,, 'db1' ],
+ 'pg_amcheck all schemas, tables and indexes in database db1');
+
+$node->command_ok(
+ [ @cmd,, 'db1', 'db2', 'db3' ],
+ 'pg_amcheck all schemas, tables and indexes in databases db1, db2 and db3');
+
+$node->command_ok(
+ [ @cmd, '--all' ],
+ 'pg_amcheck all schemas, tables and indexes in all databases');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-s', 's1' ],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-r', 's*.t1' ],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-i', 's*.t*', '-i', 's*.*btree*' ],
+ 'pg_amcheck all indexes with qualified names matching /s*.t*/ or /s*.*btree*/');
+
+$node->command_ok(
+ [ @cmd, '--no-dependents', 'db1', '-r', 's*.t1' ],
+ 'pg_amcheck all relations with qualified names matching /s*.t1/');
+
+$node->command_ok(
+ [ @cmd, '--no-dependents', 'db1', '-t', 's*.t1' ],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-T', 't1' ],
+ 'pg_amcheck everything except tables named t1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-S', 's1', '-R', 't1' ],
+ 'pg_amcheck everything not named t1 nor in schema s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.*.*' ],
+ 'pg_amcheck all tables across all databases and schemas');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.*.t1' ],
+ 'pg_amcheck all tables named t1 across all databases and schemas');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', '*.s1.*' ],
+ 'pg_amcheck all tables across all databases in schemas named s1');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', 'db2.*.*' ],
+ 'pg_amcheck all tables across all schemas in database db2');
+
+$node->command_ok(
+ [ @cmd, 'db1', '-t', 'db2.*.*', '-t', 'db3.*.*' ],
+ 'pg_amcheck all tables across all schemas in databases db2 and db3');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+
+$node->command_checks_all(
+ [ @cmd, '--all', '-s', 's1', '-i', 't1_btree' ],
+ 0,
+ [ qr/index "t1_btree" lacks a main relation fork/ ],
+ [ qr/pg_amcheck: skipping database "postgres": amcheck is not installed/ ],
+ 'pg_amcheck index s1.t1_btree reports missing main relation fork');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't2_btree' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+# Checking db1.s1 should show no corruptions if indexes are excluded
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/^$/,
+ 'pg_amcheck of db1.s1 excluding indexes');
+
+# Checking db2.s1 should show table corruptions if indexes are excluded
+$node->command_checks_all(
+ [ @cmd, 'db2', '-s', 's1', '--exclude-indexes' ],
+ 0,
+ [ qr/could not open file/ ],
+ [ qr/^$/ ],
+ 'pg_amcheck of db2.s1 excluding indexes');
+
+# Checking db2.s1 should show table corruptions if indexes are excluded
+$node->command_checks_all(
+ [ @cmd, 'db3', 'db2', '-s', 's1', '--exclude-indexes' ],
+ 0,
+ [ qr/could not open file/ ],
+ [ qr/^$/ ],
+ 'pg_amcheck of db1.s1, db2.s1, and db3.s1, excluding indexes');
+
+# In schema s3, the tables and indexes are both corrupt. We should see
+# corruption messages on stdout, nothing on stderr, and an exit
+# status of zero.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's3' ],
+ 0,
+ [ qr/index "t1_btree" lacks a main relation fork/,
+ qr/could not open file/ ],
+ [ qr/^$/ ],
+ 'pg_amcheck schema s3 reports table and index errors');
+
+# In schema s2, only tables are corrupt. Check that table corruption is
+# reported as expected.
+#
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't1' ],
+ qr/could not open file/,
+ 'pg_amcheck in schema s2 reports table corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't2' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck in schema s2 reports table corruption');
+
+# In schema s4, only toast tables are corrupt. Check that under default
+# options the toast corruption is reported, but when excluding toast we get no
+# error reports.
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's4' ],
+ qr/could not open file/,
+ 'pg_amcheck in schema s4 reports toast corruption');
+
+$node->command_like(
+ [ @cmd, '--exclude-toast', '--exclude-toast-pointers', 'db1', '-s', 's4' ],
+ qr/^$/, # Empty
+ 'pg_amcheck in schema s4 excluding toast reports no corruption');
+
+# Check that no corruption is reported in schema s5
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's5' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s5 reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-I', 't1_btree', '-I', 't2_btree' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s1 with corrupt indexes excluded reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '--exclude-indexes' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s1 with all indexes excluded reports no corruption');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's2', '-T', 't1', '-T', 't2' ],
+ qr/^$/, # Empty
+ 'pg_amcheck over schema s2 with corrupt tables excluded reports no corruption');
+
+# Check errors about bad block range command line arguments. We use schema s5
+# to avoid getting messages about corrupt tables or indexes.
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', 'junk' ],
+ qr/relation starting block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage startblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--endblock', '1234junk' ],
+ qr/relation ending block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage endblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', '5', '--endblock', '4' ],
+ qr/relation ending block argument precedes starting block argument/,
+ 'pg_amcheck rejects invalid block range');
+
+# Check bt_index_parent_check alternates. We don't create any index corruption
+# that would behave differently under these modes, so just smoke test that the
+# arguments are handled sensibly.
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--parent-check' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck smoke test --parent-check');
+
+$node->command_like(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--heapallindexed', '--rootdescend' ],
+ qr/index "t1_btree" lacks a main relation fork/,
+ 'pg_amcheck smoke test --heapallindexed --rootdescend');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..a6d3bcfb78
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,496 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 22;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper function to generate a regular expression matching the header we
+# expect verify_heapam() to return given which fields we expect to be non-null.
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ return qr/relation postgres\.public\.test, block $blkno, offset $offnum, attribute $attnum\s+/ms
+ if (defined $attnum);
+ return qr/relation postgres\.public\.test, block $blkno, offset $offnum\s+/ms
+ if (defined $offnum);
+ return qr/relation postgres\.public\.test\s+/ms
+ if (defined $blkno);
+ return qr/relation postgres\.public\.test\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, undef);
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--exclude-indexes', '-p', $port, 'postgres'],
+ 0,
+ [ @expected ],
+ [ qr/pg_amcheck: checking database "postgres"/ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..0c39f2f638
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,54 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 6;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ 0,
+ [ qr/item order invariant violated for index "fickleidx"/ ],
+ [ qr/pg_amcheck: checking database "postgres"/ ],
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..7e101f7c11 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -185,6 +185,7 @@ pages.
</para>
&oid2name;
+ &pgamcheck;
&vacuumlo;
</sect1>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db1d369743..5115cb03d0 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..e7bd066566
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,1029 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<refentry id="pgamcheck">
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle><application>pg_amcheck</application></refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>pg_amcheck</refname>
+ <refpurpose>checks for corruption in one or more
+ <productname>PostgreSQL</productname> databases</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_amcheck</command>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+ <arg rep="repeat"><replaceable>dbname</replaceable></arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <application>pg_amcheck</application> supports running
+ <xref linkend="amcheck"/>'s corruption checking functions against one or
+ more databases, with options to select which schemas, tables and indexes to
+ check, which kinds of checking to perform, and whether to perform the checks
+ in parallel, and if so, the number of parallel connections to establish and
+ use.
+ </para>
+
+ <para>
+ Only table relations and btree indexes are currently supported. Other
+ relation types are silently skipped.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Usage</title>
+
+ <refsect2>
+ <title>Parallelism Options</title>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --jobs=20 --all</literal></term>
+ <listitem>
+ <para>
+ Check relations in all connectable databases, using up to 20
+ simultaneous connections to check databases and relations in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --jobs=8 mydb yourdb</literal></term>
+ <listitem>
+ <para>
+ Check relations in databases <literal>mydb</literal> and
+ <literal>yourdb</literal>, using up to 8 simultaneous connections to
+ check relations in parallel.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Checking Option Specification</title>
+
+ <para>
+ If no checking options are specified, by default all table relation checks
+ and default level btree index checks are performed. A variety of options
+ exist to change the set of checks performed on whichever relations are
+ being checked. They are briefly mentioned here in the following examples,
+ but see their full descriptions below.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck --parent-check --heapallindexed</literal></term>
+ <listitem>
+ <para>
+ For each btree index checked, performs more extensive checks.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --exclude-toast-pointers</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, do not check toast pointers against
+ the toast relation.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --on-error-stop</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, do not continue checking pages after
+ the first page where corruption is encountered.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --skip="all-frozen"</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, skips over blocks marked as all
+ frozen. Note that "all-visible" may also be specified.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --startblock=3000 --endblock=4000</literal></term>
+ <listitem>
+ <para>
+ For each table relation checked, check only blocks in the given block
+ range.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Relation Specification</title>
+
+ <para>
+ If no relations are explicitly listed, by default all relations will be
+ checked, but there are options to specify which relations to check.
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck -r mytable -r yourtable</literal></term>
+ <listitem>
+ <para>
+ If one or more relations are explicitly given, they are interpreted as
+ an exhaustive list of all relations to be checked, with one caveat:
+ for all such relations, associated toast relations and indexes are by
+ default included in the list of relations to check.
+ </para>
+ <para>
+ Assuming <literal>mytable</literal> is an ordinary table, and that it
+ is indexed by <literal>mytable_idx</literal> and has an associated
+ toast table <literal>pg_toast_12345</literal>, checking will be
+ performed on <literal>mytable</literal>,
+ <literal>mytable_idx</literal>, and <literal>pg_toast_12345</literal>.
+ </para>
+ <para>
+ Likewise for <literal>yourtable</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r mytable --no-dependents</literal></term>
+ <listitem>
+ <para>
+ This restricts the list of relations checked to just
+ <literal>mytable</literal>, without pulling in the corresponding
+ indexes or toast, but see also
+ <option>--exclude-toast-pointers</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -t mytable -i myindex</literal></term>
+ <listitem>
+ <para>
+ The <option>-r</option> (<option>--relation</option>) will match any
+ relation, but <option>-t</option> (<option>--table</option>) and
+ <option>-i</option> (<option>--index</option>) may be used to avoid
+ matching objects of the other type.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r="my*" -R="mytemp*"</literal></term>
+ <listitem>
+ <para>
+ Relations may be included (<option>-r</option>) or excluded
+ (<option>-R</option>) using shell-style patterns.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -r="my*" -I="myanmar"</literal></term>
+ <listitem>
+ <para>
+ Table and index inclusion and exclusion patterns may be used
+ equivalently with <option>-t</option>, <option>-T</option>,
+ <option>-i</option> and <option>-I</option>. The above example checks
+ all tables and indexes starting with <literal>my</literal> except for
+ indexes named <literal>myanmar</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -R="india" -T="laos" -I="myanmar"</literal></term>
+ <listitem>
+ <para>
+ Unlike specifying one ore more <option>--relation</option> options,
+ which disables the default behavior of checking all relations,
+ specifying one or more of <option>-R</option>, <option>-T</option> or
+ <option>-I</option> does not. The above command will check all
+ relations not named <literal>india</literal>, not a table named
+ <literal>laos</literal>, nor an index named <literal>myanmar</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Schema Specification</title>
+
+ <para>
+ If no schemas are explicitly listed, by default all schemas will be
+ checked.
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck -s s1 -s s2</literal></term>
+ <listitem>
+ <para>
+ If one or more schemas are listed with <option>-s</option>, all
+ relations in the given schemas are selected.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -S s1 -S s2 -r mytable</literal></term>
+ <listitem>
+ <para>
+ As with relations, schemas may be excluded. The above command will
+ check any table named <literal>mytable</literal> not in schemas
+ <literal>s1</literal> nor in <literal>s2</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -t s3.japan -T s3.korea</literal></term>
+ <listitem>
+ <para>
+ Relations may be included or excluded with a schema-qualified name.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+
+ <refsect2>
+ <title>Database Specification</title>
+
+ <para>
+ If no databases are explicitly listed, the database to check is obtained
+ from environment variables in the usual way. Otherwise, when one or more
+ databases are explicitly given, they are interpreted as an exhaustive list
+ of all databases to be checked. This list of databases to check may
+ contain patterns, but because any such patterns need to be reconciled
+ against a list of all databases to find the matching database names, at
+ least one database specified must be a literal database name and not merely
+ a pattern, and the one so specified must be in a location where
+ <application>pg_amcheck</application> expects to find it.
+ </para>
+ <para>
+ Databases will be checked in the order they are listed. When performing
+ checking in parallel, the checks of relations in multiple databases may
+ overlap.
+ </para>
+ <para>
+ For example:
+ </para>
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><literal>pg_amcheck db1 db2 db3 --maintenance-db=foo</literal></term>
+ <listitem>
+ <para>
+ If the <option>--maintenance-db</option> option is given, it will be
+ used to look up the matching databases, though it will not by virtue of
+ having been listed as the <option>--maintenance-db</option> be added to
+ the list of databases for checking.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck foo bar baz</literal></term>
+ <listitem>
+ <para>
+ Otherwise, if one or more plain database name arguments not preceded by
+ <option>-d</option> or <option>--dbname</option> are given, the first
+ one will be used for this purpose, and it will also be included in the
+ list of databases to check.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck -d foo -d bar baz</literal></term>
+ <listitem>
+ <para>
+ If a mixture of plain database names and databases preceded with
+ <option>-d</option> or <option>--dbname</option> are given, the first
+ plain database name will be used for this purpose. In the above
+ example, <literal>baz</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --dbname=foo --dbname="bar*"</literal></term>
+ <listitem>
+ <para>
+ Otherwise, if one or more databases are given with the
+ <option>-d</option> or <option>--dbname</option> option, the first one
+ will be used and must be a literal database name. In this example,
+ <literal>foo</literal> will be used.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><literal>pg_amcheck --relation="accounts_*.*.*"</literal></term>
+ <listitem>
+ <para>
+ Otherwise, the environment will be consulted for the database to be
+ used. In the example above, the default database will be queried to
+ find all databases with names that begin with
+ <literal>accounts_</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+
+ <para>
+ As discussed above for schema-qualified relations, a database-qualified
+ relation name or pattern may also be given.
+<programlisting>
+pg_amcheck mydb \
+ --schema="t*" \
+ --exclude-schema="tmp*" \
+ --relation=baz \
+ --relation=bar.baz \
+ --relation=foo.bar.baz \
+ --relation="f*.a.b" \
+ --exclude-relation=foo.a.b
+</programlisting>
+ will check relations in database <literal>mydb</literal> using the schema
+ resolution rules discussed above, but additionally will check all relations
+ named <literal>a.b</literal> in all databases with names starting with
+ <literal>f</literal> except database <literal>foo</literal>.
+ </para>
+
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Options</title>
+
+ <para>
+ <application>pg_amcheck</application> accepts the following command-line
+ arguments:
+ </para>
+
+ <refsect2>
+ <title>Help and Version Information Options</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_amcheck</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Print the <application>pg_amcheck</application> version and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-e</option></term>
+ <term><option>--echo</option></term>
+ <listitem>
+ <para>
+ Print to stdout all commands and queries being executed against the
+ server.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-q</option></term>
+ <term><option>--quiet</option></term>
+ <listitem>
+ <para>
+ Do not write additional messages beyond those about corruption.
+ </para>
+ <para>
+ This option does not quiet any output specifically due to the use of
+ the <option>-e</option> <option>--echo</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Increases the log level verbosity. This option may be given more than
+ once.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--progress</option></term>
+ <listitem>
+ <para>
+ Show progress information about how many relations have been checked.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Database Connection and Concurrent Connection Options</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-h</option></term>
+ <term><option>--host=HOSTNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server is running.
+ If the value begins with a slash, it is used as the directory for the
+ Unix domain socket.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-p</option></term>
+ <term><option>--port=PORT</option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file extension on
+ which the server is listening for connections.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-U</option></term>
+ <term><option>--username=USERNAME</option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires password
+ authentication and a password is not available by other means such as
+ a <filename>.pgpass</filename> file, the connection attempt will fail.
+ This option can be useful in batch jobs and scripts where no user is
+ present to enter a password.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_amcheck</application> to prompt for a password
+ before connecting to a database.
+ </para>
+ <para>
+ This option is never essential, since
+ <application>pg_amcheck</application> will automatically prompt for a
+ password if the server demands password authentication. However,
+ <application>pg_amcheck</application> will waste a connection attempt
+ finding out that the server wants a password. In some cases it is
+ worth typing <option>-W</option> to avoid the extra connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--maintenance-db=DBNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to connect to when querying the
+ list of all databases. If not specified, the
+ <literal>postgres</literal> database will be used; if that does not
+ exist <literal>template1</literal> will be used. This can be a
+ <link linkend="libpq-connstring">connection string</link>. If so,
+ connection string parameters will override any conflicting command
+ line options.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-j</option></term>
+ <term><option>--jobs=NUM</option></term>
+ <listitem>
+ <para>
+ Use the specified number of concurrent connections to the server, or
+ one per object to be checked, whichever number is smaller.
+ </para>
+ <para>
+ The default is to use a single connection.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Options Controlling Index Checking Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>-P</option></term>
+ <term><option>--parent-check</option></term>
+ <listitem>
+ <para>
+ For each btree index checked, use <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> function, which performs
+ additional checks of parent/child relationships during index checking.
+ </para>
+ <para>
+ The default is to use <application>amcheck</application>'s
+ <function>bt_index_check</function> function, but note that use of the
+ <option>--rootdescend</option> option implicitly selects
+ <function>bt_index_parent_check</function>.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>-H</option></term>
+ <term><option>--heapallindexed</option></term>
+ <listitem>
+ <para>
+ For each index checked, verify the presence of all heap tuples as index
+ tuples in the index using <application>amcheck</application>'s
+ <option>heapallindexed</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--rootdescend</option></term>
+ <listitem>
+ <para>
+ For each index checked, re-find tuples on the leaf level by performing a
+ new search from the root page for each tuple using
+ <xref linkend="amcheck"/>'s <option>rootdescend</option> option.
+ </para>
+ <para>
+ Use of this option implicitly also selects the <option>-P</option>
+ <option>--parent-check</option> option.
+ </para>
+ <para>
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited use or even
+ of no use in helping detect the kinds of corruption that occur in
+ practice. It may also cause corruption checking to take considerably
+ longer and consume considerably more resources on the server.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Options Controlling Table Checking Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><option>--exclude-toast-pointers</option></term>
+ <listitem>
+ <para>
+ When checking main relations, do not look up entries in toast tables
+ corresponding to toast pointers in the main releation.
+ </para>
+ <para>
+ The default behavior checks each toast pointer encountered in the main
+ table to verify, as much as possible, that the pointer points at
+ something in the toast table that is reasonable. Toast pointers which
+ point beyond the end of the toast table, or to the middle (rather than
+ the beginning) of a toast entry, are identified as corrupt.
+ </para>
+ <para>
+ The process by which <xref linkend="amcheck"/>'s
+ <function>verify_heapam</function> function checks each toast pointer is
+ slow and may be improved in a future release. Some users may wish to
+ disable this check to save time.
+ </para>
+ <para>
+ Note that, despite their similar names, this option is unrelated to the
+ <option>--exclude-toast</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--on-error-stop</option></term>
+ <listitem>
+ <para>
+ After reporting all corruptions on the first page of a table where
+ corruptions are found, stop processing that table relation and move on
+ to the next table or index.
+ </para>
+ <para>
+ Note that index checking always stops after the first corrupt page.
+ This option only has meaning relative to table relations.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--skip=OPTION</option></term>
+ <listitem>
+ <para>
+ If <literal>"all-frozen"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all frozen.
+ </para>
+ <para>
+ If <literal>"all-visible"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all visible.
+ </para>
+ <para>
+ By default, no pages are skipped.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--startblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) pages prior to the given starting block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables, but note
+ that unless <option>--exclude-toast-pointers</option> is given, toast
+ pointers found in the main table will be followed into the toast table
+ without regard for the location in the toast table.
+ </para>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><option>--endblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) all pages after the given ending block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables, with the
+ same caveat about <option>--exclude-toast-pointers</option> as above.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </refsect2>
+
+ <refsect2>
+ <title>Corruption Checking Target Options</title>
+
+ <para>
+ Objects to be checked may span schemas in more than one database. Options
+ for restricting the list of databases, schemas, tables and indexes are
+ described below. In each place where a name may be specified, a
+ <link linkend="app-psql-patterns"><replaceable class="parameter">pattern</replaceable></link>
+ may also be used.
+ </para>
+
+ <para>
+ <variablelist>
+ <varlistentry>
+ <term><option>--all</option></term>
+ <listitem>
+ <para>
+ Perform checking in all databases.
+ </para>
+ <para>
+ In the absence of any other options, selects all objects across all
+ schemas and databases.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-db</option> takes
+ precedence over <option>--all</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-d</option></term>
+ <term><option>--dbname</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for checking. By default, all objects
+ in the matching database(s) will be checked.
+ </para>
+ <para>
+ If no <option>maintenance-db</option> argument is given nor is any
+ database name given as a command line argument, the first argument
+ specified with <option>-d</option> <option>--dbname</option> will be
+ used for the initial connection. If that argument is not a literal
+ database name, the attempt to connect will fail.
+ </para>
+ <para>
+ If <option>--all</option> is also specified, <option>-d</option>
+ <option>--dbname</option> does not affect which databases are checked,
+ but may be used to specify the database for the initial connection.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-db</option> takes
+ precedence over <option>-d</option> <option>--dbname</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--dbname=africa</literal></member>
+ <member><literal>--dbname="a*"</literal></member>
+ <member><literal>--dbname="africa|asia|europe"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-D</option></term>
+ <term><option>--exclude-db</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for exclusion.
+ </para>
+ <para>
+ If a database which is included using <option>--all</option> or
+ <option>-d</option> <option>--dbname</option> is also excluded using
+ <option>-D</option> <option>--exclude-db</option>, the database will be
+ excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--exclude-db=america</literal></member>
+ <member><literal>--exclude-db="*pacific*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--schema</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified schema(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for checking. By default, all objects in
+ the matching schema(s) will be checked.
+ </para>
+ <para>
+ Option <option>-S</option> <option>--exclude-schema</option> takes
+ precedence over <option>-s</option> <option>--schema</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--schema=corp</literal></member>
+ <member><literal>--schema="corp|llc|npo"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-S</option></term>
+ <term><option>--exclude-schema</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified schema.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for exclusion.
+ </para>
+ <para>
+ If a schema which is included using
+ <option>-s</option> <option>--schema</option> is also excluded using
+ <option>-S</option> <option>--exclude-schema</option>, the schema will
+ be excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>-S corp -S llc</literal></member>
+ <member><literal>--exclude-schema="*c*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-r</option></term>
+ <term><option>--relation</option></term>
+ <listitem>
+ <para>
+ Perform checking on the specified relation(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ relation (or relation pattern) for checking.
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>.
+ </para>
+ <para>
+ If the relation is not schema qualified, it will apply to relations in
+ all schemas.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--relation=accounts_idx</literal></member>
+ <member><literal>--relation="llc.accounts_idx"</literal></member>
+ <member><literal>--relation="asia|africa.corp|llc.accounts_idx"</literal></member>
+ </simplelist>
+ </para>
+ <para>
+ The first example, <literal>--relation=accounts_idx</literal>, checks
+ relations named <literal>accounts_idx</literal> in all schemas across
+ all databases being checked.
+ </para>
+ <para>
+ The second example, <literal>--relation="llc.accounts_idx"</literal>,
+ checks relations named <literal>accounts_idx</literal> in schema
+ <literal>llc</literal> across all databases being checked.
+ </para>
+ <para>
+ The third example,
+ <literal>--relation="asia|africa.corp|llc.accounts_idx"</literal>,
+ checks relations named <literal>accounts_idx</literal> in schemas
+ <literal>corp</literal> and <literal>llc</literal> in databases
+ <literal>asia</literal> and <literal>africa</literal>, which is
+ equivalent to listing four separate relations: <literal>-r
+ "asia.corp.accounts_idx" -r "asia.llc.accounts_idx" -r
+ "africa.corp.accounts_idx" -r "africa.llc.accounts_idx"</literal>
+ </para>
+ <para>
+ Note that if a database is implicated in a relation pattern, such as
+ <literal>asia</literal> and <literal>africa</literal> in the third
+ example above, the database need not be otherwise given in the command
+ arguments for the relation to be checked. As an extreme example of
+ this:
+ <simplelist>
+ <member><literal>pg_amcheck --relation="*.*.*" mydb</literal></member>
+ </simplelist>
+ will check all relations in all databases. The <literal>mydb</literal>
+ argument only serves to tell <application>pg_amcheck</application> the
+ name of the database to use for querying the list of all databases.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-R</option></term>
+ <term><option>--exclude-relation</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified relation(s).
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>,
+ <option>-t</option> <option>--table</option> and <option>-i</option>
+ <option>--index</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t</option></term>
+ <term><option>--table</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified tables(s). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T</option></term>
+ <term><option>--exclude-table</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified tables(s). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-i</option></term>
+ <term><option>--index</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified index(es). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-I</option></term>
+ <term><option>--exclude-index</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified index(es). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-dependents</option></term>
+ <listitem>
+ <para>
+ When calculating the list of objects to be checked, do not
+ automatically expand the list to include associated indexes and toast
+ tables of elements otherwise in the list.
+ </para>
+ <para>
+ By default, for each main table relation checked, any associated toast
+ table and all associated indexes are also checked, unless explicitly
+ excluded.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-strict-names</option></term>
+ <listitem>
+ <para>
+ When calculating the list of databases to check, and the objects within
+ those databases to be checked, do not raise an error for database,
+ schema, relation, table, nor index inclusion patterns which match no
+ corresponding objects.
+ </para>
+ <para>
+ Exclusion patterns are not required to match any objects, but by
+ default unmatched inclusion patterns raise an error, including when
+ they fail to match as a result of an exclusion pattern having
+ prohibited them matching an existent object, and when they fail to
+ match a database because it is unconnectable (datallowconn is false).
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </refsect2>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+
+ <para>
+ <application>pg_amcheck</application> is designed to work with
+ <productname>PostgreSQL</productname> 14.0 and later.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Author</title>
+
+ <para>
+ Mark Dilger <email>mark.dilger@enterprisedb.com</email>
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="amcheck"/></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index ea3af48777..49ad558b74 100644
--- a/src/tools/msvc/Install.pm
+++ b/src/tools/msvc/Install.pm
@@ -18,7 +18,7 @@ our (@ISA, @EXPORT_OK);
@EXPORT_OK = qw(Install);
my $insttype;
-my @client_contribs = ('oid2name', 'pgbench', 'vacuumlo');
+my @client_contribs = ('oid2name', 'pg_amcheck', 'pgbench', 'vacuumlo');
my @client_program_files = (
'clusterdb', 'createdb', 'createuser', 'dropdb',
'dropuser', 'ecpg', 'libecpg', 'libecpg_compat',
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 49614106dc..f680544e07 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bab4f3adb3..531b9e2a00 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -403,6 +403,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnParams
ConnStatusType
ConnType
ConnectionStateEnum
@@ -498,6 +499,7 @@ DSA
DWORD
DataDumperPtr
DataPageDeleteStack
+DatabaseInfo
DateADT
Datum
DatumTupleFields
@@ -2082,6 +2084,7 @@ RelToCluster
RelabelType
Relation
RelationData
+RelationInfo
RelationPtr
RelationSyncEntry
RelcacheCallbackFunction
@@ -2846,6 +2849,7 @@ ambuildempty_function
ambuildphasename_function
ambulkdelete_function
amcanreturn_function
+amcheckOptions
amcostestimate_function
amendscan_function
amestimateparallelscan_function
--
2.21.1 (Apple Git-122.3)
v39-0002-Extending-PostgresNode-to-test-corruption.patchapplication/octet-stream; name=v39-0002-Extending-PostgresNode-to-test-corruption.patch; x-unix-mode=0644Download
From ae03f0a5c9a66bfeb6d4e1be07a41608aca0cf48 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:37:58 -0800
Subject: [PATCH v39 2/2] Extending PostgresNode to test corruption.
PostgresNode now has functions for overwriting relation files
with full or partial prior versions of those files, creating
corruption beyond merely twiddling the bits of a heap relation
file.
Adding a regression test for pg_amcheck based on this new
functionality.
---
contrib/pg_amcheck/t/006_relfile_damage.pl | 135 +++++++++
src/test/modules/Makefile | 1 +
src/test/modules/corruption/Makefile | 16 ++
.../modules/corruption/t/001_corruption.pl | 83 ++++++
src/test/perl/PostgresNode.pm | 261 ++++++++++++++++++
5 files changed, 496 insertions(+)
create mode 100644 contrib/pg_amcheck/t/006_relfile_damage.pl
create mode 100644 src/test/modules/corruption/Makefile
create mode 100644 src/test/modules/corruption/t/001_corruption.pl
diff --git a/contrib/pg_amcheck/t/006_relfile_damage.pl b/contrib/pg_amcheck/t/006_relfile_damage.pl
new file mode 100644
index 0000000000..d997db5b63
--- /dev/null
+++ b/contrib/pg_amcheck/t/006_relfile_damage.pl
@@ -0,0 +1,135 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 27;
+use PostgresNode;
+
+my ($node, $port);
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create a table with a btree index. Use a fillfactor for the table and index
+# that will allow some fraction of updates to be on the original pages and some
+# on new pages.
+#
+$node->safe_psql('postgres', qq(
+create schema t;
+create table t.t1 (id integer, t text) with (fillfactor=75);
+alter table t.t1 alter column t set storage external;
+insert into t.t1 select gs, repeat('x',gs) from generate_series(9990,10000) gs;
+create index t1_idx on t.t1 (id) with (fillfactor=75);
+));
+
+my $toastrel = relation_toast('postgres', 't.t1');
+
+# Flush relation files to disk and take snapshots of the toast and index
+#
+$node->restart;
+$node->take_relfile_snapshot_minimal('postgres', 'idx', 't.t1_idx');
+$node->take_relfile_snapshot_minimal('postgres', 'toast', $toastrel);
+
+# Insert new data into the table and index
+#
+$node->safe_psql('postgres', qq(
+insert into t.t1 select gs, repeat('y',gs) from generate_series(10001,10100) gs;
+));
+
+# Revert index. The reverted snapshot file is not corrupt, but it also
+# does not match the current contents of the table.
+#
+$node->stop;
+$node->revert_to_snapshot('idx');
+
+# Restart the node and check table and index with varying options.
+#
+$node->start;
+
+# Checks which do not reconcile the index and table via --heapallindexed will
+# not notice any problems
+#
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--parent-check' ],
+ qr/^$/,
+ 'pg_amcheck reverted index with --parent-check');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--rootdescend' ],
+ qr/^$/,
+ 'pg_amcheck reverted index with --rootdescend');
+
+# Checks which do reconcile the index and table via --heapallindexed will
+# notice the mismatch in their contents
+#
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed' ],
+ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/,
+ 'pg_amcheck reverted index with --heapallindexed');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed', '--rootdescend' ],
+ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/,
+ 'pg_amcheck reverted index with --heapallindexed --rootdescend');
+
+# Revert the toast. The reverted toast table is not corrupt, but it does not
+# have entries for all toast pointers in the main table
+#
+$node->stop;
+$node->revert_to_snapshot('toast');
+
+# Restart the node and check table and toast with varying options. When
+# checking the toast pointers, we may get errors produced by verify_heapam, but
+# we may also get errors from failure to read toast blocks that are beyond the
+# end of the toast table, of the form /ERROR: could not read block/. To avoid
+# having a brittle test, we accept any error message.
+#
+$node->start;
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', $toastrel ],
+ qr/^$/,
+ 'pg_amcheck reverted toast table');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--exclude-toast-pointers' ],
+ qr/^$/,
+ 'pg_amcheck with reverted toast using --exclude-toast-pointers');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/.+/, # Any non-empty error message is acceptable
+ 'pg_amcheck with reverted toast and default checking');
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 5391f461a2..c92d1702b4 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -7,6 +7,7 @@ include $(top_builddir)/src/Makefile.global
SUBDIRS = \
brin \
commit_ts \
+ corruption \
delay_execution \
dummy_index_am \
dummy_seclabel \
diff --git a/src/test/modules/corruption/Makefile b/src/test/modules/corruption/Makefile
new file mode 100644
index 0000000000..ba461c645d
--- /dev/null
+++ b/src/test/modules/corruption/Makefile
@@ -0,0 +1,16 @@
+# src/test/modules/corruption/Makefile
+
+# EXTRA_INSTALL = contrib/pg_amcheck
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/corruption
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/corruption/t/001_corruption.pl b/src/test/modules/corruption/t/001_corruption.pl
new file mode 100644
index 0000000000..ae4a262e06
--- /dev/null
+++ b/src/test/modules/corruption/t/001_corruption.pl
@@ -0,0 +1,83 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 10;
+use PostgresNode;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create something non-trivial for the first snapshot
+$node->safe_psql('postgres', qq(
+create table t1 (id integer, short_text text, long_text text);
+insert into t1 (id, short_text, long_text)
+ (select gs, 'foo', repeat('x', gs)
+ from generate_series(1,10000) gs);
+create unique index idx1 on t1 (id, short_text);
+vacuum freeze;
+));
+
+# Flush relation files to disk and take snapshot of them
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap1', 'public.t1');
+
+# Update data in the table, toast table, and index
+$node->safe_psql('postgres', qq(
+update t1 set
+ short_text = 'bar',
+ long_text = repeat('y', id);
+));
+
+# Flush relation files to disk and take second snapshot
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap2', 'public.t1');
+
+# Revert the first page of t1 using a torn snapshot. This should be a partial
+# and corrupt reverting of the update.
+$node->stop;
+$node->revert_to_torn_relfile_snapshot('snap1', 8192);
+
+# Restart the node and count the number of rows in t1 with the original
+# (pre-update) values. It should not be zero, but nor will it be the full
+# 10000.
+$node->start;
+my ($old, $new, $oldtoast, $newtoast) = counts();
+ok($old > 0 && $old < 10000, "Torn snapshot reverts some of the main updates");
+ok($new > 0 && $new <= 10000, "Torn snapshot retains some of the main updates");
+
+# Revert t1 fully to the first snapshot. This should fully restore the
+# original (pre-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap1');
+
+# Restart the node and verify only old values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 10000, "Full snapshot restores all the old main values");
+is($oldtoast, 10000, "Full snapshot restores all the old toast values");
+is($new, 0, "Full snapshot reverts all the new main values");
+is($newtoast, 0, "Full snapshot reverts all the new toast values");
+
+# Restore t1 fully to the second snapshot. This should fully restore the
+# new (post-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap2');
+
+# Restart the node and verify only new values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 0, "Full snapshot reverts all the old main values");
+is($oldtoast, 0, "Full snapshot reverts all the old toast values");
+is($new, 10000, "Full snapshot restores all the new main values");
+is($newtoast, 10000, "Full snapshot restores all the new toast values");
+
+sub counts {
+ return map {
+ $node->safe_psql('postgres', qq(select count(*) from t1 where $_))
+ } ("short_text = 'foo'",
+ "short_text = 'bar'",
+ "long_text ~ 'x'",
+ "long_text ~ 'y'");
+}
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..d470af93c5 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2225,6 +2225,267 @@ sub pg_recvlogical_upto
=back
+=head1 DATABASE CORRUPTION METHODS
+
+=over
+
+=item $node->relfile_snapshot_repository()
+
+The path to the parent directory of all directories storing snapshots of
+relation backing files.
+
+=cut
+
+sub relfile_snapshot_repository
+{
+ my ($self) = @_;
+ my $snaprepo = join('/', $self->basedir, 'snapshot');
+ unless (-d $snaprepo)
+ {
+ mkdir $snaprepo
+ or $!{EEXIST}
+ or BAIL_OUT("could not create snapshot repository directory \"$snaprepo\": $!");
+ }
+ return $snaprepo;
+}
+
+=pod
+
+=item $node->relfile_snapshot_directory(snapname)
+
+The path to the directory for storing the named snapshot.
+
+=cut
+
+sub relfile_snapshot_directory
+{
+ my ($self, $snapname) = @_;
+
+ join("/", $self->relfile_snapshot_repository(), $snapname);
+}
+
+=pod
+
+=item $node->take_relfile_snapshot($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relname>, the associated
+toast relations (if any), and all associated indexes (if any). No attempt is
+made to flush these files to disk, meaning the snapshot taken could be stale
+unless the caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relations.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+=pod
+
+=item $node->take_relfile_snapshot_minimal($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relnames>. No attempt is made
+to flush these files to disk, meaning the snapshot taken could be stale unless the
+caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relation.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+sub take_relfile_snapshot
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 1, @relnames);
+}
+
+sub take_relfile_snapshot_minimal
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 0, @relnames);
+}
+
+sub take_relfile_snapshot_helper
+{
+ my ($self, $dbname, $snapname, $extended, @relnames) = @_;
+
+ croak "dbname must be specified" unless defined $dbname;
+ croak "relnames must be defined" unless scalar(grep { defined $_ } @relnames);
+ croak "snapname must be specified" unless defined $snapname;
+ croak "snapname must be unique" if exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snapdir = $self->relfile_snapshot_directory($snapname);
+ croak "snapname directory name already in use: $snapdir" if (-e $snapdir);
+ mkdir $snapdir
+ or BAIL_OUT("could not create snapshot directory \"$snapdir\": $!");
+
+ my @relpaths = map {
+ $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$_')));
+ } @relnames;
+
+ my (@toastpaths, @idxpaths);
+ if ($extended)
+ {
+ for my $relname (@relnames)
+ {
+ push (@toastpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(c.reltoastrelid)
+ FROM pg_catalog.pg_class c
+ WHERE c.oid = '$relname'::regclass
+ AND c.reltoastrelid != 0::oid))));
+ push (@idxpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(i.indexrelid)
+ FROM pg_catalog.pg_index i
+ WHERE i.indrelid = '$relname'::regclass))));
+ }
+ }
+
+ $self->{snapshot}->{$snapname} = {};
+ for my $path (@relpaths, grep { defined($_) } @toastpaths, @idxpaths)
+ {
+ croak "file backing relation is missing: $pgdata/$path" unless -f "$pgdata/$path";
+ copy_file($snapdir, $pgdata, 0, $path);
+ $self->{snapshot}->{$snapname}->{$path} = 1;
+ }
+}
+
+=pod
+
+=item $node->revert_to_snapshot($self, $snapname)
+
+Overwrites the database's relation files with files previously saved in
+B<$snapname>.
+
+Dies if the given B<$snapname> does not exist.
+
+=cut
+
+=pod
+
+=item $node->revert_to_torn_relfile_snapshot($self, $snapname, $bytes)
+
+Partially overwrites the database's relation files using prefixes of the given
+number of bytes from the files saved in B<$snapname>. If B<$bytes> is
+negative, uses suffixes of the given byte length rather than prefixes.
+
+If B<$bytes> is null, replaces the database's relation files using the saved
+files in the B<$snapname>, which unlike for non-undef values, means the file
+may become shorter if the saved file is shorter than the current file.
+
+=cut
+
+sub revert_to_snapshot
+{
+ my ($self, $snapname) = @_;
+ $self->revert_to_torn_relfile_snapshot($snapname, undef);
+}
+
+sub revert_to_torn_relfile_snapshot
+{
+ my ($self, $snapname, $bytes) = @_;
+
+ croak "no such snapshot" unless exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snaprepo = join('/', $self->relfile_snapshot_repository, $snapname);
+ croak "snapname directory missing: $snaprepo" unless (-d $snaprepo);
+
+ if (defined $bytes)
+ {
+ tear_file($pgdata, $snaprepo, $bytes, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+ else
+ {
+ copy_file($pgdata, $snaprepo, 1, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+}
+
+sub copy_file
+{
+ my ($dstdir, $srcdir, $overwrite, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ foreach my $part (split(m{/}, $path))
+ {
+ my $srcpart = "$srcdir/$part";
+ my $dstpart = "$dstdir/$part";
+
+ if (-d $srcpart)
+ {
+ $srcdir = $srcpart;
+ $dstdir = $dstpart;
+ die "$dstdir is in the way" if (-e $dstdir && ! -d $dstdir);
+ unless (-d $dstdir)
+ {
+ mkdir $dstdir
+ or BAIL_OUT("could not create directory \"$dstdir\": $!");
+ }
+ }
+ elsif (-f $srcpart)
+ {
+ die "$dstdir/$part is in the way" if (!$overwrite && -e "$dstdir/$part");
+
+ File::Copy::copy($srcpart, "$dstdir/$part");
+ }
+ }
+}
+
+sub tear_file
+{
+ my ($dstdir, $srcdir, $bytes, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ my $srcfile = "$srcdir/$path";
+ my $dstfile = "$dstdir/$path";
+
+ croak "No such file: $srcfile" unless -f $srcfile;
+ croak "No such file: $dstfile" unless -f $dstfile;
+
+ my ($srcfh, $dstfh);
+ open($srcfh, '<', $srcfile) or die "Cannot read $srcfile: $!";
+ open($dstfh, '+<', $dstfile) or die "Cannot modify $dstfile: $!";
+ binmode($srcfh);
+ binmode($dstfh);
+
+ my $buffer;
+ if ($bytes < 0)
+ {
+ $bytes *= -1; # Easier to use positive value
+ my $srcsize = (stat($srcfh))[7];
+ my $offset = $srcsize - $bytes;
+ seek($srcfh, $offset, 0);
+ seek($dstfh, $offset, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+ else
+ {
+ seek($srcfh, 0, 0);
+ seek($dstfh, 0, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+
+ close($srcfh);
+ close($dstfh);
+}
+
+=pod
+
+=back
+
=cut
1;
--
2.21.1 (Apple Git-122.3)
On Wed, Feb 17, 2021 at 1:46 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
It will reconnect and retry a command one time on error. That should cover the case that the connection to the database was merely lost. If the second attempt also fails, no further retry of the same command is attempted, though commands for remaining relation targets will still be attempted, both for the database that had the error and for other remaining databases in the list.
Assuming something is wrong with "db2", the command `pg_amcheck db1 db2 db3` could result in two failures per relation in db2 before finally moving on to db3. That seems pretty awful considering how many relations that could be, but failing to soldier on in the face of errors seems a strange design for a corruption checking tool.
That doesn't seem right at all. I think a PQsendQuery() failure is so
remote that it's probably justification for giving up on the entire
operation. If it's caused by a problem with some object, it probably
means that accessing that object caused the whole database to go down,
and retrying the object will take the database down again. Retrying
the object is betting that the user interrupted connectivity between
pg_amcheck and the database but the interruption is only momentary and
the user actually wants to complete the operation. That seems unlikely
to me. I think it's far more probably that the database crashed or got
shut down and continuing is futile.
My proposal is: if we get an ERROR trying to *run* a query, give up on
that object but still try the other ones after reconnecting. If we get
a FATAL or PANIC trying to *run* a query, give up on the entire
operation. If even sending a query fails, also give up.
In v39, exit(1) is used for all errors which are intended to stop the program. It is important to recognize that finding corruption is not an error in this sense. A query to verify_heapam() can fail if the relation's checksums are bad, and that happens beyond verify_heapam()'s control when the page is not allowed into the buffers. There can be errors if the file backing a relation is missing. There may be other corruption error cases that I have not yet thought about. The connections' errors get reported to the user, but pg_amcheck does not exit as a consequence of them. As discussed above, failing to send the query to the server is not viewed as a reason to exit, either. It would be hard to quantify all the failure modes, but presumably the catalogs for a database could be messed up enough to cause such failures, and I'm not sure that pg_amcheck should just abort.
I agree that exit(1) should happen after any error intended to stop
the program. But I think it should also happen at the end of the run
if we hit any problems for which we did not stop, so that exit(0)
means your database is healthy.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Feb 17, 2021 at 1:46 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
Reworking the code took a while. Version 39 patches attached.
Regarding the documentation, I think the Usage section at the top is
far too extensive and duplicates the option description section to far
too great an extent. You have 21 usage examples for a command with 34
options. Even if we think it's a good idea to give a brief summary of
usage, it's got to be brief; we certainly don't need examples of
obscure special-purpose options like --maintenance-db here. Looking
through the commands in "PostgreSQL Client Applications" and
"Additional Supplied Programs," most of them just have a synopsis
section and nothing like this Usage section. Those that do have a
Usage section typically use it for a narrative description of what to
do with the tool (e.g. see pg_test_timing), not a long list of
examples. I'm inclined to think you should nuke all the examples and
incorporate the descriptive text, to the extent that it's needed,
either into the descriptions of the individual options or, if the
behavior spans many options, into the Description section.
A few of these examples could move down into an Examples section at
the bottom, perhaps, but I think 21 is still too many. I'd try to
limit it to 5-7. Just hit the highlights.
I also think that perhaps it's not best to break up the list of
options into so many different categories the way you have. Notice
that for example pg_dump and psql don't do this, instead putting
everything into one ordered list, despite also having a lot of
options. This is arguably worse if you want to understand which
options are related to each other, but it's better if you are just
looking for something based on alphabetical order.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Feb 17, 2021, at 12:56 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 17, 2021 at 1:46 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:It will reconnect and retry a command one time on error. That should cover the case that the connection to the database was merely lost. If the second attempt also fails, no further retry of the same command is attempted, though commands for remaining relation targets will still be attempted, both for the database that had the error and for other remaining databases in the list.
Assuming something is wrong with "db2", the command `pg_amcheck db1 db2 db3` could result in two failures per relation in db2 before finally moving on to db3. That seems pretty awful considering how many relations that could be, but failing to soldier on in the face of errors seems a strange design for a corruption checking tool.
That doesn't seem right at all. I think a PQsendQuery() failure is so
remote that it's probably justification for giving up on the entire
operation. If it's caused by a problem with some object, it probably
means that accessing that object caused the whole database to go down,
and retrying the object will take the database down again. Retrying
the object is betting that the user interrupted connectivity between
pg_amcheck and the database but the interruption is only momentary and
the user actually wants to complete the operation. That seems unlikely
to me. I think it's far more probably that the database crashed or got
shut down and continuing is futile.My proposal is: if we get an ERROR trying to *run* a query, give up on
that object but still try the other ones after reconnecting. If we get
a FATAL or PANIC trying to *run* a query, give up on the entire
operation. If even sending a query fails, also give up.
This is changed in v40 as you propose to exit on FATAL and PANIC level errors and on error to send a query. On lesser errors (which includes all corruption reports about btrees and some heap corruption related errors), the slot's connection is still useable, I think. Are there cases where the error is lower than FATAL and yet the connection needs to be reestablished? It does not seem so from the testing I have done, but perhaps I'm not thinking of the right sort of non-fatal error?
(I'll wait to post v40 until after hearing your thoughts on this.)
In v39, exit(1) is used for all errors which are intended to stop the program. It is important to recognize that finding corruption is not an error in this sense. A query to verify_heapam() can fail if the relation's checksums are bad, and that happens beyond verify_heapam()'s control when the page is not allowed into the buffers. There can be errors if the file backing a relation is missing. There may be other corruption error cases that I have not yet thought about. The connections' errors get reported to the user, but pg_amcheck does not exit as a consequence of them. As discussed above, failing to send the query to the server is not viewed as a reason to exit, either. It would be hard to quantify all the failure modes, but presumably the catalogs for a database could be messed up enough to cause such failures, and I'm not sure that pg_amcheck should just abort.
I agree that exit(1) should happen after any error intended to stop
the program. But I think it should also happen at the end of the run
if we hit any problems for which we did not stop, so that exit(0)
means your database is healthy.
In v40, exit(1) means the program encountered fatal errors leading it to stop, and exit(2) means that a non-fatal error and/or corruption reports occurred somewhere during the processing. Otherwise, exit(0) means your database was successfully checked and is healthy.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, Feb 23, 2021 at 12:38 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
This is changed in v40 as you propose to exit on FATAL and PANIC level errors and on error to send a query. On lesser errors (which includes all corruption reports about btrees and some heap corruption related errors), the slot's connection is still useable, I think. Are there cases where the error is lower than FATAL and yet the connection needs to be reestablished? It does not seem so from the testing I have done, but perhaps I'm not thinking of the right sort of non-fatal error?
I think you should assume that if you get an ERROR you can - and
should - continue to use the connection, but still exit non-zero at
the end. Perhaps one can contrive some scenario where that's not the
case, but if the server does the equivalent of "ERROR: session
permanently borked" we should really change those to FATAL; I think
you can discount that possibility.
In v40, exit(1) means the program encountered fatal errors leading it to stop, and exit(2) means that a non-fatal error and/or corruption reports occurred somewhere during the processing. Otherwise, exit(0) means your database was successfully checked and is healthy.
wfm.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Feb 24, 2021, at 10:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Feb 23, 2021 at 12:38 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:This is changed in v40 as you propose to exit on FATAL and PANIC level errors and on error to send a query. On lesser errors (which includes all corruption reports about btrees and some heap corruption related errors), the slot's connection is still useable, I think. Are there cases where the error is lower than FATAL and yet the connection needs to be reestablished? It does not seem so from the testing I have done, but perhaps I'm not thinking of the right sort of non-fatal error?
I think you should assume that if you get an ERROR you can - and
should - continue to use the connection, but still exit non-zero at
the end. Perhaps one can contrive some scenario where that's not the
case, but if the server does the equivalent of "ERROR: session
permanently borked" we should really change those to FATAL; I think
you can discount that possibility.
Ok, that's how I had it, so no changes necessary.
In v40, exit(1) means the program encountered fatal errors leading it to stop, and exit(2) means that a non-fatal error and/or corruption reports occurred somewhere during the processing. Otherwise, exit(0) means your database was successfully checked and is healthy.
Other changes in v40 per our off-list discussions but not related to your on-list review comments:
Removed option --no-tables.
Removed option --no-dependents. This was a synonym for the combination of --exclude-toast and --exclude-indexes, but having such a synonym isn't all that helpful.
Renamed --exclude-toast to --no-toast-expansion and changed its behavior a bit. Likewise, renamed --exclude-indexes to --no-index-expansion and change behavior. The behavioral changes are that these options now only have the effect of not automatically expanding the list of relations to check to include toast or indexes associated with relations already in the list. The prior names didn't exclusively mean that, and the behavior didn't exclusively do that.
Updated the docs per your other review email.
Implemented --progress to behave much more like how it does in pg_basebackup.
Attachments:
v40-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patchapplication/octet-stream; name=v40-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patch; x-unix-mode=0644Download
From 8c8fd04f48ff80cfbc7331a8a819555cab38f27b Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 22 Feb 2021 08:55:56 -0800
Subject: [PATCH v40 1/3] Reworking ParallelSlots for mutliple DB use
The existing implementation of ParallelSlots is used by reindexdb
and vacuumdb to process tables in parallel in only one database at
a time. The ParallelSlots interface reflects this usage pattern.
The function to set up the slots assumes all slots should be
connected to the same database, and the function for getting the
next idle slot pays no attention to which database the slot may be
connected to.
In anticipation of pg_amcheck using parallel slots to process
multiple databases in parallel, reworking the interface while
trying to remain reasonably simple for reindexdb and vacuumdb to
use:
ParallelSlotsSetup() is replaced by two functions,
ParallelSlotsSetupOneDB() and ParallelSlotsSetupMinimal(). The
former establishes database connections for all slots much as the
old ParallelSlotsSetup() did, and the latter delays connecting to
databases until a slot is requested.
ParallelSlotsGetIdle() is extended to take arguments about the
database connection desired and to manage a heterogeneous set of
slots potentially containing slots connected to varying databases
and some slots not yet connected. The function will reuse an
existing connection or form a new connection as necessary.
The logic for determining whether a slot's connection is suitable
for reuse is unfortunately a little more complicated than I was
hoping, using a ConnParams struct to identify the desired database
rather than a simple database name.
For callers like reindexdb and vacuumdb, they pass NULL, and any
existing connection is considered suitable. This matches their
historical behavior and is simple.
For callers like pg_amcheck, they pass the cparams they want used to
open a new connection (if necessary) or to select an existing
connection if one matches. Byte-for-byte equality between the
connection parameter strings is used to determine if an existing
connection is suitable to satisfy a slot request. In practice,
there are multiple ways to format connection parameter strings that
would result in the same database/host/port/user, but the
implementation does not attempt to determine ConnParams equivalence
beyond bytewise equality.
---
src/bin/scripts/reindexdb.c | 5 +-
src/bin/scripts/vacuumdb.c | 35 +-
src/fe_utils/parallel_slot.c | 506 +++++++++++++++++++++------
src/include/fe_utils/parallel_slot.h | 18 +-
src/tools/pgindent/typedefs.list | 1 +
5 files changed, 427 insertions(+), 138 deletions(-)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 9f072ac49a..712102f521 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -445,7 +445,8 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
Assert(process_list != NULL);
- slots = ParallelSlotsSetup(cparams, progname, echo, conn, concurrentCons);
+ slots = ParallelSlotsSetupOneDB(cparams, progname, echo, conn,
+ concurrentCons, NULL);
cell = process_list->head;
do
@@ -459,7 +460,7 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
goto finish;
}
- free_slot = ParallelSlotsGetIdle(slots, concurrentCons);
+ free_slot = ParallelSlotsGetIdle(slots, concurrentCons, cparams, progname, echo, NULL);
if (!free_slot)
{
failed = true;
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 602fd45c42..ddd104aa25 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -428,6 +428,7 @@ vacuum_one_database(const ConnParams *cparams,
bool failed = false;
bool tables_listed = false;
bool has_where = false;
+ const char *initcmd;
const char *stage_commands[] = {
"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
"SET default_statistics_target=10; RESET vacuum_cost_delay;",
@@ -684,26 +685,25 @@ vacuum_one_database(const ConnParams *cparams,
concurrentCons = 1;
/*
- * Setup the database connections. We reuse the connection we already have
- * for the first slot. If not in parallel mode, the first slot in the
- * array contains the connection.
+ * All slots need to be prepared to run the appropriate analyze stage, if
+ * caller requested that mode. We have to prepare the initial connection
+ * ourselves before setting up the slots.
*/
- slots = ParallelSlotsSetup(cparams, progname, echo, conn, concurrentCons);
+ if (stage == ANALYZE_NO_STAGE)
+ initcmd = NULL;
+ else
+ {
+ initcmd = stage_commands[stage];
+ executeCommand(conn, initcmd, echo);
+ }
/*
- * Prepare all the connections to run the appropriate analyze stage, if
- * caller requested that mode.
+ * Setup the database connections. We reuse the connection we already have
+ * for the first slot. If not in parallel mode, the first slot in the
+ * array contains the connection.
*/
- if (stage != ANALYZE_NO_STAGE)
- {
- int j;
-
- /* We already emitted the message above */
-
- for (j = 0; j < concurrentCons; j++)
- executeCommand((slots + j)->connection,
- stage_commands[stage], echo);
- }
+ slots = ParallelSlotsSetupOneDB(cparams, progname, echo, conn,
+ concurrentCons, initcmd);
initPQExpBuffer(&sql);
@@ -719,7 +719,8 @@ vacuum_one_database(const ConnParams *cparams,
goto finish;
}
- free_slot = ParallelSlotsGetIdle(slots, concurrentCons);
+ free_slot = ParallelSlotsGetIdle(slots, concurrentCons, cparams,
+ progname, echo, initcmd);
if (!free_slot)
{
failed = true;
diff --git a/src/fe_utils/parallel_slot.c b/src/fe_utils/parallel_slot.c
index b625deb254..e71104ff78 100644
--- a/src/fe_utils/parallel_slot.c
+++ b/src/fe_utils/parallel_slot.c
@@ -25,20 +25,91 @@
#include "common/logging.h"
#include "fe_utils/cancel.h"
#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
#define ERRCODE_UNDEFINED_TABLE "42P01"
-static void init_slot(ParallelSlot *slot, PGconn *conn);
static int select_loop(int maxFd, fd_set *workerset);
static bool processQueryResult(ParallelSlot *slot, PGresult *result);
+/*
+ * Copy the fields from a source ConnParams struct without sharing any state
+ * that would allow changes to the source to affect the copy.
+ */
static void
-init_slot(ParallelSlot *slot, PGconn *conn)
+deep_copy_cparams(ConnParams *dst, const ConnParams *src)
{
- slot->connection = conn;
- /* Initially assume connection is idle */
- slot->isFree = true;
- ParallelSlotClearHandler(slot);
+ memset(dst, 0, sizeof(*dst));
+ if (src == NULL)
+ return;
+ if (src->dbname)
+ dst->dbname = pstrdup(src->dbname);
+ if (src->pghost)
+ dst->pghost = pstrdup(src->pghost);
+ if (src->pgport)
+ dst->pgport = pstrdup(src->pgport);
+ if (src->pguser)
+ dst->pguser = pstrdup(src->pguser);
+ dst->prompt_password = src->prompt_password;
+ if (src->override_dbname)
+ dst->override_dbname = pstrdup(src->override_dbname);
+}
+
+/*
+ * Free a ConnParams struct that was made using deep_copy_cparams. Beware that
+ * ConnParams structs often contain const data from the command line. This
+ * function must only be used on copies where we own the data to be freed.
+ */
+static void
+free_cparams_copy(ConnParams *copy)
+{
+ /* We need to cast away const before freeing each field. */
+ if (copy->dbname)
+ pfree((void *) copy->dbname);
+ if (copy->pghost)
+ pfree((void *) copy->pghost);
+ if (copy->pgport)
+ pfree((void *) copy->pgport);
+ if (copy->pguser)
+ pfree((void *) copy->pguser);
+ if (copy->override_dbname)
+ pfree((void *) copy->override_dbname);
+}
+
+/* Macro for comparing string fields that might be NULL */
+#define equalstr(a, b) \
+ (((a) != NULL && (b) != NULL) ? (strcmp(a, b) == 0) : (a) == (b))
+
+/* Compare a field that is a pointer to a C string, or perhaps NULL */
+#define COMPARE_STRING_FIELD(fldname) \
+ do { \
+ if (!equalstr(a->fldname, b->fldname)) \
+ return false; \
+ } while (0)
+
+/* Compare a simple scalar field (int, float, bool, enum, etc) */
+#define COMPARE_SCALAR_FIELD(fldname) \
+ do { \
+ if (a->fldname != b->fldname) \
+ return false; \
+ } while (0)
+
+/*
+ * Return whether two ConnParams structs contain equal parameters.
+ */
+static bool
+equalConnParams(const ConnParams *a, const ConnParams *b)
+{
+ Assert(a != NULL);
+ Assert(b != NULL);
+
+ COMPARE_STRING_FIELD(dbname);
+ COMPARE_STRING_FIELD(pghost);
+ COMPARE_STRING_FIELD(pgport);
+ COMPARE_STRING_FIELD(pguser);
+ COMPARE_SCALAR_FIELD(prompt_password);
+ COMPARE_STRING_FIELD(override_dbname);
+ return true;
}
/*
@@ -50,6 +121,7 @@ static bool
processQueryResult(ParallelSlot *slot, PGresult *result)
{
Assert(slot->handler != NULL);
+ Assert(slot->connection != NULL);
/* On failure, the handler should return NULL after freeing the result */
if (!slot->handler(result, slot->connection, slot->handler_context))
@@ -71,6 +143,9 @@ consumeQueryResult(ParallelSlot *slot)
bool ok = true;
PGresult *result;
+ Assert(slot != NULL);
+ Assert(slot->connection != NULL);
+
SetCancelConn(slot->connection);
while ((result = PQgetResult(slot->connection)) != NULL)
{
@@ -82,10 +157,9 @@ consumeQueryResult(ParallelSlot *slot)
}
/*
- * Wait until a file descriptor from the given set becomes readable.
- *
- * Returns the number of ready descriptors, or -1 on failure (including
- * getting a cancel request).
+ * Wait until a file descriptor from the given set becomes readable. Returns
+ * the number of ready descriptors, or -1 on failure (including getting a
+ * cancel request).
*/
static int
select_loop(int maxFd, fd_set *workerset)
@@ -137,153 +211,352 @@ select_loop(int maxFd, fd_set *workerset)
}
/*
- * ParallelSlotsGetIdle
- * Return a connection slot that is ready to execute a command.
- *
- * This returns the first slot we find that is marked isFree, if one is;
- * otherwise, we loop on select() until one socket becomes available. When
- * this happens, we read the whole set and mark as free all sockets that
- * become available. If an error occurs, NULL is returned.
+ * Return the offset of a suitable idle slot, or -1 if none are available. If
+ * the given connection parameters are not null, only idle slots connected
+ * using equivalent parameters are considered suitable, otherwise all idle
+ * connected slots are considered suitable.
*/
-ParallelSlot *
-ParallelSlotsGetIdle(ParallelSlot *slots, int numslots)
+static int
+find_matching_idle_slot(const ParallelSlot *slots, int numslots,
+ const ConnParams *cparams)
{
int i;
- int firstFree = -1;
- /*
- * Look for any connection currently free. If there is one, mark it as
- * taken and let the caller know the slot to use.
- */
+ Assert(slots != NULL);
+
for (i = 0; i < numslots; i++)
{
- if (slots[i].isFree)
- {
- slots[i].isFree = false;
- return slots + i;
- }
+ if (slots[i].inUse)
+ continue;
+
+ if (slots[i].connection == NULL)
+ continue;
+
+ if (cparams == NULL || equalConnParams(&slots[i].cparams, cparams))
+ return i;
+ }
+ return -1;
+}
+
+/*
+ * Return the offset of the first slot without a database connection, or -1 if
+ * all slots are connected.
+ */
+static int
+find_unconnected_slot(const ParallelSlot *slots, int numslots)
+{
+ int i;
+
+ Assert(slots != NULL);
+
+ for (i = 0; i < numslots; i++)
+ {
+ if (slots[i].inUse)
+ continue;
+
+ if (slots[i].connection == NULL)
+ return i;
+ }
+
+ return -1;
+}
+
+/*
+ * Return the offset of the first idle slot, or -1 if all slots are busy.
+ */
+static int
+find_any_idle_slot(const ParallelSlot *slots, int numslots)
+{
+ int i;
+
+ Assert(slots != NULL);
+
+ for (i = 0; i < numslots; i++)
+ if (!slots[i].inUse)
+ return i;
+
+ return -1;
+}
+
+/*
+ * Wait for any slot's connection to have query results, consume the results,
+ * and update the slot's status as appropriate. Returns true on success,
+ * false on cancellation, on error, or if no slots are connected.
+ */
+static bool
+wait_on_slots(ParallelSlot *slots, int numslots, const char *progname)
+{
+ int i;
+ fd_set slotset;
+ int maxFd = 0;
+ PGconn *cancelconn = NULL;
+
+ Assert(slots != NULL);
+ Assert(progname != NULL);
+
+ /* We must reconstruct the fd_set for each call to select_loop */
+ FD_ZERO(&slotset);
+
+ for (i = 0; i < numslots; i++)
+ {
+ int sock;
+
+ /* We shouldn't get here if we still have slots without connections */
+ Assert(slots[i].connection != NULL);
+
+ sock = PQsocket(slots[i].connection);
+
+ /*
+ * We don't really expect any connections to lose their sockets after
+ * startup, but just in case, cope by ignoring them.
+ */
+ if (sock < 0)
+ continue;
+
+ /* Keep track of the first valid connection we see. */
+ if (cancelconn == NULL)
+ cancelconn = slots[i].connection;
+
+ FD_SET(sock, &slotset);
+ if (sock > maxFd)
+ maxFd = sock;
}
/*
- * No free slot found, so wait until one of the connections has finished
- * its task and return the available slot.
+ * If we get this far with no valid connections, processing cannot
+ * continue.
*/
- while (firstFree < 0)
+ if (cancelconn == NULL)
+ return false;
+
+ SetCancelConn(slots->connection);
+ i = select_loop(maxFd, &slotset);
+ ResetCancelConn();
+
+ /* failure? */
+ if (i < 0)
+ return false;
+
+ for (i = 0; i < numslots; i++)
{
- fd_set slotset;
- int maxFd = 0;
+ int sock;
- /* We must reconstruct the fd_set for each call to select_loop */
- FD_ZERO(&slotset);
+ sock = PQsocket(slots[i].connection);
- for (i = 0; i < numslots; i++)
+ if (sock >= 0 && FD_ISSET(sock, &slotset))
{
- int sock = PQsocket(slots[i].connection);
-
- /*
- * We don't really expect any connections to lose their sockets
- * after startup, but just in case, cope by ignoring them.
- */
- if (sock < 0)
- continue;
-
- FD_SET(sock, &slotset);
- if (sock > maxFd)
- maxFd = sock;
+ /* select() says input is available, so consume it */
+ PQconsumeInput(slots[i].connection);
}
- SetCancelConn(slots->connection);
- i = select_loop(maxFd, &slotset);
- ResetCancelConn();
-
- /* failure? */
- if (i < 0)
- return NULL;
-
- for (i = 0; i < numslots; i++)
+ /* Collect result(s) as long as any are available */
+ while (!PQisBusy(slots[i].connection))
{
- int sock = PQsocket(slots[i].connection);
+ PGresult *result = PQgetResult(slots[i].connection);
- if (sock >= 0 && FD_ISSET(sock, &slotset))
+ if (result != NULL)
{
- /* select() says input is available, so consume it */
- PQconsumeInput(slots[i].connection);
+ /* Handle and discard the command result */
+ if (!processQueryResult(slots + i, result))
+ return false;
}
-
- /* Collect result(s) as long as any are available */
- while (!PQisBusy(slots[i].connection))
+ else
{
- PGresult *result = PQgetResult(slots[i].connection);
-
- if (result != NULL)
- {
- /* Handle and discard the command result */
- if (!processQueryResult(slots + i, result))
- return NULL;
- }
- else
- {
- /* This connection has become idle */
- slots[i].isFree = true;
- ParallelSlotClearHandler(slots + i);
- if (firstFree < 0)
- firstFree = i;
- break;
- }
+ /* This connection has become idle */
+ slots[i].inUse = false;
+ ParallelSlotClearHandler(slots + i);
+ break;
}
}
}
+ return true;
+}
+
+/*
+ * Close a slot's database connection.
+ */
+static void
+disconnect_slot(ParallelSlot *slot)
+{
+ Assert(slot);
+ Assert(slot->connection);
- slots[firstFree].isFree = false;
- return slots + firstFree;
+ disconnectDatabase(slot->connection);
+ slot->connection = NULL;
+ free_cparams_copy(&slot->cparams);
+ memset(&slot->cparams, 0, sizeof(ConnParams));
}
/*
- * ParallelSlotsSetup
- * Prepare a set of parallel slots to use on a given database.
+ * Open a new database connection using the given connection parameters,
+ * execute an initial command if supplied, and associate the new connection
+ * with the given slot.
+ */
+static void
+connect_slot(ParallelSlot *slot, const ConnParams *cparams,
+ const char *progname, bool echo, const char *initcmd)
+{
+ Assert(slot);
+ Assert(slot->connection == NULL);
+
+ slot->connection = connectDatabase(cparams, progname, echo, false, true);
+ if (PQsocket(slot->connection) >= FD_SETSIZE)
+ {
+ pg_log_fatal("too many jobs for this platform");
+ exit(1);
+ }
+
+ /*
+ * The caller is at liberty to reuse the cparams struct, overwriting
+ * fields (in particular, override_dbname). We need our own deep copy of
+ * the parameter struct fields.
+ */
+ deep_copy_cparams(&slot->cparams, cparams);
+
+ /* Setup the connection using the supplied command, if any. */
+ if (initcmd)
+ executeCommand(slot->connection, initcmd, echo);
+}
+
+/*
+ * ParallelSlotsGetIdle
+ * Return a connection slot that is ready to execute a command.
+ *
+ * The slot returned is chosen as follows:
+ *
+ * If any idle slot already has an open connection, and if either cparams is
+ * null or the connection was formed using connection parameter string values
+ * identical to those in cparams, that slot will be returned allowing the
+ * connection to be reused.
+ *
+ * Otherwise, if cparams is not null, and if any idle slot is not yet connected
+ * to a database, the slot will be returned with it's connection opened using
+ * the supplied cparams.
+ *
+ * Otherwise, if cparams is not null, and if any idle slot exists, an idle slot
+ * will be chosen and returned after having it's connection disconnected and
+ * reconnected using the supplied cparams.
+ *
+ * Otherwise, if any slots have connections that are busy, we loop on select()
+ * until one socket becomes available. When this happens, we read the whole
+ * set and mark as free all sockets that become available. We then select a
+ * slot using the same rules as above.
+ *
+ * Otherwise, we cannot return a slot, which is an error, and NULL is returned.
+ *
+ * For any connection created, if "initcmd" is not null, it will be executed as
+ * a command on the newly formed connection before the slot is returned.
*
- * This creates and initializes a set of connections to the database
- * using the information given by the caller, marking all parallel slots
- * as free and ready to use. "conn" is an initial connection set up
- * by the caller and is associated with the first slot in the parallel
- * set.
+ * If an error occurs, NULL is returned.
*/
ParallelSlot *
-ParallelSlotsSetup(const ConnParams *cparams,
- const char *progname, bool echo,
- PGconn *conn, int numslots)
+ParallelSlotsGetIdle(ParallelSlot *slots, int numslots,
+ const ConnParams *cparams, const char *progname,
+ bool echo, const char *initcmd)
{
- ParallelSlot *slots;
- int i;
+ int offset;
- Assert(conn != NULL);
+ Assert(slots);
+ Assert(numslots > 0);
+ Assert(cparams);
+ Assert(progname);
- slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) * numslots);
- init_slot(slots, conn);
- if (numslots > 1)
+ while (1)
{
- for (i = 1; i < numslots; i++)
+ /* First choice: a slot already connected to the desired database. */
+ offset = find_matching_idle_slot(slots, numslots, cparams);
+ if (offset >= 0)
{
- conn = connectDatabase(cparams, progname, echo, false, true);
-
- /*
- * Fail and exit immediately if trying to use a socket in an
- * unsupported range. POSIX requires open(2) to use the lowest
- * unused file descriptor and the hint given relies on that.
- */
- if (PQsocket(conn) >= FD_SETSIZE)
- {
- pg_log_fatal("too many jobs for this platform -- try %d", i);
- exit(1);
- }
+ slots[offset].inUse = true;
+ return slots + offset;
+ }
- init_slot(slots + i, conn);
+ /* Second choice: a slot not connected to any database. */
+ offset = find_unconnected_slot(slots, numslots);
+ if (offset >= 0)
+ {
+ connect_slot(slots + offset, cparams, progname, echo, initcmd);
+ slots[offset].inUse = true;
+ return slots + offset;
+ }
+
+ /* Third choice: a slot connected to the wrong database. */
+ offset = find_any_idle_slot(slots, numslots);
+ if (offset >= 0)
+ {
+ disconnect_slot(slots + offset);
+ connect_slot(slots + offset, cparams, progname, echo, initcmd);
+ slots[offset].inUse = true;
+ return slots + offset;
}
+
+ /*
+ * Fourth choice: block until one or more slots become available. If
+ * any slot's hit a fatal error, we'll find out about that here and
+ * return NULL.
+ */
+ if (!wait_on_slots(slots, numslots, progname))
+ return NULL;
+ }
+}
+
+/*
+ * ParallelSlotsSetupMinimal
+ * Prepare a set of parallel slots but do not connect to any database.
+ *
+ * This creates and initializes a set of slots, marking all parallel slots
+ * as free and ready to use. Establishing connections is delayed until
+ * requesting a free slot, but in the event that an existing connection is
+ * provided in "conn", that connection will be associated with the first
+ * slot and saved for reuse. In this case, "cparams" must contain the
+ * parameters that were used for opening "conn".
+ */
+ParallelSlot *
+ParallelSlotsSetupMinimal(int numslots, PGconn *conn,
+ const ConnParams *cparams)
+{
+ ParallelSlot *slots;
+
+ Assert(numslots > 0);
+
+ slots = (ParallelSlot *) palloc0(sizeof(ParallelSlot) * numslots);
+ if (conn != NULL)
+ {
+ slots[0].connection = conn;
+ deep_copy_cparams(&slots[0].cparams, cparams);
}
return slots;
}
+/*
+ * ParallelSlotsSetupOneDB
+ * Prepare a set of parallel slots to use on a given database.
+ *
+ * This creates and initializes a set of connections to the database using the
+ * information given by the caller, marking all parallel slots as free and
+ * ready to use. If not null, "conn" is an initial connection set up by the
+ * caller and is associated with the first slot in the parallel set. "cparams"
+ * is used to form the remaining connections, and must be the same as was used
+ * for creating the inital connection "conn". If not null, "initcmd" is run
+ * on each connection opened, not including "conn".
+ */
+ParallelSlot *
+ParallelSlotsSetupOneDB(const ConnParams *cparams, const char *progname,
+ bool echo, PGconn *conn, int numslots,
+ const char *initcmd)
+{
+ int i = 0;
+ ParallelSlot *slots = ParallelSlotsSetupMinimal(numslots, conn, cparams);
+
+ if (conn)
+ i++; /* first slot already assigned "conn" */
+ for (; i < numslots; i++)
+ connect_slot(&slots[i], cparams, progname, echo, initcmd);
+
+ return slots;
+}
+
/*
* ParallelSlotsTerminate
* Clean up a set of parallel slots
@@ -320,6 +593,8 @@ ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots)
for (i = 0; i < numslots; i++)
{
+ if (slots[i].connection == NULL)
+ continue;
if (!consumeQueryResult(slots + i))
return false;
}
@@ -350,6 +625,9 @@ ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots)
bool
TableCommandResultHandler(PGresult *res, PGconn *conn, void *context)
{
+ Assert(res != NULL);
+ Assert(conn != NULL);
+
/*
* If it's an error, report it. Errors about a missing table are harmless
* so we continue processing; but die for other errors.
diff --git a/src/include/fe_utils/parallel_slot.h b/src/include/fe_utils/parallel_slot.h
index 8902f8d4f4..d42bc33c87 100644
--- a/src/include/fe_utils/parallel_slot.h
+++ b/src/include/fe_utils/parallel_slot.h
@@ -21,7 +21,8 @@ typedef bool (*ParallelSlotResultHandler) (PGresult *res, PGconn *conn,
typedef struct ParallelSlot
{
PGconn *connection; /* One connection */
- bool isFree; /* Is it known to be idle? */
+ ConnParams cparams; /* Params used to form connection */
+ bool inUse; /* Is the slot being used? */
/*
* Prior to issuing a command or query on 'connection', a handler callback
@@ -48,11 +49,18 @@ ParallelSlotClearHandler(ParallelSlot *slot)
slot->handler_context = NULL;
}
-extern ParallelSlot *ParallelSlotsGetIdle(ParallelSlot *slots, int numslots);
+extern ParallelSlot *ParallelSlotsGetIdle(ParallelSlot *slots, int numslots,
+ const ConnParams *cparams,
+ const char *progname, bool echo,
+ const char *initcmd);
-extern ParallelSlot *ParallelSlotsSetup(const ConnParams *cparams,
- const char *progname, bool echo,
- PGconn *conn, int numslots);
+extern ParallelSlot *ParallelSlotsSetupMinimal(int numslots, PGconn *conn,
+ const ConnParams *cparams);
+
+extern ParallelSlot *ParallelSlotsSetupOneDB(const ConnParams *cparams,
+ const char *progname, bool echo,
+ PGconn *conn, int numslots,
+ const char *initcmd);
extern void ParallelSlotsTerminate(ParallelSlot *slots, int numslots);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bab4f3adb3..caae8cbd5b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -403,6 +403,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnParams
ConnStatusType
ConnType
ConnectionStateEnum
--
2.21.1 (Apple Git-122.3)
v40-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v40-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 25b05c67cee51e8299f45a7743071de5132be32d Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Mon, 22 Feb 2021 12:08:18 -0800
Subject: [PATCH v40 2/3] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 29 +
contrib/pg_amcheck/pg_amcheck.c | 1891 ++++++++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 213 +++
contrib/pg_amcheck/t/003_check.pl | 520 ++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 496 +++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 54 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 670 +++++++
src/tools/msvc/Install.pm | 2 +-
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 3 +
15 files changed, 3895 insertions(+), 4 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..a72dcf7304 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..c21a14de31
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..bc61ee7970
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,29 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+SHLIB_PREREQS = submake-libpq
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..613bdf5d02
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1891 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <time.h>
+
+#include "catalog/pg_am_d.h"
+#include "catalog/pg_namespace_d.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "getopt_long.h" /* pgrminclude ignore */
+#include "pgtime.h"
+#include "storage/block.h"
+
+/* pg_amcheck command line options controlled by user flags */
+typedef struct amcheckOptions
+{
+ bool alldb;
+ bool allrel;
+ bool excludetbl;
+ bool excludeidx;
+ bool echo;
+ bool quiet;
+ bool verbose;
+ bool no_indexes;
+ bool no_toast;
+ bool reconcile_toast;
+ bool on_error_stop;
+ bool parent_check;
+ bool rootdescend;
+ bool heapallindexed;
+ bool strict_names;
+ bool show_progress;
+ const char *skip;
+ int jobs;
+ long startblock;
+ long endblock;
+ SimplePtrList include; /* list of PatternInfo structs */
+ SimplePtrList exclude; /* list of PatternInfo structs */
+} amcheckOptions;
+
+static amcheckOptions opts = {
+ .alldb = false,
+ .allrel = true,
+ .excludetbl = false,
+ .excludeidx = false,
+ .echo = false,
+ .quiet = false,
+ .verbose = false,
+ .no_indexes = false,
+ .no_toast = false,
+ .reconcile_toast = true,
+ .on_error_stop = false,
+ .parent_check = false,
+ .rootdescend = false,
+ .heapallindexed = false,
+ .strict_names = true,
+ .show_progress = false,
+ .skip = "none",
+ .jobs = 1,
+ .startblock = -1,
+ .endblock = -1,
+ .include = {NULL, NULL},
+ .exclude = {NULL, NULL},
+};
+
+static const char *progname = NULL;
+
+typedef struct PatternInfo
+{
+ int pattern_id; /* Unique ID of this pattern */
+ const char *pattern; /* Unaltered pattern from the command line */
+ char *dbrgx; /* Database regexp parsed from pattern, or
+ * NULL */
+ char *nsprgx; /* Schema regexp parsed from pattern, or NULL */
+ char *relrgx; /* Relation regexp parsed from pattern, or
+ * NULL */
+ bool tblonly; /* true if relrgx should only match tables */
+ bool idxonly; /* true if relrgx should only match indexes */
+ bool matched; /* true if the pattern matched in any database */
+} PatternInfo;
+
+/* Unique pattern id counter */
+static int next_id = 1;
+
+/* Whether all relations have so far passed their corruption checks */
+static bool all_checks_pass = true;
+
+/* Time last progress report was displayed */
+static pg_time_t last_progress_report = 0;
+
+typedef struct DatabaseInfo
+{
+ char *datname;
+ char *amcheck_schema; /* escaped, quoted literal */
+} DatabaseInfo;
+
+typedef struct RelationInfo
+{
+ const DatabaseInfo *datinfo; /* shared by other relinfos */
+ Oid reloid;
+ bool is_table; /* true if heap, false if btree */
+} RelationInfo;
+
+/*
+ * Query for determining if contrib's amcheck is installed. If so, selects the
+ * namespace name where amcheck's functions can be found.
+ */
+static const char *amcheck_sql =
+"SELECT n.nspname, x.extversion"
+"\nFROM pg_catalog.pg_extension x"
+"\nJOIN pg_catalog.pg_namespace n"
+"\nON x.extnamespace OPERATOR(pg_catalog.=) n.oid"
+"\nWHERE x.extname OPERATOR(pg_catalog.=) 'amcheck'";
+
+static void prepare_table_command(PQExpBuffer sql, Oid reloid,
+ const char *nspname);
+static void prepare_btree_command(PQExpBuffer sql, Oid reloid,
+ const char *nspname);
+static void run_command(ParallelSlot *slot, const char *sql,
+ ConnParams *cparams);
+static bool VerifyHeapamSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+static bool VerifyBtreeSlotHandler(PGresult *res, PGconn *conn, void *context);
+static void help(const char *progname);
+static void progress_report(uint64 relations_total, uint64 relations_checked,
+ const char *datname, bool force, bool finished);
+
+static void appendDatabasePattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendSchemaPattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendRelationPattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendTablePattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendIndexPattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void compileDatabaseList(PGconn *conn, SimplePtrList *databases);
+static void compileRelationListOneDb(PGconn *conn, SimplePtrList *relations,
+ const DatabaseInfo *datinfo);
+
+int
+main(int argc, char *argv[])
+{
+ PGconn *conn;
+ SimplePtrListCell *cell;
+ SimplePtrList databases = {NULL, NULL};
+ SimplePtrList relations = {NULL, NULL};
+ bool failed = false;
+ const char *latest_datname;
+ int parallel_workers;
+ ParallelSlot *slots;
+ PQExpBufferData sql;
+ long long int reltotal;
+ long long int relprogress;
+
+ static struct option long_options[] = {
+ /* Connection options */
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"maintenance-db", required_argument, NULL, 1},
+
+ /* check options */
+ {"all", no_argument, NULL, 'a'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"exclude-dbname", required_argument, NULL, 'D'},
+ {"echo", no_argument, NULL, 'e'},
+ {"heapallindexed", no_argument, NULL, 'H'},
+ {"index", required_argument, NULL, 'i'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"jobs", required_argument, NULL, 'j'},
+ {"parent-check", no_argument, NULL, 'P'},
+ {"quiet", no_argument, NULL, 'q'},
+ {"relation", required_argument, NULL, 'r'},
+ {"exclude-relation", required_argument, NULL, 'R'},
+ {"schema", required_argument, NULL, 's'},
+ {"exclude-schema", required_argument, NULL, 'S'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"no-index-expansion", no_argument, NULL, 2},
+ {"no-toast-expansion", no_argument, NULL, 3},
+ {"exclude-toast-pointers", no_argument, NULL, 4},
+ {"on-error-stop", no_argument, NULL, 5},
+ {"skip", required_argument, NULL, 6},
+ {"startblock", required_argument, NULL, 7},
+ {"endblock", required_argument, NULL, 8},
+ {"rootdescend", no_argument, NULL, 9},
+ {"no-strict-names", no_argument, NULL, 10},
+ {"progress", no_argument, NULL, 11},
+
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ /*
+ * If a maintenance database is specified, that will be used for the
+ * initial connection. Failing that, the first plain argument (without a
+ * flag) will be used. If neither of those are given, the first database
+ * specified with -d.
+ */
+ const char *primary_db = NULL;
+ const char *secondary_db = NULL;
+ const char *tertiary_db = NULL;
+
+ const char *host = NULL;
+ const char *port = NULL;
+ const char *username = NULL;
+ enum trivalue prompt_password = TRI_DEFAULT;
+ int encoding = pg_get_encoding_from_locale(NULL, false);
+ ConnParams cparams;
+
+ pg_logging_init(argv[0]);
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("contrib"));
+
+ handle_help_version_opts(argc, argv, progname, help);
+
+ /* process command-line options */
+ while ((c = getopt_long(argc, argv, "ad:D:eh:Hi:I:j:p:Pqr:R:s:S:t:T:U:wWv",
+ long_options, &optindex)) != -1)
+ {
+ char *endptr;
+
+ switch (c)
+ {
+ case 'a':
+ opts.alldb = true;
+ break;
+ case 'd':
+ if (tertiary_db == NULL)
+ tertiary_db = optarg;
+ appendDatabasePattern(&opts.include, optarg, encoding);
+ break;
+ case 'D':
+ appendDatabasePattern(&opts.exclude, optarg, encoding);
+ break;
+ case 'e':
+ opts.echo = true;
+ break;
+ case 'h':
+ host = pg_strdup(optarg);
+ break;
+ case 'H':
+ opts.heapallindexed = true;
+ break;
+ case 'i':
+ opts.allrel = false;
+ appendIndexPattern(&opts.include, optarg, encoding);
+ break;
+ case 'I':
+ opts.excludeidx = true;
+ appendIndexPattern(&opts.exclude, optarg, encoding);
+ break;
+ case 'j':
+ opts.jobs = atoi(optarg);
+ if (opts.jobs < 1)
+ {
+ pg_log_error("number of parallel jobs must be at least 1");
+ exit(1);
+ }
+ break;
+ case 'p':
+ port = pg_strdup(optarg);
+ break;
+ case 'P':
+ opts.parent_check = true;
+ break;
+ case 'q':
+ opts.quiet = true;
+ break;
+ case 'r':
+ opts.allrel = false;
+ appendRelationPattern(&opts.include, optarg, encoding);
+ break;
+ case 'R':
+ opts.excludeidx = true;
+ opts.excludetbl = true;
+ appendRelationPattern(&opts.exclude, optarg, encoding);
+ break;
+ case 's':
+ opts.allrel = false;
+ appendSchemaPattern(&opts.include, optarg, encoding);
+ break;
+ case 'S':
+ appendSchemaPattern(&opts.exclude, optarg, encoding);
+ break;
+ case 't':
+ opts.allrel = false;
+ appendTablePattern(&opts.include, optarg, encoding);
+ break;
+ case 'T':
+ opts.excludetbl = true;
+ appendTablePattern(&opts.exclude, optarg, encoding);
+ break;
+ case 'U':
+ username = pg_strdup(optarg);
+ break;
+ case 'w':
+ prompt_password = TRI_NO;
+ break;
+ case 'W':
+ prompt_password = TRI_YES;
+ break;
+ case 'v':
+ opts.verbose = true;
+ pg_logging_increase_verbosity();
+ break;
+ case 1:
+ primary_db = pg_strdup(optarg);
+ break;
+ case 2:
+ opts.no_indexes = true;
+ break;
+ case 3:
+ opts.no_toast = true;
+ break;
+ case 4:
+ opts.reconcile_toast = false;
+ break;
+ case 5:
+ opts.on_error_stop = true;
+ break;
+ case 6:
+ if (pg_strcasecmp(optarg, "all-visible") == 0)
+ opts.skip = "all visible";
+ else if (pg_strcasecmp(optarg, "all-frozen") == 0)
+ opts.skip = "all frozen";
+ else
+ {
+ fprintf(stderr, "invalid skip options");
+ exit(1);
+ }
+ break;
+ case 7:
+ opts.startblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ "relation starting block argument contains garbage characters");
+ exit(1);
+ }
+ if (opts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ "relation starting block argument out of bounds");
+ exit(1);
+ }
+ break;
+ case 8:
+ opts.endblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ "relation ending block argument contains garbage characters");
+ exit(1);
+ }
+ if (opts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ "relation ending block argument out of bounds");
+ exit(1);
+ }
+ break;
+ case 9:
+ opts.rootdescend = true;
+ opts.parent_check = true;
+ break;
+ case 10:
+ opts.strict_names = false;
+ break;
+ case 11:
+ opts.show_progress = true;
+ break;
+ default:
+ fprintf(stderr,
+ "Try \"%s --help\" for more information.\n",
+ progname);
+ exit(1);
+ }
+ }
+
+ if (opts.endblock >= 0 && opts.endblock < opts.startblock)
+ {
+ pg_log_error("relation ending block argument precedes starting block argument");
+ exit(1);
+ }
+
+ /* non-option arguments specify database names */
+ while (optind < argc)
+ {
+ if (secondary_db == NULL)
+ secondary_db = argv[optind];
+ appendDatabasePattern(&opts.include, argv[optind], encoding);
+ optind++;
+ }
+
+ /* fill cparams except for dbname, which is set below */
+ cparams.pghost = host;
+ cparams.pgport = port;
+ cparams.pguser = username;
+ cparams.prompt_password = prompt_password;
+ cparams.override_dbname = NULL;
+
+ setup_cancel_handler(NULL);
+
+ /* choose the database for our initial connection */
+ if (primary_db)
+ cparams.dbname = primary_db;
+ else if (secondary_db != NULL)
+ cparams.dbname = secondary_db;
+ else if (tertiary_db != NULL)
+ cparams.dbname = tertiary_db;
+ else
+ {
+ const char *default_db;
+
+ if (getenv("PGDATABASE"))
+ default_db = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ default_db = getenv("PGUSER");
+ else
+ default_db = get_user_name_or_exit(progname);
+
+ /*
+ * Users expect the database name inferred from the environment to get
+ * checked, not just get used for the initial connection.
+ */
+ appendDatabasePattern(&opts.include, default_db, encoding);
+
+ cparams.dbname = default_db;
+ }
+
+ conn = connectMaintenanceDatabase(&cparams, progname, opts.echo);
+ compileDatabaseList(conn, &databases);
+ disconnectDatabase(conn);
+
+ if (databases.head == NULL)
+ {
+ fprintf(stderr, "%s: no databases to check\n", progname);
+ exit(0);
+ }
+
+ /*
+ * Compile a list of all relations spanning all databases to be checked.
+ */
+ for (cell = databases.head; cell; cell = cell->next)
+ {
+ PGresult *result;
+ int ntups;
+ const char *amcheck_schema = NULL;
+ DatabaseInfo *dat = (DatabaseInfo *) cell->ptr;
+
+ cparams.override_dbname = dat->datname;
+
+ /*
+ * Test that this function works, but for now we're not using the list
+ * 'relations' that it builds.
+ */
+ conn = connectDatabase(&cparams, progname, opts.echo, false, true);
+
+ /*
+ * Verify that amcheck is installed for this next database. User
+ * error could result in a database not having amcheck that should
+ * have it, but we also could be iterating over multiple databases
+ * where not all of them have amcheck installed (for example,
+ * 'template1').
+ */
+ result = executeQuery(conn, amcheck_sql, opts.echo);
+ if (PQresultStatus(result) != PGRES_TUPLES_OK)
+ {
+ /* Querying the catalog failed. */
+ pg_log_error("database \"%s\": %s\n",
+ PQdb(conn), PQerrorMessage(conn));
+ pg_log_error("query was: %s", amcheck_sql);
+ PQclear(result);
+ disconnectDatabase(conn);
+ exit(1);
+ }
+ ntups = PQntuples(result);
+ if (ntups == 0)
+ {
+ /* Querying the catalog succeeded, but amcheck is missing. */
+ fprintf(stderr,
+ "%s: skipping database \"%s\": amcheck is not installed\n",
+ progname, PQdb(conn));
+ disconnectDatabase(conn);
+ continue;
+ }
+ amcheck_schema = PQgetvalue(result, 0, 0);
+ if (opts.verbose)
+ fprintf(stderr,
+ "%s: in database \"%s\": using amcheck version \"%s\" in schema \"%s\"\n",
+ progname, PQdb(conn), PQgetvalue(result, 0, 1),
+ amcheck_schema);
+ dat->amcheck_schema = PQescapeIdentifier(conn, amcheck_schema,
+ strlen(amcheck_schema));
+ PQclear(result);
+
+ compileRelationListOneDb(conn, &relations, dat);
+ disconnectDatabase(conn);
+ }
+
+ /*
+ * Check that all inclusion patterns matched at least one schema or
+ * relation that we can check.
+ */
+ for (failed = false, cell = opts.include.head; cell; cell = cell->next)
+ {
+ PatternInfo *pat = (PatternInfo *) cell->ptr;
+
+ if (!pat->matched && (pat->nsprgx != NULL || pat->relrgx != NULL))
+ {
+ failed = opts.strict_names;
+
+ if (!opts.quiet)
+ {
+ if (pat->tblonly)
+ fprintf(stderr, "%s: no tables to check for \"%s\"\n",
+ progname, pat->pattern);
+ else if (pat->idxonly)
+ fprintf(stderr, "%s: no btree indexes to check for \"%s\"\n",
+ progname, pat->pattern);
+ else if (pat->relrgx == NULL)
+ fprintf(stderr, "%s: no relations to check in schemas for \"%s\"\n",
+ progname, pat->pattern);
+ else
+ fprintf(stderr, "%s: no relations to check for \"%s\"\n",
+ progname, pat->pattern);
+ }
+ }
+ }
+
+ if (failed)
+ exit(1);
+
+ /*
+ * Set parallel_workers to the lesser of opts.jobs and the number of
+ * relations.
+ */
+ reltotal = 0;
+ parallel_workers = 0;
+ for (cell = relations.head; cell; cell = cell->next)
+ {
+ reltotal++;
+ if (parallel_workers < opts.jobs)
+ parallel_workers++;
+ }
+
+ if (reltotal == 0)
+ {
+ fprintf(stderr, "%s: no relations to check", progname);
+ exit(1);
+ }
+ progress_report(reltotal, 0, NULL, true, false);
+
+ /*
+ * ParallelSlots based event loop follows.
+ *
+ * We use server-side parallelism to check up to parallel_workers relations
+ * in parallel. The relations list was computed in database order, which
+ * minimizes the number of connects and disconnects as we process the list.
+ */
+ latest_datname = NULL;
+ failed = false;
+ slots = ParallelSlotsSetupMinimal(parallel_workers, NULL, NULL);
+ initPQExpBuffer(&sql);
+ for (relprogress = 0, cell = relations.head; cell; cell = cell->next)
+ {
+ ParallelSlot *free_slot;
+ RelationInfo *rel;
+
+ rel = (RelationInfo *) cell->ptr;
+
+ if (CancelRequested)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ /*
+ * The relations list is in database sorted order. If this next
+ * relation is in a different database than the last one seen, we are
+ * about to start checking this database. Note that other slots may
+ * still be working on relations from prior databases.
+ */
+ latest_datname = rel->datinfo->datname;
+
+ progress_report(reltotal, relprogress, latest_datname, false, false);
+
+ relprogress++;
+
+ /*
+ * Get a parallel slot for the next amcheck command, blocking if
+ * necessary until one is available, or until a previously issued slot
+ * command fails, indicating that we should abort checking the
+ * remaining objects.
+ */
+ cparams.override_dbname = rel->datinfo->datname;
+ free_slot = ParallelSlotsGetIdle(slots, parallel_workers, &cparams,
+ progname, opts.echo, NULL);
+ if (!free_slot)
+ {
+ /*
+ * Something failed. We don't need to know what it was, because
+ * the handler should already have emitted the necessary error
+ * messages.
+ */
+ failed = true;
+ goto finish;
+ }
+
+ /*
+ * Execute the appropriate amcheck command for this relation using our
+ * slot's database connection. We do not wait for the command to
+ * complete, nor do we perform any error checking, as that is done by
+ * the parallel slots and our handler callback functions.
+ */
+ if (rel->is_table)
+ {
+ prepare_table_command(&sql, rel->reloid,
+ rel->datinfo->amcheck_schema);
+ ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler,
+ sql.data);
+ run_command(free_slot, sql.data, &cparams);
+ }
+ else
+ {
+ prepare_btree_command(&sql, rel->reloid,
+ rel->datinfo->amcheck_schema);
+ ParallelSlotSetHandler(free_slot, VerifyBtreeSlotHandler, NULL);
+ run_command(free_slot, sql.data, &cparams);
+ }
+ }
+ termPQExpBuffer(&sql);
+
+ /*
+ * Wait for all slots to complete, or for one to indicate that an error
+ * occurred. Like above, we rely on the handler emitting the necessary
+ * error messages.
+ */
+ if (slots && !ParallelSlotsWaitCompletion(slots, parallel_workers))
+ failed = true;
+
+ progress_report(reltotal, relprogress, NULL, true, true);
+
+finish:
+ if (slots)
+ {
+ ParallelSlotsTerminate(slots, parallel_workers);
+ pg_free(slots);
+ }
+
+ if (failed)
+ exit(1);
+
+ if (!all_checks_pass)
+ exit(2);
+}
+
+/*
+ * prepare_table_command
+ *
+ * Creates a SQL command for running amcheck checking on the given heap
+ * relation. The command is phrased as a SQL query, with column order and
+ * names matching the expectations of VerifyHeapamSlotHandler, which will
+ * receive and handle each row returned from the verify_heapam() function.
+ *
+ * sql: buffer into which the table checking command will be written
+ * reloid: relation of the table to be checked
+ * amcheck_schema: escaped and quoted name of schema in which amcheck contrib
+ * module is installed
+ */
+static void
+prepare_table_command(PQExpBuffer sql, Oid reloid, const char *amcheck_schema)
+{
+ resetPQExpBuffer(sql);
+ appendPQExpBuffer(sql,
+ "SELECT n.nspname, c.relname, v.blkno, v.offnum, "
+ "v.attnum, v.msg"
+ "\nFROM %s.verify_heapam("
+ "\nrelation := %u,"
+ "\non_error_stop := %s,"
+ "\ncheck_toast := %s,"
+ "\nskip := '%s'",
+ amcheck_schema,
+ reloid,
+ opts.on_error_stop ? "true" : "false",
+ opts.reconcile_toast ? "true" : "false",
+ opts.skip);
+ if (opts.startblock >= 0)
+ appendPQExpBuffer(sql, ",\nstartblock := %ld", opts.startblock);
+ if (opts.endblock >= 0)
+ appendPQExpBuffer(sql, ",\nendblock := %ld", opts.endblock);
+ appendPQExpBuffer(sql, "\n) v,"
+ "\npg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE c.oid OPERATOR(pg_catalog.=) %u",
+ reloid);
+}
+
+/*
+ * prepare_btree_command
+ *
+ * Creates a SQL command for running amcheck checking on the given btree index
+ * relation. The command does not select any columns, as btree checking
+ * functions do not return any, but rather return corruption information by
+ * raising errors, which VerifyBtreeSlotHandler expects.
+ *
+ * sql: buffer into which the table checking command will be written
+ * reloid: relation of the table to be checked
+ * amcheck_schema: escaped and quoted name of schema in which amcheck contrib
+ * module is installed
+ */
+static void
+prepare_btree_command(PQExpBuffer sql, Oid reloid, const char *amcheck_schema)
+{
+ resetPQExpBuffer(sql);
+ if (opts.parent_check)
+ appendPQExpBuffer(sql,
+ "SELECT %s.bt_index_parent_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s,"
+ "\nrootdescend := %s)",
+ amcheck_schema,
+ reloid,
+ (opts.heapallindexed ? "true" : "false"),
+ (opts.rootdescend ? "true" : "false"));
+ else
+ appendPQExpBuffer(sql,
+ "SELECT %s.bt_index_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s)",
+ amcheck_schema,
+ reloid,
+ (opts.heapallindexed ? "true" : "false"));
+}
+
+/*
+ * run_command
+ *
+ * Sends a command to the server without waiting for the command to complete.
+ * Logs an error if the command cannot be sent, but otherwise any errors are
+ * expected to be handled by a ParallelSlotHandler.
+ *
+ * If reconnecting to the database is necessary, the cparams argument may be
+ * modified.
+ *
+ * slot: slot with connection to the server we should use for the command
+ * sql: query to send
+ * cparams: connection parameters in case the slot needs to be reconnected
+ */
+static void
+run_command(ParallelSlot *slot, const char *sql, ConnParams *cparams)
+{
+ if (opts.echo)
+ printf("%s\n", sql);
+
+ if (PQsendQuery(slot->connection, sql) == 0)
+ {
+ pg_log_error("error sending command to database \"%s\": %s",
+ PQdb(slot->connection),
+ PQerrorMessage(slot->connection));
+ pg_log_error("command was: %s", sql);
+ exit(1);
+ }
+}
+
+/*
+ * should_processing_continue
+ *
+ * Checks a query result returned from a query (presumably issued on a slot's
+ * connection) to determine if parallel slots should continue issuing further
+ * commands.
+ *
+ * Note: Heap relation corruption is returned by verify_heapam() without the
+ * use of raising errors, but running verify_heapam() on a corrupted table may
+ * still result in an error being returned from the server due to missing
+ * relation files, bad checksums, etc. The btree corruption checking functions
+ * always use errors to communicate corruption messages. We can't just abort
+ * processing because we got a mere ERROR.
+ *
+ * res: result from an executed sql query
+ */
+static bool
+should_processing_continue(PGresult *res)
+{
+ const char *severity;
+
+ switch (PQresultStatus(res))
+ {
+ /* These are expected and ok */
+ case PGRES_COMMAND_OK:
+ case PGRES_TUPLES_OK:
+ case PGRES_NONFATAL_ERROR:
+ break;
+
+ /* This is expected but requires closer scrutiny */
+ case PGRES_FATAL_ERROR:
+ severity = PQresultErrorField(res, PG_DIAG_SEVERITY_NONLOCALIZED);
+ if (strcmp(severity, "FATAL") == 0)
+ return false;
+ if (strcmp(severity, "PANIC") == 0)
+ return false;
+ break;
+
+ /* These are unexpected */
+ case PGRES_BAD_RESPONSE:
+ case PGRES_EMPTY_QUERY:
+ case PGRES_COPY_OUT:
+ case PGRES_COPY_IN:
+ case PGRES_COPY_BOTH:
+ case PGRES_SINGLE_TUPLE:
+ return false;;
+ }
+ return true;
+}
+
+/*
+ * VerifyHeapamSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a table checking command
+ * created by prepare_table_command and outputs the results for the user.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: the sql query being handled, as a cstring
+ */
+static bool
+VerifyHeapamSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ int i;
+ int ntups = PQntuples(res);
+
+ if (ntups > 0)
+ all_checks_pass = false;
+
+ for (i = 0; i < ntups; i++)
+ {
+ if (!PQgetisnull(res, i, 4))
+ printf("relation %s.%s.%s, block %s, offset %s, attribute %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ PQgetvalue(res, i, 4), /* attnum */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 3))
+ printf("relation %s.%s.%s, block %s, offset %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 2))
+ printf("relation %s.%s.%s, block %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 1))
+ printf("relation %s.%s.%s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ /* blkno is null: 2 */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else
+ printf("%s.%s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 5)); /* msg */
+ }
+ }
+ else if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ all_checks_pass = false;
+ printf("%s: %s\n", PQdb(conn), PQerrorMessage(conn));
+ printf("%s: query was: %s\n", PQdb(conn), (const char *) context);
+ }
+
+ return should_processing_continue(res);
+}
+
+/*
+ * VerifyBtreeSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a btree checking command
+ * created by prepare_btree_command and outputs them for the user. The results
+ * from the btree checking command is assumed to be empty, but when the results
+ * are an error code, the useful information about the corruption is expected
+ * in the connection's error message.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: unused
+ */
+static bool
+VerifyBtreeSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ all_checks_pass = false;
+ printf("%s: %s\n", PQdb(conn), PQerrorMessage(conn));
+ }
+
+ return should_processing_continue(res);
+}
+
+/*
+ * help
+ *
+ * Prints help page for the program
+ *
+ * progname: the name of the executed program, such as "pg_amcheck"
+ */
+static void
+help(const char *progname)
+{
+ printf("%s checks objects in a PostgreSQL database for corruption.\n\n", progname);
+ printf("Usage:\n");
+ printf(" %s [OPTION]... [DBNAME]\n", progname);
+ printf("\nTarget Options:\n");
+ printf(" -a, --all check all databases\n");
+ printf(" -d, --dbname=DBNAME check specific database(s)\n");
+ printf(" -D, --exclude-dbname=DBNAME do NOT check specific database(s)\n");
+ printf(" -i, --index=INDEX check specific index(es)\n");
+ printf(" -I, --exclude-index=INDEX do NOT check specific index(es)\n");
+ printf(" -r, --relation=RELNAME check specific relation(s)\n");
+ printf(" -R, --exclude-relation=RELNAME do NOT check specific relation(s)\n");
+ printf(" -s, --schema=SCHEMA check specific schema(s)\n");
+ printf(" -S, --exclude-schema=SCHEMA do NOT check specific schema(s)\n");
+ printf(" -t, --table=TABLE check specific table(s)\n");
+ printf(" -T, --exclude-table=TABLE do NOT check specific table(s)\n");
+ printf(" --no-index-expansion do NOT expand list of relations to include indexes\n");
+ printf(" --no-toast-expansion do NOT expand list of relations to include toast\n");
+ printf(" --no-strict-names do NOT require patterns to match objects\n");
+ printf("\nIndex Checking Options:\n");
+ printf(" -H, --heapallindexed check all heap tuples are found within indexes\n");
+ printf(" -P, --parent-check check index parent/child relationships\n");
+ printf(" --rootdescend search from root page to refind tuples\n");
+ printf("\nTable Checking Options:\n");
+ printf(" --exclude-toast-pointers do NOT follow relation toast pointers\n");
+ printf(" --on-error-stop stop checking at end of first corrupt page\n");
+ printf(" --skip=OPTION do NOT check \"all-frozen\" or \"all-visible\" blocks\n");
+ printf(" --startblock=BLOCK begin checking table(s) at the given block number\n");
+ printf(" --endblock=BLOCK check table(s) only up to the given block number\n");
+ printf("\nConnection options:\n");
+ printf(" -h, --host=HOSTNAME database server host or socket directory\n");
+ printf(" -p, --port=PORT database server port\n");
+ printf(" -U, --username=USERNAME user name to connect as\n");
+ printf(" -w, --no-password never prompt for password\n");
+ printf(" -W, --password force password prompt\n");
+ printf(" --maintenance-db=DBNAME alternate maintenance database\n");
+ printf("\nOther Options:\n");
+ printf(" -e, --echo show the commands being sent to the server\n");
+ printf(" -j, --jobs=NUM use this many concurrent connections to the server\n");
+ printf(" -q, --quiet don't write any messages\n");
+ printf(" -v, --verbose write a lot of output\n");
+ printf(" -V, --version output version information, then exit\n");
+ printf(" --progress show progress information\n");
+ printf(" -?, --help show this help, then exit\n");
+
+ printf("\nRead the description of the amcheck contrib module for details.\n");
+ printf("\nReport bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+/*
+ * Print a progress report based on the global variables. If verbose output
+ * is enabled, also print the current file name.
+ *
+ * Progress report is written at maximum once per second, unless the force
+ * parameter is set to true.
+ *
+ * If finished is set to true, this is the last progress report. The cursor
+ * is moved to the next line.
+ */
+static void
+progress_report(uint64 relations_total, uint64 relations_checked,
+ const char *datname, bool force, bool finished)
+{
+ int percent = 0;
+ char checked_str[32];
+ char total_str[32];
+ pg_time_t now;
+
+ if (!opts.show_progress)
+ return;
+
+ now = time(NULL);
+ if (now == last_progress_report && !force && !finished)
+ return; /* Max once per second */
+
+ last_progress_report = now;
+ if (relations_total)
+ percent = (int) (relations_checked * 100 / relations_total);
+
+ /*
+ * Separate step to keep platform-dependent format code out of fprintf
+ * calls. We only test for INT64_FORMAT availability in snprintf, not
+ * fprintf.
+ */
+ snprintf(checked_str, sizeof(checked_str), INT64_FORMAT, relations_checked);
+ snprintf(total_str, sizeof(total_str), INT64_FORMAT, relations_total);
+
+#define VERBOSE_DATNAME_LENGTH 35
+ if (opts.verbose)
+ {
+ if (!datname)
+
+ /*
+ * No datname given, so clear the status line (used for first and
+ * last call)
+ */
+ fprintf(stderr,
+ "%*s/%s (%d%%) %*s",
+ (int) strlen(total_str),
+ checked_str, total_str, percent,
+ VERBOSE_DATNAME_LENGTH + 2, "");
+ else
+ {
+ bool truncate = (strlen(datname) > VERBOSE_DATNAME_LENGTH);
+
+ fprintf(stderr,
+ "%*s/%s (%d%%), (%s%-*.*s)",
+ (int) strlen(total_str),
+ checked_str, total_str, percent,
+ /* Prefix with "..." if we do leading truncation */
+ truncate ? "..." : "",
+ truncate ? VERBOSE_DATNAME_LENGTH - 3 : VERBOSE_DATNAME_LENGTH,
+ truncate ? VERBOSE_DATNAME_LENGTH - 3 : VERBOSE_DATNAME_LENGTH,
+ /* Truncate datname at beginning if it's too long */
+ truncate ? datname + strlen(datname) - VERBOSE_DATNAME_LENGTH + 3 : datname);
+ }
+ }
+ else
+ fprintf(stderr,
+ "%*s/%s (%d%%)",
+ (int) strlen(total_str),
+ checked_str, total_str, percent);
+
+ /*
+ * Stay on the same line if reporting to a terminal and we're not done
+ * yet.
+ */
+ fputc((!finished && isatty(fileno(stderr))) ? '\r' : '\n', stderr);
+}
+
+/*
+ * appendDatabasePattern
+ *
+ * Adds to a list the given pattern interpreted as a database name pattern.
+ *
+ * list: the list to be appended
+ * pattern: the database name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendDatabasePattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ PQExpBufferData buf;
+ PatternInfo *info = (PatternInfo *) palloc0(sizeof(PatternInfo));
+
+ info->pattern_id = next_id++;
+
+ initPQExpBuffer(&buf);
+ patternToSQLRegex(encoding, NULL, NULL, &buf, pattern, false);
+ info->pattern = pattern;
+ info->dbrgx = pstrdup(buf.data);
+
+ termPQExpBuffer(&buf);
+
+ simple_ptr_list_append(list, info);
+}
+
+/*
+ * appendSchemaPattern
+ *
+ * Adds to a list the given pattern interpreted as a schema name pattern.
+ *
+ * list: the list to be appended
+ * pattern: the schema name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendSchemaPattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ PQExpBufferData buf;
+ PatternInfo *info = (PatternInfo *) palloc0(sizeof(PatternInfo));
+
+ info->pattern_id = next_id++;
+
+ initPQExpBuffer(&buf);
+ patternToSQLRegex(encoding, NULL, NULL, &buf, pattern, false);
+ info->pattern = pattern;
+ info->nsprgx = pstrdup(buf.data);
+ termPQExpBuffer(&buf);
+
+ simple_ptr_list_append(list, info);
+}
+
+/*
+ * appendRelationPatternHelper
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ * tblonly: whether the pattern should only be matched against heap tables
+ * idxonly: whether the pattern should only be matched against btree indexes
+ */
+static void
+appendRelationPatternHelper(SimplePtrList *list, const char *pattern,
+ int encoding, bool tblonly, bool idxonly)
+{
+ PQExpBufferData dbbuf;
+ PQExpBufferData nspbuf;
+ PQExpBufferData relbuf;
+ PatternInfo *info = (PatternInfo *) palloc0(sizeof(PatternInfo));
+
+ info->pattern_id = next_id++;
+
+ initPQExpBuffer(&dbbuf);
+ initPQExpBuffer(&nspbuf);
+ initPQExpBuffer(&relbuf);
+
+ patternToSQLRegex(encoding, &dbbuf, &nspbuf, &relbuf, pattern, false);
+ info->pattern = pattern;
+ if (dbbuf.data[0])
+ info->dbrgx = pstrdup(dbbuf.data);
+ if (nspbuf.data[0])
+ info->nsprgx = pstrdup(nspbuf.data);
+ if (relbuf.data[0])
+ info->relrgx = pstrdup(relbuf.data);
+
+ termPQExpBuffer(&dbbuf);
+ termPQExpBuffer(&nspbuf);
+ termPQExpBuffer(&relbuf);
+
+ info->tblonly = tblonly;
+ info->idxonly = idxonly;
+
+ simple_ptr_list_append(list, info);
+}
+
+/*
+ * appendRelationPattern
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern, to be
+ * matched against both tables and indexes.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendRelationPattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ appendRelationPatternHelper(list, pattern, encoding, false, false);
+}
+
+/*
+ * appendTablePattern
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern, to be
+ * matched only against tables.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendTablePattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ appendRelationPatternHelper(list, pattern, encoding, true, false);
+}
+
+/*
+ * appendIndexPattern
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern, to be
+ * matched only against indexes.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendIndexPattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ appendRelationPatternHelper(list, pattern, encoding, false, true);
+}
+
+/*
+ * appendDbPatternCTE
+ *
+ * Appends to the buffer the body of a Common Table Expression (CTE) containing
+ * the database portions filtered from the list of patterns expressed as three
+ * columns:
+ *
+ * id: the unique pattern ID
+ * pat: the full user specified pattern from the command line
+ * rgx: the database regular expression parsed from the pattern
+ *
+ * Patterns without a database portion are skipped. Patterns with more than
+ * just a database portion are optionally skipped, depending on argument
+ * 'inclusive'.
+ *
+ * buf: the buffer to be appended
+ * patterns: the list of patterns to be inserted into the CTE
+ * conn: the database connection
+ * inclusive: whether to include patterns with schema and/or relation parts
+ */
+static void
+appendDbPatternCTE(PQExpBuffer buf, const SimplePtrList *patterns,
+ PGconn *conn, bool inclusive)
+{
+ SimplePtrListCell *cell;
+ const char *comma;
+ bool have_values;
+
+ comma = "";
+ have_values = false;
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ PatternInfo *info = (PatternInfo *) cell->ptr;
+
+ if (info->dbrgx != NULL &&
+ (inclusive || (info->nsprgx == NULL && info->relrgx == NULL)))
+ {
+ if (!have_values)
+ appendPQExpBufferStr(buf, "\nVALUES");
+ have_values = true;
+ appendPQExpBuffer(buf, "%s\n(%d, ", comma, info->pattern_id);
+ appendStringLiteralConn(buf, info->pattern, conn);
+ appendPQExpBufferStr(buf, ", ");
+ appendStringLiteralConn(buf, info->dbrgx, conn);
+ appendPQExpBufferStr(buf, ")");
+ comma = ",";
+ }
+ }
+
+ if (!have_values)
+ appendPQExpBufferStr(buf, "\nSELECT NULL, NULL, NULL WHERE false");
+}
+
+/*
+ * compileDatabaseList
+ *
+ * Compiles a list of databases to check based on the user supplied options,
+ * sorted to preserve the order they were specified on the command line. In
+ * the event that multiple databases match a single command line pattern, they
+ * are secondarily sorted by name.
+ *
+ * conn: connection to the initial database
+ * databases: the list onto which databases should be appended
+ */
+static void
+compileDatabaseList(PGconn *conn, SimplePtrList *databases)
+{
+ PGresult *res;
+ PQExpBufferData sql;
+ int ntups;
+ int i;
+ bool fatal;
+
+ initPQExpBuffer(&sql);
+
+ /* Append the include patterns CTE. */
+ appendPQExpBufferStr(&sql, "WITH include_raw (id, pat, rgx) AS (");
+ appendDbPatternCTE(&sql, &opts.include, conn, true);
+
+ /* Append the exclude patterns CTE. */
+ appendPQExpBufferStr(&sql, "\n),\nexclude_raw (id, pat, rgx) AS (");
+ appendDbPatternCTE(&sql, &opts.exclude, conn, false);
+ appendPQExpBufferStr(&sql, "\n),");
+
+ /*
+ * Append the database CTE, which includes whether each database is
+ * connectable and also joins against exclude_raw to determine whether
+ * each database is excluded.
+ */
+ appendPQExpBufferStr(&sql,
+ "\ndatabase (datname) AS ("
+ "\nSELECT d.datname"
+ "\nFROM pg_catalog.pg_database d"
+ "\nLEFT OUTER JOIN exclude_raw e"
+ "\nON d.datname ~ e.rgx"
+ "\nWHERE d.datallowconn"
+ "\nAND e.id IS NULL"
+ "\n),"
+
+ /*
+ * Append the include_pat CTE, which joins the include_raw CTE against the
+ * databases CTE to determine if all the inclusion patterns had matches,
+ * and whether each matched pattern had the misfortune of only matching
+ * excluded or unconnectable databases.
+ */
+ "\ninclude_pat (id, pat, checkable) AS ("
+ "\nSELECT i.id, i.pat,"
+ "\nCOUNT(*) FILTER ("
+ "\nWHERE d IS NOT NULL"
+ "\n) AS checkable"
+ "\nFROM include_raw i"
+ "\nLEFT OUTER JOIN database d"
+ "\nON d.datname ~ i.rgx"
+ "\nGROUP BY i.id, i.pat"
+ "\n),"
+
+ /*
+ * Append the filtered_databases CTE, which selects from the database CTE
+ * optionally joined against the include_raw CTE to only select databases
+ * that match an inclusion pattern. This appears to duplicate what the
+ * include_pat CTE already did above, but here we want only databses, and
+ * there we wanted patterns.
+ */
+ "\nfiltered_databases (datname) AS ("
+ "\nSELECT DISTINCT d.datname"
+ "\nFROM database d");
+ if (!opts.alldb)
+ appendPQExpBufferStr(&sql,
+ "\nINNER JOIN include_raw i"
+ "\nON d.datname ~ i.rgx");
+ appendPQExpBufferStr(&sql,
+ "\n)"
+
+ /*
+ * Select the checkable databases and the unmatched inclusion patterns.
+ */
+ "\nSELECT pat, datname"
+ "\nFROM ("
+ "\nSELECT id, pat, NULL::TEXT AS datname"
+ "\nFROM include_pat"
+ "\nWHERE checkable = 0"
+ "\nUNION ALL"
+ "\nSELECT NULL, NULL, datname"
+ "\nFROM filtered_databases"
+ "\n) AS combined_records"
+ "\nORDER BY id NULLS LAST, datname");
+
+ res = executeQuery(conn, sql.data, opts.echo);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ disconnectDatabase(conn);
+ exit(1);
+ }
+ termPQExpBuffer(&sql);
+
+ ntups = PQntuples(res);
+ for (fatal = false, i = 0; i < ntups; i++)
+ {
+ const char *pat = NULL;
+ const char *datname = NULL;
+
+ if (!PQgetisnull(res, i, 0))
+ pat = PQgetvalue(res, i, 0);
+ if (!PQgetisnull(res, i, 1))
+ datname = PQgetvalue(res, i, 1);
+
+ if (pat != NULL)
+ {
+ /*
+ * Current record pertains to an inclusion pattern that matched no
+ * checkable databases.
+ */
+ fatal = opts.strict_names;
+ fprintf(stderr, "%s: no checkable database: \"%s\"\n",
+ progname, pat);
+ }
+ else
+ {
+ /* Current record pertains to a database */
+ Assert(datname != NULL);
+
+ DatabaseInfo *dat = (DatabaseInfo *) palloc0(sizeof(DatabaseInfo));
+
+ /* This database is included. Add to list */
+ if (opts.verbose)
+ fprintf(stderr, "%s: including database: \"%s\"\n", progname,
+ datname);
+
+ dat->datname = pstrdup(datname);
+ simple_ptr_list_append(databases, dat);
+ }
+ }
+ PQclear(res);
+
+ if (fatal)
+ {
+ disconnectDatabase(conn);
+ exit(1);
+ }
+}
+
+/*
+ * appendRelPatternRawCTE
+ *
+ * Appends to the buffer the body of a Common Table Expression (CTE) containing
+ * the patterns from the given list as seven columns:
+ *
+ * id: the unique pattern ID
+ * pat: the full user specified pattern from the command line
+ * dbrgx: the database regexp parsed from the pattern, or NULL if the
+ * pattern had no database part
+ * nsprgx: the namespace regexp parsed from the pattern, or NULL if the
+ * pattern had no namespace part
+ * relrgx: the relname regexp parsed from the pattern, or NULL if the
+ * pattern had no relname part
+ * tbl: true if the pattern applies only to tables (not indexes)
+ * idx: true if the pattern applies only to indexes (not tables)
+ *
+ * buf: the buffer to be appended
+ * patterns: the list of patterns to be inserted into the CTE
+ * conn: the database connection
+ */
+static void
+appendRelPatternRawCTE(PQExpBuffer buf, const SimplePtrList *patterns,
+ PGconn *conn)
+{
+ SimplePtrListCell *cell;
+ const char *comma;
+ bool have_values;
+
+ comma = "";
+ have_values = false;
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ PatternInfo *info = (PatternInfo *) cell->ptr;
+
+ if (!have_values)
+ appendPQExpBufferStr(buf, "\nVALUES");
+ have_values = true;
+ appendPQExpBuffer(buf, "%s\n(%d::INTEGER, ", comma, info->pattern_id);
+ appendStringLiteralConn(buf, info->pattern, conn);
+ appendPQExpBufferStr(buf, "::TEXT, ");
+ if (info->dbrgx == NULL)
+ appendPQExpBufferStr(buf, "NULL");
+ else
+ appendStringLiteralConn(buf, info->dbrgx, conn);
+ appendPQExpBufferStr(buf, "::TEXT, ");
+ if (info->nsprgx == NULL)
+ appendPQExpBufferStr(buf, "NULL");
+ else
+ appendStringLiteralConn(buf, info->nsprgx, conn);
+ appendPQExpBufferStr(buf, "::TEXT, ");
+ if (info->relrgx == NULL)
+ appendPQExpBufferStr(buf, "NULL");
+ else
+ appendStringLiteralConn(buf, info->relrgx, conn);
+ if (info->tblonly)
+ appendPQExpBufferStr(buf, "::TEXT, true::BOOLEAN");
+ else
+ appendPQExpBufferStr(buf, "::TEXT, false::BOOLEAN");
+ if (info->idxonly)
+ appendPQExpBufferStr(buf, ", true::BOOLEAN");
+ else
+ appendPQExpBufferStr(buf, ", false::BOOLEAN");
+ appendPQExpBufferStr(buf, ")");
+ comma = ",";
+ }
+
+ if (!have_values)
+ appendPQExpBufferStr(buf,
+ "\nSELECT NULL::INTEGER, NULL::TEXT, NULL::TEXT,"
+ "\nNULL::TEXT, NULL::TEXT, NULL::BOOLEAN,"
+ "\nNULL::BOOLEAN"
+ "\nWHERE false");
+}
+
+/*
+ * appendRelPatternFilteredCTE
+ *
+ * Appends to the buffer a Common Table Expression (CTE) which selects
+ * all patterns from the named raw CTE, filtered by database. All patterns
+ * which have no database portion or whose database portion matches our
+ * connection's database name are selected, with other patterns excluded.
+ *
+ * The basic idea here is that if we're connected to database "foo" and we have
+ * patterns "foo.bar.baz", "alpha.beta" and "one.two.three", we only want to
+ * use the first two while processing relations in this database, as the third
+ * one is not relevant.
+ *
+ * buf: the buffer to be appended
+ * raw: the name of the CTE to select from
+ * filtered: the name of the CTE to create
+ * conn: the database connection
+ */
+static void
+appendRelPatternFilteredCTE(PQExpBuffer buf, const char *raw,
+ const char *filtered, PGconn *conn)
+{
+ appendPQExpBuffer(buf,
+ "\n%s (id, pat, nsprgx, relrgx, tbl, idx) AS ("
+ "\nSELECT id, pat, nsprgx, relrgx, tbl, idx"
+ "\nFROM %s r"
+ "\nWHERE (r.dbrgx IS NULL"
+ "\nOR ",
+ filtered, raw);
+ appendStringLiteralConn(buf, PQdb(conn), conn);
+ appendPQExpBufferStr(buf, " ~ r.dbrgx)");
+ appendPQExpBufferStr(buf,
+ "\nAND (r.nsprgx IS NOT NULL"
+ "\nOR r.relrgx IS NOT NULL)"
+ "\n),");
+}
+
+/*
+ * compileRelationListOneDb
+ *
+ * Compiles a list of relations to check within the currently connected
+ * database based on the user supplied options, sorted by descending size,
+ * and appends them to the given relations list.
+ *
+ * The cells of the constructed list contain all information about the relation
+ * necessary to connect to the database and check the object, including which
+ * database to connect to, where contrib/amcheck is installed, and the Oid and
+ * type of object (table vs. index). Rather than duplicating the database
+ * details per relation, the relation structs use references to the same
+ * database object, provided by the caller.
+ *
+ * conn: connection to this next database, which should be the same as in 'dat'
+ * relations: list onto which the relations information should be appended
+ * dat: the database info struct for use by each relation
+ */
+static void
+compileRelationListOneDb(PGconn *conn, SimplePtrList *relations,
+ const DatabaseInfo *dat)
+{
+ PGresult *res;
+ PQExpBufferData sql;
+ int ntups;
+ int i;
+ const char *datname;
+
+ initPQExpBuffer(&sql);
+ appendPQExpBufferStr(&sql, "WITH");
+
+ /* Append CTEs for the relation inclusion patterns, if any */
+ if (!opts.allrel)
+ {
+ appendPQExpBufferStr(&sql,
+ "\ninclude_raw (id, pat, dbrgx, nsprgx, relrgx, tbl, idx) AS (");
+ appendRelPatternRawCTE(&sql, &opts.include, conn);
+ appendPQExpBufferStr(&sql, "\n),");
+ appendRelPatternFilteredCTE(&sql, "include_raw", "include_pat", conn);
+ }
+
+ /* Append CTEs for the relation exclusion patterns, if any */
+ if (opts.excludetbl || opts.excludeidx)
+ {
+ appendPQExpBufferStr(&sql,
+ "\nexclude_raw (id, pat, dbrgx, nsprgx, relrgx, tbl, idx) AS (");
+ appendRelPatternRawCTE(&sql, &opts.exclude, conn);
+ appendPQExpBufferStr(&sql, "\n),");
+ appendRelPatternFilteredCTE(&sql, "exclude_raw", "exclude_pat", conn);
+ }
+
+ /* Append the relation CTE. */
+ appendPQExpBufferStr(&sql,
+ "\nrelation (id, pat, oid, reltoastrelid, relpages, tbl, idx) AS ("
+ "\nSELECT DISTINCT ON (c.oid");
+ if (!opts.allrel)
+ appendPQExpBufferStr(&sql, ", ip.id) ip.id, ip.pat,");
+ else
+ appendPQExpBufferStr(&sql, ") NULL::INTEGER AS id, NULL::TEXT AS pat,");
+ appendPQExpBuffer(&sql,
+ "\nc.oid, c.reltoastrelid, c.relpages,"
+ "\nc.relam = %u AS tbl,"
+ "\nc.relam = %u AS idx"
+ "\nFROM pg_catalog.pg_class c"
+ "\nINNER JOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace = n.oid",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+ if (!opts.allrel)
+ appendPQExpBuffer(&sql,
+ "\nINNER JOIN include_pat ip"
+ "\nON (n.nspname ~ ip.nsprgx OR ip.nsprgx IS NULL)"
+ "\nAND (c.relname ~ ip.relrgx OR ip.relrgx IS NULL)"
+ "\nAND (c.relam = %u OR NOT ip.tbl)"
+ "\nAND (c.relam = %u OR NOT ip.idx)",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+ if (opts.excludetbl || opts.excludeidx)
+ appendPQExpBuffer(&sql,
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON (n.nspname ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (c.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND (c.relam = %u OR NOT e.tbl)"
+ "\nAND (c.relam = %u OR NOT e.idx)",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+
+ if (opts.excludetbl || opts.excludeidx)
+ appendPQExpBufferStr(&sql, "\nWHERE e.pat IS NULL");
+ else
+ appendPQExpBufferStr(&sql, "\nWHERE true");
+
+ /*
+ * We need to be careful not to break the --no-toast-expansion and
+ * --no-index-expansion options. By default, the indexes, toast tables,
+ * and toast table indexes associated with primary tables are included,
+ * using their own CTEs below. We implement the --exclude-* options by not
+ * creating those CTEs, but that's no use if we've already selected the
+ * toast and indexes here. On the other hand, we want inclusion patterns
+ * that match indexes or toast tables to be honored. So, if inclusion
+ * patterns were given, we want to select all tables, toast tables, or
+ * indexes that match the patterns. But if no inclusion patterns were
+ * given, and we're simply matching all relations, then we only want to
+ * match the primary tables here.
+ */
+ if (opts.allrel)
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam = %u"
+ "\nAND c.relkind IN ('r', 'm', 't')"
+ "\nAND c.relnamespace != %u",
+ HEAP_TABLE_AM_OID, PG_TOAST_NAMESPACE);
+ else
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam IN (%u, %u)"
+ "\nAND c.relkind IN ('r', 'm', 't', 'i')"
+ "\nAND ((c.relam = %u AND c.relkind IN ('r', 'm', 't')) OR"
+ "\n(c.relam = %u AND c.relkind = 'i'))",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID,
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+
+ appendPQExpBufferStr(&sql,
+ "\nORDER BY c.oid"
+ "\n)");
+
+ if (!opts.no_toast)
+ {
+ /*
+ * Include a CTE for toast tables associated with primary tables
+ * selected above, filtering by exclusion patterns (if any) that match
+ * toast table names.
+ */
+ appendPQExpBufferStr(&sql,
+ ",\ntoast (oid, relpages) AS ("
+ "\nSELECT t.oid, t.relpages"
+ "\nFROM pg_catalog.pg_class t"
+ "\nINNER JOIN relation r"
+ "\nON r.reltoastrelid = t.oid");
+ if (opts.excludetbl)
+ appendPQExpBufferStr(&sql,
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON ('pg_toast' ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (t.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND e.tbl"
+ "\nWHERE e.id IS NULL");
+ appendPQExpBufferStr(&sql,
+ "\n)");
+ }
+ if (!opts.no_indexes)
+ {
+ /*
+ * Include a CTE for btree indexes associated with primary tables
+ * selected above, filtering by exclusion patterns (if any) that match
+ * btree index names.
+ */
+ appendPQExpBuffer(&sql,
+ ",\nindex (oid, relpages) AS ("
+ "\nSELECT c.oid, c.relpages"
+ "\nFROM relation r"
+ "\nINNER JOIN pg_catalog.pg_index i"
+ "\nON r.oid = i.indrelid"
+ "\nINNER JOIN pg_catalog.pg_class c"
+ "\nON i.indexrelid = c.oid");
+ if (opts.excludeidx)
+ appendPQExpBufferStr(&sql,
+ "\nINNER JOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace = n.oid"
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON (n.nspname ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (c.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND e.idx"
+ "\nWHERE e.id IS NULL");
+ else
+ appendPQExpBufferStr(&sql,
+ "\nWHERE true");
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam = %u"
+ "\nAND c.relkind = 'i'",
+ BTREE_AM_OID);
+ if (opts.no_toast)
+ appendPQExpBuffer(&sql,
+ "\nAND c.relnamespace != %u",
+ PG_TOAST_NAMESPACE);
+ appendPQExpBufferStr(&sql, "\n)");
+ }
+
+ if (!opts.no_toast && !opts.no_indexes)
+ {
+ /*
+ * Include a CTE for btree indexes associated with toast tables of
+ * primary tables selected above, filtering by exclusion patterns (if
+ * any) that match the toast index names.
+ */
+ appendPQExpBuffer(&sql,
+ ",\ntoast_index (oid, relpages) AS ("
+ "\nSELECT c.oid, c.relpages"
+ "\nFROM toast t"
+ "\nINNER JOIN pg_catalog.pg_index i"
+ "\nON t.oid = i.indrelid"
+ "\nINNER JOIN pg_catalog.pg_class c"
+ "\nON i.indexrelid = c.oid");
+ if (opts.excludeidx)
+ appendPQExpBufferStr(&sql,
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON ('pg_toast' ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (c.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND e.idx"
+ "\nWHERE e.id IS NULL");
+ else
+ appendPQExpBufferStr(&sql,
+ "\nWHERE true");
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam = %u"
+ "\nAND c.relkind = 'i'"
+ "\n)",
+ BTREE_AM_OID);
+ }
+
+ /*
+ * Roll-up distinct rows from CTEs.
+ *
+ * Relations that match more than one pattern may occur more than once in
+ * the list, and indexes and toast for primary relations may also have
+ * matched in their own right, so we rely on UNION to deduplicate the
+ * list.
+ */
+ appendPQExpBuffer(&sql,
+ "\nSELECT id, tbl, idx, oid"
+ "\nFROM (");
+ appendPQExpBufferStr(&sql,
+ /* Inclusion patterns that failed to match */
+ "\nSELECT id, tbl, idx,"
+ "\nNULL::OID AS oid,"
+ "\nNULL::INTEGER AS relpages"
+ "\nFROM relation"
+ "\nWHERE id IS NOT NULL"
+ "\nUNION"
+ /* Primary relations */
+ "\nSELECT NULL::INTEGER AS id,"
+ "\ntbl, idx,"
+ "\noid, relpages"
+ "\nFROM relation");
+ if (!opts.no_toast)
+ appendPQExpBufferStr(&sql,
+ "\nUNION"
+ /* Toast tables for primary relations */
+ "\nSELECT NULL::INTEGER AS id, TRUE AS tbl,"
+ "\nFALSE AS idx, oid, relpages"
+ "\nFROM toast");
+ if (!opts.no_indexes)
+ appendPQExpBufferStr(&sql,
+ "\nUNION"
+ /* Indexes for primary relations */
+ "\nSELECT NULL::INTEGER AS id, FALSE AS tbl,"
+ "\nTRUE AS idx, oid, relpages"
+ "\nFROM index");
+ if (!opts.no_toast && !opts.no_indexes)
+ appendPQExpBufferStr(&sql,
+ "\nUNION"
+ /* Indexes for toast relations */
+ "\nSELECT NULL::INTEGER AS id, FALSE AS tbl,"
+ "\nTRUE AS idx, oid, relpages"
+ "\nFROM toast_index");
+ appendPQExpBufferStr(&sql,
+ "\n) AS combined_records"
+ "\nORDER BY relpages DESC NULLS FIRST, oid");
+
+ res = executeQuery(conn, sql.data, opts.echo);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ disconnectDatabase(conn);
+ exit(1);
+ }
+ termPQExpBuffer(&sql);
+
+ /*
+ * Allocate a single copy of the database name to be shared by all nodes
+ * in the object list, constructed below.
+ */
+ datname = pstrdup(PQdb(conn));
+
+ ntups = PQntuples(res);
+ for (i = 0; i < ntups; i++)
+ {
+ int pattern_id = 0;
+ bool tbl = false;
+ bool idx = false;
+ Oid oid = InvalidOid;
+
+ if (!PQgetisnull(res, i, 0))
+ pattern_id = atoi(PQgetvalue(res, i, 0));
+ if (!PQgetisnull(res, i, 1))
+ tbl = (PQgetvalue(res, i, 1)[0] == 't');
+ if (!PQgetisnull(res, i, 2))
+ idx = (PQgetvalue(res, i, 2)[0] == 't');
+ if (!PQgetisnull(res, i, 3))
+ oid = atooid(PQgetvalue(res, i, 3));
+
+ if (pattern_id > 0)
+ {
+ /*
+ * Current record pertains to an inclusion pattern. Find the
+ * pattern in the list and record that it matched. If we expected
+ * a large number of command-line inclusion pattern arguments, the
+ * datastructure here might need to be more efficient, but we
+ * expect the list to be short.
+ */
+
+ SimplePtrListCell *cell;
+ bool found;
+
+ for (found = false, cell = opts.include.head; cell; cell = cell->next)
+ {
+ PatternInfo *info = (PatternInfo *) cell->ptr;
+
+ if (info->pattern_id == pattern_id)
+ {
+ info->matched = true;
+ found = true;
+ break;
+ }
+ }
+ if (!found)
+ {
+ pg_log_error("internal error: received unexpected pattern_id %d",
+ pattern_id);
+ exit(1);
+ }
+ }
+ else
+ {
+ /* Current record pertains to a relation */
+
+ RelationInfo *rel = (RelationInfo *) palloc0(sizeof(RelationInfo));
+
+ Assert(OidIsValid(oid));
+ Assert(!(tbl && idx));
+
+ rel->datinfo = dat;
+ rel->reloid = oid;
+ rel->is_table = tbl;
+
+ simple_ptr_list_append(relations, rel);
+ }
+ }
+ PQclear(res);
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..8c6e267ee9
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,213 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 60;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+# Failing to connect to the initial database is an error.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'qqq' ],
+ qr/database "qqq" does not exist/,
+ 'checking a non-existent database');
+
+# Failing to resolve a secondary database name is also an error, though since
+# the string is treated as a pattern, the error message looks different.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', 'qqq' ],
+ qr/pg_amcheck: no checkable database: "qqq"/,
+ 'checking a non-existent database');
+
+# Failing to connect to the initial database is still an error when using
+# --no-strict-names.
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-p', $port, 'qqq' ],
+ qr/database "qqq" does not exist/,
+ 'checking a non-existent database with --no-strict-names');
+
+# But failing to resolve secondary database names is not an error when using
+# --no-strict-names. We should still see the message, but as a non-fatal
+# warning
+$node->command_checks_all(
+ [ 'pg_amcheck', '--no-strict-names', '-p', $port, '-d', 'no_such_database', 'postgres', 'qqq' ],
+ 0,
+ [ ],
+ [ qr/no checkable database: "qqq"/ ],
+ 'checking a non-existent secondary database with --no-strict-names');
+
+# Check that a substring of an existent database name does not get interpreted
+# as a matching pattern.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'post' ],
+ qr/database "post" does not exist/,
+ 'checking a non-existent primary database (substring of existent database)');
+
+# And again, but testing the secondary database name rather than the primary
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', 'post' ],
+ qr/pg_amcheck: no checkable database: "post"/,
+ 'checking a non-existent secondary database (substring of existent database)');
+
+# Likewise, check that a superstring of an existent database name does not get
+# interpreted as a matching pattern.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postresql' ],
+ qr/database "postresql" does not exist/,
+ 'checking a non-existent primary database (superstring of existent database)');
+
+# And again, but testing the secondary database name rather than the primary
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', 'postgresql' ],
+ qr/pg_amcheck: no checkable database: "postgresql"/,
+ 'checking a non-existent secondary database (superstring of existent database)');
+
+#########################################
+# Test connecting with a non-existent user
+
+# Failing to connect to the initial database due to bad username is an error.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-U=no_such_user', 'postgres' ],
+ qr/role "=no_such_user" does not exist/,
+ 'checking with a non-existent user');
+
+# Failing to connect to the initial database due to bad username is still an
+# error when using --no-strict-names.
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-p', $port, '-U=no_such_user', 'postgres' ],
+ qr/role "=no_such_user" does not exist/,
+ 'checking with a non-existent user, --no-strict-names');
+
+#########################################
+# Test checking databases without amcheck installed
+
+# Attempting to check a database by name where amcheck is not installed should
+# raise a warning. If all databases are skipped, having no relations to check
+# raises an error.
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'template1' ],
+ 1,
+ [],
+ [ qr/pg_amcheck: skipping database "template1": amcheck is not installed/,
+ qr/pg_amcheck: no relations to check/ ],
+ 'checking a database by name without amcheck installed');
+
+# Likewise, but by database pattern rather than by name, such that some
+# databases with amcheck installed are included, and so checking occurs and
+# only a warning is raised.
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, '-d', '*', 'postgres' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: skipping database "template1": amcheck is not installed/ ],
+ 'checking a database by dbname implication without amcheck installed');
+
+# And again, but by checking all databases.
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, '--all', 'postgres' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: skipping database "template1": amcheck is not installed/ ],
+ 'checking a database by --all implication without amcheck installed');
+
+#########################################
+# Test unreasonable patterns
+
+# Check three-part unreasonable pattern that has zero-length names
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-t', '..' ],
+ 1,
+ [ qr/^$/ ],
+ [ qr/pg_amcheck: no checkable database: "\.\."/ ],
+ 'checking table pattern ".."');
+
+# Again, but with non-trivial schema and relation parts
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-t', '.foo.bar' ],
+ 1,
+ [ qr/^$/ ],
+ [ qr/pg_amcheck: no checkable database: "\.foo\.bar"/ ],
+ 'checking table pattern ".foo.bar"');
+
+# Check two-part unreasonable pattern that has zero-length names
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-t', '.' ],
+ 1,
+ [ qr/^$/ ],
+ [ qr/pg_amcheck: no tables to check for "\."/ ],
+ 'checking table pattern "."');
+
+#########################################
+# Test checking non-existent schemas, tables, and indexes
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no_such_schema' ],
+ qr/pg_amcheck: no relations to check in schemas for "no_such_schema"/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-s', 'no_such_schema' ],
+ qr/pg_amcheck: no relations to check/,
+ 'checking a non-existent schema with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no_such_table' ],
+ qr/pg_amcheck: no tables to check for "no_such_table"/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-t', 'no_such_table' ],
+ qr/pg_amcheck: no relations to check/,
+ 'checking a non-existent table with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no_such_index' ],
+ qr/pg_amcheck: no btree indexes to check for "no_such_index"/,
+ 'checking a non-existent index');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-i', 'no_such_index' ],
+ qr/pg_amcheck: no relations to check/,
+ 'checking a non-existent index with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no*such*schema*' ],
+ qr/pg_amcheck: no relations to check in schemas for "no\*such\*schema\*"/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-s', 'no*such*schema*' ],
+ qr/pg_amcheck: no relations to check/,
+ 'no matching schemas with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no*such*table*' ],
+ qr/pg_amcheck: no tables to check for "no\*such\*table\*"/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-t', 'no*such*table*' ],
+ qr/pg_amcheck: no relations to check/,
+ 'no matching tables with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no*such*index*' ],
+ qr/pg_amcheck: no btree indexes to check for "no\*such\*index\*"/,
+ 'no matching indexes');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-i', 'no*such*index*' ],
+ qr/pg_amcheck: no relations to check/,
+ 'no matching indexes with --no-strict-names -v');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..f985273e83
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,520 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 70;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the test node is running.
+sub corrupt_first_page($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# toast table (if any) corresponding to the given main table relation, and
+# restarts the node.
+#
+# Assumes the test node is running
+sub remove_toast_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $toastname = relation_toast($dbname, $relname);
+ remove_relation_file($dbname, $toastname) if ($toastname);
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+for my $dbname (qw(db1 db2 db3))
+{
+ # Create the database
+ $node->safe_psql('postgres', qq(CREATE DATABASE $dbname));
+
+ # Load the amcheck extension, upon which pg_amcheck depends. Put the
+ # extension in an unexpected location to test that pg_amcheck finds it
+ # correctly. Create tables with names that look like pg_catalog names to
+ # check that pg_amcheck does not get confused by them. Create functions in
+ # schema public that look like amcheck functions to check that pg_amcheck
+ # does not use them.
+ $node->safe_psql($dbname, q(
+ CREATE SCHEMA amcheck_schema;
+ CREATE EXTENSION amcheck WITH SCHEMA amcheck_schema;
+ CREATE TABLE amcheck_schema.pg_database (junk text);
+ CREATE TABLE amcheck_schema.pg_namespace (junk text);
+ CREATE TABLE amcheck_schema.pg_class (junk text);
+ CREATE TABLE amcheck_schema.pg_operator (junk text);
+ CREATE TABLE amcheck_schema.pg_proc (junk text);
+ CREATE TABLE amcheck_schema.pg_tablespace (junk text);
+
+ CREATE FUNCTION public.bt_index_check(index regclass,
+ heapallindexed boolean default false)
+ RETURNS VOID AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong bt_index_check!';
+ END;
+ $$ LANGUAGE plpgsql;
+
+ CREATE FUNCTION public.bt_index_parent_check(index regclass,
+ heapallindexed boolean default false,
+ rootdescend boolean default false)
+ RETURNS VOID AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong bt_index_parent_check!';
+ END;
+ $$ LANGUAGE plpgsql;
+
+ CREATE FUNCTION public.verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip text default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ RETURNS SETOF record AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong verify_heapam!';
+ END;
+ $$ LANGUAGE plpgsql;
+ ));
+
+ # Create schemas, tables and indexes in five separate
+ # schemas. The schemas are all identical to start, but
+ # we will corrupt them differently later.
+ #
+ for my $schema (qw(s1 s2 s3 s4 s5))
+ {
+ $node->safe_psql($dbname, qq(
+ CREATE SCHEMA $schema;
+ CREATE SEQUENCE $schema.seq1;
+ CREATE SEQUENCE $schema.seq2;
+ CREATE TABLE $schema.t1 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE TABLE $schema.t2 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE VIEW $schema.t2_view AS (
+ SELECT i*2, t FROM $schema.t2
+ );
+ ALTER TABLE $schema.t2
+ ALTER COLUMN t
+ SET STORAGE EXTERNAL;
+
+ INSERT INTO $schema.t1 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ INSERT INTO $schema.t2 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ CREATE MATERIALIZED VIEW $schema.t1_mv AS SELECT * FROM $schema.t1;
+ CREATE MATERIALIZED VIEW $schema.t2_mv AS SELECT * FROM $schema.t2;
+
+ create table $schema.p1 (a int, b int) PARTITION BY list (a);
+ create table $schema.p2 (a int, b int) PARTITION BY list (a);
+
+ create table $schema.p1_1 partition of $schema.p1 for values in (1, 2, 3);
+ create table $schema.p1_2 partition of $schema.p1 for values in (4, 5, 6);
+ create table $schema.p2_1 partition of $schema.p2 for values in (1, 2, 3);
+ create table $schema.p2_2 partition of $schema.p2 for values in (4, 5, 6);
+
+ CREATE INDEX t1_btree ON $schema.t1 USING BTREE (i);
+ CREATE INDEX t2_btree ON $schema.t2 USING BTREE (i);
+
+ CREATE INDEX t1_hash ON $schema.t1 USING HASH (i);
+ CREATE INDEX t2_hash ON $schema.t2 USING HASH (i);
+
+ CREATE INDEX t1_brin ON $schema.t1 USING BRIN (i);
+ CREATE INDEX t2_brin ON $schema.t2 USING BRIN (i);
+
+ CREATE INDEX t1_gist ON $schema.t1 USING GIST (b);
+ CREATE INDEX t2_gist ON $schema.t2 USING GIST (b);
+
+ CREATE INDEX t1_gin ON $schema.t1 USING GIN (ia);
+ CREATE INDEX t2_gin ON $schema.t2 USING GIN (ia);
+
+ CREATE INDEX t1_spgist ON $schema.t1 USING SPGIST (ir);
+ CREATE INDEX t2_spgist ON $schema.t2 USING SPGIST (ir);
+ ));
+ }
+}
+
+# Database 'db1' corruptions
+#
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('db1', 's1.t1_btree');
+corrupt_first_page('db1', 's1.t2_btree');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('db1', 's2.t1');
+corrupt_first_page('db1', 's2.t2');
+
+# Corrupt tables, partitions, matviews, and btrees in schema "s3"
+remove_relation_file('db1', 's3.t1');
+corrupt_first_page('db1', 's3.t2');
+
+remove_relation_file('db1', 's3.t1_mv');
+remove_relation_file('db1', 's3.p1_1');
+
+corrupt_first_page('db1', 's3.t2_mv');
+corrupt_first_page('db1', 's3.p2_1');
+
+remove_relation_file('db1', 's3.t1_btree');
+corrupt_first_page('db1', 's3.t2_btree');
+
+# Corrupt toast table, partitions, and materialized views in schema "s4"
+remove_toast_file('db1', 's4.t2');
+
+# Corrupt all other object types in schema "s5". We don't have amcheck support
+# for these types, but we check that their corruption does not trigger any
+# errors in pg_amcheck
+remove_relation_file('db1', 's5.seq1');
+remove_relation_file('db1', 's5.t1_hash');
+remove_relation_file('db1', 's5.t1_gist');
+remove_relation_file('db1', 's5.t1_gin');
+remove_relation_file('db1', 's5.t1_brin');
+remove_relation_file('db1', 's5.t1_spgist');
+
+corrupt_first_page('db1', 's5.seq2');
+corrupt_first_page('db1', 's5.t2_hash');
+corrupt_first_page('db1', 's5.t2_gist');
+corrupt_first_page('db1', 's5.t2_gin');
+corrupt_first_page('db1', 's5.t2_brin');
+corrupt_first_page('db1', 's5.t2_spgist');
+
+
+# Database 'db2' corruptions
+#
+remove_relation_file('db2', 's1.t1');
+remove_relation_file('db2', 's1.t1_btree');
+
+
+# Leave 'db3' uncorrupted
+#
+
+
+# Standard first arguments to TestLib functions
+my @cmd = ('pg_amcheck', '--quiet', '-p', $port);
+
+# The pg_amcheck command itself should return exit status = 2, because tables
+# and indexes are corrupt. Exit status = 1 would mean the pg_amcheck command
+# itself failed, for example because a connection to the database could not be
+# established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1' ],
+ 2, [], [],
+ 'pg_amcheck all schemas, tables and indexes in database db1');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', 'db2', 'db3' ],
+ 2, [], [],
+ 'pg_amcheck all schemas, tables and indexes in databases db1, db2 and db3');
+
+$node->command_checks_all(
+ [ @cmd, '--all' ],
+ 2, [], [],
+ 'pg_amcheck all schemas, tables and indexes in all databases');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's1' ],
+ 2, [], [],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-r', 's*.t1' ],
+ 2, [], [],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-i', 's*.t*', '-i', 's*.*btree*' ],
+ 2, [], [],
+ 'pg_amcheck all indexes with qualified names matching /s*.t*/ or /s*.*btree*/');
+
+$node->command_checks_all(
+ [ @cmd, '--no-toast-expansion', '--no-index-expansion', 'db1', '-r', 's*.t1' ],
+ 2, [], [],
+ 'pg_amcheck all relations with qualified names matching /s*.t1/');
+
+$node->command_checks_all(
+ [ @cmd, '--no-toast-expansion', '--no-index-expansion', 'db1', '-t', 's*.t1' ],
+ 2, [], [],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-T', 't1' ],
+ 2, [], [],
+ 'pg_amcheck everything except tables named t1');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-S', 's1', '-R', 't1' ],
+ 2, [], [],
+ 'pg_amcheck everything not named t1 nor in schema s1');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', '*.*.*' ],
+ 2, [], [],
+ 'pg_amcheck all tables across all databases and schemas');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', '*.*.t1' ],
+ 2, [], [],
+ 'pg_amcheck all tables named t1 across all databases and schemas');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', '*.s1.*' ],
+ 2, [], [],
+ 'pg_amcheck all tables across all databases in schemas named s1');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', 'db2.*.*' ],
+ 2, [], [],
+ 'pg_amcheck all tables across all schemas in database db2');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', 'db2.*.*', '-t', 'db3.*.*' ],
+ 2, [], [],
+ 'pg_amcheck all tables across all schemas in databases db2 and db3');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+
+$node->command_checks_all(
+ [ @cmd, '--all', '-s', 's1', '-i', 't1_btree' ],
+ 2,
+ [ qr/index "t1_btree" lacks a main relation fork/ ],
+ [ qr/pg_amcheck: skipping database "postgres": amcheck is not installed/ ],
+ 'pg_amcheck index s1.t1_btree reports missing main relation fork');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't2_btree' ],
+ 2,
+ [ qr/.+/ ], # Any non-empty error message is acceptable
+ [ qr/^$/ ],
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+# Checking db1.s1 should show no corruptions if indexes are excluded
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', 's1.*', '--no-index-expansion' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck of db1.s1 excluding indexes');
+
+# Checking db2.s1 should show table corruptions if indexes are excluded
+$node->command_checks_all(
+ [ @cmd, 'db2', '-t', 's1.*', '--no-index-expansion' ],
+ 2,
+ [ qr/could not open file/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck of db2.s1 excluding indexes');
+
+# Checking db2.s1 should show table corruptions if indexes are excluded
+$node->command_checks_all(
+ [ @cmd, 'db3', 'db2', '-t', 's1.*', '--no-index-expansion' ],
+ 2,
+ [ qr/could not open file/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck of db1.s1, db2.s1, and db3.s1, excluding indexes');
+
+# In schema s3, the tables and indexes are both corrupt. We should see
+# corruption messages on stdout, nothing on stderr, and an exit
+# status of zero.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's3' ],
+ 2,
+ [ qr/index "t1_btree" lacks a main relation fork/,
+ qr/could not open file/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck schema s3 reports table and index errors');
+
+# In schema s2, only tables are corrupt. Check that table corruption is
+# reported as expected.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't1' ],
+ 2,
+ [ qr/could not open file/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck in schema s2 reports table corruption');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't2' ],
+ 2,
+ [ qr/.+/ ], # Any non-empty error message is acceptable
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck in schema s2 reports table corruption');
+
+# In schema s4, only toast tables are corrupt. Check that under default
+# options the toast corruption is reported, but when excluding toast we get no
+# error reports.
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's4' ],
+ 2,
+ [ qr/could not open file/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck in schema s4 reports toast corruption');
+
+$node->command_checks_all(
+ [ @cmd, '--no-toast-expansion', '--exclude-toast-pointers', 'db1', '-s', 's4' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck in schema s4 excluding toast reports no corruption');
+
+# Check that no corruption is reported in schema s5
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's5' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck over schema s5 reports no corruption');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's1', '-I', 't1_btree', '-I', 't2_btree' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck over schema s1 with corrupt indexes excluded reports no corruption');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', 's1.*', '--no-index-expansion' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck over schema s1 with all indexes excluded reports no corruption');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's2', '-T', 't1', '-T', 't2' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck over schema s2 with corrupt tables excluded reports no corruption');
+
+# Check errors about bad block range command line arguments. We use schema s5
+# to avoid getting messages about corrupt tables or indexes.
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', 'junk' ],
+ qr/relation starting block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage startblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--endblock', '1234junk' ],
+ qr/relation ending block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage endblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', '5', '--endblock', '4' ],
+ qr/relation ending block argument precedes starting block argument/,
+ 'pg_amcheck rejects invalid block range');
+
+# Check bt_index_parent_check alternates. We don't create any index corruption
+# that would behave differently under these modes, so just smoke test that the
+# arguments are handled sensibly.
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--parent-check' ],
+ 2,
+ [ qr/index "t1_btree" lacks a main relation fork/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck smoke test --parent-check');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--heapallindexed', '--rootdescend' ],
+ 2,
+ [ qr/index "t1_btree" lacks a main relation fork/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck smoke test --heapallindexed --rootdescend');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..cae0e90dbe
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,496 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 21;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper function to generate a regular expression matching the header we
+# expect verify_heapam() to return given which fields we expect to be non-null.
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ return qr/relation postgres\.public\.test, block $blkno, offset $offnum, attribute $attnum\s+/ms
+ if (defined $attnum);
+ return qr/relation postgres\.public\.test, block $blkno, offset $offnum\s+/ms
+ if (defined $offnum);
+ return qr/relation postgres\.public\.test\s+/ms
+ if (defined $blkno);
+ return qr/relation postgres\.public\.test\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, undef);
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_LOCK_ONLY and HEAP_KEYS_UPDATED
+ $tup->{t_infomask} |= HEAP_XMAX_LOCK_ONLY;
+ $tup->{t_infomask2} |= HEAP_KEYS_UPDATED;
+
+ push @expected,
+ qr/${header}tuple is marked as only locked, but also claims key columns were updated/;
+ }
+ elsif ($offnum == 15)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 16) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--no-index-expansion', '-p', $port, 'postgres'],
+ 2,
+ [ @expected ],
+ [ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..eba8ea9cae
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,54 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 5;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ 2,
+ [ qr/item order invariant violated for index "fickleidx"/ ],
+ [ ],
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..7e101f7c11 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -185,6 +185,7 @@ pages.
</para>
&oid2name;
+ &pgamcheck;
&vacuumlo;
</sect1>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db1d369743..5115cb03d0 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..76e5a0e511
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,670 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<refentry id="pgamcheck">
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle><application>pg_amcheck</application></refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>pg_amcheck</refname>
+ <refpurpose>checks for corruption in one or more
+ <productname>PostgreSQL</productname> databases</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_amcheck</command>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+ <arg rep="repeat"><replaceable>dbname</replaceable></arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <application>pg_amcheck</application> supports running
+ <xref linkend="amcheck"/>'s corruption checking functions against one or
+ more databases, with options to select which schemas, tables and indexes to
+ check, which kinds of checking to perform, and whether to perform the checks
+ in parallel, and if so, the number of parallel connections to establish and
+ use.
+ </para>
+
+ <para>
+ Only table relations and btree indexes are currently supported. Other
+ relation types are silently skipped.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Options</title>
+
+ <para>
+ pg_amcheck accepts the following command-line arguments:
+
+ <variablelist>
+ <varlistentry>
+ <term><option>--all</option></term>
+ <listitem>
+ <para>
+ Perform checking in all databases.
+ </para>
+ <para>
+ In the absence of any other options, selects all objects across all
+ schemas and databases.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-dbname</option> takes
+ precedence over <option>--all</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-d</option></term>
+ <term><option>--dbname</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for checking. By default, all objects
+ in the matching database(s) will be checked.
+ </para>
+ <para>
+ If no <option>maintenance-db</option> argument is given nor is any
+ database name given as a command line argument, the first argument
+ specified with <option>-d</option> <option>--dbname</option> will be
+ used for the initial connection. If that argument is not a literal
+ database name, the attempt to connect will fail.
+ </para>
+ <para>
+ If <option>--all</option> is also specified, <option>-d</option>
+ <option>--dbname</option> does not affect which databases are checked,
+ but may be used to specify the database for the initial connection.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-dbname</option> takes
+ precedence over <option>-d</option> <option>--dbname</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--dbname=africa</literal></member>
+ <member><literal>--dbname="a*"</literal></member>
+ <member><literal>--dbname="africa|asia|europe"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-D</option></term>
+ <term><option>--exclude-dbname</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for exclusion.
+ </para>
+ <para>
+ If a database which is included using <option>--all</option> or
+ <option>-d</option> <option>--dbname</option> is also excluded using
+ <option>-D</option> <option>--exclude-dbname</option>, the database will
+ be excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--exclude-dbname=america</literal></member>
+ <member><literal>--exclude-dbname="*pacific*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-e</option></term>
+ <term><option>--echo</option></term>
+ <listitem>
+ <para>
+ Print to stdout all commands and queries being executed against the
+ server.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--endblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) all pages after the given ending block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables, but note that
+ unless <option>--exclude-toast-pointers</option> is given, toast
+ pointers found in the main table will be followed into the toast table
+ without regard for the location in the toast table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--exclude-toast-pointers</option></term>
+ <listitem>
+ <para>
+ When checking main relations, do not look up entries in toast tables
+ corresponding to toast pointers in the main relation.
+ </para>
+ <para>
+ The default behavior checks each toast pointer encountered in the main
+ table to verify, as much as possible, that the pointer points at
+ something in the toast table that is reasonable. Toast pointers which
+ point beyond the end of the toast table, or to the middle (rather than
+ the beginning) of a toast entry, are identified as corrupt.
+ </para>
+ <para>
+ The process by which <xref linkend="amcheck"/>'s
+ <function>verify_heapam</function> function checks each toast pointer is
+ slow and may be improved in a future release. Some users may wish to
+ disable this check to save time.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-H</option></term>
+ <term><option>--heapallindexed</option></term>
+ <listitem>
+ <para>
+ For each index checked, verify the presence of all heap tuples as index
+ tuples in the index using <application>amcheck</application>'s
+ <option>heapallindexed</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_amcheck</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-h</option></term>
+ <term><option>--host=HOSTNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server is running.
+ If the value begins with a slash, it is used as the directory for the
+ Unix domain socket.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-i</option></term>
+ <term><option>--index</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified index(es). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-I</option></term>
+ <term><option>--exclude-index</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified index(es). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-j</option></term>
+ <term><option>--jobs=NUM</option></term>
+ <listitem>
+ <para>
+ Use the specified number of concurrent connections to the server, or
+ one per object to be checked, whichever number is smaller.
+ </para>
+ <para>
+ The default is to use a single connection.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--maintenance-db=DBNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to connect to when querying the
+ list of all databases. If not specified, the
+ <literal>postgres</literal> database will be used; if that does not
+ exist <literal>template1</literal> will be used. This can be a
+ <link linkend="libpq-connstring">connection string</link>. If so,
+ connection string parameters will override any conflicting command
+ line options.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-index-expansion</option></term>
+ <listitem>
+ <para>
+ When including a table relation in the list of relations to check, do
+ not automatically include btree indexes associated with table.
+ </para>
+ <para>
+ By default, all tables to be checked will also have checks performed on
+ their associated btree indexes, if any. If this option is given, only
+ those indexes which match a <option>--relation</option> or
+ <option>--index</option> pattern will be checked.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-strict-names</option></term>
+ <listitem>
+ <para>
+ When calculating the list of databases to check, and the objects within
+ those databases to be checked, do not raise an error for database,
+ schema, relation, table, nor index inclusion patterns which match no
+ corresponding objects.
+ </para>
+ <para>
+ Exclusion patterns are not required to match any objects, but by
+ default unmatched inclusion patterns raise an error, including when
+ they fail to match as a result of an exclusion pattern having
+ prohibited them matching an existent object, and when they fail to
+ match a database because it is unconnectable (datallowconn is false).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-toast-expansion</option></term>
+ <listitem>
+ <para>
+ When including a table relation in the list of relations to check, do
+ not automatically include toast tables associated with table.
+ </para>
+ <para>
+ By default, all tables to be checked will also have checks performed on
+ their associated toast tables, if any. If this option is given, only
+ those toast tables which match a <option>--relation</option> or
+ <option>--table</option> pattern will be checked.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--on-error-stop</option></term>
+ <listitem>
+ <para>
+ After reporting all corruptions on the first page of a table where
+ corruptions are found, stop processing that table relation and move on
+ to the next table or index.
+ </para>
+ <para>
+ Note that index checking always stops after the first corrupt page.
+ This option only has meaning relative to table relations.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-P</option></term>
+ <term><option>--parent-check</option></term>
+ <listitem>
+ <para>
+ For each btree index checked, use <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> function, which performs
+ additional checks of parent/child relationships during index checking.
+ </para>
+ <para>
+ The default is to use <application>amcheck</application>'s
+ <function>bt_index_check</function> function, but note that use of the
+ <option>--rootdescend</option> option implicitly selects
+ <function>bt_index_parent_check</function>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-p</option></term>
+ <term><option>--port=PORT</option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file extension on
+ which the server is listening for connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--progress</option></term>
+ <listitem>
+ <para>
+ Show progress information about how many relations have been checked.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-q</option></term>
+ <term><option>--quiet</option></term>
+ <listitem>
+ <para>
+ Do not write additional messages beyond those about corruption.
+ </para>
+ <para>
+ This option does not quiet any output specifically due to the use of
+ the <option>-e</option> <option>--echo</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-r</option></term>
+ <term><option>--relation</option></term>
+ <listitem>
+ <para>
+ Perform checking on the specified relation(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ relation (or relation pattern) for checking.
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--relation=accounts_table</literal></member>
+ <member><literal>--relation=accounting_department.accounts_table</literal></member>
+ <member><literal>--relation=corporate_database.accounting_department.*_table</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-R</option></term>
+ <term><option>--exclude-relation</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified relation(s).
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>,
+ <option>-t</option> <option>--table</option> and <option>-i</option>
+ <option>--index</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--rootdescend</option></term>
+ <listitem>
+ <para>
+ For each index checked, re-find tuples on the leaf level by performing a
+ new search from the root page for each tuple using
+ <xref linkend="amcheck"/>'s <option>rootdescend</option> option.
+ </para>
+ <para>
+ Use of this option implicitly also selects the <option>-P</option>
+ <option>--parent-check</option> option.
+ </para>
+ <para>
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited use or even
+ of no use in helping detect the kinds of corruption that occur in
+ practice. It may also cause corruption checking to take considerably
+ longer and consume considerably more resources on the server.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--schema</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified schema(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for checking. By default, all objects in
+ the matching schema(s) will be checked.
+ </para>
+ <para>
+ Option <option>-S</option> <option>--exclude-schema</option> takes
+ precedence over <option>-s</option> <option>--schema</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--schema=corp</literal></member>
+ <member><literal>--schema="corp|llc|npo"</literal></member>
+ </simplelist>
+ </para>
+ <para>
+ Note that both tables and indexes are included using this option, which
+ might not be what you want if you are also using
+ <option>--no-index-expansion</option>. To specify all tables in a schema
+ without also specifying all indexes, <option>--table</option> can be
+ used with a pattern that specifies the schema. For example, to check
+ all tables in schema <literal>corp</literal>, the option
+ <literal>--table="corp.*"</literal> may be used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-S</option></term>
+ <term><option>--exclude-schema</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified schema.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for exclusion.
+ </para>
+ <para>
+ If a schema which is included using
+ <option>-s</option> <option>--schema</option> is also excluded using
+ <option>-S</option> <option>--exclude-schema</option>, the schema will
+ be excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>-S corp -S llc</literal></member>
+ <member><literal>--exclude-schema="*c*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--skip=OPTION</option></term>
+ <listitem>
+ <para>
+ If <literal>"all-frozen"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all frozen.
+ </para>
+ <para>
+ If <literal>"all-visible"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all visible.
+ </para>
+ <para>
+ By default, no pages are skipped. This can be specified as
+ <literal>"none"</literal>, but since this is the default, it need not be
+ mentioned.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--startblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) pages prior to the given starting block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables, but note
+ that unless <option>--exclude-toast-pointers</option> is given, toast
+ pointers found in the main table will be followed into the toast table
+ without regard for the location in the toast table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t</option></term>
+ <term><option>--table</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified tables(s). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T</option></term>
+ <term><option>--exclude-table</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified tables(s). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-U</option></term>
+ <term><option>--username=USERNAME</option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Increases the log level verbosity. This option may be given more than
+ once.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Print the <application>pg_amcheck</application> version and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires password
+ authentication and a password is not available by other means such as
+ a <filename>.pgpass</filename> file, the connection attempt will fail.
+ This option can be useful in batch jobs and scripts where no user is
+ present to enter a password.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_amcheck</application> to prompt for a password
+ before connecting to a database.
+ </para>
+ <para>
+ This option is never essential, since
+ <application>pg_amcheck</application> will automatically prompt for a
+ password if the server demands password authentication. However,
+ <application>pg_amcheck</application> will waste a connection attempt
+ finding out that the server wants a password. In some cases it is
+ worth typing <option>-W</option> to avoid the extra connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+
+ <para>
+ <application>pg_amcheck</application> is designed to work with
+ <productname>PostgreSQL</productname> 14.0 and later.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Author</title>
+
+ <para>
+ Mark Dilger <email>mark.dilger@enterprisedb.com</email>
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="amcheck"/></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index ea3af48777..49ad558b74 100644
--- a/src/tools/msvc/Install.pm
+++ b/src/tools/msvc/Install.pm
@@ -18,7 +18,7 @@ our (@ISA, @EXPORT_OK);
@EXPORT_OK = qw(Install);
my $insttype;
-my @client_contribs = ('oid2name', 'pgbench', 'vacuumlo');
+my @client_contribs = ('oid2name', 'pg_amcheck', 'pgbench', 'vacuumlo');
my @client_program_files = (
'clusterdb', 'createdb', 'createuser', 'dropdb',
'dropuser', 'ecpg', 'libecpg', 'libecpg_compat',
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 49614106dc..f680544e07 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index caae8cbd5b..531b9e2a00 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -499,6 +499,7 @@ DSA
DWORD
DataDumperPtr
DataPageDeleteStack
+DatabaseInfo
DateADT
Datum
DatumTupleFields
@@ -2083,6 +2084,7 @@ RelToCluster
RelabelType
Relation
RelationData
+RelationInfo
RelationPtr
RelationSyncEntry
RelcacheCallbackFunction
@@ -2847,6 +2849,7 @@ ambuildempty_function
ambuildphasename_function
ambulkdelete_function
amcanreturn_function
+amcheckOptions
amcostestimate_function
amendscan_function
amestimateparallelscan_function
--
2.21.1 (Apple Git-122.3)
v40-0003-Extending-PostgresNode-to-test-corruption.patchapplication/octet-stream; name=v40-0003-Extending-PostgresNode-to-test-corruption.patch; x-unix-mode=0644Download
From 6785e8ed573c4e30f8be5628d971612b4cd84c61 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:37:58 -0800
Subject: [PATCH v40 3/3] Extending PostgresNode to test corruption.
PostgresNode now has functions for overwriting relation files
with full or partial prior versions of those files, creating
corruption beyond merely twiddling the bits of a heap relation
file.
Adding a regression test for pg_amcheck based on this new
functionality.
---
contrib/pg_amcheck/t/006_relfile_damage.pl | 145 ++++++++++
src/test/modules/Makefile | 1 +
src/test/modules/corruption/Makefile | 16 ++
.../modules/corruption/t/001_corruption.pl | 83 ++++++
src/test/perl/PostgresNode.pm | 261 ++++++++++++++++++
5 files changed, 506 insertions(+)
create mode 100644 contrib/pg_amcheck/t/006_relfile_damage.pl
create mode 100644 src/test/modules/corruption/Makefile
create mode 100644 src/test/modules/corruption/t/001_corruption.pl
diff --git a/contrib/pg_amcheck/t/006_relfile_damage.pl b/contrib/pg_amcheck/t/006_relfile_damage.pl
new file mode 100644
index 0000000000..45ad223531
--- /dev/null
+++ b/contrib/pg_amcheck/t/006_relfile_damage.pl
@@ -0,0 +1,145 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 22;
+use PostgresNode;
+
+my ($node, $port);
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create a table with a btree index. Use a fillfactor for the table and index
+# that will allow some fraction of updates to be on the original pages and some
+# on new pages.
+#
+$node->safe_psql('postgres', qq(
+create schema t;
+create table t.t1 (id integer, t text) with (fillfactor=75);
+alter table t.t1 alter column t set storage external;
+insert into t.t1 select gs, repeat('x',gs) from generate_series(9990,10000) gs;
+create index t1_idx on t.t1 (id) with (fillfactor=75);
+));
+
+my $toastrel = relation_toast('postgres', 't.t1');
+
+# Flush relation files to disk and take snapshots of the toast and index
+#
+$node->restart;
+$node->take_relfile_snapshot_minimal('postgres', 'idx', 't.t1_idx');
+$node->take_relfile_snapshot_minimal('postgres', 'toast', $toastrel);
+
+# Insert new data into the table and index
+#
+$node->safe_psql('postgres', qq(
+insert into t.t1 select gs, repeat('y',gs) from generate_series(10001,10100) gs;
+));
+
+# Revert index. The reverted snapshot file is not corrupt, but it also
+# does not match the current contents of the table.
+#
+$node->stop;
+$node->revert_to_snapshot('idx');
+
+# Restart the node and check table and index with varying options.
+#
+$node->start;
+
+# Checks which do not reconcile the index and table via --heapallindexed will
+# not notice any problems
+#
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--parent-check' ],
+ qr/^$/,
+ 'pg_amcheck reverted index with --parent-check');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--rootdescend' ],
+ qr/^$/,
+ 'pg_amcheck reverted index with --rootdescend');
+
+# Checks which do reconcile the index and table via --heapallindexed will
+# notice the mismatch in their contents
+#
+$node->command_checks_all(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed' ],
+ 2,
+ [ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/ ],
+ [ ],
+ 'pg_amcheck reverted index with --heapallindexed');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed', '--rootdescend' ],
+ 2,
+ [ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/ ],
+ [ ],
+ 'pg_amcheck reverted index with --heapallindexed --rootdescend');
+
+# Revert the toast. The reverted toast table is not corrupt, but it does not
+# have entries for all toast pointers in the main table
+#
+$node->stop;
+$node->revert_to_snapshot('toast');
+
+# Restart the node and check table and toast with varying options. When
+# checking the toast pointers, we may get errors produced by verify_heapam, but
+# we may also get errors from failure to read toast blocks that are beyond the
+# end of the toast table, of the form /ERROR: could not read block/. To avoid
+# having a brittle test, we accept any error message.
+#
+$node->start;
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', $toastrel ],
+ 0,
+ [ qr/^$/ ],
+ [ ],
+ 'pg_amcheck reverted toast table');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--exclude-toast-pointers' ],
+ 0,
+ [ qr/^$/ ],
+ [ ],
+ 'pg_amcheck with reverted toast using --exclude-toast-pointers');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ 2,
+ [ qr/.+/ ], # Any non-empty error message is acceptable
+ [ ],
+ 'pg_amcheck with reverted toast and default checking');
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 5391f461a2..c92d1702b4 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -7,6 +7,7 @@ include $(top_builddir)/src/Makefile.global
SUBDIRS = \
brin \
commit_ts \
+ corruption \
delay_execution \
dummy_index_am \
dummy_seclabel \
diff --git a/src/test/modules/corruption/Makefile b/src/test/modules/corruption/Makefile
new file mode 100644
index 0000000000..ba461c645d
--- /dev/null
+++ b/src/test/modules/corruption/Makefile
@@ -0,0 +1,16 @@
+# src/test/modules/corruption/Makefile
+
+# EXTRA_INSTALL = contrib/pg_amcheck
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/corruption
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/corruption/t/001_corruption.pl b/src/test/modules/corruption/t/001_corruption.pl
new file mode 100644
index 0000000000..ae4a262e06
--- /dev/null
+++ b/src/test/modules/corruption/t/001_corruption.pl
@@ -0,0 +1,83 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 10;
+use PostgresNode;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create something non-trivial for the first snapshot
+$node->safe_psql('postgres', qq(
+create table t1 (id integer, short_text text, long_text text);
+insert into t1 (id, short_text, long_text)
+ (select gs, 'foo', repeat('x', gs)
+ from generate_series(1,10000) gs);
+create unique index idx1 on t1 (id, short_text);
+vacuum freeze;
+));
+
+# Flush relation files to disk and take snapshot of them
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap1', 'public.t1');
+
+# Update data in the table, toast table, and index
+$node->safe_psql('postgres', qq(
+update t1 set
+ short_text = 'bar',
+ long_text = repeat('y', id);
+));
+
+# Flush relation files to disk and take second snapshot
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap2', 'public.t1');
+
+# Revert the first page of t1 using a torn snapshot. This should be a partial
+# and corrupt reverting of the update.
+$node->stop;
+$node->revert_to_torn_relfile_snapshot('snap1', 8192);
+
+# Restart the node and count the number of rows in t1 with the original
+# (pre-update) values. It should not be zero, but nor will it be the full
+# 10000.
+$node->start;
+my ($old, $new, $oldtoast, $newtoast) = counts();
+ok($old > 0 && $old < 10000, "Torn snapshot reverts some of the main updates");
+ok($new > 0 && $new <= 10000, "Torn snapshot retains some of the main updates");
+
+# Revert t1 fully to the first snapshot. This should fully restore the
+# original (pre-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap1');
+
+# Restart the node and verify only old values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 10000, "Full snapshot restores all the old main values");
+is($oldtoast, 10000, "Full snapshot restores all the old toast values");
+is($new, 0, "Full snapshot reverts all the new main values");
+is($newtoast, 0, "Full snapshot reverts all the new toast values");
+
+# Restore t1 fully to the second snapshot. This should fully restore the
+# new (post-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap2');
+
+# Restart the node and verify only new values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 0, "Full snapshot reverts all the old main values");
+is($oldtoast, 0, "Full snapshot reverts all the old toast values");
+is($new, 10000, "Full snapshot restores all the new main values");
+is($newtoast, 10000, "Full snapshot restores all the new toast values");
+
+sub counts {
+ return map {
+ $node->safe_psql('postgres', qq(select count(*) from t1 where $_))
+ } ("short_text = 'foo'",
+ "short_text = 'bar'",
+ "long_text ~ 'x'",
+ "long_text ~ 'y'");
+}
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..d470af93c5 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2225,6 +2225,267 @@ sub pg_recvlogical_upto
=back
+=head1 DATABASE CORRUPTION METHODS
+
+=over
+
+=item $node->relfile_snapshot_repository()
+
+The path to the parent directory of all directories storing snapshots of
+relation backing files.
+
+=cut
+
+sub relfile_snapshot_repository
+{
+ my ($self) = @_;
+ my $snaprepo = join('/', $self->basedir, 'snapshot');
+ unless (-d $snaprepo)
+ {
+ mkdir $snaprepo
+ or $!{EEXIST}
+ or BAIL_OUT("could not create snapshot repository directory \"$snaprepo\": $!");
+ }
+ return $snaprepo;
+}
+
+=pod
+
+=item $node->relfile_snapshot_directory(snapname)
+
+The path to the directory for storing the named snapshot.
+
+=cut
+
+sub relfile_snapshot_directory
+{
+ my ($self, $snapname) = @_;
+
+ join("/", $self->relfile_snapshot_repository(), $snapname);
+}
+
+=pod
+
+=item $node->take_relfile_snapshot($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relname>, the associated
+toast relations (if any), and all associated indexes (if any). No attempt is
+made to flush these files to disk, meaning the snapshot taken could be stale
+unless the caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relations.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+=pod
+
+=item $node->take_relfile_snapshot_minimal($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relnames>. No attempt is made
+to flush these files to disk, meaning the snapshot taken could be stale unless the
+caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relation.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+sub take_relfile_snapshot
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 1, @relnames);
+}
+
+sub take_relfile_snapshot_minimal
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 0, @relnames);
+}
+
+sub take_relfile_snapshot_helper
+{
+ my ($self, $dbname, $snapname, $extended, @relnames) = @_;
+
+ croak "dbname must be specified" unless defined $dbname;
+ croak "relnames must be defined" unless scalar(grep { defined $_ } @relnames);
+ croak "snapname must be specified" unless defined $snapname;
+ croak "snapname must be unique" if exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snapdir = $self->relfile_snapshot_directory($snapname);
+ croak "snapname directory name already in use: $snapdir" if (-e $snapdir);
+ mkdir $snapdir
+ or BAIL_OUT("could not create snapshot directory \"$snapdir\": $!");
+
+ my @relpaths = map {
+ $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$_')));
+ } @relnames;
+
+ my (@toastpaths, @idxpaths);
+ if ($extended)
+ {
+ for my $relname (@relnames)
+ {
+ push (@toastpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(c.reltoastrelid)
+ FROM pg_catalog.pg_class c
+ WHERE c.oid = '$relname'::regclass
+ AND c.reltoastrelid != 0::oid))));
+ push (@idxpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(i.indexrelid)
+ FROM pg_catalog.pg_index i
+ WHERE i.indrelid = '$relname'::regclass))));
+ }
+ }
+
+ $self->{snapshot}->{$snapname} = {};
+ for my $path (@relpaths, grep { defined($_) } @toastpaths, @idxpaths)
+ {
+ croak "file backing relation is missing: $pgdata/$path" unless -f "$pgdata/$path";
+ copy_file($snapdir, $pgdata, 0, $path);
+ $self->{snapshot}->{$snapname}->{$path} = 1;
+ }
+}
+
+=pod
+
+=item $node->revert_to_snapshot($self, $snapname)
+
+Overwrites the database's relation files with files previously saved in
+B<$snapname>.
+
+Dies if the given B<$snapname> does not exist.
+
+=cut
+
+=pod
+
+=item $node->revert_to_torn_relfile_snapshot($self, $snapname, $bytes)
+
+Partially overwrites the database's relation files using prefixes of the given
+number of bytes from the files saved in B<$snapname>. If B<$bytes> is
+negative, uses suffixes of the given byte length rather than prefixes.
+
+If B<$bytes> is null, replaces the database's relation files using the saved
+files in the B<$snapname>, which unlike for non-undef values, means the file
+may become shorter if the saved file is shorter than the current file.
+
+=cut
+
+sub revert_to_snapshot
+{
+ my ($self, $snapname) = @_;
+ $self->revert_to_torn_relfile_snapshot($snapname, undef);
+}
+
+sub revert_to_torn_relfile_snapshot
+{
+ my ($self, $snapname, $bytes) = @_;
+
+ croak "no such snapshot" unless exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snaprepo = join('/', $self->relfile_snapshot_repository, $snapname);
+ croak "snapname directory missing: $snaprepo" unless (-d $snaprepo);
+
+ if (defined $bytes)
+ {
+ tear_file($pgdata, $snaprepo, $bytes, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+ else
+ {
+ copy_file($pgdata, $snaprepo, 1, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+}
+
+sub copy_file
+{
+ my ($dstdir, $srcdir, $overwrite, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ foreach my $part (split(m{/}, $path))
+ {
+ my $srcpart = "$srcdir/$part";
+ my $dstpart = "$dstdir/$part";
+
+ if (-d $srcpart)
+ {
+ $srcdir = $srcpart;
+ $dstdir = $dstpart;
+ die "$dstdir is in the way" if (-e $dstdir && ! -d $dstdir);
+ unless (-d $dstdir)
+ {
+ mkdir $dstdir
+ or BAIL_OUT("could not create directory \"$dstdir\": $!");
+ }
+ }
+ elsif (-f $srcpart)
+ {
+ die "$dstdir/$part is in the way" if (!$overwrite && -e "$dstdir/$part");
+
+ File::Copy::copy($srcpart, "$dstdir/$part");
+ }
+ }
+}
+
+sub tear_file
+{
+ my ($dstdir, $srcdir, $bytes, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ my $srcfile = "$srcdir/$path";
+ my $dstfile = "$dstdir/$path";
+
+ croak "No such file: $srcfile" unless -f $srcfile;
+ croak "No such file: $dstfile" unless -f $dstfile;
+
+ my ($srcfh, $dstfh);
+ open($srcfh, '<', $srcfile) or die "Cannot read $srcfile: $!";
+ open($dstfh, '+<', $dstfile) or die "Cannot modify $dstfile: $!";
+ binmode($srcfh);
+ binmode($dstfh);
+
+ my $buffer;
+ if ($bytes < 0)
+ {
+ $bytes *= -1; # Easier to use positive value
+ my $srcsize = (stat($srcfh))[7];
+ my $offset = $srcsize - $bytes;
+ seek($srcfh, $offset, 0);
+ seek($dstfh, $offset, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+ else
+ {
+ seek($srcfh, 0, 0);
+ seek($dstfh, 0, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+
+ close($srcfh);
+ close($dstfh);
+}
+
+=pod
+
+=back
+
=cut
1;
--
2.21.1 (Apple Git-122.3)
On Wed, Feb 24, 2021 at 1:55 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
[ new patches ]
Regarding 0001:
There seem to be whitespace-only changes to the comment for select_loop().
I wonder if the ParallelSlotsSetupOneDB()/ParallelSlotsSetupMinimal()
changes could be simpler. First idea: Suppose you had
ParallelSlotsSetup(numslots) that just creates the slot array with 0
connections, and then ParallelSlotsAdopt(slots, conn, cparams) if you
want to make it own an existing connection. That seems like it might
be cleaner. Second idea: Why not get rid of ParallelSlotsSetupOneDB()
altogether, and just let ParallelSlotsGetIdle() connect the other
slots as required? Preconnecting all slots before we do anything is
good because ... of what?
I also wonder if things might be simplified by introducing a wrapper
object, e.g. ParallelSlotArray. Suppose a ParallelSlotArray stores the
number of slots (num_slots), the array of actual PGconn objects, and
the ConnParams to be used for new connections, and the initcmd to be
used for new connections. Maybe also the progname. This seems like it
would avoid a bunch of repeated parameter passing: you could just
create the ParallelSlotArray with the right contents, and then pass it
around everywhere, instead of having to keep passing the same stuff
in. If you want to switch to connecting to a different DB, you tweak
the ConnParams - maybe using an accessor function - and the system
figures the rest out.
I wonder if it's really useful to generalize this to a point of caring
about all the ConnParams fields, too. Like, if you only provide
ParallelSlotUpdateDB(slotarray, dbname), then that's the only field
that can change so you don't need to care about the others. And maybe
you also don't really need to keep the ConnParams fields in every
slot, either. Like, couldn't you just say something like: if
(strcmp(PQdb(conn) , slotarray->cparams->dbname) != 0) { wrong DB,
can't reuse without a reconnect }? I know sometimes a dbname is really
a whole connection string, but perhaps we could try to fix that by
using PQconninfoParse() in the right place, so that what ends up in
the cparams is just the db name, not a whole connection string.
This is just based on a relatively short amount of time spent studying
the patch, so I might well be off-base here. What do you think?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mar 1, 2021, at 1:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 24, 2021 at 1:55 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:[ new patches ]
Regarding 0001:
There seem to be whitespace-only changes to the comment for select_loop().
I wonder if the ParallelSlotsSetupOneDB()/ParallelSlotsSetupMinimal()
changes could be simpler. First idea: Suppose you had
ParallelSlotsSetup(numslots) that just creates the slot array with 0
connections, and then ParallelSlotsAdopt(slots, conn, cparams) if you
want to make it own an existing connection. That seems like it might
be cleaner. Second idea: Why not get rid of ParallelSlotsSetupOneDB()
altogether, and just let ParallelSlotsGetIdle() connect the other
slots as required? Preconnecting all slots before we do anything is
good because ... of what?
Mostly because, if --jobs is set too high, you get an error before launching any work. I don't know that it's really a big deal if vacuumdb or reindexdb have a bunch of tasks kicked off prior to exit(1) due to not being able to open connections for all the slots, but it is a behavioral change.
I also wonder if things might be simplified by introducing a wrapper
object, e.g. ParallelSlotArray. Suppose a ParallelSlotArray stores the
number of slots (num_slots), the array of actual PGconn objects, and
the ConnParams to be used for new connections, and the initcmd to be
used for new connections. Maybe also the progname. This seems like it
would avoid a bunch of repeated parameter passing: you could just
create the ParallelSlotArray with the right contents, and then pass it
around everywhere, instead of having to keep passing the same stuff
in. If you want to switch to connecting to a different DB, you tweak
the ConnParams - maybe using an accessor function - and the system
figures the rest out.
The existing version of parallel slots (before any of my changes) could already have been written that way, but the author chose not to. I thought about making the sort of change you suggest, and decided against, mostly on the principle of stare decisis. But the idea is very appealing, and since you're on board, I think I'll go make that change.
I wonder if it's really useful to generalize this to a point of caring
about all the ConnParams fields, too. Like, if you only provide
ParallelSlotUpdateDB(slotarray, dbname), then that's the only field
that can change so you don't need to care about the others. And maybe
you also don't really need to keep the ConnParams fields in every
slot, either. Like, couldn't you just say something like: if
(strcmp(PQdb(conn) , slotarray->cparams->dbname) != 0) { wrong DB,
can't reuse without a reconnect }? I know sometimes a dbname is really
a whole connection string, but perhaps we could try to fix that by
using PQconninfoParse() in the right place, so that what ends up in
the cparams is just the db name, not a whole connection string.
I went a little out of my way to avoid that, as I didn't want the next application that uses parallel slots to have to refactor it again, if for example they want to process in parallel databases listening on different ports, or to process commands issued under different roles.
This is just based on a relatively short amount of time spent studying
the patch, so I might well be off-base here. What do you think?
I like the ParallelSlotArray idea, and will go do that now. I'm happy to defer to your judgement on the other stuff, too, but will wait to hear back from you.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mar 1, 2021, at 1:57 PM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
On Mar 1, 2021, at 1:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 24, 2021 at 1:55 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:[ new patches ]
Regarding 0001:
There seem to be whitespace-only changes to the comment for select_loop().
I believe this is fixed in the attached patch series.
I wonder if the ParallelSlotsSetupOneDB()/ParallelSlotsSetupMinimal()
changes could be simpler. First idea: Suppose you had
ParallelSlotsSetup(numslots) that just creates the slot array with 0
connections, and then ParallelSlotsAdopt(slots, conn, cparams) if you
want to make it own an existing connection. That seems like it might
be cleaner.
I used this idea. The functions are ParallelSlotsSetup(numslots, cparams) and ParallelSlotsAdoptConn(sa, conn)
Second idea: Why not get rid of ParallelSlotsSetupOneDB()
altogether, and just let ParallelSlotsGetIdle() connect the other
slots as required?
I did this also.
Preconnecting all slots before we do anything is
good because ... of what?Mostly because, if --jobs is set too high, you get an error before launching any work. I don't know that it's really a big deal if vacuumdb or reindexdb have a bunch of tasks kicked off prior to exit(1) due to not being able to open connections for all the slots, but it is a behavioral change.
On further reflection, I decided to implement these changes and not worry about the behavioral change.
I also wonder if things might be simplified by introducing a wrapper
object, e.g. ParallelSlotArray. Suppose a ParallelSlotArray stores the
number of slots (num_slots), the array of actual PGconn objects, and
the ConnParams to be used for new connections
I did this.
, and the initcmd to be
used for new connections.
I skipped this part. The initcmd argument is only handed to ParallelSlotsGetIdle(). Doing as you suggest would not really be simpler, it would just move that argument to ParallelSlotsSetup(). But I don't feel strongly about it, so I can move this, too, if you like.
Maybe also the progname.
I didn't do this either, and for the same reason. It's just a parameter to ParallelSlotsGetIdle(), so nothing is really gained by moving it to ParallelSlotsSetup().
This seems like it
would avoid a bunch of repeated parameter passing: you could just
create the ParallelSlotArray with the right contents, and then pass it
around everywhere, instead of having to keep passing the same stuff
in. If you want to switch to connecting to a different DB, you tweak
the ConnParams - maybe using an accessor function - and the system
figures the rest out.
Rather than the slots user tweak the slot's ConnParams, ParallelSlotsGetIdle() takes a dbname argument, and uses it as ConnParams->override_dbname.
The existing version of parallel slots (before any of my changes) could already have been written that way, but the author chose not to. I thought about making the sort of change you suggest, and decided against, mostly on the principle of stare decisis. But the idea is very appealing, and since you're on board, I think I'll go make that change.
I wonder if it's really useful to generalize this to a point of caring
about all the ConnParams fields, too. Like, if you only provide
ParallelSlotUpdateDB(slotarray, dbname), then that's the only field
that can change so you don't need to care about the others. And maybe
you also don't really need to keep the ConnParams fields in every
slot, either. Like, couldn't you just say something like: if
(strcmp(PQdb(conn) , slotarray->cparams->dbname) != 0) { wrong DB,
can't reuse without a reconnect }? I know sometimes a dbname is really
a whole connection string, but perhaps we could try to fix that by
using PQconninfoParse() in the right place, so that what ends up in
the cparams is just the db name, not a whole connection string.I went a little out of my way to avoid that, as I didn't want the next application that uses parallel slots to have to refactor it again, if for example they want to process in parallel databases listening on different ports, or to process commands issued under different roles.
This next version has a single ConnParams for the slots array and only contemplates the dbname changing from one connection to another.
This is just based on a relatively short amount of time spent studying
the patch, so I might well be off-base here. What do you think?I like the ParallelSlotArray idea, and will go do that now. I'm happy to defer to your judgement on the other stuff, too, but will wait to hear back from you.
Rather than waiting to hear back from you, I decided to implement these ideas as separate commits in my development environment, so I can roll some of them back if you don't like them. The full patch set is attached:
Attachments:
v41-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patchapplication/octet-stream; name=v41-0001-Reworking-ParallelSlots-for-mutliple-DB-use.patch; x-unix-mode=0644Download
From c5e25f8380f52bfcf23d7d93b5ca41c861bdb68e Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Mar 2021 08:28:03 -0800
Subject: [PATCH v41 1/3] Reworking ParallelSlots for mutliple DB use
The existing implementation of ParallelSlots is used by reindexdb
and vacuumdb to process tables in parallel in only one database at
a time. The ParallelSlots interface reflects this usage pattern.
The function to set up the slots assumes all slots should be
connected to the same database, and the function for getting the
next idle slot pays no attention to which database the slot may be
connected to.
In anticipation of pg_amcheck using parallel slots to process
multiple databases in parallel, reworking the interface while
trying to remain reasonably simple for reindexdb and vacuumdb to
use:
ParallelSlotsSetup() no longer creates or receives database
connections. It takes a ConnParams argument that it stores for use
in subsequent operations when a connection needs to be formed.
For callers who already have a connection and want to reuse it can
give it to the parallel slots using a new function,
ParallelSlotsAdoptConn(). Both reindexdb and vacuumdb use this.
ParallelSlotsGetIdle() is extended to take arguments about the
database connection desired and to manage a heterogeneous set of
slots potentially containing slots connected to varying databases
and some slots not yet connected. The function will reuse an
existing connection or form a new connection as necessary.
The logic for determining whether a slot's connection is suitable
for reuse is based on the database the slot's connection is
connected to, and whether that matches the database desired. Other
connection parameters (user, host, port, etc.) are assumed not to
change from slot to slot.
---
src/bin/scripts/reindexdb.c | 17 +-
src/bin/scripts/vacuumdb.c | 46 +--
src/fe_utils/parallel_slot.c | 430 +++++++++++++++++++--------
src/include/fe_utils/parallel_slot.h | 25 +-
src/tools/pgindent/typedefs.list | 2 +
5 files changed, 360 insertions(+), 160 deletions(-)
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 9f072ac49a..62cd2f789a 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -36,7 +36,7 @@ static SimpleStringList *get_parallel_object_list(PGconn *conn,
ReindexType type,
SimpleStringList *user_list,
bool echo);
-static void reindex_one_database(const ConnParams *cparams, ReindexType type,
+static void reindex_one_database(ConnParams *cparams, ReindexType type,
SimpleStringList *user_list,
const char *progname,
bool echo, bool verbose, bool concurrently,
@@ -324,7 +324,7 @@ main(int argc, char *argv[])
}
static void
-reindex_one_database(const ConnParams *cparams, ReindexType type,
+reindex_one_database(ConnParams *cparams, ReindexType type,
SimpleStringList *user_list,
const char *progname, bool echo,
bool verbose, bool concurrently, int concurrentCons)
@@ -334,7 +334,7 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
bool parallel = concurrentCons > 1;
SimpleStringList *process_list = user_list;
ReindexType process_type = type;
- ParallelSlot *slots;
+ ParallelSlotArray *sa;
bool failed = false;
int items_count = 0;
@@ -445,7 +445,8 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
Assert(process_list != NULL);
- slots = ParallelSlotsSetup(cparams, progname, echo, conn, concurrentCons);
+ sa = ParallelSlotsSetup(concurrentCons, cparams);
+ ParallelSlotsAdoptConn(sa, conn);
cell = process_list->head;
do
@@ -459,7 +460,7 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
goto finish;
}
- free_slot = ParallelSlotsGetIdle(slots, concurrentCons);
+ free_slot = ParallelSlotsGetIdle(sa, NULL, progname, echo, NULL);
if (!free_slot)
{
failed = true;
@@ -473,7 +474,7 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
cell = cell->next;
} while (cell != NULL);
- if (!ParallelSlotsWaitCompletion(slots, concurrentCons))
+ if (!ParallelSlotsWaitCompletion(sa))
failed = true;
finish:
@@ -483,8 +484,8 @@ finish:
pg_free(process_list);
}
- ParallelSlotsTerminate(slots, concurrentCons);
- pfree(slots);
+ ParallelSlotsTerminate(sa);
+ pfree(sa);
if (failed)
exit(1);
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 602fd45c42..4f947eb62d 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -45,7 +45,7 @@ typedef struct vacuumingOptions
} vacuumingOptions;
-static void vacuum_one_database(const ConnParams *cparams,
+static void vacuum_one_database(ConnParams *cparams,
vacuumingOptions *vacopts,
int stage,
SimpleStringList *tables,
@@ -408,7 +408,7 @@ main(int argc, char *argv[])
* a list of tables from the database.
*/
static void
-vacuum_one_database(const ConnParams *cparams,
+vacuum_one_database(ConnParams *cparams,
vacuumingOptions *vacopts,
int stage,
SimpleStringList *tables,
@@ -421,13 +421,14 @@ vacuum_one_database(const ConnParams *cparams,
PGresult *res;
PGconn *conn;
SimpleStringListCell *cell;
- ParallelSlot *slots;
+ ParallelSlotArray *sa;
SimpleStringList dbtables = {NULL, NULL};
int i;
int ntups;
bool failed = false;
bool tables_listed = false;
bool has_where = false;
+ const char *initcmd;
const char *stage_commands[] = {
"SET default_statistics_target=1; SET vacuum_cost_delay=0;",
"SET default_statistics_target=10; RESET vacuum_cost_delay;",
@@ -684,26 +685,25 @@ vacuum_one_database(const ConnParams *cparams,
concurrentCons = 1;
/*
- * Setup the database connections. We reuse the connection we already have
- * for the first slot. If not in parallel mode, the first slot in the
- * array contains the connection.
+ * All slots need to be prepared to run the appropriate analyze stage, if
+ * caller requested that mode. We have to prepare the initial connection
+ * ourselves before setting up the slots.
*/
- slots = ParallelSlotsSetup(cparams, progname, echo, conn, concurrentCons);
+ if (stage == ANALYZE_NO_STAGE)
+ initcmd = NULL;
+ else
+ {
+ initcmd = stage_commands[stage];
+ executeCommand(conn, initcmd, echo);
+ }
/*
- * Prepare all the connections to run the appropriate analyze stage, if
- * caller requested that mode.
+ * Setup the database connections. We reuse the connection we already have
+ * for the first slot. If not in parallel mode, the first slot in the
+ * array contains the connection.
*/
- if (stage != ANALYZE_NO_STAGE)
- {
- int j;
-
- /* We already emitted the message above */
-
- for (j = 0; j < concurrentCons; j++)
- executeCommand((slots + j)->connection,
- stage_commands[stage], echo);
- }
+ sa = ParallelSlotsSetup(concurrentCons, cparams);
+ ParallelSlotsAdoptConn(sa, conn);
initPQExpBuffer(&sql);
@@ -719,7 +719,7 @@ vacuum_one_database(const ConnParams *cparams,
goto finish;
}
- free_slot = ParallelSlotsGetIdle(slots, concurrentCons);
+ free_slot = ParallelSlotsGetIdle(sa, NULL, progname, echo, initcmd);
if (!free_slot)
{
failed = true;
@@ -740,12 +740,12 @@ vacuum_one_database(const ConnParams *cparams,
cell = cell->next;
} while (cell != NULL);
- if (!ParallelSlotsWaitCompletion(slots, concurrentCons))
+ if (!ParallelSlotsWaitCompletion(sa))
failed = true;
finish:
- ParallelSlotsTerminate(slots, concurrentCons);
- pg_free(slots);
+ ParallelSlotsTerminate(sa);
+ pg_free(sa);
termPQExpBuffer(&sql);
diff --git a/src/fe_utils/parallel_slot.c b/src/fe_utils/parallel_slot.c
index b625deb254..42ae2d71ed 100644
--- a/src/fe_utils/parallel_slot.c
+++ b/src/fe_utils/parallel_slot.c
@@ -25,22 +25,13 @@
#include "common/logging.h"
#include "fe_utils/cancel.h"
#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
#define ERRCODE_UNDEFINED_TABLE "42P01"
-static void init_slot(ParallelSlot *slot, PGconn *conn);
static int select_loop(int maxFd, fd_set *workerset);
static bool processQueryResult(ParallelSlot *slot, PGresult *result);
-static void
-init_slot(ParallelSlot *slot, PGconn *conn)
-{
- slot->connection = conn;
- /* Initially assume connection is idle */
- slot->isFree = true;
- ParallelSlotClearHandler(slot);
-}
-
/*
* Process (and delete) a query result. Returns true if there's no problem,
* false otherwise. It's up to the handler to decide what cosntitutes a
@@ -50,6 +41,7 @@ static bool
processQueryResult(ParallelSlot *slot, PGresult *result)
{
Assert(slot->handler != NULL);
+ Assert(slot->connection != NULL);
/* On failure, the handler should return NULL after freeing the result */
if (!slot->handler(result, slot->connection, slot->handler_context))
@@ -71,6 +63,9 @@ consumeQueryResult(ParallelSlot *slot)
bool ok = true;
PGresult *result;
+ Assert(slot != NULL);
+ Assert(slot->connection != NULL);
+
SetCancelConn(slot->connection);
while ((result = PQgetResult(slot->connection)) != NULL)
{
@@ -137,151 +132,337 @@ select_loop(int maxFd, fd_set *workerset)
}
/*
- * ParallelSlotsGetIdle
- * Return a connection slot that is ready to execute a command.
- *
- * This returns the first slot we find that is marked isFree, if one is;
- * otherwise, we loop on select() until one socket becomes available. When
- * this happens, we read the whole set and mark as free all sockets that
- * become available. If an error occurs, NULL is returned.
+ * Return the offset of a suitable idle slot, or -1 if none are available. If
+ * the given connection parameters are not null, only idle slots connected
+ * using equivalent parameters are considered suitable, otherwise all idle
+ * connected slots are considered suitable.
*/
-ParallelSlot *
-ParallelSlotsGetIdle(ParallelSlot *slots, int numslots)
+static int
+find_matching_idle_slot(const ParallelSlotArray *sa, const char *dbname)
{
int i;
- int firstFree = -1;
- /*
- * Look for any connection currently free. If there is one, mark it as
- * taken and let the caller know the slot to use.
- */
- for (i = 0; i < numslots; i++)
+ Assert(sa != NULL);
+
+ for (i = 0; i < sa->numslots; i++)
{
- if (slots[i].isFree)
- {
- slots[i].isFree = false;
- return slots + i;
- }
+ if (sa->slots[i].inUse)
+ continue;
+
+ if (sa->slots[i].connection == NULL)
+ continue;
+
+ if (dbname == NULL ||
+ strcmp(PQdb(sa->slots[i].connection), dbname) == 0)
+ return i;
+ }
+ return -1;
+}
+
+/*
+ * Return the offset of the first slot without a database connection, or -1 if
+ * all slots are connected.
+ */
+static int
+find_unconnected_slot(const ParallelSlotArray *sa)
+{
+ int i;
+
+ Assert(sa != NULL);
+
+ for (i = 0; i < sa->numslots; i++)
+ {
+ if (sa->slots[i].inUse)
+ continue;
+
+ if (sa->slots[i].connection == NULL)
+ return i;
+ }
+
+ return -1;
+}
+
+/*
+ * Return the offset of the first idle slot, or -1 if all slots are busy.
+ */
+static int
+find_any_idle_slot(const ParallelSlotArray *sa)
+{
+ int i;
+
+ Assert(sa != NULL);
+
+ for (i = 0; i < sa->numslots; i++)
+ if (!sa->slots[i].inUse)
+ return i;
+
+ return -1;
+}
+
+/*
+ * Wait for any slot's connection to have query results, consume the results,
+ * and update the slot's status as appropriate. Returns true on success,
+ * false on cancellation, on error, or if no slots are connected.
+ */
+static bool
+wait_on_slots(ParallelSlotArray *sa, const char *progname)
+{
+ int i;
+ fd_set slotset;
+ int maxFd = 0;
+ PGconn *cancelconn = NULL;
+
+ Assert(sa != NULL);
+ Assert(progname != NULL);
+
+ /* We must reconstruct the fd_set for each call to select_loop */
+ FD_ZERO(&slotset);
+
+ for (i = 0; i < sa->numslots; i++)
+ {
+ int sock;
+
+ /* We shouldn't get here if we still have slots without connections */
+ Assert(sa->slots[i].connection != NULL);
+
+ sock = PQsocket(sa->slots[i].connection);
+
+ /*
+ * We don't really expect any connections to lose their sockets after
+ * startup, but just in case, cope by ignoring them.
+ */
+ if (sock < 0)
+ continue;
+
+ /* Keep track of the first valid connection we see. */
+ if (cancelconn == NULL)
+ cancelconn = sa->slots[i].connection;
+
+ FD_SET(sock, &slotset);
+ if (sock > maxFd)
+ maxFd = sock;
}
/*
- * No free slot found, so wait until one of the connections has finished
- * its task and return the available slot.
+ * If we get this far with no valid connections, processing cannot
+ * continue.
*/
- while (firstFree < 0)
+ if (cancelconn == NULL)
+ return false;
+
+ SetCancelConn(sa->slots->connection);
+ i = select_loop(maxFd, &slotset);
+ ResetCancelConn();
+
+ /* failure? */
+ if (i < 0)
+ return false;
+
+ for (i = 0; i < sa->numslots; i++)
{
- fd_set slotset;
- int maxFd = 0;
+ int sock;
- /* We must reconstruct the fd_set for each call to select_loop */
- FD_ZERO(&slotset);
+ sock = PQsocket(sa->slots[i].connection);
- for (i = 0; i < numslots; i++)
+ if (sock >= 0 && FD_ISSET(sock, &slotset))
{
- int sock = PQsocket(slots[i].connection);
-
- /*
- * We don't really expect any connections to lose their sockets
- * after startup, but just in case, cope by ignoring them.
- */
- if (sock < 0)
- continue;
-
- FD_SET(sock, &slotset);
- if (sock > maxFd)
- maxFd = sock;
+ /* select() says input is available, so consume it */
+ PQconsumeInput(sa->slots[i].connection);
}
- SetCancelConn(slots->connection);
- i = select_loop(maxFd, &slotset);
- ResetCancelConn();
-
- /* failure? */
- if (i < 0)
- return NULL;
-
- for (i = 0; i < numslots; i++)
+ /* Collect result(s) as long as any are available */
+ while (!PQisBusy(sa->slots[i].connection))
{
- int sock = PQsocket(slots[i].connection);
+ PGresult *result = PQgetResult(sa->slots[i].connection);
- if (sock >= 0 && FD_ISSET(sock, &slotset))
+ if (result != NULL)
{
- /* select() says input is available, so consume it */
- PQconsumeInput(slots[i].connection);
+ /* Handle and discard the command result */
+ if (!processQueryResult(&sa->slots[i], result))
+ return false;
}
-
- /* Collect result(s) as long as any are available */
- while (!PQisBusy(slots[i].connection))
+ else
{
- PGresult *result = PQgetResult(slots[i].connection);
-
- if (result != NULL)
- {
- /* Handle and discard the command result */
- if (!processQueryResult(slots + i, result))
- return NULL;
- }
- else
- {
- /* This connection has become idle */
- slots[i].isFree = true;
- ParallelSlotClearHandler(slots + i);
- if (firstFree < 0)
- firstFree = i;
- break;
- }
+ /* This connection has become idle */
+ sa->slots[i].inUse = false;
+ ParallelSlotClearHandler(&sa->slots[i]);
+ break;
}
}
}
+ return true;
+}
- slots[firstFree].isFree = false;
- return slots + firstFree;
+/*
+ * Close a slot's database connection.
+ */
+static void
+disconnect_slot(ParallelSlot *slot)
+{
+ Assert(slot);
+ Assert(slot->connection);
+
+ disconnectDatabase(slot->connection);
+ slot->connection = NULL;
}
/*
- * ParallelSlotsSetup
- * Prepare a set of parallel slots to use on a given database.
+ * Open a new database connection using the given connection parameters,
+ * execute an initial command if supplied, and associate the new connection
+ * with the given slot.
+ */
+static void
+connect_slot(ParallelSlot *slot, ConnParams *cparams, const char *dbname,
+ const char *progname, bool echo, const char *initcmd)
+{
+ const char *old_override;
+ Assert(slot);
+ Assert(slot->connection == NULL);
+
+ old_override = cparams->override_dbname;
+ if (dbname)
+ cparams->override_dbname = dbname;
+ slot->connection = connectDatabase(cparams, progname, echo, false, true);
+ cparams->override_dbname = old_override;
+
+ if (PQsocket(slot->connection) >= FD_SETSIZE)
+ {
+ pg_log_fatal("too many jobs for this platform");
+ exit(1);
+ }
+
+ /* Setup the connection using the supplied command, if any. */
+ if (initcmd)
+ executeCommand(slot->connection, initcmd, echo);
+}
+
+/*
+ * ParallelSlotsGetIdle
+ * Return a connection slot that is ready to execute a command.
*
- * This creates and initializes a set of connections to the database
- * using the information given by the caller, marking all parallel slots
- * as free and ready to use. "conn" is an initial connection set up
- * by the caller and is associated with the first slot in the parallel
- * set.
+ * The slot returned is chosen as follows:
+ *
+ * If any idle slot already has an open connection, and if either cparams is
+ * null or the connection was formed using connection parameter string values
+ * identical to those in cparams, that slot will be returned allowing the
+ * connection to be reused.
+ *
+ * Otherwise, if cparams is not null, and if any idle slot is not yet connected
+ * to a database, the slot will be returned with it's connection opened using
+ * the supplied cparams.
+ *
+ * Otherwise, if cparams is not null, and if any idle slot exists, an idle slot
+ * will be chosen and returned after having it's connection disconnected and
+ * reconnected using the supplied cparams.
+ *
+ * Otherwise, if any slots have connections that are busy, we loop on select()
+ * until one socket becomes available. When this happens, we read the whole
+ * set and mark as free all sockets that become available. We then select a
+ * slot using the same rules as above.
+ *
+ * Otherwise, we cannot return a slot, which is an error, and NULL is returned.
+ *
+ * For any connection created, if "initcmd" is not null, it will be executed as
+ * a command on the newly formed connection before the slot is returned.
+ *
+ * If an error occurs, NULL is returned.
*/
ParallelSlot *
-ParallelSlotsSetup(const ConnParams *cparams,
- const char *progname, bool echo,
- PGconn *conn, int numslots)
+ParallelSlotsGetIdle(ParallelSlotArray *sa, const char *dbname,
+ const char *progname, bool echo, const char *initcmd)
{
- ParallelSlot *slots;
- int i;
+ int offset;
- Assert(conn != NULL);
+ Assert(sa);
+ Assert(sa->numslots > 0);
+ Assert(progname);
- slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) * numslots);
- init_slot(slots, conn);
- if (numslots > 1)
+ while (1)
{
- for (i = 1; i < numslots; i++)
+ /* First choice: a slot already connected to the desired database. */
+ offset = find_matching_idle_slot(sa, dbname);
+ if (offset >= 0)
{
- conn = connectDatabase(cparams, progname, echo, false, true);
-
- /*
- * Fail and exit immediately if trying to use a socket in an
- * unsupported range. POSIX requires open(2) to use the lowest
- * unused file descriptor and the hint given relies on that.
- */
- if (PQsocket(conn) >= FD_SETSIZE)
- {
- pg_log_fatal("too many jobs for this platform -- try %d", i);
- exit(1);
- }
+ sa->slots[offset].inUse = true;
+ return &sa->slots[offset];
+ }
- init_slot(slots + i, conn);
+ /* Second choice: a slot not connected to any database. */
+ offset = find_unconnected_slot(sa);
+ if (offset >= 0)
+ {
+ connect_slot(&sa->slots[offset], sa->cparams, dbname, progname, echo, initcmd);
+ sa->slots[offset].inUse = true;
+ return &sa->slots[offset];
}
+
+ /* Third choice: a slot connected to the wrong database. */
+ offset = find_any_idle_slot(sa);
+ if (offset >= 0)
+ {
+ disconnect_slot(&sa->slots[offset]);
+ connect_slot(&sa->slots[offset], sa->cparams, dbname, progname, echo, initcmd);
+ sa->slots[offset].inUse = true;
+ return &sa->slots[offset];
+ }
+
+ /*
+ * Fourth choice: block until one or more slots become available. If
+ * any slot's hit a fatal error, we'll find out about that here and
+ * return NULL.
+ */
+ if (!wait_on_slots(sa, progname))
+ return NULL;
}
+}
- return slots;
+/*
+ * ParallelSlotsSetup
+ * Prepare a set of parallel slots but do not connect to any database.
+ *
+ * This creates and initializes a set of slots, marking all parallel slots
+ * as free and ready to use. Establishing connections is delayed until
+ * requesting a free slot, but in the event that an existing connection is
+ * provided in "conn", that connection will be associated with the first
+ * slot and saved for reuse. In this case, "cparams" must contain the
+ * parameters that were used for opening "conn".
+ */
+ParallelSlotArray *
+ParallelSlotsSetup(int numslots, ConnParams *cparams)
+{
+ ParallelSlotArray *sa;
+
+ Assert(numslots > 0);
+ Assert(cparams != NULL);
+
+ sa = (ParallelSlotArray *) palloc0(sizeof(ParallelSlotArray) +
+ numslots * sizeof(ParallelSlot));
+ sa->numslots = numslots;
+ sa->cparams = cparams;
+
+ return sa;
+}
+
+/*
+ * ParallelSlotsAdoptConn
+ * Assign an open connection to the slots array for reuse.
+ *
+ * This turns over ownership of an open connection to a slots array. The
+ * caller should not further use or close the connection. All the connection's
+ * parameters (user, host, port, etc.) except possibly dbname should match
+ * those of the slots array's cparams, as given in ParallelSlotsSetup. If
+ * these parameters differ, subsequent behavior is undefined.
+ */
+void
+ParallelSlotsAdoptConn(ParallelSlotArray *sa, PGconn *conn)
+{
+ int offset;
+
+ offset = find_unconnected_slot(sa);
+ if (offset >= 0)
+ sa->slots[offset].connection = conn;
+ else
+ disconnectDatabase(conn);
}
/*
@@ -292,13 +473,13 @@ ParallelSlotsSetup(const ConnParams *cparams,
* terminate all connections.
*/
void
-ParallelSlotsTerminate(ParallelSlot *slots, int numslots)
+ParallelSlotsTerminate(ParallelSlotArray *sa)
{
int i;
- for (i = 0; i < numslots; i++)
+ for (i = 0; i < sa->numslots; i++)
{
- PGconn *conn = slots[i].connection;
+ PGconn *conn = sa->slots[i].connection;
if (conn == NULL)
continue;
@@ -314,13 +495,15 @@ ParallelSlotsTerminate(ParallelSlot *slots, int numslots)
* error has been found on the way.
*/
bool
-ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots)
+ParallelSlotsWaitCompletion(ParallelSlotArray *sa)
{
int i;
- for (i = 0; i < numslots; i++)
+ for (i = 0; i < sa->numslots; i++)
{
- if (!consumeQueryResult(slots + i))
+ if (sa->slots[i].connection == NULL)
+ continue;
+ if (!consumeQueryResult(&sa->slots[i]))
return false;
}
@@ -350,6 +533,9 @@ ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots)
bool
TableCommandResultHandler(PGresult *res, PGconn *conn, void *context)
{
+ Assert(res != NULL);
+ Assert(conn != NULL);
+
/*
* If it's an error, report it. Errors about a missing table are harmless
* so we continue processing; but die for other errors.
diff --git a/src/include/fe_utils/parallel_slot.h b/src/include/fe_utils/parallel_slot.h
index 8902f8d4f4..6d128ea71c 100644
--- a/src/include/fe_utils/parallel_slot.h
+++ b/src/include/fe_utils/parallel_slot.h
@@ -21,7 +21,7 @@ typedef bool (*ParallelSlotResultHandler) (PGresult *res, PGconn *conn,
typedef struct ParallelSlot
{
PGconn *connection; /* One connection */
- bool isFree; /* Is it known to be idle? */
+ bool inUse; /* Is the slot being used? */
/*
* Prior to issuing a command or query on 'connection', a handler callback
@@ -33,6 +33,13 @@ typedef struct ParallelSlot
void *handler_context;
} ParallelSlot;
+typedef struct ParallelSlotArray
+{
+ int numslots;
+ ConnParams *cparams;
+ ParallelSlot slots[FLEXIBLE_ARRAY_MEMBER];
+} ParallelSlotArray;
+
static inline void
ParallelSlotSetHandler(ParallelSlot *slot, ParallelSlotResultHandler handler,
void *context)
@@ -48,15 +55,19 @@ ParallelSlotClearHandler(ParallelSlot *slot)
slot->handler_context = NULL;
}
-extern ParallelSlot *ParallelSlotsGetIdle(ParallelSlot *slots, int numslots);
+extern ParallelSlot *ParallelSlotsGetIdle(ParallelSlotArray *slots,
+ const char *dbname,
+ const char *progname, bool echo,
+ const char *initcmd);
+
+extern ParallelSlotArray *ParallelSlotsSetup(int numslots,
+ ConnParams *cparams);
-extern ParallelSlot *ParallelSlotsSetup(const ConnParams *cparams,
- const char *progname, bool echo,
- PGconn *conn, int numslots);
+extern void ParallelSlotsAdoptConn(ParallelSlotArray *sa, PGconn *conn);
-extern void ParallelSlotsTerminate(ParallelSlot *slots, int numslots);
+extern void ParallelSlotsTerminate(ParallelSlotArray *sa);
-extern bool ParallelSlotsWaitCompletion(ParallelSlot *slots, int numslots);
+extern bool ParallelSlotsWaitCompletion(ParallelSlotArray *sa);
extern bool TableCommandResultHandler(PGresult *res, PGconn *conn,
void *context);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bab4f3adb3..08776f41ca 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -403,6 +403,7 @@ ConfigData
ConfigVariable
ConnCacheEntry
ConnCacheKey
+ConnParams
ConnStatusType
ConnType
ConnectionStateEnum
@@ -1729,6 +1730,7 @@ ParallelHashJoinState
ParallelIndexScanDesc
ParallelReadyList
ParallelSlot
+ParallelSlotArray
ParallelState
ParallelTableScanDesc
ParallelTableScanDescData
--
2.21.1 (Apple Git-122.3)
v41-0002-Adding-contrib-module-pg_amcheck.patchapplication/octet-stream; name=v41-0002-Adding-contrib-module-pg_amcheck.patch; x-unix-mode=0644Download
From 1ae194cb9ec4271a73d18e122be98b27364c484c Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Mar 2021 08:34:40 -0800
Subject: [PATCH v41 2/3] Adding contrib module pg_amcheck
Adding new contrib module pg_amcheck, which is a command line
interface for running amcheck's verifications against tables and
indexes.
---
contrib/Makefile | 1 +
contrib/pg_amcheck/.gitignore | 3 +
contrib/pg_amcheck/Makefile | 29 +
contrib/pg_amcheck/pg_amcheck.c | 1889 ++++++++++++++++++++
contrib/pg_amcheck/t/001_basic.pl | 9 +
contrib/pg_amcheck/t/002_nonesuch.pl | 213 +++
contrib/pg_amcheck/t/003_check.pl | 520 ++++++
contrib/pg_amcheck/t/004_verify_heapam.pl | 487 +++++
contrib/pg_amcheck/t/005_opclass_damage.pl | 54 +
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgamcheck.sgml | 670 +++++++
src/tools/msvc/Install.pm | 2 +-
src/tools/msvc/Mkvcbuild.pm | 6 +-
src/tools/pgindent/typedefs.list | 3 +
15 files changed, 3884 insertions(+), 4 deletions(-)
create mode 100644 contrib/pg_amcheck/.gitignore
create mode 100644 contrib/pg_amcheck/Makefile
create mode 100644 contrib/pg_amcheck/pg_amcheck.c
create mode 100644 contrib/pg_amcheck/t/001_basic.pl
create mode 100644 contrib/pg_amcheck/t/002_nonesuch.pl
create mode 100644 contrib/pg_amcheck/t/003_check.pl
create mode 100644 contrib/pg_amcheck/t/004_verify_heapam.pl
create mode 100644 contrib/pg_amcheck/t/005_opclass_damage.pl
create mode 100644 doc/src/sgml/pgamcheck.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..a72dcf7304 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -30,6 +30,7 @@ SUBDIRS = \
old_snapshot \
pageinspect \
passwordcheck \
+ pg_amcheck \
pg_buffercache \
pg_freespacemap \
pg_prewarm \
diff --git a/contrib/pg_amcheck/.gitignore b/contrib/pg_amcheck/.gitignore
new file mode 100644
index 0000000000..c21a14de31
--- /dev/null
+++ b/contrib/pg_amcheck/.gitignore
@@ -0,0 +1,3 @@
+pg_amcheck
+
+/tmp_check/
diff --git a/contrib/pg_amcheck/Makefile b/contrib/pg_amcheck/Makefile
new file mode 100644
index 0000000000..bc61ee7970
--- /dev/null
+++ b/contrib/pg_amcheck/Makefile
@@ -0,0 +1,29 @@
+# contrib/pg_amcheck/Makefile
+
+PGFILEDESC = "pg_amcheck - detects corruption within database relations"
+PGAPPICON = win32
+
+PROGRAM = pg_amcheck
+OBJS = \
+ $(WIN32RES) \
+ pg_amcheck.o
+
+REGRESS_OPTS += --load-extension=amcheck --load-extension=pageinspect
+EXTRA_INSTALL += contrib/amcheck contrib/pageinspect
+
+TAP_TESTS = 1
+
+PG_CPPFLAGS = -I$(libpq_srcdir)
+PG_LIBS_INTERNAL = -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport)
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+SHLIB_PREREQS = submake-libpq
+subdir = contrib/pg_amcheck
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_amcheck/pg_amcheck.c b/contrib/pg_amcheck/pg_amcheck.c
new file mode 100644
index 0000000000..a9724d33bb
--- /dev/null
+++ b/contrib/pg_amcheck/pg_amcheck.c
@@ -0,0 +1,1889 @@
+/*-------------------------------------------------------------------------
+ *
+ * pg_amcheck.c
+ * Detects corruption within database relations.
+ *
+ * Copyright (c) 2017-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_amcheck/pg_amcheck.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <time.h>
+
+#include "catalog/pg_am_d.h"
+#include "catalog/pg_namespace_d.h"
+#include "common/logging.h"
+#include "common/username.h"
+#include "fe_utils/cancel.h"
+#include "fe_utils/option_utils.h"
+#include "fe_utils/parallel_slot.h"
+#include "fe_utils/query_utils.h"
+#include "fe_utils/simple_list.h"
+#include "fe_utils/string_utils.h"
+#include "getopt_long.h" /* pgrminclude ignore */
+#include "pgtime.h"
+#include "storage/block.h"
+
+/* pg_amcheck command line options controlled by user flags */
+typedef struct amcheckOptions
+{
+ bool alldb;
+ bool allrel;
+ bool excludetbl;
+ bool excludeidx;
+ bool echo;
+ bool quiet;
+ bool verbose;
+ bool no_indexes;
+ bool no_toast;
+ bool reconcile_toast;
+ bool on_error_stop;
+ bool parent_check;
+ bool rootdescend;
+ bool heapallindexed;
+ bool strict_names;
+ bool show_progress;
+ const char *skip;
+ int jobs;
+ long startblock;
+ long endblock;
+ SimplePtrList include; /* list of PatternInfo structs */
+ SimplePtrList exclude; /* list of PatternInfo structs */
+} amcheckOptions;
+
+static amcheckOptions opts = {
+ .alldb = false,
+ .allrel = true,
+ .excludetbl = false,
+ .excludeidx = false,
+ .echo = false,
+ .quiet = false,
+ .verbose = false,
+ .no_indexes = false,
+ .no_toast = false,
+ .reconcile_toast = true,
+ .on_error_stop = false,
+ .parent_check = false,
+ .rootdescend = false,
+ .heapallindexed = false,
+ .strict_names = true,
+ .show_progress = false,
+ .skip = "none",
+ .jobs = 1,
+ .startblock = -1,
+ .endblock = -1,
+ .include = {NULL, NULL},
+ .exclude = {NULL, NULL},
+};
+
+static const char *progname = NULL;
+
+typedef struct PatternInfo
+{
+ int pattern_id; /* Unique ID of this pattern */
+ const char *pattern; /* Unaltered pattern from the command line */
+ char *dbrgx; /* Database regexp parsed from pattern, or
+ * NULL */
+ char *nsprgx; /* Schema regexp parsed from pattern, or NULL */
+ char *relrgx; /* Relation regexp parsed from pattern, or
+ * NULL */
+ bool tblonly; /* true if relrgx should only match tables */
+ bool idxonly; /* true if relrgx should only match indexes */
+ bool matched; /* true if the pattern matched in any database */
+} PatternInfo;
+
+/* Unique pattern id counter */
+static int next_id = 1;
+
+/* Whether all relations have so far passed their corruption checks */
+static bool all_checks_pass = true;
+
+/* Time last progress report was displayed */
+static pg_time_t last_progress_report = 0;
+
+typedef struct DatabaseInfo
+{
+ char *datname;
+ char *amcheck_schema; /* escaped, quoted literal */
+} DatabaseInfo;
+
+typedef struct RelationInfo
+{
+ const DatabaseInfo *datinfo; /* shared by other relinfos */
+ Oid reloid;
+ bool is_table; /* true if heap, false if btree */
+} RelationInfo;
+
+/*
+ * Query for determining if contrib's amcheck is installed. If so, selects the
+ * namespace name where amcheck's functions can be found.
+ */
+static const char *amcheck_sql =
+"SELECT n.nspname, x.extversion"
+"\nFROM pg_catalog.pg_extension x"
+"\nJOIN pg_catalog.pg_namespace n"
+"\nON x.extnamespace OPERATOR(pg_catalog.=) n.oid"
+"\nWHERE x.extname OPERATOR(pg_catalog.=) 'amcheck'";
+
+static void prepare_table_command(PQExpBuffer sql, Oid reloid,
+ const char *nspname);
+static void prepare_btree_command(PQExpBuffer sql, Oid reloid,
+ const char *nspname);
+static void run_command(ParallelSlot *slot, const char *sql,
+ ConnParams *cparams);
+static bool VerifyHeapamSlotHandler(PGresult *res, PGconn *conn,
+ void *context);
+static bool VerifyBtreeSlotHandler(PGresult *res, PGconn *conn, void *context);
+static void help(const char *progname);
+static void progress_report(uint64 relations_total, uint64 relations_checked,
+ const char *datname, bool force, bool finished);
+
+static void appendDatabasePattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendSchemaPattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendRelationPattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendTablePattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void appendIndexPattern(SimplePtrList *list, const char *pattern,
+ int encoding);
+static void compileDatabaseList(PGconn *conn, SimplePtrList *databases);
+static void compileRelationListOneDb(PGconn *conn, SimplePtrList *relations,
+ const DatabaseInfo *datinfo);
+
+int
+main(int argc, char *argv[])
+{
+ PGconn *conn;
+ SimplePtrListCell *cell;
+ SimplePtrList databases = {NULL, NULL};
+ SimplePtrList relations = {NULL, NULL};
+ bool failed = false;
+ const char *latest_datname;
+ int parallel_workers;
+ ParallelSlotArray *sa;
+ PQExpBufferData sql;
+ long long int reltotal;
+ long long int relprogress;
+
+ static struct option long_options[] = {
+ /* Connection options */
+ {"host", required_argument, NULL, 'h'},
+ {"port", required_argument, NULL, 'p'},
+ {"username", required_argument, NULL, 'U'},
+ {"no-password", no_argument, NULL, 'w'},
+ {"password", no_argument, NULL, 'W'},
+ {"maintenance-db", required_argument, NULL, 1},
+
+ /* check options */
+ {"all", no_argument, NULL, 'a'},
+ {"dbname", required_argument, NULL, 'd'},
+ {"exclude-dbname", required_argument, NULL, 'D'},
+ {"echo", no_argument, NULL, 'e'},
+ {"heapallindexed", no_argument, NULL, 'H'},
+ {"index", required_argument, NULL, 'i'},
+ {"exclude-index", required_argument, NULL, 'I'},
+ {"jobs", required_argument, NULL, 'j'},
+ {"parent-check", no_argument, NULL, 'P'},
+ {"quiet", no_argument, NULL, 'q'},
+ {"relation", required_argument, NULL, 'r'},
+ {"exclude-relation", required_argument, NULL, 'R'},
+ {"schema", required_argument, NULL, 's'},
+ {"exclude-schema", required_argument, NULL, 'S'},
+ {"table", required_argument, NULL, 't'},
+ {"exclude-table", required_argument, NULL, 'T'},
+ {"verbose", no_argument, NULL, 'v'},
+ {"no-index-expansion", no_argument, NULL, 2},
+ {"no-toast-expansion", no_argument, NULL, 3},
+ {"exclude-toast-pointers", no_argument, NULL, 4},
+ {"on-error-stop", no_argument, NULL, 5},
+ {"skip", required_argument, NULL, 6},
+ {"startblock", required_argument, NULL, 7},
+ {"endblock", required_argument, NULL, 8},
+ {"rootdescend", no_argument, NULL, 9},
+ {"no-strict-names", no_argument, NULL, 10},
+ {"progress", no_argument, NULL, 11},
+
+ {NULL, 0, NULL, 0}
+ };
+
+ int optindex;
+ int c;
+
+ /*
+ * If a maintenance database is specified, that will be used for the
+ * initial connection. Failing that, the first plain argument (without a
+ * flag) will be used. If neither of those are given, the first database
+ * specified with -d.
+ */
+ const char *primary_db = NULL;
+ const char *secondary_db = NULL;
+ const char *tertiary_db = NULL;
+
+ const char *host = NULL;
+ const char *port = NULL;
+ const char *username = NULL;
+ enum trivalue prompt_password = TRI_DEFAULT;
+ int encoding = pg_get_encoding_from_locale(NULL, false);
+ ConnParams cparams;
+
+ pg_logging_init(argv[0]);
+ progname = get_progname(argv[0]);
+ set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("contrib"));
+
+ handle_help_version_opts(argc, argv, progname, help);
+
+ /* process command-line options */
+ while ((c = getopt_long(argc, argv, "ad:D:eh:Hi:I:j:p:Pqr:R:s:S:t:T:U:wWv",
+ long_options, &optindex)) != -1)
+ {
+ char *endptr;
+
+ switch (c)
+ {
+ case 'a':
+ opts.alldb = true;
+ break;
+ case 'd':
+ if (tertiary_db == NULL)
+ tertiary_db = optarg;
+ appendDatabasePattern(&opts.include, optarg, encoding);
+ break;
+ case 'D':
+ appendDatabasePattern(&opts.exclude, optarg, encoding);
+ break;
+ case 'e':
+ opts.echo = true;
+ break;
+ case 'h':
+ host = pg_strdup(optarg);
+ break;
+ case 'H':
+ opts.heapallindexed = true;
+ break;
+ case 'i':
+ opts.allrel = false;
+ appendIndexPattern(&opts.include, optarg, encoding);
+ break;
+ case 'I':
+ opts.excludeidx = true;
+ appendIndexPattern(&opts.exclude, optarg, encoding);
+ break;
+ case 'j':
+ opts.jobs = atoi(optarg);
+ if (opts.jobs < 1)
+ {
+ pg_log_error("number of parallel jobs must be at least 1");
+ exit(1);
+ }
+ break;
+ case 'p':
+ port = pg_strdup(optarg);
+ break;
+ case 'P':
+ opts.parent_check = true;
+ break;
+ case 'q':
+ opts.quiet = true;
+ break;
+ case 'r':
+ opts.allrel = false;
+ appendRelationPattern(&opts.include, optarg, encoding);
+ break;
+ case 'R':
+ opts.excludeidx = true;
+ opts.excludetbl = true;
+ appendRelationPattern(&opts.exclude, optarg, encoding);
+ break;
+ case 's':
+ opts.allrel = false;
+ appendSchemaPattern(&opts.include, optarg, encoding);
+ break;
+ case 'S':
+ appendSchemaPattern(&opts.exclude, optarg, encoding);
+ break;
+ case 't':
+ opts.allrel = false;
+ appendTablePattern(&opts.include, optarg, encoding);
+ break;
+ case 'T':
+ opts.excludetbl = true;
+ appendTablePattern(&opts.exclude, optarg, encoding);
+ break;
+ case 'U':
+ username = pg_strdup(optarg);
+ break;
+ case 'w':
+ prompt_password = TRI_NO;
+ break;
+ case 'W':
+ prompt_password = TRI_YES;
+ break;
+ case 'v':
+ opts.verbose = true;
+ pg_logging_increase_verbosity();
+ break;
+ case 1:
+ primary_db = pg_strdup(optarg);
+ break;
+ case 2:
+ opts.no_indexes = true;
+ break;
+ case 3:
+ opts.no_toast = true;
+ break;
+ case 4:
+ opts.reconcile_toast = false;
+ break;
+ case 5:
+ opts.on_error_stop = true;
+ break;
+ case 6:
+ if (pg_strcasecmp(optarg, "all-visible") == 0)
+ opts.skip = "all visible";
+ else if (pg_strcasecmp(optarg, "all-frozen") == 0)
+ opts.skip = "all frozen";
+ else
+ {
+ fprintf(stderr, "invalid skip options");
+ exit(1);
+ }
+ break;
+ case 7:
+ opts.startblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ "relation starting block argument contains garbage characters");
+ exit(1);
+ }
+ if (opts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ "relation starting block argument out of bounds");
+ exit(1);
+ }
+ break;
+ case 8:
+ opts.endblock = strtol(optarg, &endptr, 10);
+ if (*endptr != '\0')
+ {
+ fprintf(stderr,
+ "relation ending block argument contains garbage characters");
+ exit(1);
+ }
+ if (opts.startblock > (long) MaxBlockNumber)
+ {
+ fprintf(stderr,
+ "relation ending block argument out of bounds");
+ exit(1);
+ }
+ break;
+ case 9:
+ opts.rootdescend = true;
+ opts.parent_check = true;
+ break;
+ case 10:
+ opts.strict_names = false;
+ break;
+ case 11:
+ opts.show_progress = true;
+ break;
+ default:
+ fprintf(stderr,
+ "Try \"%s --help\" for more information.\n",
+ progname);
+ exit(1);
+ }
+ }
+
+ if (opts.endblock >= 0 && opts.endblock < opts.startblock)
+ {
+ pg_log_error("relation ending block argument precedes starting block argument");
+ exit(1);
+ }
+
+ /* non-option arguments specify database names */
+ while (optind < argc)
+ {
+ if (secondary_db == NULL)
+ secondary_db = argv[optind];
+ appendDatabasePattern(&opts.include, argv[optind], encoding);
+ optind++;
+ }
+
+ /* fill cparams except for dbname, which is set below */
+ cparams.pghost = host;
+ cparams.pgport = port;
+ cparams.pguser = username;
+ cparams.prompt_password = prompt_password;
+ cparams.override_dbname = NULL;
+
+ setup_cancel_handler(NULL);
+
+ /* choose the database for our initial connection */
+ if (primary_db)
+ cparams.dbname = primary_db;
+ else if (secondary_db != NULL)
+ cparams.dbname = secondary_db;
+ else if (tertiary_db != NULL)
+ cparams.dbname = tertiary_db;
+ else
+ {
+ const char *default_db;
+
+ if (getenv("PGDATABASE"))
+ default_db = getenv("PGDATABASE");
+ else if (getenv("PGUSER"))
+ default_db = getenv("PGUSER");
+ else
+ default_db = get_user_name_or_exit(progname);
+
+ /*
+ * Users expect the database name inferred from the environment to get
+ * checked, not just get used for the initial connection.
+ */
+ appendDatabasePattern(&opts.include, default_db, encoding);
+
+ cparams.dbname = default_db;
+ }
+
+ conn = connectMaintenanceDatabase(&cparams, progname, opts.echo);
+ compileDatabaseList(conn, &databases);
+ disconnectDatabase(conn);
+
+ if (databases.head == NULL)
+ {
+ fprintf(stderr, "%s: no databases to check\n", progname);
+ exit(0);
+ }
+
+ /*
+ * Compile a list of all relations spanning all databases to be checked.
+ */
+ for (cell = databases.head; cell; cell = cell->next)
+ {
+ PGresult *result;
+ int ntups;
+ const char *amcheck_schema = NULL;
+ DatabaseInfo *dat = (DatabaseInfo *) cell->ptr;
+
+ cparams.override_dbname = dat->datname;
+
+ /*
+ * Test that this function works, but for now we're not using the list
+ * 'relations' that it builds.
+ */
+ conn = connectDatabase(&cparams, progname, opts.echo, false, true);
+
+ /*
+ * Verify that amcheck is installed for this next database. User
+ * error could result in a database not having amcheck that should
+ * have it, but we also could be iterating over multiple databases
+ * where not all of them have amcheck installed (for example,
+ * 'template1').
+ */
+ result = executeQuery(conn, amcheck_sql, opts.echo);
+ if (PQresultStatus(result) != PGRES_TUPLES_OK)
+ {
+ /* Querying the catalog failed. */
+ pg_log_error("database \"%s\": %s\n",
+ PQdb(conn), PQerrorMessage(conn));
+ pg_log_error("query was: %s", amcheck_sql);
+ PQclear(result);
+ disconnectDatabase(conn);
+ exit(1);
+ }
+ ntups = PQntuples(result);
+ if (ntups == 0)
+ {
+ /* Querying the catalog succeeded, but amcheck is missing. */
+ fprintf(stderr,
+ "%s: skipping database \"%s\": amcheck is not installed\n",
+ progname, PQdb(conn));
+ disconnectDatabase(conn);
+ continue;
+ }
+ amcheck_schema = PQgetvalue(result, 0, 0);
+ if (opts.verbose)
+ fprintf(stderr,
+ "%s: in database \"%s\": using amcheck version \"%s\" in schema \"%s\"\n",
+ progname, PQdb(conn), PQgetvalue(result, 0, 1),
+ amcheck_schema);
+ dat->amcheck_schema = PQescapeIdentifier(conn, amcheck_schema,
+ strlen(amcheck_schema));
+ PQclear(result);
+
+ compileRelationListOneDb(conn, &relations, dat);
+ disconnectDatabase(conn);
+ }
+
+ /*
+ * Check that all inclusion patterns matched at least one schema or
+ * relation that we can check.
+ */
+ for (failed = false, cell = opts.include.head; cell; cell = cell->next)
+ {
+ PatternInfo *pat = (PatternInfo *) cell->ptr;
+
+ if (!pat->matched && (pat->nsprgx != NULL || pat->relrgx != NULL))
+ {
+ failed = opts.strict_names;
+
+ if (!opts.quiet)
+ {
+ if (pat->tblonly)
+ fprintf(stderr, "%s: no tables to check for \"%s\"\n",
+ progname, pat->pattern);
+ else if (pat->idxonly)
+ fprintf(stderr, "%s: no btree indexes to check for \"%s\"\n",
+ progname, pat->pattern);
+ else if (pat->relrgx == NULL)
+ fprintf(stderr, "%s: no relations to check in schemas for \"%s\"\n",
+ progname, pat->pattern);
+ else
+ fprintf(stderr, "%s: no relations to check for \"%s\"\n",
+ progname, pat->pattern);
+ }
+ }
+ }
+
+ if (failed)
+ exit(1);
+
+ /*
+ * Set parallel_workers to the lesser of opts.jobs and the number of
+ * relations.
+ */
+ reltotal = 0;
+ parallel_workers = 0;
+ for (cell = relations.head; cell; cell = cell->next)
+ {
+ reltotal++;
+ if (parallel_workers < opts.jobs)
+ parallel_workers++;
+ }
+
+ if (reltotal == 0)
+ {
+ fprintf(stderr, "%s: no relations to check", progname);
+ exit(1);
+ }
+ progress_report(reltotal, 0, NULL, true, false);
+
+ /*
+ * ParallelSlots based event loop follows.
+ *
+ * We use server-side parallelism to check up to parallel_workers relations
+ * in parallel. The relations list was computed in database order, which
+ * minimizes the number of connects and disconnects as we process the list.
+ */
+ latest_datname = NULL;
+ failed = false;
+ sa = ParallelSlotsSetup(parallel_workers, &cparams);
+ initPQExpBuffer(&sql);
+ for (relprogress = 0, cell = relations.head; cell; cell = cell->next)
+ {
+ ParallelSlot *free_slot;
+ RelationInfo *rel;
+
+ rel = (RelationInfo *) cell->ptr;
+
+ if (CancelRequested)
+ {
+ failed = true;
+ goto finish;
+ }
+
+ /*
+ * The relations list is in database sorted order. If this next
+ * relation is in a different database than the last one seen, we are
+ * about to start checking this database. Note that other slots may
+ * still be working on relations from prior databases.
+ */
+ latest_datname = rel->datinfo->datname;
+
+ progress_report(reltotal, relprogress, latest_datname, false, false);
+
+ relprogress++;
+
+ /*
+ * Get a parallel slot for the next amcheck command, blocking if
+ * necessary until one is available, or until a previously issued slot
+ * command fails, indicating that we should abort checking the
+ * remaining objects.
+ */
+ free_slot = ParallelSlotsGetIdle(sa, rel->datinfo->datname, progname, opts.echo, NULL);
+ if (!free_slot)
+ {
+ /*
+ * Something failed. We don't need to know what it was, because
+ * the handler should already have emitted the necessary error
+ * messages.
+ */
+ failed = true;
+ goto finish;
+ }
+
+ /*
+ * Execute the appropriate amcheck command for this relation using our
+ * slot's database connection. We do not wait for the command to
+ * complete, nor do we perform any error checking, as that is done by
+ * the parallel slots and our handler callback functions.
+ */
+ if (rel->is_table)
+ {
+ prepare_table_command(&sql, rel->reloid,
+ rel->datinfo->amcheck_schema);
+ ParallelSlotSetHandler(free_slot, VerifyHeapamSlotHandler,
+ sql.data);
+ run_command(free_slot, sql.data, &cparams);
+ }
+ else
+ {
+ prepare_btree_command(&sql, rel->reloid,
+ rel->datinfo->amcheck_schema);
+ ParallelSlotSetHandler(free_slot, VerifyBtreeSlotHandler, NULL);
+ run_command(free_slot, sql.data, &cparams);
+ }
+ }
+ termPQExpBuffer(&sql);
+
+ /*
+ * Wait for all slots to complete, or for one to indicate that an error
+ * occurred. Like above, we rely on the handler emitting the necessary
+ * error messages.
+ */
+ if (sa && !ParallelSlotsWaitCompletion(sa))
+ failed = true;
+
+ progress_report(reltotal, relprogress, NULL, true, true);
+
+finish:
+ if (sa)
+ {
+ ParallelSlotsTerminate(sa);
+ pg_free(sa);
+ }
+
+ if (failed)
+ exit(1);
+
+ if (!all_checks_pass)
+ exit(2);
+}
+
+/*
+ * prepare_table_command
+ *
+ * Creates a SQL command for running amcheck checking on the given heap
+ * relation. The command is phrased as a SQL query, with column order and
+ * names matching the expectations of VerifyHeapamSlotHandler, which will
+ * receive and handle each row returned from the verify_heapam() function.
+ *
+ * sql: buffer into which the table checking command will be written
+ * reloid: relation of the table to be checked
+ * amcheck_schema: escaped and quoted name of schema in which amcheck contrib
+ * module is installed
+ */
+static void
+prepare_table_command(PQExpBuffer sql, Oid reloid, const char *amcheck_schema)
+{
+ resetPQExpBuffer(sql);
+ appendPQExpBuffer(sql,
+ "SELECT n.nspname, c.relname, v.blkno, v.offnum, "
+ "v.attnum, v.msg"
+ "\nFROM %s.verify_heapam("
+ "\nrelation := %u,"
+ "\non_error_stop := %s,"
+ "\ncheck_toast := %s,"
+ "\nskip := '%s'",
+ amcheck_schema,
+ reloid,
+ opts.on_error_stop ? "true" : "false",
+ opts.reconcile_toast ? "true" : "false",
+ opts.skip);
+ if (opts.startblock >= 0)
+ appendPQExpBuffer(sql, ",\nstartblock := %ld", opts.startblock);
+ if (opts.endblock >= 0)
+ appendPQExpBuffer(sql, ",\nendblock := %ld", opts.endblock);
+ appendPQExpBuffer(sql, "\n) v,"
+ "\npg_catalog.pg_class c"
+ "\nJOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace OPERATOR(pg_catalog.=) n.oid"
+ "\nWHERE c.oid OPERATOR(pg_catalog.=) %u",
+ reloid);
+}
+
+/*
+ * prepare_btree_command
+ *
+ * Creates a SQL command for running amcheck checking on the given btree index
+ * relation. The command does not select any columns, as btree checking
+ * functions do not return any, but rather return corruption information by
+ * raising errors, which VerifyBtreeSlotHandler expects.
+ *
+ * sql: buffer into which the table checking command will be written
+ * reloid: relation of the table to be checked
+ * amcheck_schema: escaped and quoted name of schema in which amcheck contrib
+ * module is installed
+ */
+static void
+prepare_btree_command(PQExpBuffer sql, Oid reloid, const char *amcheck_schema)
+{
+ resetPQExpBuffer(sql);
+ if (opts.parent_check)
+ appendPQExpBuffer(sql,
+ "SELECT %s.bt_index_parent_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s,"
+ "\nrootdescend := %s)",
+ amcheck_schema,
+ reloid,
+ (opts.heapallindexed ? "true" : "false"),
+ (opts.rootdescend ? "true" : "false"));
+ else
+ appendPQExpBuffer(sql,
+ "SELECT %s.bt_index_check("
+ "\nindex := '%u'::regclass,"
+ "\nheapallindexed := %s)",
+ amcheck_schema,
+ reloid,
+ (opts.heapallindexed ? "true" : "false"));
+}
+
+/*
+ * run_command
+ *
+ * Sends a command to the server without waiting for the command to complete.
+ * Logs an error if the command cannot be sent, but otherwise any errors are
+ * expected to be handled by a ParallelSlotHandler.
+ *
+ * If reconnecting to the database is necessary, the cparams argument may be
+ * modified.
+ *
+ * slot: slot with connection to the server we should use for the command
+ * sql: query to send
+ * cparams: connection parameters in case the slot needs to be reconnected
+ */
+static void
+run_command(ParallelSlot *slot, const char *sql, ConnParams *cparams)
+{
+ if (opts.echo)
+ printf("%s\n", sql);
+
+ if (PQsendQuery(slot->connection, sql) == 0)
+ {
+ pg_log_error("error sending command to database \"%s\": %s",
+ PQdb(slot->connection),
+ PQerrorMessage(slot->connection));
+ pg_log_error("command was: %s", sql);
+ exit(1);
+ }
+}
+
+/*
+ * should_processing_continue
+ *
+ * Checks a query result returned from a query (presumably issued on a slot's
+ * connection) to determine if parallel slots should continue issuing further
+ * commands.
+ *
+ * Note: Heap relation corruption is returned by verify_heapam() without the
+ * use of raising errors, but running verify_heapam() on a corrupted table may
+ * still result in an error being returned from the server due to missing
+ * relation files, bad checksums, etc. The btree corruption checking functions
+ * always use errors to communicate corruption messages. We can't just abort
+ * processing because we got a mere ERROR.
+ *
+ * res: result from an executed sql query
+ */
+static bool
+should_processing_continue(PGresult *res)
+{
+ const char *severity;
+
+ switch (PQresultStatus(res))
+ {
+ /* These are expected and ok */
+ case PGRES_COMMAND_OK:
+ case PGRES_TUPLES_OK:
+ case PGRES_NONFATAL_ERROR:
+ break;
+
+ /* This is expected but requires closer scrutiny */
+ case PGRES_FATAL_ERROR:
+ severity = PQresultErrorField(res, PG_DIAG_SEVERITY_NONLOCALIZED);
+ if (strcmp(severity, "FATAL") == 0)
+ return false;
+ if (strcmp(severity, "PANIC") == 0)
+ return false;
+ break;
+
+ /* These are unexpected */
+ case PGRES_BAD_RESPONSE:
+ case PGRES_EMPTY_QUERY:
+ case PGRES_COPY_OUT:
+ case PGRES_COPY_IN:
+ case PGRES_COPY_BOTH:
+ case PGRES_SINGLE_TUPLE:
+ return false;;
+ }
+ return true;
+}
+
+/*
+ * VerifyHeapamSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a table checking command
+ * created by prepare_table_command and outputs the results for the user.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: the sql query being handled, as a cstring
+ */
+static bool
+VerifyHeapamSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ if (PQresultStatus(res) == PGRES_TUPLES_OK)
+ {
+ int i;
+ int ntups = PQntuples(res);
+
+ if (ntups > 0)
+ all_checks_pass = false;
+
+ for (i = 0; i < ntups; i++)
+ {
+ if (!PQgetisnull(res, i, 4))
+ printf("relation %s.%s.%s, block %s, offset %s, attribute %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ PQgetvalue(res, i, 4), /* attnum */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 3))
+ printf("relation %s.%s.%s, block %s, offset %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ PQgetvalue(res, i, 3), /* offnum */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 2))
+ printf("relation %s.%s.%s, block %s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ PQgetvalue(res, i, 2), /* blkno */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else if (!PQgetisnull(res, i, 1))
+ printf("relation %s.%s.%s\n %s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 0), /* schema */
+ PQgetvalue(res, i, 1), /* relname */
+ /* blkno is null: 2 */
+ /* offnum is null: 3 */
+ /* attnum is null: 4 */
+ PQgetvalue(res, i, 5)); /* msg */
+
+ else
+ printf("%s.%s\n",
+ PQdb(conn),
+ PQgetvalue(res, i, 5)); /* msg */
+ }
+ }
+ else if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ all_checks_pass = false;
+ printf("%s: %s\n", PQdb(conn), PQerrorMessage(conn));
+ printf("%s: query was: %s\n", PQdb(conn), (const char *) context);
+ }
+
+ return should_processing_continue(res);
+}
+
+/*
+ * VerifyBtreeSlotHandler
+ *
+ * ParallelSlotHandler that receives results from a btree checking command
+ * created by prepare_btree_command and outputs them for the user. The results
+ * from the btree checking command is assumed to be empty, but when the results
+ * are an error code, the useful information about the corruption is expected
+ * in the connection's error message.
+ *
+ * res: result from an executed sql query
+ * conn: connection on which the sql query was executed
+ * context: unused
+ */
+static bool
+VerifyBtreeSlotHandler(PGresult *res, PGconn *conn, void *context)
+{
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ all_checks_pass = false;
+ printf("%s: %s\n", PQdb(conn), PQerrorMessage(conn));
+ }
+
+ return should_processing_continue(res);
+}
+
+/*
+ * help
+ *
+ * Prints help page for the program
+ *
+ * progname: the name of the executed program, such as "pg_amcheck"
+ */
+static void
+help(const char *progname)
+{
+ printf("%s checks objects in a PostgreSQL database for corruption.\n\n", progname);
+ printf("Usage:\n");
+ printf(" %s [OPTION]... [DBNAME]\n", progname);
+ printf("\nTarget Options:\n");
+ printf(" -a, --all check all databases\n");
+ printf(" -d, --dbname=DBNAME check specific database(s)\n");
+ printf(" -D, --exclude-dbname=DBNAME do NOT check specific database(s)\n");
+ printf(" -i, --index=INDEX check specific index(es)\n");
+ printf(" -I, --exclude-index=INDEX do NOT check specific index(es)\n");
+ printf(" -r, --relation=RELNAME check specific relation(s)\n");
+ printf(" -R, --exclude-relation=RELNAME do NOT check specific relation(s)\n");
+ printf(" -s, --schema=SCHEMA check specific schema(s)\n");
+ printf(" -S, --exclude-schema=SCHEMA do NOT check specific schema(s)\n");
+ printf(" -t, --table=TABLE check specific table(s)\n");
+ printf(" -T, --exclude-table=TABLE do NOT check specific table(s)\n");
+ printf(" --no-index-expansion do NOT expand list of relations to include indexes\n");
+ printf(" --no-toast-expansion do NOT expand list of relations to include toast\n");
+ printf(" --no-strict-names do NOT require patterns to match objects\n");
+ printf("\nIndex Checking Options:\n");
+ printf(" -H, --heapallindexed check all heap tuples are found within indexes\n");
+ printf(" -P, --parent-check check index parent/child relationships\n");
+ printf(" --rootdescend search from root page to refind tuples\n");
+ printf("\nTable Checking Options:\n");
+ printf(" --exclude-toast-pointers do NOT follow relation toast pointers\n");
+ printf(" --on-error-stop stop checking at end of first corrupt page\n");
+ printf(" --skip=OPTION do NOT check \"all-frozen\" or \"all-visible\" blocks\n");
+ printf(" --startblock=BLOCK begin checking table(s) at the given block number\n");
+ printf(" --endblock=BLOCK check table(s) only up to the given block number\n");
+ printf("\nConnection options:\n");
+ printf(" -h, --host=HOSTNAME database server host or socket directory\n");
+ printf(" -p, --port=PORT database server port\n");
+ printf(" -U, --username=USERNAME user name to connect as\n");
+ printf(" -w, --no-password never prompt for password\n");
+ printf(" -W, --password force password prompt\n");
+ printf(" --maintenance-db=DBNAME alternate maintenance database\n");
+ printf("\nOther Options:\n");
+ printf(" -e, --echo show the commands being sent to the server\n");
+ printf(" -j, --jobs=NUM use this many concurrent connections to the server\n");
+ printf(" -q, --quiet don't write any messages\n");
+ printf(" -v, --verbose write a lot of output\n");
+ printf(" -V, --version output version information, then exit\n");
+ printf(" --progress show progress information\n");
+ printf(" -?, --help show this help, then exit\n");
+
+ printf("\nRead the description of the amcheck contrib module for details.\n");
+ printf("\nReport bugs to <%s>.\n", PACKAGE_BUGREPORT);
+ printf("%s home page: <%s>\n", PACKAGE_NAME, PACKAGE_URL);
+}
+
+/*
+ * Print a progress report based on the global variables. If verbose output
+ * is enabled, also print the current file name.
+ *
+ * Progress report is written at maximum once per second, unless the force
+ * parameter is set to true.
+ *
+ * If finished is set to true, this is the last progress report. The cursor
+ * is moved to the next line.
+ */
+static void
+progress_report(uint64 relations_total, uint64 relations_checked,
+ const char *datname, bool force, bool finished)
+{
+ int percent = 0;
+ char checked_str[32];
+ char total_str[32];
+ pg_time_t now;
+
+ if (!opts.show_progress)
+ return;
+
+ now = time(NULL);
+ if (now == last_progress_report && !force && !finished)
+ return; /* Max once per second */
+
+ last_progress_report = now;
+ if (relations_total)
+ percent = (int) (relations_checked * 100 / relations_total);
+
+ /*
+ * Separate step to keep platform-dependent format code out of fprintf
+ * calls. We only test for INT64_FORMAT availability in snprintf, not
+ * fprintf.
+ */
+ snprintf(checked_str, sizeof(checked_str), INT64_FORMAT, relations_checked);
+ snprintf(total_str, sizeof(total_str), INT64_FORMAT, relations_total);
+
+#define VERBOSE_DATNAME_LENGTH 35
+ if (opts.verbose)
+ {
+ if (!datname)
+
+ /*
+ * No datname given, so clear the status line (used for first and
+ * last call)
+ */
+ fprintf(stderr,
+ "%*s/%s (%d%%) %*s",
+ (int) strlen(total_str),
+ checked_str, total_str, percent,
+ VERBOSE_DATNAME_LENGTH + 2, "");
+ else
+ {
+ bool truncate = (strlen(datname) > VERBOSE_DATNAME_LENGTH);
+
+ fprintf(stderr,
+ "%*s/%s (%d%%), (%s%-*.*s)",
+ (int) strlen(total_str),
+ checked_str, total_str, percent,
+ /* Prefix with "..." if we do leading truncation */
+ truncate ? "..." : "",
+ truncate ? VERBOSE_DATNAME_LENGTH - 3 : VERBOSE_DATNAME_LENGTH,
+ truncate ? VERBOSE_DATNAME_LENGTH - 3 : VERBOSE_DATNAME_LENGTH,
+ /* Truncate datname at beginning if it's too long */
+ truncate ? datname + strlen(datname) - VERBOSE_DATNAME_LENGTH + 3 : datname);
+ }
+ }
+ else
+ fprintf(stderr,
+ "%*s/%s (%d%%)",
+ (int) strlen(total_str),
+ checked_str, total_str, percent);
+
+ /*
+ * Stay on the same line if reporting to a terminal and we're not done
+ * yet.
+ */
+ fputc((!finished && isatty(fileno(stderr))) ? '\r' : '\n', stderr);
+}
+
+/*
+ * appendDatabasePattern
+ *
+ * Adds to a list the given pattern interpreted as a database name pattern.
+ *
+ * list: the list to be appended
+ * pattern: the database name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendDatabasePattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ PQExpBufferData buf;
+ PatternInfo *info = (PatternInfo *) palloc0(sizeof(PatternInfo));
+
+ info->pattern_id = next_id++;
+
+ initPQExpBuffer(&buf);
+ patternToSQLRegex(encoding, NULL, NULL, &buf, pattern, false);
+ info->pattern = pattern;
+ info->dbrgx = pstrdup(buf.data);
+
+ termPQExpBuffer(&buf);
+
+ simple_ptr_list_append(list, info);
+}
+
+/*
+ * appendSchemaPattern
+ *
+ * Adds to a list the given pattern interpreted as a schema name pattern.
+ *
+ * list: the list to be appended
+ * pattern: the schema name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendSchemaPattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ PQExpBufferData buf;
+ PatternInfo *info = (PatternInfo *) palloc0(sizeof(PatternInfo));
+
+ info->pattern_id = next_id++;
+
+ initPQExpBuffer(&buf);
+ patternToSQLRegex(encoding, NULL, NULL, &buf, pattern, false);
+ info->pattern = pattern;
+ info->nsprgx = pstrdup(buf.data);
+ termPQExpBuffer(&buf);
+
+ simple_ptr_list_append(list, info);
+}
+
+/*
+ * appendRelationPatternHelper
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ * tblonly: whether the pattern should only be matched against heap tables
+ * idxonly: whether the pattern should only be matched against btree indexes
+ */
+static void
+appendRelationPatternHelper(SimplePtrList *list, const char *pattern,
+ int encoding, bool tblonly, bool idxonly)
+{
+ PQExpBufferData dbbuf;
+ PQExpBufferData nspbuf;
+ PQExpBufferData relbuf;
+ PatternInfo *info = (PatternInfo *) palloc0(sizeof(PatternInfo));
+
+ info->pattern_id = next_id++;
+
+ initPQExpBuffer(&dbbuf);
+ initPQExpBuffer(&nspbuf);
+ initPQExpBuffer(&relbuf);
+
+ patternToSQLRegex(encoding, &dbbuf, &nspbuf, &relbuf, pattern, false);
+ info->pattern = pattern;
+ if (dbbuf.data[0])
+ info->dbrgx = pstrdup(dbbuf.data);
+ if (nspbuf.data[0])
+ info->nsprgx = pstrdup(nspbuf.data);
+ if (relbuf.data[0])
+ info->relrgx = pstrdup(relbuf.data);
+
+ termPQExpBuffer(&dbbuf);
+ termPQExpBuffer(&nspbuf);
+ termPQExpBuffer(&relbuf);
+
+ info->tblonly = tblonly;
+ info->idxonly = idxonly;
+
+ simple_ptr_list_append(list, info);
+}
+
+/*
+ * appendRelationPattern
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern, to be
+ * matched against both tables and indexes.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendRelationPattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ appendRelationPatternHelper(list, pattern, encoding, false, false);
+}
+
+/*
+ * appendTablePattern
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern, to be
+ * matched only against tables.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendTablePattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ appendRelationPatternHelper(list, pattern, encoding, true, false);
+}
+
+/*
+ * appendIndexPattern
+ *
+ * Adds to a list the given pattern interpreted as a relation pattern, to be
+ * matched only against indexes.
+ *
+ * list: the list to be appended
+ * pattern: the relation name pattern
+ * encoding: client encoding for parsing the pattern
+ */
+static void
+appendIndexPattern(SimplePtrList *list, const char *pattern, int encoding)
+{
+ appendRelationPatternHelper(list, pattern, encoding, false, true);
+}
+
+/*
+ * appendDbPatternCTE
+ *
+ * Appends to the buffer the body of a Common Table Expression (CTE) containing
+ * the database portions filtered from the list of patterns expressed as three
+ * columns:
+ *
+ * id: the unique pattern ID
+ * pat: the full user specified pattern from the command line
+ * rgx: the database regular expression parsed from the pattern
+ *
+ * Patterns without a database portion are skipped. Patterns with more than
+ * just a database portion are optionally skipped, depending on argument
+ * 'inclusive'.
+ *
+ * buf: the buffer to be appended
+ * patterns: the list of patterns to be inserted into the CTE
+ * conn: the database connection
+ * inclusive: whether to include patterns with schema and/or relation parts
+ */
+static void
+appendDbPatternCTE(PQExpBuffer buf, const SimplePtrList *patterns,
+ PGconn *conn, bool inclusive)
+{
+ SimplePtrListCell *cell;
+ const char *comma;
+ bool have_values;
+
+ comma = "";
+ have_values = false;
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ PatternInfo *info = (PatternInfo *) cell->ptr;
+
+ if (info->dbrgx != NULL &&
+ (inclusive || (info->nsprgx == NULL && info->relrgx == NULL)))
+ {
+ if (!have_values)
+ appendPQExpBufferStr(buf, "\nVALUES");
+ have_values = true;
+ appendPQExpBuffer(buf, "%s\n(%d, ", comma, info->pattern_id);
+ appendStringLiteralConn(buf, info->pattern, conn);
+ appendPQExpBufferStr(buf, ", ");
+ appendStringLiteralConn(buf, info->dbrgx, conn);
+ appendPQExpBufferStr(buf, ")");
+ comma = ",";
+ }
+ }
+
+ if (!have_values)
+ appendPQExpBufferStr(buf, "\nSELECT NULL, NULL, NULL WHERE false");
+}
+
+/*
+ * compileDatabaseList
+ *
+ * Compiles a list of databases to check based on the user supplied options,
+ * sorted to preserve the order they were specified on the command line. In
+ * the event that multiple databases match a single command line pattern, they
+ * are secondarily sorted by name.
+ *
+ * conn: connection to the initial database
+ * databases: the list onto which databases should be appended
+ */
+static void
+compileDatabaseList(PGconn *conn, SimplePtrList *databases)
+{
+ PGresult *res;
+ PQExpBufferData sql;
+ int ntups;
+ int i;
+ bool fatal;
+
+ initPQExpBuffer(&sql);
+
+ /* Append the include patterns CTE. */
+ appendPQExpBufferStr(&sql, "WITH include_raw (id, pat, rgx) AS (");
+ appendDbPatternCTE(&sql, &opts.include, conn, true);
+
+ /* Append the exclude patterns CTE. */
+ appendPQExpBufferStr(&sql, "\n),\nexclude_raw (id, pat, rgx) AS (");
+ appendDbPatternCTE(&sql, &opts.exclude, conn, false);
+ appendPQExpBufferStr(&sql, "\n),");
+
+ /*
+ * Append the database CTE, which includes whether each database is
+ * connectable and also joins against exclude_raw to determine whether
+ * each database is excluded.
+ */
+ appendPQExpBufferStr(&sql,
+ "\ndatabase (datname) AS ("
+ "\nSELECT d.datname"
+ "\nFROM pg_catalog.pg_database d"
+ "\nLEFT OUTER JOIN exclude_raw e"
+ "\nON d.datname ~ e.rgx"
+ "\nWHERE d.datallowconn"
+ "\nAND e.id IS NULL"
+ "\n),"
+
+ /*
+ * Append the include_pat CTE, which joins the include_raw CTE against the
+ * databases CTE to determine if all the inclusion patterns had matches,
+ * and whether each matched pattern had the misfortune of only matching
+ * excluded or unconnectable databases.
+ */
+ "\ninclude_pat (id, pat, checkable) AS ("
+ "\nSELECT i.id, i.pat,"
+ "\nCOUNT(*) FILTER ("
+ "\nWHERE d IS NOT NULL"
+ "\n) AS checkable"
+ "\nFROM include_raw i"
+ "\nLEFT OUTER JOIN database d"
+ "\nON d.datname ~ i.rgx"
+ "\nGROUP BY i.id, i.pat"
+ "\n),"
+
+ /*
+ * Append the filtered_databases CTE, which selects from the database CTE
+ * optionally joined against the include_raw CTE to only select databases
+ * that match an inclusion pattern. This appears to duplicate what the
+ * include_pat CTE already did above, but here we want only databses, and
+ * there we wanted patterns.
+ */
+ "\nfiltered_databases (datname) AS ("
+ "\nSELECT DISTINCT d.datname"
+ "\nFROM database d");
+ if (!opts.alldb)
+ appendPQExpBufferStr(&sql,
+ "\nINNER JOIN include_raw i"
+ "\nON d.datname ~ i.rgx");
+ appendPQExpBufferStr(&sql,
+ "\n)"
+
+ /*
+ * Select the checkable databases and the unmatched inclusion patterns.
+ */
+ "\nSELECT pat, datname"
+ "\nFROM ("
+ "\nSELECT id, pat, NULL::TEXT AS datname"
+ "\nFROM include_pat"
+ "\nWHERE checkable = 0"
+ "\nUNION ALL"
+ "\nSELECT NULL, NULL, datname"
+ "\nFROM filtered_databases"
+ "\n) AS combined_records"
+ "\nORDER BY id NULLS LAST, datname");
+
+ res = executeQuery(conn, sql.data, opts.echo);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ disconnectDatabase(conn);
+ exit(1);
+ }
+ termPQExpBuffer(&sql);
+
+ ntups = PQntuples(res);
+ for (fatal = false, i = 0; i < ntups; i++)
+ {
+ const char *pat = NULL;
+ const char *datname = NULL;
+
+ if (!PQgetisnull(res, i, 0))
+ pat = PQgetvalue(res, i, 0);
+ if (!PQgetisnull(res, i, 1))
+ datname = PQgetvalue(res, i, 1);
+
+ if (pat != NULL)
+ {
+ /*
+ * Current record pertains to an inclusion pattern that matched no
+ * checkable databases.
+ */
+ fatal = opts.strict_names;
+ fprintf(stderr, "%s: no checkable database: \"%s\"\n",
+ progname, pat);
+ }
+ else
+ {
+ /* Current record pertains to a database */
+ Assert(datname != NULL);
+
+ DatabaseInfo *dat = (DatabaseInfo *) palloc0(sizeof(DatabaseInfo));
+
+ /* This database is included. Add to list */
+ if (opts.verbose)
+ fprintf(stderr, "%s: including database: \"%s\"\n", progname,
+ datname);
+
+ dat->datname = pstrdup(datname);
+ simple_ptr_list_append(databases, dat);
+ }
+ }
+ PQclear(res);
+
+ if (fatal)
+ {
+ disconnectDatabase(conn);
+ exit(1);
+ }
+}
+
+/*
+ * appendRelPatternRawCTE
+ *
+ * Appends to the buffer the body of a Common Table Expression (CTE) containing
+ * the patterns from the given list as seven columns:
+ *
+ * id: the unique pattern ID
+ * pat: the full user specified pattern from the command line
+ * dbrgx: the database regexp parsed from the pattern, or NULL if the
+ * pattern had no database part
+ * nsprgx: the namespace regexp parsed from the pattern, or NULL if the
+ * pattern had no namespace part
+ * relrgx: the relname regexp parsed from the pattern, or NULL if the
+ * pattern had no relname part
+ * tbl: true if the pattern applies only to tables (not indexes)
+ * idx: true if the pattern applies only to indexes (not tables)
+ *
+ * buf: the buffer to be appended
+ * patterns: the list of patterns to be inserted into the CTE
+ * conn: the database connection
+ */
+static void
+appendRelPatternRawCTE(PQExpBuffer buf, const SimplePtrList *patterns,
+ PGconn *conn)
+{
+ SimplePtrListCell *cell;
+ const char *comma;
+ bool have_values;
+
+ comma = "";
+ have_values = false;
+ for (cell = patterns->head; cell; cell = cell->next)
+ {
+ PatternInfo *info = (PatternInfo *) cell->ptr;
+
+ if (!have_values)
+ appendPQExpBufferStr(buf, "\nVALUES");
+ have_values = true;
+ appendPQExpBuffer(buf, "%s\n(%d::INTEGER, ", comma, info->pattern_id);
+ appendStringLiteralConn(buf, info->pattern, conn);
+ appendPQExpBufferStr(buf, "::TEXT, ");
+ if (info->dbrgx == NULL)
+ appendPQExpBufferStr(buf, "NULL");
+ else
+ appendStringLiteralConn(buf, info->dbrgx, conn);
+ appendPQExpBufferStr(buf, "::TEXT, ");
+ if (info->nsprgx == NULL)
+ appendPQExpBufferStr(buf, "NULL");
+ else
+ appendStringLiteralConn(buf, info->nsprgx, conn);
+ appendPQExpBufferStr(buf, "::TEXT, ");
+ if (info->relrgx == NULL)
+ appendPQExpBufferStr(buf, "NULL");
+ else
+ appendStringLiteralConn(buf, info->relrgx, conn);
+ if (info->tblonly)
+ appendPQExpBufferStr(buf, "::TEXT, true::BOOLEAN");
+ else
+ appendPQExpBufferStr(buf, "::TEXT, false::BOOLEAN");
+ if (info->idxonly)
+ appendPQExpBufferStr(buf, ", true::BOOLEAN");
+ else
+ appendPQExpBufferStr(buf, ", false::BOOLEAN");
+ appendPQExpBufferStr(buf, ")");
+ comma = ",";
+ }
+
+ if (!have_values)
+ appendPQExpBufferStr(buf,
+ "\nSELECT NULL::INTEGER, NULL::TEXT, NULL::TEXT,"
+ "\nNULL::TEXT, NULL::TEXT, NULL::BOOLEAN,"
+ "\nNULL::BOOLEAN"
+ "\nWHERE false");
+}
+
+/*
+ * appendRelPatternFilteredCTE
+ *
+ * Appends to the buffer a Common Table Expression (CTE) which selects
+ * all patterns from the named raw CTE, filtered by database. All patterns
+ * which have no database portion or whose database portion matches our
+ * connection's database name are selected, with other patterns excluded.
+ *
+ * The basic idea here is that if we're connected to database "foo" and we have
+ * patterns "foo.bar.baz", "alpha.beta" and "one.two.three", we only want to
+ * use the first two while processing relations in this database, as the third
+ * one is not relevant.
+ *
+ * buf: the buffer to be appended
+ * raw: the name of the CTE to select from
+ * filtered: the name of the CTE to create
+ * conn: the database connection
+ */
+static void
+appendRelPatternFilteredCTE(PQExpBuffer buf, const char *raw,
+ const char *filtered, PGconn *conn)
+{
+ appendPQExpBuffer(buf,
+ "\n%s (id, pat, nsprgx, relrgx, tbl, idx) AS ("
+ "\nSELECT id, pat, nsprgx, relrgx, tbl, idx"
+ "\nFROM %s r"
+ "\nWHERE (r.dbrgx IS NULL"
+ "\nOR ",
+ filtered, raw);
+ appendStringLiteralConn(buf, PQdb(conn), conn);
+ appendPQExpBufferStr(buf, " ~ r.dbrgx)");
+ appendPQExpBufferStr(buf,
+ "\nAND (r.nsprgx IS NOT NULL"
+ "\nOR r.relrgx IS NOT NULL)"
+ "\n),");
+}
+
+/*
+ * compileRelationListOneDb
+ *
+ * Compiles a list of relations to check within the currently connected
+ * database based on the user supplied options, sorted by descending size,
+ * and appends them to the given relations list.
+ *
+ * The cells of the constructed list contain all information about the relation
+ * necessary to connect to the database and check the object, including which
+ * database to connect to, where contrib/amcheck is installed, and the Oid and
+ * type of object (table vs. index). Rather than duplicating the database
+ * details per relation, the relation structs use references to the same
+ * database object, provided by the caller.
+ *
+ * conn: connection to this next database, which should be the same as in 'dat'
+ * relations: list onto which the relations information should be appended
+ * dat: the database info struct for use by each relation
+ */
+static void
+compileRelationListOneDb(PGconn *conn, SimplePtrList *relations,
+ const DatabaseInfo *dat)
+{
+ PGresult *res;
+ PQExpBufferData sql;
+ int ntups;
+ int i;
+ const char *datname;
+
+ initPQExpBuffer(&sql);
+ appendPQExpBufferStr(&sql, "WITH");
+
+ /* Append CTEs for the relation inclusion patterns, if any */
+ if (!opts.allrel)
+ {
+ appendPQExpBufferStr(&sql,
+ "\ninclude_raw (id, pat, dbrgx, nsprgx, relrgx, tbl, idx) AS (");
+ appendRelPatternRawCTE(&sql, &opts.include, conn);
+ appendPQExpBufferStr(&sql, "\n),");
+ appendRelPatternFilteredCTE(&sql, "include_raw", "include_pat", conn);
+ }
+
+ /* Append CTEs for the relation exclusion patterns, if any */
+ if (opts.excludetbl || opts.excludeidx)
+ {
+ appendPQExpBufferStr(&sql,
+ "\nexclude_raw (id, pat, dbrgx, nsprgx, relrgx, tbl, idx) AS (");
+ appendRelPatternRawCTE(&sql, &opts.exclude, conn);
+ appendPQExpBufferStr(&sql, "\n),");
+ appendRelPatternFilteredCTE(&sql, "exclude_raw", "exclude_pat", conn);
+ }
+
+ /* Append the relation CTE. */
+ appendPQExpBufferStr(&sql,
+ "\nrelation (id, pat, oid, reltoastrelid, relpages, tbl, idx) AS ("
+ "\nSELECT DISTINCT ON (c.oid");
+ if (!opts.allrel)
+ appendPQExpBufferStr(&sql, ", ip.id) ip.id, ip.pat,");
+ else
+ appendPQExpBufferStr(&sql, ") NULL::INTEGER AS id, NULL::TEXT AS pat,");
+ appendPQExpBuffer(&sql,
+ "\nc.oid, c.reltoastrelid, c.relpages,"
+ "\nc.relam = %u AS tbl,"
+ "\nc.relam = %u AS idx"
+ "\nFROM pg_catalog.pg_class c"
+ "\nINNER JOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace = n.oid",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+ if (!opts.allrel)
+ appendPQExpBuffer(&sql,
+ "\nINNER JOIN include_pat ip"
+ "\nON (n.nspname ~ ip.nsprgx OR ip.nsprgx IS NULL)"
+ "\nAND (c.relname ~ ip.relrgx OR ip.relrgx IS NULL)"
+ "\nAND (c.relam = %u OR NOT ip.tbl)"
+ "\nAND (c.relam = %u OR NOT ip.idx)",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+ if (opts.excludetbl || opts.excludeidx)
+ appendPQExpBuffer(&sql,
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON (n.nspname ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (c.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND (c.relam = %u OR NOT e.tbl)"
+ "\nAND (c.relam = %u OR NOT e.idx)",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+
+ if (opts.excludetbl || opts.excludeidx)
+ appendPQExpBufferStr(&sql, "\nWHERE e.pat IS NULL");
+ else
+ appendPQExpBufferStr(&sql, "\nWHERE true");
+
+ /*
+ * We need to be careful not to break the --no-toast-expansion and
+ * --no-index-expansion options. By default, the indexes, toast tables,
+ * and toast table indexes associated with primary tables are included,
+ * using their own CTEs below. We implement the --exclude-* options by not
+ * creating those CTEs, but that's no use if we've already selected the
+ * toast and indexes here. On the other hand, we want inclusion patterns
+ * that match indexes or toast tables to be honored. So, if inclusion
+ * patterns were given, we want to select all tables, toast tables, or
+ * indexes that match the patterns. But if no inclusion patterns were
+ * given, and we're simply matching all relations, then we only want to
+ * match the primary tables here.
+ */
+ if (opts.allrel)
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam = %u"
+ "\nAND c.relkind IN ('r', 'm', 't')"
+ "\nAND c.relnamespace != %u",
+ HEAP_TABLE_AM_OID, PG_TOAST_NAMESPACE);
+ else
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam IN (%u, %u)"
+ "\nAND c.relkind IN ('r', 'm', 't', 'i')"
+ "\nAND ((c.relam = %u AND c.relkind IN ('r', 'm', 't')) OR"
+ "\n(c.relam = %u AND c.relkind = 'i'))",
+ HEAP_TABLE_AM_OID, BTREE_AM_OID,
+ HEAP_TABLE_AM_OID, BTREE_AM_OID);
+
+ appendPQExpBufferStr(&sql,
+ "\nORDER BY c.oid"
+ "\n)");
+
+ if (!opts.no_toast)
+ {
+ /*
+ * Include a CTE for toast tables associated with primary tables
+ * selected above, filtering by exclusion patterns (if any) that match
+ * toast table names.
+ */
+ appendPQExpBufferStr(&sql,
+ ",\ntoast (oid, relpages) AS ("
+ "\nSELECT t.oid, t.relpages"
+ "\nFROM pg_catalog.pg_class t"
+ "\nINNER JOIN relation r"
+ "\nON r.reltoastrelid = t.oid");
+ if (opts.excludetbl)
+ appendPQExpBufferStr(&sql,
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON ('pg_toast' ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (t.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND e.tbl"
+ "\nWHERE e.id IS NULL");
+ appendPQExpBufferStr(&sql,
+ "\n)");
+ }
+ if (!opts.no_indexes)
+ {
+ /*
+ * Include a CTE for btree indexes associated with primary tables
+ * selected above, filtering by exclusion patterns (if any) that match
+ * btree index names.
+ */
+ appendPQExpBuffer(&sql,
+ ",\nindex (oid, relpages) AS ("
+ "\nSELECT c.oid, c.relpages"
+ "\nFROM relation r"
+ "\nINNER JOIN pg_catalog.pg_index i"
+ "\nON r.oid = i.indrelid"
+ "\nINNER JOIN pg_catalog.pg_class c"
+ "\nON i.indexrelid = c.oid");
+ if (opts.excludeidx)
+ appendPQExpBufferStr(&sql,
+ "\nINNER JOIN pg_catalog.pg_namespace n"
+ "\nON c.relnamespace = n.oid"
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON (n.nspname ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (c.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND e.idx"
+ "\nWHERE e.id IS NULL");
+ else
+ appendPQExpBufferStr(&sql,
+ "\nWHERE true");
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam = %u"
+ "\nAND c.relkind = 'i'",
+ BTREE_AM_OID);
+ if (opts.no_toast)
+ appendPQExpBuffer(&sql,
+ "\nAND c.relnamespace != %u",
+ PG_TOAST_NAMESPACE);
+ appendPQExpBufferStr(&sql, "\n)");
+ }
+
+ if (!opts.no_toast && !opts.no_indexes)
+ {
+ /*
+ * Include a CTE for btree indexes associated with toast tables of
+ * primary tables selected above, filtering by exclusion patterns (if
+ * any) that match the toast index names.
+ */
+ appendPQExpBuffer(&sql,
+ ",\ntoast_index (oid, relpages) AS ("
+ "\nSELECT c.oid, c.relpages"
+ "\nFROM toast t"
+ "\nINNER JOIN pg_catalog.pg_index i"
+ "\nON t.oid = i.indrelid"
+ "\nINNER JOIN pg_catalog.pg_class c"
+ "\nON i.indexrelid = c.oid");
+ if (opts.excludeidx)
+ appendPQExpBufferStr(&sql,
+ "\nLEFT OUTER JOIN exclude_pat e"
+ "\nON ('pg_toast' ~ e.nsprgx OR e.nsprgx IS NULL)"
+ "\nAND (c.relname ~ e.relrgx OR e.relrgx IS NULL)"
+ "\nAND e.idx"
+ "\nWHERE e.id IS NULL");
+ else
+ appendPQExpBufferStr(&sql,
+ "\nWHERE true");
+ appendPQExpBuffer(&sql,
+ "\nAND c.relam = %u"
+ "\nAND c.relkind = 'i'"
+ "\n)",
+ BTREE_AM_OID);
+ }
+
+ /*
+ * Roll-up distinct rows from CTEs.
+ *
+ * Relations that match more than one pattern may occur more than once in
+ * the list, and indexes and toast for primary relations may also have
+ * matched in their own right, so we rely on UNION to deduplicate the
+ * list.
+ */
+ appendPQExpBuffer(&sql,
+ "\nSELECT id, tbl, idx, oid"
+ "\nFROM (");
+ appendPQExpBufferStr(&sql,
+ /* Inclusion patterns that failed to match */
+ "\nSELECT id, tbl, idx,"
+ "\nNULL::OID AS oid,"
+ "\nNULL::INTEGER AS relpages"
+ "\nFROM relation"
+ "\nWHERE id IS NOT NULL"
+ "\nUNION"
+ /* Primary relations */
+ "\nSELECT NULL::INTEGER AS id,"
+ "\ntbl, idx,"
+ "\noid, relpages"
+ "\nFROM relation");
+ if (!opts.no_toast)
+ appendPQExpBufferStr(&sql,
+ "\nUNION"
+ /* Toast tables for primary relations */
+ "\nSELECT NULL::INTEGER AS id, TRUE AS tbl,"
+ "\nFALSE AS idx, oid, relpages"
+ "\nFROM toast");
+ if (!opts.no_indexes)
+ appendPQExpBufferStr(&sql,
+ "\nUNION"
+ /* Indexes for primary relations */
+ "\nSELECT NULL::INTEGER AS id, FALSE AS tbl,"
+ "\nTRUE AS idx, oid, relpages"
+ "\nFROM index");
+ if (!opts.no_toast && !opts.no_indexes)
+ appendPQExpBufferStr(&sql,
+ "\nUNION"
+ /* Indexes for toast relations */
+ "\nSELECT NULL::INTEGER AS id, FALSE AS tbl,"
+ "\nTRUE AS idx, oid, relpages"
+ "\nFROM toast_index");
+ appendPQExpBufferStr(&sql,
+ "\n) AS combined_records"
+ "\nORDER BY relpages DESC NULLS FIRST, oid");
+
+ res = executeQuery(conn, sql.data, opts.echo);
+ if (PQresultStatus(res) != PGRES_TUPLES_OK)
+ {
+ pg_log_error("query failed: %s", PQerrorMessage(conn));
+ pg_log_error("query was: %s", sql.data);
+ disconnectDatabase(conn);
+ exit(1);
+ }
+ termPQExpBuffer(&sql);
+
+ /*
+ * Allocate a single copy of the database name to be shared by all nodes
+ * in the object list, constructed below.
+ */
+ datname = pstrdup(PQdb(conn));
+
+ ntups = PQntuples(res);
+ for (i = 0; i < ntups; i++)
+ {
+ int pattern_id = 0;
+ bool tbl = false;
+ bool idx = false;
+ Oid oid = InvalidOid;
+
+ if (!PQgetisnull(res, i, 0))
+ pattern_id = atoi(PQgetvalue(res, i, 0));
+ if (!PQgetisnull(res, i, 1))
+ tbl = (PQgetvalue(res, i, 1)[0] == 't');
+ if (!PQgetisnull(res, i, 2))
+ idx = (PQgetvalue(res, i, 2)[0] == 't');
+ if (!PQgetisnull(res, i, 3))
+ oid = atooid(PQgetvalue(res, i, 3));
+
+ if (pattern_id > 0)
+ {
+ /*
+ * Current record pertains to an inclusion pattern. Find the
+ * pattern in the list and record that it matched. If we expected
+ * a large number of command-line inclusion pattern arguments, the
+ * datastructure here might need to be more efficient, but we
+ * expect the list to be short.
+ */
+
+ SimplePtrListCell *cell;
+ bool found;
+
+ for (found = false, cell = opts.include.head; cell; cell = cell->next)
+ {
+ PatternInfo *info = (PatternInfo *) cell->ptr;
+
+ if (info->pattern_id == pattern_id)
+ {
+ info->matched = true;
+ found = true;
+ break;
+ }
+ }
+ if (!found)
+ {
+ pg_log_error("internal error: received unexpected pattern_id %d",
+ pattern_id);
+ exit(1);
+ }
+ }
+ else
+ {
+ /* Current record pertains to a relation */
+
+ RelationInfo *rel = (RelationInfo *) palloc0(sizeof(RelationInfo));
+
+ Assert(OidIsValid(oid));
+ Assert(!(tbl && idx));
+
+ rel->datinfo = dat;
+ rel->reloid = oid;
+ rel->is_table = tbl;
+
+ simple_ptr_list_append(relations, rel);
+ }
+ }
+ PQclear(res);
+}
diff --git a/contrib/pg_amcheck/t/001_basic.pl b/contrib/pg_amcheck/t/001_basic.pl
new file mode 100644
index 0000000000..dfa0ae9e06
--- /dev/null
+++ b/contrib/pg_amcheck/t/001_basic.pl
@@ -0,0 +1,9 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 8;
+
+program_help_ok('pg_amcheck');
+program_version_ok('pg_amcheck');
+program_options_handling_ok('pg_amcheck');
diff --git a/contrib/pg_amcheck/t/002_nonesuch.pl b/contrib/pg_amcheck/t/002_nonesuch.pl
new file mode 100644
index 0000000000..8c6e267ee9
--- /dev/null
+++ b/contrib/pg_amcheck/t/002_nonesuch.pl
@@ -0,0 +1,213 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 60;
+
+# Test set-up
+my ($node, $port);
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+#########################################
+# Test connecting to a non-existent database
+
+# Failing to connect to the initial database is an error.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'qqq' ],
+ qr/database "qqq" does not exist/,
+ 'checking a non-existent database');
+
+# Failing to resolve a secondary database name is also an error, though since
+# the string is treated as a pattern, the error message looks different.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', 'qqq' ],
+ qr/pg_amcheck: no checkable database: "qqq"/,
+ 'checking a non-existent database');
+
+# Failing to connect to the initial database is still an error when using
+# --no-strict-names.
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-p', $port, 'qqq' ],
+ qr/database "qqq" does not exist/,
+ 'checking a non-existent database with --no-strict-names');
+
+# But failing to resolve secondary database names is not an error when using
+# --no-strict-names. We should still see the message, but as a non-fatal
+# warning
+$node->command_checks_all(
+ [ 'pg_amcheck', '--no-strict-names', '-p', $port, '-d', 'no_such_database', 'postgres', 'qqq' ],
+ 0,
+ [ ],
+ [ qr/no checkable database: "qqq"/ ],
+ 'checking a non-existent secondary database with --no-strict-names');
+
+# Check that a substring of an existent database name does not get interpreted
+# as a matching pattern.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'post' ],
+ qr/database "post" does not exist/,
+ 'checking a non-existent primary database (substring of existent database)');
+
+# And again, but testing the secondary database name rather than the primary
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', 'post' ],
+ qr/pg_amcheck: no checkable database: "post"/,
+ 'checking a non-existent secondary database (substring of existent database)');
+
+# Likewise, check that a superstring of an existent database name does not get
+# interpreted as a matching pattern.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postresql' ],
+ qr/database "postresql" does not exist/,
+ 'checking a non-existent primary database (superstring of existent database)');
+
+# And again, but testing the secondary database name rather than the primary
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, 'postgres', 'postgresql' ],
+ qr/pg_amcheck: no checkable database: "postgresql"/,
+ 'checking a non-existent secondary database (superstring of existent database)');
+
+#########################################
+# Test connecting with a non-existent user
+
+# Failing to connect to the initial database due to bad username is an error.
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-U=no_such_user', 'postgres' ],
+ qr/role "=no_such_user" does not exist/,
+ 'checking with a non-existent user');
+
+# Failing to connect to the initial database due to bad username is still an
+# error when using --no-strict-names.
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-p', $port, '-U=no_such_user', 'postgres' ],
+ qr/role "=no_such_user" does not exist/,
+ 'checking with a non-existent user, --no-strict-names');
+
+#########################################
+# Test checking databases without amcheck installed
+
+# Attempting to check a database by name where amcheck is not installed should
+# raise a warning. If all databases are skipped, having no relations to check
+# raises an error.
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'template1' ],
+ 1,
+ [],
+ [ qr/pg_amcheck: skipping database "template1": amcheck is not installed/,
+ qr/pg_amcheck: no relations to check/ ],
+ 'checking a database by name without amcheck installed');
+
+# Likewise, but by database pattern rather than by name, such that some
+# databases with amcheck installed are included, and so checking occurs and
+# only a warning is raised.
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, '-d', '*', 'postgres' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: skipping database "template1": amcheck is not installed/ ],
+ 'checking a database by dbname implication without amcheck installed');
+
+# And again, but by checking all databases.
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, '--all', 'postgres' ],
+ 0,
+ [],
+ [ qr/pg_amcheck: skipping database "template1": amcheck is not installed/ ],
+ 'checking a database by --all implication without amcheck installed');
+
+#########################################
+# Test unreasonable patterns
+
+# Check three-part unreasonable pattern that has zero-length names
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-t', '..' ],
+ 1,
+ [ qr/^$/ ],
+ [ qr/pg_amcheck: no checkable database: "\.\."/ ],
+ 'checking table pattern ".."');
+
+# Again, but with non-trivial schema and relation parts
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-t', '.foo.bar' ],
+ 1,
+ [ qr/^$/ ],
+ [ qr/pg_amcheck: no checkable database: "\.foo\.bar"/ ],
+ 'checking table pattern ".foo.bar"');
+
+# Check two-part unreasonable pattern that has zero-length names
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $port, 'postgres', '-t', '.' ],
+ 1,
+ [ qr/^$/ ],
+ [ qr/pg_amcheck: no tables to check for "\."/ ],
+ 'checking table pattern "."');
+
+#########################################
+# Test checking non-existent schemas, tables, and indexes
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no_such_schema' ],
+ qr/pg_amcheck: no relations to check in schemas for "no_such_schema"/,
+ 'checking a non-existent schema');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-s', 'no_such_schema' ],
+ qr/pg_amcheck: no relations to check/,
+ 'checking a non-existent schema with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no_such_table' ],
+ qr/pg_amcheck: no tables to check for "no_such_table"/,
+ 'checking a non-existent table');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-t', 'no_such_table' ],
+ qr/pg_amcheck: no relations to check/,
+ 'checking a non-existent table with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no_such_index' ],
+ qr/pg_amcheck: no btree indexes to check for "no_such_index"/,
+ 'checking a non-existent index');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-i', 'no_such_index' ],
+ qr/pg_amcheck: no relations to check/,
+ 'checking a non-existent index with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-s', 'no*such*schema*' ],
+ qr/pg_amcheck: no relations to check in schemas for "no\*such\*schema\*"/,
+ 'no matching schemas');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-s', 'no*such*schema*' ],
+ qr/pg_amcheck: no relations to check/,
+ 'no matching schemas with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-t', 'no*such*table*' ],
+ qr/pg_amcheck: no tables to check for "no\*such\*table\*"/,
+ 'no matching tables');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-t', 'no*such*table*' ],
+ qr/pg_amcheck: no relations to check/,
+ 'no matching tables with --no-strict-names -v');
+
+command_fails_like(
+ [ 'pg_amcheck', '-p', $port, '-i', 'no*such*index*' ],
+ qr/pg_amcheck: no btree indexes to check for "no\*such\*index\*"/,
+ 'no matching indexes');
+
+command_fails_like(
+ [ 'pg_amcheck', '--no-strict-names', '-v', '-p', $port, '-i', 'no*such*index*' ],
+ qr/pg_amcheck: no relations to check/,
+ 'no matching indexes with --no-strict-names -v');
diff --git a/contrib/pg_amcheck/t/003_check.pl b/contrib/pg_amcheck/t/003_check.pl
new file mode 100644
index 0000000000..f985273e83
--- /dev/null
+++ b/contrib/pg_amcheck/t/003_check.pl
@@ -0,0 +1,520 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 70;
+
+my ($node, $port);
+
+# Returns the filesystem path for the named relation.
+#
+# Assumes the test node is running
+sub relation_filepath($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $pgdata = $node->data_dir;
+ my $rel = $node->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$relname')));
+ die "path not found for relation $relname" unless defined $rel;
+ return "$pgdata/$rel";
+}
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Stops the test node, corrupts the first page of the named relation, and
+# restarts the node.
+#
+# Assumes the test node is running.
+sub corrupt_first_page($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop;
+ my $fh;
+ open($fh, '+<', $relpath);
+ binmode $fh;
+ seek($fh, 32, 0);
+ syswrite($fh, '\x77\x77\x77\x77', 500);
+ close($fh);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# relation, and restarts the node.
+#
+# Assumes the test node is running
+sub remove_relation_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $relpath = relation_filepath($dbname, $relname);
+
+ $node->stop();
+ unlink($relpath);
+ $node->start;
+}
+
+# Stops the test node, unlinks the file from the filesystem that backs the
+# toast table (if any) corresponding to the given main table relation, and
+# restarts the node.
+#
+# Assumes the test node is running
+sub remove_toast_file($$)
+{
+ my ($dbname, $relname) = @_;
+ my $toastname = relation_toast($dbname, $relname);
+ remove_relation_file($dbname, $toastname) if ($toastname);
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+for my $dbname (qw(db1 db2 db3))
+{
+ # Create the database
+ $node->safe_psql('postgres', qq(CREATE DATABASE $dbname));
+
+ # Load the amcheck extension, upon which pg_amcheck depends. Put the
+ # extension in an unexpected location to test that pg_amcheck finds it
+ # correctly. Create tables with names that look like pg_catalog names to
+ # check that pg_amcheck does not get confused by them. Create functions in
+ # schema public that look like amcheck functions to check that pg_amcheck
+ # does not use them.
+ $node->safe_psql($dbname, q(
+ CREATE SCHEMA amcheck_schema;
+ CREATE EXTENSION amcheck WITH SCHEMA amcheck_schema;
+ CREATE TABLE amcheck_schema.pg_database (junk text);
+ CREATE TABLE amcheck_schema.pg_namespace (junk text);
+ CREATE TABLE amcheck_schema.pg_class (junk text);
+ CREATE TABLE amcheck_schema.pg_operator (junk text);
+ CREATE TABLE amcheck_schema.pg_proc (junk text);
+ CREATE TABLE amcheck_schema.pg_tablespace (junk text);
+
+ CREATE FUNCTION public.bt_index_check(index regclass,
+ heapallindexed boolean default false)
+ RETURNS VOID AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong bt_index_check!';
+ END;
+ $$ LANGUAGE plpgsql;
+
+ CREATE FUNCTION public.bt_index_parent_check(index regclass,
+ heapallindexed boolean default false,
+ rootdescend boolean default false)
+ RETURNS VOID AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong bt_index_parent_check!';
+ END;
+ $$ LANGUAGE plpgsql;
+
+ CREATE FUNCTION public.verify_heapam(relation regclass,
+ on_error_stop boolean default false,
+ check_toast boolean default false,
+ skip text default 'none',
+ startblock bigint default null,
+ endblock bigint default null,
+ blkno OUT bigint,
+ offnum OUT integer,
+ attnum OUT integer,
+ msg OUT text)
+ RETURNS SETOF record AS $$
+ BEGIN
+ RAISE EXCEPTION 'Invoked wrong verify_heapam!';
+ END;
+ $$ LANGUAGE plpgsql;
+ ));
+
+ # Create schemas, tables and indexes in five separate
+ # schemas. The schemas are all identical to start, but
+ # we will corrupt them differently later.
+ #
+ for my $schema (qw(s1 s2 s3 s4 s5))
+ {
+ $node->safe_psql($dbname, qq(
+ CREATE SCHEMA $schema;
+ CREATE SEQUENCE $schema.seq1;
+ CREATE SEQUENCE $schema.seq2;
+ CREATE TABLE $schema.t1 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE TABLE $schema.t2 (
+ i INTEGER,
+ b BOX,
+ ia int4[],
+ ir int4range,
+ t TEXT
+ );
+ CREATE VIEW $schema.t2_view AS (
+ SELECT i*2, t FROM $schema.t2
+ );
+ ALTER TABLE $schema.t2
+ ALTER COLUMN t
+ SET STORAGE EXTERNAL;
+
+ INSERT INTO $schema.t1 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ INSERT INTO $schema.t2 (i, b, ia, ir, t)
+ (SELECT gs::INTEGER AS i,
+ box(point(gs,gs+5),point(gs*2,gs*3)) AS b,
+ array[gs, gs + 1]::int4[] AS ia,
+ int4range(gs, gs+100) AS ir,
+ repeat('foo', gs) AS t
+ FROM generate_series(1,10000,3000) AS gs);
+
+ CREATE MATERIALIZED VIEW $schema.t1_mv AS SELECT * FROM $schema.t1;
+ CREATE MATERIALIZED VIEW $schema.t2_mv AS SELECT * FROM $schema.t2;
+
+ create table $schema.p1 (a int, b int) PARTITION BY list (a);
+ create table $schema.p2 (a int, b int) PARTITION BY list (a);
+
+ create table $schema.p1_1 partition of $schema.p1 for values in (1, 2, 3);
+ create table $schema.p1_2 partition of $schema.p1 for values in (4, 5, 6);
+ create table $schema.p2_1 partition of $schema.p2 for values in (1, 2, 3);
+ create table $schema.p2_2 partition of $schema.p2 for values in (4, 5, 6);
+
+ CREATE INDEX t1_btree ON $schema.t1 USING BTREE (i);
+ CREATE INDEX t2_btree ON $schema.t2 USING BTREE (i);
+
+ CREATE INDEX t1_hash ON $schema.t1 USING HASH (i);
+ CREATE INDEX t2_hash ON $schema.t2 USING HASH (i);
+
+ CREATE INDEX t1_brin ON $schema.t1 USING BRIN (i);
+ CREATE INDEX t2_brin ON $schema.t2 USING BRIN (i);
+
+ CREATE INDEX t1_gist ON $schema.t1 USING GIST (b);
+ CREATE INDEX t2_gist ON $schema.t2 USING GIST (b);
+
+ CREATE INDEX t1_gin ON $schema.t1 USING GIN (ia);
+ CREATE INDEX t2_gin ON $schema.t2 USING GIN (ia);
+
+ CREATE INDEX t1_spgist ON $schema.t1 USING SPGIST (ir);
+ CREATE INDEX t2_spgist ON $schema.t2 USING SPGIST (ir);
+ ));
+ }
+}
+
+# Database 'db1' corruptions
+#
+
+# Corrupt indexes in schema "s1"
+remove_relation_file('db1', 's1.t1_btree');
+corrupt_first_page('db1', 's1.t2_btree');
+
+# Corrupt tables in schema "s2"
+remove_relation_file('db1', 's2.t1');
+corrupt_first_page('db1', 's2.t2');
+
+# Corrupt tables, partitions, matviews, and btrees in schema "s3"
+remove_relation_file('db1', 's3.t1');
+corrupt_first_page('db1', 's3.t2');
+
+remove_relation_file('db1', 's3.t1_mv');
+remove_relation_file('db1', 's3.p1_1');
+
+corrupt_first_page('db1', 's3.t2_mv');
+corrupt_first_page('db1', 's3.p2_1');
+
+remove_relation_file('db1', 's3.t1_btree');
+corrupt_first_page('db1', 's3.t2_btree');
+
+# Corrupt toast table, partitions, and materialized views in schema "s4"
+remove_toast_file('db1', 's4.t2');
+
+# Corrupt all other object types in schema "s5". We don't have amcheck support
+# for these types, but we check that their corruption does not trigger any
+# errors in pg_amcheck
+remove_relation_file('db1', 's5.seq1');
+remove_relation_file('db1', 's5.t1_hash');
+remove_relation_file('db1', 's5.t1_gist');
+remove_relation_file('db1', 's5.t1_gin');
+remove_relation_file('db1', 's5.t1_brin');
+remove_relation_file('db1', 's5.t1_spgist');
+
+corrupt_first_page('db1', 's5.seq2');
+corrupt_first_page('db1', 's5.t2_hash');
+corrupt_first_page('db1', 's5.t2_gist');
+corrupt_first_page('db1', 's5.t2_gin');
+corrupt_first_page('db1', 's5.t2_brin');
+corrupt_first_page('db1', 's5.t2_spgist');
+
+
+# Database 'db2' corruptions
+#
+remove_relation_file('db2', 's1.t1');
+remove_relation_file('db2', 's1.t1_btree');
+
+
+# Leave 'db3' uncorrupted
+#
+
+
+# Standard first arguments to TestLib functions
+my @cmd = ('pg_amcheck', '--quiet', '-p', $port);
+
+# The pg_amcheck command itself should return exit status = 2, because tables
+# and indexes are corrupt. Exit status = 1 would mean the pg_amcheck command
+# itself failed, for example because a connection to the database could not be
+# established.
+#
+# For these checks, we're ignoring any corruption reported and focusing
+# exclusively on the exit code from pg_amcheck.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1' ],
+ 2, [], [],
+ 'pg_amcheck all schemas, tables and indexes in database db1');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', 'db2', 'db3' ],
+ 2, [], [],
+ 'pg_amcheck all schemas, tables and indexes in databases db1, db2 and db3');
+
+$node->command_checks_all(
+ [ @cmd, '--all' ],
+ 2, [], [],
+ 'pg_amcheck all schemas, tables and indexes in all databases');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's1' ],
+ 2, [], [],
+ 'pg_amcheck all objects in schema s1');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-r', 's*.t1' ],
+ 2, [], [],
+ 'pg_amcheck all tables named t1 and their indexes');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-i', 's*.t*', '-i', 's*.*btree*' ],
+ 2, [], [],
+ 'pg_amcheck all indexes with qualified names matching /s*.t*/ or /s*.*btree*/');
+
+$node->command_checks_all(
+ [ @cmd, '--no-toast-expansion', '--no-index-expansion', 'db1', '-r', 's*.t1' ],
+ 2, [], [],
+ 'pg_amcheck all relations with qualified names matching /s*.t1/');
+
+$node->command_checks_all(
+ [ @cmd, '--no-toast-expansion', '--no-index-expansion', 'db1', '-t', 's*.t1' ],
+ 2, [], [],
+ 'pg_amcheck all tables with qualified names matching /s*.t1/');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-T', 't1' ],
+ 2, [], [],
+ 'pg_amcheck everything except tables named t1');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-S', 's1', '-R', 't1' ],
+ 2, [], [],
+ 'pg_amcheck everything not named t1 nor in schema s1');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', '*.*.*' ],
+ 2, [], [],
+ 'pg_amcheck all tables across all databases and schemas');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', '*.*.t1' ],
+ 2, [], [],
+ 'pg_amcheck all tables named t1 across all databases and schemas');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', '*.s1.*' ],
+ 2, [], [],
+ 'pg_amcheck all tables across all databases in schemas named s1');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', 'db2.*.*' ],
+ 2, [], [],
+ 'pg_amcheck all tables across all schemas in database db2');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', 'db2.*.*', '-t', 'db3.*.*' ],
+ 2, [], [],
+ 'pg_amcheck all tables across all schemas in databases db2 and db3');
+
+# Scans of indexes in s1 should detect the specific corruption that we created
+# above. For missing relation forks, we know what the error message looks
+# like. For corrupted index pages, the error might vary depending on how the
+# page was formatted on disk, including variations due to alignment differences
+# between platforms, so we accept any non-empty error message.
+#
+
+$node->command_checks_all(
+ [ @cmd, '--all', '-s', 's1', '-i', 't1_btree' ],
+ 2,
+ [ qr/index "t1_btree" lacks a main relation fork/ ],
+ [ qr/pg_amcheck: skipping database "postgres": amcheck is not installed/ ],
+ 'pg_amcheck index s1.t1_btree reports missing main relation fork');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't2_btree' ],
+ 2,
+ [ qr/.+/ ], # Any non-empty error message is acceptable
+ [ qr/^$/ ],
+ 'pg_amcheck index s1.s2 reports index corruption');
+
+# Checking db1.s1 should show no corruptions if indexes are excluded
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', 's1.*', '--no-index-expansion' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck of db1.s1 excluding indexes');
+
+# Checking db2.s1 should show table corruptions if indexes are excluded
+$node->command_checks_all(
+ [ @cmd, 'db2', '-t', 's1.*', '--no-index-expansion' ],
+ 2,
+ [ qr/could not open file/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck of db2.s1 excluding indexes');
+
+# Checking db2.s1 should show table corruptions if indexes are excluded
+$node->command_checks_all(
+ [ @cmd, 'db3', 'db2', '-t', 's1.*', '--no-index-expansion' ],
+ 2,
+ [ qr/could not open file/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck of db1.s1, db2.s1, and db3.s1, excluding indexes');
+
+# In schema s3, the tables and indexes are both corrupt. We should see
+# corruption messages on stdout, nothing on stderr, and an exit
+# status of zero.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's3' ],
+ 2,
+ [ qr/index "t1_btree" lacks a main relation fork/,
+ qr/could not open file/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck schema s3 reports table and index errors');
+
+# In schema s2, only tables are corrupt. Check that table corruption is
+# reported as expected.
+#
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't1' ],
+ 2,
+ [ qr/could not open file/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck in schema s2 reports table corruption');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's2', '-t', 't2' ],
+ 2,
+ [ qr/.+/ ], # Any non-empty error message is acceptable
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck in schema s2 reports table corruption');
+
+# In schema s4, only toast tables are corrupt. Check that under default
+# options the toast corruption is reported, but when excluding toast we get no
+# error reports.
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's4' ],
+ 2,
+ [ qr/could not open file/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck in schema s4 reports toast corruption');
+
+$node->command_checks_all(
+ [ @cmd, '--no-toast-expansion', '--exclude-toast-pointers', 'db1', '-s', 's4' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck in schema s4 excluding toast reports no corruption');
+
+# Check that no corruption is reported in schema s5
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's5' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck over schema s5 reports no corruption');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's1', '-I', 't1_btree', '-I', 't2_btree' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck over schema s1 with corrupt indexes excluded reports no corruption');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-t', 's1.*', '--no-index-expansion' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck over schema s1 with all indexes excluded reports no corruption');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's2', '-T', 't1', '-T', 't2' ],
+ 0,
+ [ qr/^$/ ], # Empty
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck over schema s2 with corrupt tables excluded reports no corruption');
+
+# Check errors about bad block range command line arguments. We use schema s5
+# to avoid getting messages about corrupt tables or indexes.
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', 'junk' ],
+ qr/relation starting block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage startblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--endblock', '1234junk' ],
+ qr/relation ending block argument contains garbage characters/,
+ 'pg_amcheck rejects garbage endblock');
+
+command_fails_like(
+ [ @cmd, 'db1', '-s', 's5', '--startblock', '5', '--endblock', '4' ],
+ qr/relation ending block argument precedes starting block argument/,
+ 'pg_amcheck rejects invalid block range');
+
+# Check bt_index_parent_check alternates. We don't create any index corruption
+# that would behave differently under these modes, so just smoke test that the
+# arguments are handled sensibly.
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--parent-check' ],
+ 2,
+ [ qr/index "t1_btree" lacks a main relation fork/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck smoke test --parent-check');
+
+$node->command_checks_all(
+ [ @cmd, 'db1', '-s', 's1', '-i', 't1_btree', '--heapallindexed', '--rootdescend' ],
+ 2,
+ [ qr/index "t1_btree" lacks a main relation fork/ ],
+ [ qr/^$/ ], # Empty
+ 'pg_amcheck smoke test --heapallindexed --rootdescend');
diff --git a/contrib/pg_amcheck/t/004_verify_heapam.pl b/contrib/pg_amcheck/t/004_verify_heapam.pl
new file mode 100644
index 0000000000..d5537a5b37
--- /dev/null
+++ b/contrib/pg_amcheck/t/004_verify_heapam.pl
@@ -0,0 +1,487 @@
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+
+use Test::More tests => 20;
+
+# This regression test demonstrates that the pg_amcheck binary supplied with
+# the pg_amcheck contrib module correctly identifies specific kinds of
+# corruption within pages. To test this, we need a mechanism to create corrupt
+# pages with predictable, repeatable corruption. The postgres backend cannot
+# be expected to help us with this, as its design is not consistent with the
+# goal of intentionally corrupting pages.
+#
+# Instead, we create a table to corrupt, and with careful consideration of how
+# postgresql lays out heap pages, we seek to offsets within the page and
+# overwrite deliberately chosen bytes with specific values calculated to
+# corrupt the page in expected ways. We then verify that pg_amcheck reports
+# the corruption, and that it runs without crashing. Note that the backend
+# cannot simply be started to run queries against the corrupt table, as the
+# backend will crash, at least for some of the corruption types we generate.
+#
+# Autovacuum potentially touching the table in the background makes the exact
+# behavior of this test harder to reason about. We turn it off to keep things
+# simpler. We use a "belt and suspenders" approach, turning it off for the
+# system generally in postgresql.conf, and turning it off specifically for the
+# test table.
+#
+# This test depends on the table being written to the heap file exactly as we
+# expect it to be, so we take care to arrange the columns of the table, and
+# insert rows of the table, that give predictable sizes and locations within
+# the table page.
+#
+# The HeapTupleHeaderData has 23 bytes of fixed size fields before the variable
+# length t_bits[] array. We have exactly 3 columns in the table, so natts = 3,
+# t_bits is 1 byte long, and t_hoff = MAXALIGN(23 + 1) = 24.
+#
+# We're not too fussy about which datatypes we use for the test, but we do care
+# about some specific properties. We'd like to test both fixed size and
+# varlena types. We'd like some varlena data inline and some toasted. And
+# we'd like the layout of the table such that the datums land at predictable
+# offsets within the tuple. We choose a structure without padding on all
+# supported architectures:
+#
+# a BIGINT
+# b TEXT
+# c TEXT
+#
+# We always insert a 7-ascii character string into field 'b', which with a
+# 1-byte varlena header gives an 8 byte inline value. We always insert a long
+# text string in field 'c', long enough to force toast storage.
+#
+# We choose to read and write binary copies of our table's tuples, using perl's
+# pack() and unpack() functions. Perl uses a packing code system in which:
+#
+# L = "Unsigned 32-bit Long",
+# S = "Unsigned 16-bit Short",
+# C = "Unsigned 8-bit Octet",
+# c = "signed 8-bit octet",
+# q = "signed 64-bit quadword"
+#
+# Each tuple in our table has a layout as follows:
+#
+# xx xx xx xx t_xmin: xxxx offset = 0 L
+# xx xx xx xx t_xmax: xxxx offset = 4 L
+# xx xx xx xx t_field3: xxxx offset = 8 L
+# xx xx bi_hi: xx offset = 12 S
+# xx xx bi_lo: xx offset = 14 S
+# xx xx ip_posid: xx offset = 16 S
+# xx xx t_infomask2: xx offset = 18 S
+# xx xx t_infomask: xx offset = 20 S
+# xx t_hoff: x offset = 22 C
+# xx t_bits: x offset = 23 C
+# xx xx xx xx xx xx xx xx 'a': xxxxxxxx offset = 24 q
+# xx xx xx xx xx xx xx xx 'b': xxxxxxxx offset = 32 Cccccccc
+# xx xx xx xx xx xx xx xx 'c': xxxxxxxx offset = 40 SSSS
+# xx xx xx xx xx xx xx xx : xxxxxxxx ...continued SSSS
+# xx xx : xx ...continued S
+#
+# We could choose to read and write columns 'b' and 'c' in other ways, but
+# it is convenient enough to do it this way. We define packing code
+# constants here, where they can be compared easily against the layout.
+
+use constant HEAPTUPLE_PACK_CODE => 'LLLSSSSSCCqCcccccccSSSSSSSSS';
+use constant HEAPTUPLE_PACK_LENGTH => 58; # Total size
+
+# Read a tuple of our table from a heap page.
+#
+# Takes an open filehandle to the heap file, and the offset of the tuple.
+#
+# Rather than returning the binary data from the file, unpacks the data into a
+# perl hash with named fields. These fields exactly match the ones understood
+# by write_tuple(), below. Returns a reference to this hash.
+#
+sub read_tuple ($$)
+{
+ my ($fh, $offset) = @_;
+ my ($buffer, %tup);
+ seek($fh, $offset, 0);
+ sysread($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+
+ @_ = unpack(HEAPTUPLE_PACK_CODE, $buffer);
+ %tup = (t_xmin => shift,
+ t_xmax => shift,
+ t_field3 => shift,
+ bi_hi => shift,
+ bi_lo => shift,
+ ip_posid => shift,
+ t_infomask2 => shift,
+ t_infomask => shift,
+ t_hoff => shift,
+ t_bits => shift,
+ a => shift,
+ b_header => shift,
+ b_body1 => shift,
+ b_body2 => shift,
+ b_body3 => shift,
+ b_body4 => shift,
+ b_body5 => shift,
+ b_body6 => shift,
+ b_body7 => shift,
+ c1 => shift,
+ c2 => shift,
+ c3 => shift,
+ c4 => shift,
+ c5 => shift,
+ c6 => shift,
+ c7 => shift,
+ c8 => shift,
+ c9 => shift);
+ # Stitch together the text for column 'b'
+ $tup{b} = join('', map { chr($tup{"b_body$_"}) } (1..7));
+ return \%tup;
+}
+
+# Write a tuple of our table to a heap page.
+#
+# Takes an open filehandle to the heap file, the offset of the tuple, and a
+# reference to a hash with the tuple values, as returned by read_tuple().
+# Writes the tuple fields from the hash into the heap file.
+#
+# The purpose of this function is to write a tuple back to disk with some
+# subset of fields modified. The function does no error checking. Use
+# cautiously.
+#
+sub write_tuple($$$)
+{
+ my ($fh, $offset, $tup) = @_;
+ my $buffer = pack(HEAPTUPLE_PACK_CODE,
+ $tup->{t_xmin},
+ $tup->{t_xmax},
+ $tup->{t_field3},
+ $tup->{bi_hi},
+ $tup->{bi_lo},
+ $tup->{ip_posid},
+ $tup->{t_infomask2},
+ $tup->{t_infomask},
+ $tup->{t_hoff},
+ $tup->{t_bits},
+ $tup->{a},
+ $tup->{b_header},
+ $tup->{b_body1},
+ $tup->{b_body2},
+ $tup->{b_body3},
+ $tup->{b_body4},
+ $tup->{b_body5},
+ $tup->{b_body6},
+ $tup->{b_body7},
+ $tup->{c1},
+ $tup->{c2},
+ $tup->{c3},
+ $tup->{c4},
+ $tup->{c5},
+ $tup->{c6},
+ $tup->{c7},
+ $tup->{c8},
+ $tup->{c9});
+ seek($fh, $offset, 0);
+ syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH);
+ return;
+}
+
+# Set umask so test directories and files are created with default permissions
+umask(0077);
+
+# Set up the node. Once we create and corrupt the table,
+# autovacuum workers visiting the table could crash the backend.
+# Disable autovacuum so that won't happen.
+my $node = get_new_node('test');
+$node->init;
+$node->append_conf('postgresql.conf', 'autovacuum=off');
+
+# Start the node and load the extensions. We depend on both
+# amcheck and pageinspect for this test.
+$node->start;
+my $port = $node->port;
+my $pgdata = $node->data_dir;
+$node->safe_psql('postgres', "CREATE EXTENSION amcheck");
+$node->safe_psql('postgres', "CREATE EXTENSION pageinspect");
+
+# Get a non-zero datfrozenxid
+$node->safe_psql('postgres', qq(VACUUM FREEZE));
+
+# Create the test table with precisely the schema that our corruption function
+# expects.
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.test (a BIGINT, b TEXT, c TEXT);
+ ALTER TABLE public.test SET (autovacuum_enabled=false);
+ ALTER TABLE public.test ALTER COLUMN c SET STORAGE EXTERNAL;
+ CREATE INDEX test_idx ON public.test(a, b);
+ ));
+
+# We want (0 < datfrozenxid < test.relfrozenxid). To achieve this, we freeze
+# an otherwise unused table, public.junk, prior to inserting data and freezing
+# public.test
+$node->safe_psql(
+ 'postgres', qq(
+ CREATE TABLE public.junk AS SELECT 'junk'::TEXT AS junk_column;
+ ALTER TABLE public.junk SET (autovacuum_enabled=false);
+ VACUUM FREEZE public.junk
+ ));
+
+my $rel = $node->safe_psql('postgres', qq(SELECT pg_relation_filepath('public.test')));
+my $relpath = "$pgdata/$rel";
+
+# Insert data and freeze public.test
+use constant ROWCOUNT => 16;
+$node->safe_psql('postgres', qq(
+ INSERT INTO public.test (a, b, c)
+ VALUES (
+ 12345678,
+ 'abcdefg',
+ repeat('w', 10000)
+ );
+ VACUUM FREEZE public.test
+ )) for (1..ROWCOUNT);
+
+my $relfrozenxid = $node->safe_psql('postgres',
+ q(select relfrozenxid from pg_class where relname = 'test'));
+my $datfrozenxid = $node->safe_psql('postgres',
+ q(select datfrozenxid from pg_database where datname = 'postgres'));
+
+# Find where each of the tuples is located on the page.
+my @lp_off;
+for my $tup (0..ROWCOUNT-1)
+{
+ push (@lp_off, $node->safe_psql('postgres', qq(
+select lp_off from heap_page_items(get_raw_page('test', 'main', 0))
+ offset $tup limit 1)));
+}
+
+# Check that pg_amcheck runs against the uncorrupted table without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table, prior to corruption');
+
+# Check that pg_amcheck runs against the uncorrupted table and index without error.
+$node->command_ok(['pg_amcheck', '-p', $port, 'postgres'],
+ 'pg_amcheck test table and index, prior to corruption');
+
+$node->stop;
+
+# Sanity check that our 'test' table has a relfrozenxid newer than the
+# datfrozenxid for the database, and that the datfrozenxid is greater than the
+# first normal xid. We rely on these invariants in some of our tests.
+if ($datfrozenxid <= 3 || $datfrozenxid >= $relfrozenxid)
+{
+ fail('Xid thresholds not as expected');
+ $node->clean_node;
+ exit;
+}
+
+# Some #define constants from access/htup_details.h for use while corrupting.
+use constant HEAP_HASNULL => 0x0001;
+use constant HEAP_XMAX_LOCK_ONLY => 0x0080;
+use constant HEAP_XMIN_COMMITTED => 0x0100;
+use constant HEAP_XMIN_INVALID => 0x0200;
+use constant HEAP_XMAX_COMMITTED => 0x0400;
+use constant HEAP_XMAX_INVALID => 0x0800;
+use constant HEAP_NATTS_MASK => 0x07FF;
+use constant HEAP_XMAX_IS_MULTI => 0x1000;
+use constant HEAP_KEYS_UPDATED => 0x2000;
+
+# Helper function to generate a regular expression matching the header we
+# expect verify_heapam() to return given which fields we expect to be non-null.
+sub header
+{
+ my ($blkno, $offnum, $attnum) = @_;
+ return qr/relation postgres\.public\.test, block $blkno, offset $offnum, attribute $attnum\s+/ms
+ if (defined $attnum);
+ return qr/relation postgres\.public\.test, block $blkno, offset $offnum\s+/ms
+ if (defined $offnum);
+ return qr/relation postgres\.public\.test\s+/ms
+ if (defined $blkno);
+ return qr/relation postgres\.public\.test\s+/ms;
+}
+
+# Corrupt the tuples, one type of corruption per tuple. Some types of
+# corruption cause verify_heapam to skip to the next tuple without
+# performing any remaining checks, so we can't exercise the system properly if
+# we focus all our corruption on a single tuple.
+#
+my @expected;
+my $file;
+open($file, '+<', $relpath);
+binmode $file;
+
+for (my $tupidx = 0; $tupidx < ROWCOUNT; $tupidx++)
+{
+ my $offnum = $tupidx + 1; # offnum is 1-based, not zero-based
+ my $offset = $lp_off[$tupidx];
+ my $tup = read_tuple($file, $offset);
+
+ # Sanity-check that the data appears on the page where we expect.
+ if ($tup->{a} ne '12345678' || $tup->{b} ne 'abcdefg')
+ {
+ fail('Page layout differs from our expectations');
+ $node->clean_node;
+ exit;
+ }
+
+ my $header = header(0, $offnum, undef);
+ if ($offnum == 1)
+ {
+ # Corruptly set xmin < relfrozenxid
+ my $xmin = $relfrozenxid - 1;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ # Expected corruption report
+ push @expected,
+ qr/${header}xmin $xmin precedes relation freeze threshold 0:\d+/;
+ }
+ if ($offnum == 2)
+ {
+ # Corruptly set xmin < datfrozenxid
+ my $xmin = 3;
+ $tup->{t_xmin} = $xmin;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin $xmin precedes oldest valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 3)
+ {
+ # Corruptly set xmin < datfrozenxid, further back, noting circularity
+ # of xid comparison. For a new cluster with epoch = 0, the corrupt
+ # xmin will be interpreted as in the future
+ $tup->{t_xmin} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMIN_COMMITTED;
+ $tup->{t_infomask} &= ~HEAP_XMIN_INVALID;
+
+ push @expected,
+ qr/${$header}xmin 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 4)
+ {
+ # Corruptly set xmax < relminmxid;
+ $tup->{t_xmax} = 4026531839;
+ $tup->{t_infomask} &= ~HEAP_XMAX_INVALID;
+
+ push @expected,
+ qr/${$header}xmax 4026531839 equals or exceeds next valid transaction ID 0:\d+/;
+ }
+ elsif ($offnum == 5)
+ {
+ # Corrupt the tuple t_hoff, but keep it aligned properly
+ $tup->{t_hoff} += 128;
+
+ push @expected,
+ qr/${$header}data begins at offset 152 beyond the tuple length 58/,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 152 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 6)
+ {
+ # Corrupt the tuple t_hoff, wrong alignment
+ $tup->{t_hoff} += 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 27 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 7)
+ {
+ # Corrupt the tuple t_hoff, underflow but correct alignment
+ $tup->{t_hoff} -= 8;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 16 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 8)
+ {
+ # Corrupt the tuple t_hoff, underflow and wrong alignment
+ $tup->{t_hoff} -= 3;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 24, but actually begins at byte 21 \(3 attributes, no nulls\)/;
+ }
+ elsif ($offnum == 9)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, not just 3
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+
+ push @expected,
+ qr/${$header}number of attributes 2047 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 10)
+ {
+ # Corrupt the tuple to look like it has lots of attributes, some of
+ # them null. This falsely creates the impression that the t_bits
+ # array is longer than just one byte, but t_hoff still says otherwise.
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= HEAP_NATTS_MASK;
+ $tup->{t_bits} = 0xAA;
+
+ push @expected,
+ qr/${$header}tuple data should begin at byte 280, but actually begins at byte 24 \(2047 attributes, has nulls\)/;
+ }
+ elsif ($offnum == 11)
+ {
+ # Same as above, but this time t_hoff plays along
+ $tup->{t_infomask} |= HEAP_HASNULL;
+ $tup->{t_infomask2} |= (HEAP_NATTS_MASK & 0x40);
+ $tup->{t_bits} = 0xAA;
+ $tup->{t_hoff} = 32;
+
+ push @expected,
+ qr/${$header}number of attributes 67 exceeds maximum expected for table 3/;
+ }
+ elsif ($offnum == 12)
+ {
+ # Corrupt the bits in column 'b' 1-byte varlena header
+ $tup->{b_header} = 0x80;
+
+ $header = header(0, $offnum, 1);
+ push @expected,
+ qr/${header}attribute 1 with length 4294967295 ends at offset 416848000 beyond total tuple length 58/;
+ }
+ elsif ($offnum == 13)
+ {
+ # Corrupt the bits in column 'c' toast pointer
+ $tup->{c6} = 41;
+ $tup->{c7} = 41;
+
+ $header = header(0, $offnum, 2);
+ push @expected,
+ qr/${header}final toast chunk number 0 differs from expected value 6/,
+ qr/${header}toasted value for attribute 2 missing from toast table/;
+ }
+ elsif ($offnum == 14)
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4;
+
+ push @expected,
+ qr/${header}multitransaction ID 4 equals or exceeds next valid multitransaction ID 1/;
+ }
+ elsif ($offnum == 15) # Last offnum must equal ROWCOUNT
+ {
+ # Set both HEAP_XMAX_COMMITTED and HEAP_XMAX_IS_MULTI
+ $tup->{t_infomask} |= HEAP_XMAX_COMMITTED;
+ $tup->{t_infomask} |= HEAP_XMAX_IS_MULTI;
+ $tup->{t_xmax} = 4000000000;
+
+ push @expected,
+ qr/${header}multitransaction ID 4000000000 precedes relation minimum multitransaction ID threshold 1/;
+ }
+ write_tuple($file, $offset, $tup);
+}
+close($file);
+$node->start;
+
+# Run pg_amcheck against the corrupt table with epoch=0, comparing actual
+# corruption messages against the expected messages
+$node->command_checks_all(
+ ['pg_amcheck', '--no-index-expansion', '-p', $port, 'postgres'],
+ 2,
+ [ @expected ],
+ [ ],
+ 'Expected corruption message output');
+
+$node->teardown_node;
+$node->clean_node;
diff --git a/contrib/pg_amcheck/t/005_opclass_damage.pl b/contrib/pg_amcheck/t/005_opclass_damage.pl
new file mode 100644
index 0000000000..eba8ea9cae
--- /dev/null
+++ b/contrib/pg_amcheck/t/005_opclass_damage.pl
@@ -0,0 +1,54 @@
+# This regression test checks the behavior of the btree validation in the
+# presence of breaking sort order changes.
+#
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 5;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create a custom operator class and an index which uses it.
+$node->safe_psql('postgres', q(
+ CREATE EXTENSION amcheck;
+
+ CREATE FUNCTION int4_asc_cmp (a int4, b int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN 1 ELSE -1 END; $$;
+
+ CREATE OPERATOR CLASS int4_fickle_ops FOR TYPE int4 USING btree AS
+ OPERATOR 1 < (int4, int4), OPERATOR 2 <= (int4, int4),
+ OPERATOR 3 = (int4, int4), OPERATOR 4 >= (int4, int4),
+ OPERATOR 5 > (int4, int4), FUNCTION 1 int4_asc_cmp(int4, int4);
+
+ CREATE TABLE int4tbl (i int4);
+ INSERT INTO int4tbl (SELECT * FROM generate_series(1,1000) gs);
+ CREATE INDEX fickleidx ON int4tbl USING btree (i int4_fickle_ops);
+));
+
+# We have not yet broken the index, so we should get no corruption
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $node->port, 'postgres' ],
+ qr/^$/,
+ 'pg_amcheck all schemas, tables and indexes reports no corruption');
+
+# Change the operator class to use a function which sorts in a different
+# order to corrupt the btree index
+$node->safe_psql('postgres', q(
+ CREATE FUNCTION int4_desc_cmp (int4, int4) RETURNS int LANGUAGE sql AS $$
+ SELECT CASE WHEN $1 = $2 THEN 0 WHEN $1 > $2 THEN -1 ELSE 1 END; $$;
+ UPDATE pg_catalog.pg_amproc
+ SET amproc = 'int4_desc_cmp'::regproc
+ WHERE amproc = 'int4_asc_cmp'::regproc
+));
+
+# Index corruption should now be reported
+$node->command_checks_all(
+ [ 'pg_amcheck', '-p', $node->port, 'postgres' ],
+ 2,
+ [ qr/item order invariant violated for index "fickleidx"/ ],
+ [ ],
+ 'pg_amcheck all schemas, tables and indexes reports fickleidx corruption'
+);
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index d3ca4b6932..7e101f7c11 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -185,6 +185,7 @@ pages.
</para>
&oid2name;
+ &pgamcheck;
&vacuumlo;
</sect1>
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index db1d369743..5115cb03d0 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -133,6 +133,7 @@
<!ENTITY oldsnapshot SYSTEM "oldsnapshot.sgml">
<!ENTITY pageinspect SYSTEM "pageinspect.sgml">
<!ENTITY passwordcheck SYSTEM "passwordcheck.sgml">
+<!ENTITY pgamcheck SYSTEM "pgamcheck.sgml">
<!ENTITY pgbuffercache SYSTEM "pgbuffercache.sgml">
<!ENTITY pgcrypto SYSTEM "pgcrypto.sgml">
<!ENTITY pgfreespacemap SYSTEM "pgfreespacemap.sgml">
diff --git a/doc/src/sgml/pgamcheck.sgml b/doc/src/sgml/pgamcheck.sgml
new file mode 100644
index 0000000000..76e5a0e511
--- /dev/null
+++ b/doc/src/sgml/pgamcheck.sgml
@@ -0,0 +1,670 @@
+<!-- doc/src/sgml/pgamcheck.sgml -->
+
+<refentry id="pgamcheck">
+ <indexterm zone="pgamcheck">
+ <primary>pg_amcheck</primary>
+ </indexterm>
+
+ <refmeta>
+ <refentrytitle><application>pg_amcheck</application></refentrytitle>
+ <manvolnum>1</manvolnum>
+ <refmiscinfo>Application</refmiscinfo>
+ </refmeta>
+
+ <refnamediv>
+ <refname>pg_amcheck</refname>
+ <refpurpose>checks for corruption in one or more
+ <productname>PostgreSQL</productname> databases</refpurpose>
+ </refnamediv>
+
+ <refsynopsisdiv>
+ <cmdsynopsis>
+ <command>pg_amcheck</command>
+ <arg rep="repeat"><replaceable>option</replaceable></arg>
+ <arg rep="repeat"><replaceable>dbname</replaceable></arg>
+ </cmdsynopsis>
+ </refsynopsisdiv>
+
+ <refsect1>
+ <title>Description</title>
+
+ <para>
+ <application>pg_amcheck</application> supports running
+ <xref linkend="amcheck"/>'s corruption checking functions against one or
+ more databases, with options to select which schemas, tables and indexes to
+ check, which kinds of checking to perform, and whether to perform the checks
+ in parallel, and if so, the number of parallel connections to establish and
+ use.
+ </para>
+
+ <para>
+ Only table relations and btree indexes are currently supported. Other
+ relation types are silently skipped.
+ </para>
+
+ </refsect1>
+
+ <refsect1>
+ <title>Options</title>
+
+ <para>
+ pg_amcheck accepts the following command-line arguments:
+
+ <variablelist>
+ <varlistentry>
+ <term><option>--all</option></term>
+ <listitem>
+ <para>
+ Perform checking in all databases.
+ </para>
+ <para>
+ In the absence of any other options, selects all objects across all
+ schemas and databases.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-dbname</option> takes
+ precedence over <option>--all</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-d</option></term>
+ <term><option>--dbname</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for checking. By default, all objects
+ in the matching database(s) will be checked.
+ </para>
+ <para>
+ If no <option>maintenance-db</option> argument is given nor is any
+ database name given as a command line argument, the first argument
+ specified with <option>-d</option> <option>--dbname</option> will be
+ used for the initial connection. If that argument is not a literal
+ database name, the attempt to connect will fail.
+ </para>
+ <para>
+ If <option>--all</option> is also specified, <option>-d</option>
+ <option>--dbname</option> does not affect which databases are checked,
+ but may be used to specify the database for the initial connection.
+ </para>
+ <para>
+ Option <option>-D</option> <option>--exclude-dbname</option> takes
+ precedence over <option>-d</option> <option>--dbname</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--dbname=africa</literal></member>
+ <member><literal>--dbname="a*"</literal></member>
+ <member><literal>--dbname="africa|asia|europe"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-D</option></term>
+ <term><option>--exclude-dbname</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified database.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ database (or database pattern) for exclusion.
+ </para>
+ <para>
+ If a database which is included using <option>--all</option> or
+ <option>-d</option> <option>--dbname</option> is also excluded using
+ <option>-D</option> <option>--exclude-dbname</option>, the database will
+ be excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--exclude-dbname=america</literal></member>
+ <member><literal>--exclude-dbname="*pacific*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-e</option></term>
+ <term><option>--echo</option></term>
+ <listitem>
+ <para>
+ Print to stdout all commands and queries being executed against the
+ server.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--endblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) all pages after the given ending block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables, but note that
+ unless <option>--exclude-toast-pointers</option> is given, toast
+ pointers found in the main table will be followed into the toast table
+ without regard for the location in the toast table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--exclude-toast-pointers</option></term>
+ <listitem>
+ <para>
+ When checking main relations, do not look up entries in toast tables
+ corresponding to toast pointers in the main relation.
+ </para>
+ <para>
+ The default behavior checks each toast pointer encountered in the main
+ table to verify, as much as possible, that the pointer points at
+ something in the toast table that is reasonable. Toast pointers which
+ point beyond the end of the toast table, or to the middle (rather than
+ the beginning) of a toast entry, are identified as corrupt.
+ </para>
+ <para>
+ The process by which <xref linkend="amcheck"/>'s
+ <function>verify_heapam</function> function checks each toast pointer is
+ slow and may be improved in a future release. Some users may wish to
+ disable this check to save time.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-H</option></term>
+ <term><option>--heapallindexed</option></term>
+ <listitem>
+ <para>
+ For each index checked, verify the presence of all heap tuples as index
+ tuples in the index using <application>amcheck</application>'s
+ <option>heapallindexed</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-?</option></term>
+ <term><option>--help</option></term>
+ <listitem>
+ <para>
+ Show help about <application>pg_amcheck</application> command line
+ arguments, and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-h</option></term>
+ <term><option>--host=HOSTNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the host name of the machine on which the server is running.
+ If the value begins with a slash, it is used as the directory for the
+ Unix domain socket.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-i</option></term>
+ <term><option>--index</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified index(es). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-I</option></term>
+ <term><option>--exclude-index</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified index(es). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to indexes, not tables.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-j</option></term>
+ <term><option>--jobs=NUM</option></term>
+ <listitem>
+ <para>
+ Use the specified number of concurrent connections to the server, or
+ one per object to be checked, whichever number is smaller.
+ </para>
+ <para>
+ The default is to use a single connection.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--maintenance-db=DBNAME</option></term>
+ <listitem>
+ <para>
+ Specifies the name of the database to connect to when querying the
+ list of all databases. If not specified, the
+ <literal>postgres</literal> database will be used; if that does not
+ exist <literal>template1</literal> will be used. This can be a
+ <link linkend="libpq-connstring">connection string</link>. If so,
+ connection string parameters will override any conflicting command
+ line options.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-index-expansion</option></term>
+ <listitem>
+ <para>
+ When including a table relation in the list of relations to check, do
+ not automatically include btree indexes associated with table.
+ </para>
+ <para>
+ By default, all tables to be checked will also have checks performed on
+ their associated btree indexes, if any. If this option is given, only
+ those indexes which match a <option>--relation</option> or
+ <option>--index</option> pattern will be checked.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-strict-names</option></term>
+ <listitem>
+ <para>
+ When calculating the list of databases to check, and the objects within
+ those databases to be checked, do not raise an error for database,
+ schema, relation, table, nor index inclusion patterns which match no
+ corresponding objects.
+ </para>
+ <para>
+ Exclusion patterns are not required to match any objects, but by
+ default unmatched inclusion patterns raise an error, including when
+ they fail to match as a result of an exclusion pattern having
+ prohibited them matching an existent object, and when they fail to
+ match a database because it is unconnectable (datallowconn is false).
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--no-toast-expansion</option></term>
+ <listitem>
+ <para>
+ When including a table relation in the list of relations to check, do
+ not automatically include toast tables associated with table.
+ </para>
+ <para>
+ By default, all tables to be checked will also have checks performed on
+ their associated toast tables, if any. If this option is given, only
+ those toast tables which match a <option>--relation</option> or
+ <option>--table</option> pattern will be checked.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--on-error-stop</option></term>
+ <listitem>
+ <para>
+ After reporting all corruptions on the first page of a table where
+ corruptions are found, stop processing that table relation and move on
+ to the next table or index.
+ </para>
+ <para>
+ Note that index checking always stops after the first corrupt page.
+ This option only has meaning relative to table relations.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-P</option></term>
+ <term><option>--parent-check</option></term>
+ <listitem>
+ <para>
+ For each btree index checked, use <xref linkend="amcheck"/>'s
+ <function>bt_index_parent_check</function> function, which performs
+ additional checks of parent/child relationships during index checking.
+ </para>
+ <para>
+ The default is to use <application>amcheck</application>'s
+ <function>bt_index_check</function> function, but note that use of the
+ <option>--rootdescend</option> option implicitly selects
+ <function>bt_index_parent_check</function>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-p</option></term>
+ <term><option>--port=PORT</option></term>
+ <listitem>
+ <para>
+ Specifies the TCP port or local Unix domain socket file extension on
+ which the server is listening for connections.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--progress</option></term>
+ <listitem>
+ <para>
+ Show progress information about how many relations have been checked.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-q</option></term>
+ <term><option>--quiet</option></term>
+ <listitem>
+ <para>
+ Do not write additional messages beyond those about corruption.
+ </para>
+ <para>
+ This option does not quiet any output specifically due to the use of
+ the <option>-e</option> <option>--echo</option> option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-r</option></term>
+ <term><option>--relation</option></term>
+ <listitem>
+ <para>
+ Perform checking on the specified relation(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ relation (or relation pattern) for checking.
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--relation=accounts_table</literal></member>
+ <member><literal>--relation=accounting_department.accounts_table</literal></member>
+ <member><literal>--relation=corporate_database.accounting_department.*_table</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-R</option></term>
+ <term><option>--exclude-relation</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified relation(s).
+ </para>
+ <para>
+ Option <option>-R</option> <option>--exclude-relation</option> takes
+ precedence over <option>-r</option> <option>--relation</option>,
+ <option>-t</option> <option>--table</option> and <option>-i</option>
+ <option>--index</option>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--rootdescend</option></term>
+ <listitem>
+ <para>
+ For each index checked, re-find tuples on the leaf level by performing a
+ new search from the root page for each tuple using
+ <xref linkend="amcheck"/>'s <option>rootdescend</option> option.
+ </para>
+ <para>
+ Use of this option implicitly also selects the <option>-P</option>
+ <option>--parent-check</option> option.
+ </para>
+ <para>
+ This form of verification was originally written to help in the
+ development of btree index features. It may be of limited use or even
+ of no use in helping detect the kinds of corruption that occur in
+ practice. It may also cause corruption checking to take considerably
+ longer and consume considerably more resources on the server.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-s</option></term>
+ <term><option>--schema</option></term>
+ <listitem>
+ <para>
+ Perform checking in the specified schema(s).
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for checking. By default, all objects in
+ the matching schema(s) will be checked.
+ </para>
+ <para>
+ Option <option>-S</option> <option>--exclude-schema</option> takes
+ precedence over <option>-s</option> <option>--schema</option>.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>--schema=corp</literal></member>
+ <member><literal>--schema="corp|llc|npo"</literal></member>
+ </simplelist>
+ </para>
+ <para>
+ Note that both tables and indexes are included using this option, which
+ might not be what you want if you are also using
+ <option>--no-index-expansion</option>. To specify all tables in a schema
+ without also specifying all indexes, <option>--table</option> can be
+ used with a pattern that specifies the schema. For example, to check
+ all tables in schema <literal>corp</literal>, the option
+ <literal>--table="corp.*"</literal> may be used.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-S</option></term>
+ <term><option>--exclude-schema</option></term>
+ <listitem>
+ <para>
+ Do not perform checking in the specified schema.
+ </para>
+ <para>
+ This option may be specified multiple times to list more than one
+ schema (or schema pattern) for exclusion.
+ </para>
+ <para>
+ If a schema which is included using
+ <option>-s</option> <option>--schema</option> is also excluded using
+ <option>-S</option> <option>--exclude-schema</option>, the schema will
+ be excluded.
+ </para>
+ <para>
+ Examples:
+ <simplelist>
+ <member><literal>-S corp -S llc</literal></member>
+ <member><literal>--exclude-schema="*c*"</literal></member>
+ </simplelist>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--skip=OPTION</option></term>
+ <listitem>
+ <para>
+ If <literal>"all-frozen"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all frozen.
+ </para>
+ <para>
+ If <literal>"all-visible"</literal> is given, table corruption checks
+ will skip over pages in all tables that are marked as all visible.
+ </para>
+ <para>
+ By default, no pages are skipped. This can be specified as
+ <literal>"none"</literal>, but since this is the default, it need not be
+ mentioned.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>--startblock=BLOCK</option></term>
+ <listitem>
+ <para>
+ Skip (do not check) pages prior to the given starting block.
+ </para>
+ <para>
+ By default, no pages are skipped. This option will be applied to all
+ table relations that are checked, including toast tables, but note
+ that unless <option>--exclude-toast-pointers</option> is given, toast
+ pointers found in the main table will be followed into the toast table
+ without regard for the location in the toast table.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-t</option></term>
+ <term><option>--table</option></term>
+ <listitem>
+ <para>
+ Perform checks on the specified tables(s). This is an alias for the
+ <option>-r</option> <option>--relation</option> option, except that it
+ applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-T</option></term>
+ <term><option>--exclude-table</option></term>
+ <listitem>
+ <para>
+ Exclude checks on the specified tables(s). This is an alias for the
+ <option>-R</option> <option>--exclude-relation</option> option, except
+ that it applies only to tables, not indexes.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-U</option></term>
+ <term><option>--username=USERNAME</option></term>
+ <listitem>
+ <para>
+ User name to connect as.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-v</option></term>
+ <term><option>--verbose</option></term>
+ <listitem>
+ <para>
+ Increases the log level verbosity. This option may be given more than
+ once.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-V</option></term>
+ <term><option>--version</option></term>
+ <listitem>
+ <para>
+ Print the <application>pg_amcheck</application> version and exit.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-w</option></term>
+ <term><option>--no-password</option></term>
+ <listitem>
+ <para>
+ Never issue a password prompt. If the server requires password
+ authentication and a password is not available by other means such as
+ a <filename>.pgpass</filename> file, the connection attempt will fail.
+ This option can be useful in batch jobs and scripts where no user is
+ present to enter a password.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><option>-W</option></term>
+ <term><option>--password</option></term>
+ <listitem>
+ <para>
+ Force <application>pg_amcheck</application> to prompt for a password
+ before connecting to a database.
+ </para>
+ <para>
+ This option is never essential, since
+ <application>pg_amcheck</application> will automatically prompt for a
+ password if the server demands password authentication. However,
+ <application>pg_amcheck</application> will waste a connection attempt
+ finding out that the server wants a password. In some cases it is
+ worth typing <option>-W</option> to avoid the extra connection attempt.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Notes</title>
+
+ <para>
+ <application>pg_amcheck</application> is designed to work with
+ <productname>PostgreSQL</productname> 14.0 and later.
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>Author</title>
+
+ <para>
+ Mark Dilger <email>mark.dilger@enterprisedb.com</email>
+ </para>
+ </refsect1>
+
+ <refsect1>
+ <title>See Also</title>
+
+ <simplelist type="inline">
+ <member><xref linkend="amcheck"/></member>
+ </simplelist>
+ </refsect1>
+</refentry>
diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index ea3af48777..49ad558b74 100644
--- a/src/tools/msvc/Install.pm
+++ b/src/tools/msvc/Install.pm
@@ -18,7 +18,7 @@ our (@ISA, @EXPORT_OK);
@EXPORT_OK = qw(Install);
my $insttype;
-my @client_contribs = ('oid2name', 'pgbench', 'vacuumlo');
+my @client_contribs = ('oid2name', 'pg_amcheck', 'pgbench', 'vacuumlo');
my @client_program_files = (
'clusterdb', 'createdb', 'createuser', 'dropdb',
'dropuser', 'ecpg', 'libecpg', 'libecpg_compat',
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 49614106dc..f680544e07 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -33,9 +33,9 @@ my @unlink_on_exit;
# Set of variables for modules in contrib/ and src/test/modules/
my $contrib_defines = { 'refint' => 'REFINT_VERBOSE' };
-my @contrib_uselibpq = ('dblink', 'oid2name', 'postgres_fdw', 'vacuumlo');
-my @contrib_uselibpgport = ('oid2name', 'vacuumlo');
-my @contrib_uselibpgcommon = ('oid2name', 'vacuumlo');
+my @contrib_uselibpq = ('dblink', 'oid2name', 'pg_amcheck', 'postgres_fdw', 'vacuumlo');
+my @contrib_uselibpgport = ('oid2name', 'pg_amcheck', 'vacuumlo');
+my @contrib_uselibpgcommon = ('oid2name', 'pg_amcheck', 'vacuumlo');
my $contrib_extralibs = undef;
my $contrib_extraincludes = { 'dblink' => ['src/backend'] };
my $contrib_extrasource = {
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 08776f41ca..7a967fcbaf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -499,6 +499,7 @@ DSA
DWORD
DataDumperPtr
DataPageDeleteStack
+DatabaseInfo
DateADT
Datum
DatumTupleFields
@@ -2084,6 +2085,7 @@ RelToCluster
RelabelType
Relation
RelationData
+RelationInfo
RelationPtr
RelationSyncEntry
RelcacheCallbackFunction
@@ -2848,6 +2850,7 @@ ambuildempty_function
ambuildphasename_function
ambulkdelete_function
amcanreturn_function
+amcheckOptions
amcostestimate_function
amendscan_function
amestimateparallelscan_function
--
2.21.1 (Apple Git-122.3)
v41-0003-Extending-PostgresNode-to-test-corruption.patchapplication/octet-stream; name=v41-0003-Extending-PostgresNode-to-test-corruption.patch; x-unix-mode=0644Download
From 91d408c672d8f49a57052b3f5280296c420928c3 Mon Sep 17 00:00:00 2001
From: Mark Dilger <mark.dilger@enterprisedb.com>
Date: Tue, 2 Feb 2021 12:37:58 -0800
Subject: [PATCH v41 3/3] Extending PostgresNode to test corruption.
PostgresNode now has functions for overwriting relation files
with full or partial prior versions of those files, creating
corruption beyond merely twiddling the bits of a heap relation
file.
Adding a regression test for pg_amcheck based on this new
functionality.
---
contrib/pg_amcheck/t/006_relfile_damage.pl | 145 ++++++++++
src/test/modules/Makefile | 1 +
src/test/modules/corruption/Makefile | 16 ++
.../modules/corruption/t/001_corruption.pl | 83 ++++++
src/test/perl/PostgresNode.pm | 261 ++++++++++++++++++
5 files changed, 506 insertions(+)
create mode 100644 contrib/pg_amcheck/t/006_relfile_damage.pl
create mode 100644 src/test/modules/corruption/Makefile
create mode 100644 src/test/modules/corruption/t/001_corruption.pl
diff --git a/contrib/pg_amcheck/t/006_relfile_damage.pl b/contrib/pg_amcheck/t/006_relfile_damage.pl
new file mode 100644
index 0000000000..45ad223531
--- /dev/null
+++ b/contrib/pg_amcheck/t/006_relfile_damage.pl
@@ -0,0 +1,145 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 22;
+use PostgresNode;
+
+my ($node, $port);
+
+# Returns the name of the toast relation associated with the named relation.
+#
+# Assumes the test node is running
+sub relation_toast($$)
+{
+ my ($dbname, $relname) = @_;
+
+ my $rel = $node->safe_psql($dbname, qq(
+ SELECT ct.relname
+ FROM pg_catalog.pg_class cr, pg_catalog.pg_class ct
+ WHERE cr.oid = '$relname'::regclass
+ AND cr.reltoastrelid = ct.oid
+ ));
+ return undef unless defined $rel;
+ return "pg_toast.$rel";
+}
+
+# Test set-up
+$node = get_new_node('test');
+$node->init;
+$node->start;
+$port = $node->port;
+
+# Load the amcheck extension, upon which pg_amcheck depends
+$node->safe_psql('postgres', q(CREATE EXTENSION amcheck));
+
+# Create a table with a btree index. Use a fillfactor for the table and index
+# that will allow some fraction of updates to be on the original pages and some
+# on new pages.
+#
+$node->safe_psql('postgres', qq(
+create schema t;
+create table t.t1 (id integer, t text) with (fillfactor=75);
+alter table t.t1 alter column t set storage external;
+insert into t.t1 select gs, repeat('x',gs) from generate_series(9990,10000) gs;
+create index t1_idx on t.t1 (id) with (fillfactor=75);
+));
+
+my $toastrel = relation_toast('postgres', 't.t1');
+
+# Flush relation files to disk and take snapshots of the toast and index
+#
+$node->restart;
+$node->take_relfile_snapshot_minimal('postgres', 'idx', 't.t1_idx');
+$node->take_relfile_snapshot_minimal('postgres', 'toast', $toastrel);
+
+# Insert new data into the table and index
+#
+$node->safe_psql('postgres', qq(
+insert into t.t1 select gs, repeat('y',gs) from generate_series(10001,10100) gs;
+));
+
+# Revert index. The reverted snapshot file is not corrupt, but it also
+# does not match the current contents of the table.
+#
+$node->stop;
+$node->revert_to_snapshot('idx');
+
+# Restart the node and check table and index with varying options.
+#
+$node->start;
+
+# Checks which do not reconcile the index and table via --heapallindexed will
+# not notice any problems
+#
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ qr/^$/,
+ 'pg_amcheck reverted index at default checking level');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--parent-check' ],
+ qr/^$/,
+ 'pg_amcheck reverted index with --parent-check');
+
+$node->command_like(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--rootdescend' ],
+ qr/^$/,
+ 'pg_amcheck reverted index with --rootdescend');
+
+# Checks which do reconcile the index and table via --heapallindexed will
+# notice the mismatch in their contents
+#
+$node->command_checks_all(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed' ],
+ 2,
+ [ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/ ],
+ [ ],
+ 'pg_amcheck reverted index with --heapallindexed');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--heapallindexed', '--rootdescend' ],
+ 2,
+ [ qr/heap tuple .* from table "t1" lacks matching index tuple within index "t1_idx"/ ],
+ [ ],
+ 'pg_amcheck reverted index with --heapallindexed --rootdescend');
+
+# Revert the toast. The reverted toast table is not corrupt, but it does not
+# have entries for all toast pointers in the main table
+#
+$node->stop;
+$node->revert_to_snapshot('toast');
+
+# Restart the node and check table and toast with varying options. When
+# checking the toast pointers, we may get errors produced by verify_heapam, but
+# we may also get errors from failure to read toast blocks that are beyond the
+# end of the toast table, of the form /ERROR: could not read block/. To avoid
+# having a brittle test, we accept any error message.
+#
+$node->start;
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', $toastrel ],
+ 0,
+ [ qr/^$/ ],
+ [ ],
+ 'pg_amcheck reverted toast table');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*', '--exclude-toast-pointers' ],
+ 0,
+ [ qr/^$/ ],
+ [ ],
+ 'pg_amcheck with reverted toast using --exclude-toast-pointers');
+
+$node->command_checks_all(
+ [ 'pg_amcheck', '--quiet', '-p', $port, '-r', 'postgres.t.*' ],
+ 2,
+ [ qr/.+/ ], # Any non-empty error message is acceptable
+ [ ],
+ 'pg_amcheck with reverted toast and default checking');
diff --git a/src/test/modules/Makefile b/src/test/modules/Makefile
index 5391f461a2..c92d1702b4 100644
--- a/src/test/modules/Makefile
+++ b/src/test/modules/Makefile
@@ -7,6 +7,7 @@ include $(top_builddir)/src/Makefile.global
SUBDIRS = \
brin \
commit_ts \
+ corruption \
delay_execution \
dummy_index_am \
dummy_seclabel \
diff --git a/src/test/modules/corruption/Makefile b/src/test/modules/corruption/Makefile
new file mode 100644
index 0000000000..ba461c645d
--- /dev/null
+++ b/src/test/modules/corruption/Makefile
@@ -0,0 +1,16 @@
+# src/test/modules/corruption/Makefile
+
+# EXTRA_INSTALL = contrib/pg_amcheck
+
+TAP_TESTS = 1
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = src/test/modules/corruption
+top_builddir = ../../../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/src/test/modules/corruption/t/001_corruption.pl b/src/test/modules/corruption/t/001_corruption.pl
new file mode 100644
index 0000000000..ae4a262e06
--- /dev/null
+++ b/src/test/modules/corruption/t/001_corruption.pl
@@ -0,0 +1,83 @@
+use strict;
+use warnings;
+
+use TestLib;
+use Test::More tests => 10;
+use PostgresNode;
+
+my $node = get_new_node('test');
+$node->init;
+$node->start;
+
+# Create something non-trivial for the first snapshot
+$node->safe_psql('postgres', qq(
+create table t1 (id integer, short_text text, long_text text);
+insert into t1 (id, short_text, long_text)
+ (select gs, 'foo', repeat('x', gs)
+ from generate_series(1,10000) gs);
+create unique index idx1 on t1 (id, short_text);
+vacuum freeze;
+));
+
+# Flush relation files to disk and take snapshot of them
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap1', 'public.t1');
+
+# Update data in the table, toast table, and index
+$node->safe_psql('postgres', qq(
+update t1 set
+ short_text = 'bar',
+ long_text = repeat('y', id);
+));
+
+# Flush relation files to disk and take second snapshot
+$node->restart;
+$node->take_relfile_snapshot('postgres', 'snap2', 'public.t1');
+
+# Revert the first page of t1 using a torn snapshot. This should be a partial
+# and corrupt reverting of the update.
+$node->stop;
+$node->revert_to_torn_relfile_snapshot('snap1', 8192);
+
+# Restart the node and count the number of rows in t1 with the original
+# (pre-update) values. It should not be zero, but nor will it be the full
+# 10000.
+$node->start;
+my ($old, $new, $oldtoast, $newtoast) = counts();
+ok($old > 0 && $old < 10000, "Torn snapshot reverts some of the main updates");
+ok($new > 0 && $new <= 10000, "Torn snapshot retains some of the main updates");
+
+# Revert t1 fully to the first snapshot. This should fully restore the
+# original (pre-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap1');
+
+# Restart the node and verify only old values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 10000, "Full snapshot restores all the old main values");
+is($oldtoast, 10000, "Full snapshot restores all the old toast values");
+is($new, 0, "Full snapshot reverts all the new main values");
+is($newtoast, 0, "Full snapshot reverts all the new toast values");
+
+# Restore t1 fully to the second snapshot. This should fully restore the
+# new (post-update) values.
+$node->stop;
+$node->revert_to_snapshot('snap2');
+
+# Restart the node and verify only new values remain
+$node->start;
+($old, $new, $oldtoast, $newtoast) = counts();
+is($old, 0, "Full snapshot reverts all the old main values");
+is($oldtoast, 0, "Full snapshot reverts all the old toast values");
+is($new, 10000, "Full snapshot restores all the new main values");
+is($newtoast, 10000, "Full snapshot restores all the new toast values");
+
+sub counts {
+ return map {
+ $node->safe_psql('postgres', qq(select count(*) from t1 where $_))
+ } ("short_text = 'foo'",
+ "short_text = 'bar'",
+ "long_text ~ 'x'",
+ "long_text ~ 'y'");
+}
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 9667f7667e..d470af93c5 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -2225,6 +2225,267 @@ sub pg_recvlogical_upto
=back
+=head1 DATABASE CORRUPTION METHODS
+
+=over
+
+=item $node->relfile_snapshot_repository()
+
+The path to the parent directory of all directories storing snapshots of
+relation backing files.
+
+=cut
+
+sub relfile_snapshot_repository
+{
+ my ($self) = @_;
+ my $snaprepo = join('/', $self->basedir, 'snapshot');
+ unless (-d $snaprepo)
+ {
+ mkdir $snaprepo
+ or $!{EEXIST}
+ or BAIL_OUT("could not create snapshot repository directory \"$snaprepo\": $!");
+ }
+ return $snaprepo;
+}
+
+=pod
+
+=item $node->relfile_snapshot_directory(snapname)
+
+The path to the directory for storing the named snapshot.
+
+=cut
+
+sub relfile_snapshot_directory
+{
+ my ($self, $snapname) = @_;
+
+ join("/", $self->relfile_snapshot_repository(), $snapname);
+}
+
+=pod
+
+=item $node->take_relfile_snapshot($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relname>, the associated
+toast relations (if any), and all associated indexes (if any). No attempt is
+made to flush these files to disk, meaning the snapshot taken could be stale
+unless the caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relations.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+=pod
+
+=item $node->take_relfile_snapshot_minimal($self, $dbname, $snapname, @relnames)
+
+Makes a copy of the files backing the relations B<@relnames>. No attempt is made
+to flush these files to disk, meaning the snapshot taken could be stale unless the
+caller ensures these files have been flushed prior to calling.
+
+Dies on failure to invoke psql.
+
+Dies on missing relation.
+
+Dies if the given B<$snapname> is already in use.
+
+=cut
+
+sub take_relfile_snapshot
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 1, @relnames);
+}
+
+sub take_relfile_snapshot_minimal
+{
+ my ($self, $dbname, $snapname, @relnames) = @_;
+ $self->take_relfile_snapshot_helper($dbname, $snapname, 0, @relnames);
+}
+
+sub take_relfile_snapshot_helper
+{
+ my ($self, $dbname, $snapname, $extended, @relnames) = @_;
+
+ croak "dbname must be specified" unless defined $dbname;
+ croak "relnames must be defined" unless scalar(grep { defined $_ } @relnames);
+ croak "snapname must be specified" unless defined $snapname;
+ croak "snapname must be unique" if exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snapdir = $self->relfile_snapshot_directory($snapname);
+ croak "snapname directory name already in use: $snapdir" if (-e $snapdir);
+ mkdir $snapdir
+ or BAIL_OUT("could not create snapshot directory \"$snapdir\": $!");
+
+ my @relpaths = map {
+ $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath('$_')));
+ } @relnames;
+
+ my (@toastpaths, @idxpaths);
+ if ($extended)
+ {
+ for my $relname (@relnames)
+ {
+ push (@toastpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(c.reltoastrelid)
+ FROM pg_catalog.pg_class c
+ WHERE c.oid = '$relname'::regclass
+ AND c.reltoastrelid != 0::oid))));
+ push (@idxpaths, grep /\w/, split(/(?:\s*\r?\n\s*)+/, $self->safe_psql($dbname,
+ qq(SELECT pg_relation_filepath(i.indexrelid)
+ FROM pg_catalog.pg_index i
+ WHERE i.indrelid = '$relname'::regclass))));
+ }
+ }
+
+ $self->{snapshot}->{$snapname} = {};
+ for my $path (@relpaths, grep { defined($_) } @toastpaths, @idxpaths)
+ {
+ croak "file backing relation is missing: $pgdata/$path" unless -f "$pgdata/$path";
+ copy_file($snapdir, $pgdata, 0, $path);
+ $self->{snapshot}->{$snapname}->{$path} = 1;
+ }
+}
+
+=pod
+
+=item $node->revert_to_snapshot($self, $snapname)
+
+Overwrites the database's relation files with files previously saved in
+B<$snapname>.
+
+Dies if the given B<$snapname> does not exist.
+
+=cut
+
+=pod
+
+=item $node->revert_to_torn_relfile_snapshot($self, $snapname, $bytes)
+
+Partially overwrites the database's relation files using prefixes of the given
+number of bytes from the files saved in B<$snapname>. If B<$bytes> is
+negative, uses suffixes of the given byte length rather than prefixes.
+
+If B<$bytes> is null, replaces the database's relation files using the saved
+files in the B<$snapname>, which unlike for non-undef values, means the file
+may become shorter if the saved file is shorter than the current file.
+
+=cut
+
+sub revert_to_snapshot
+{
+ my ($self, $snapname) = @_;
+ $self->revert_to_torn_relfile_snapshot($snapname, undef);
+}
+
+sub revert_to_torn_relfile_snapshot
+{
+ my ($self, $snapname, $bytes) = @_;
+
+ croak "no such snapshot" unless exists $self->{snapshot}->{$snapname};
+
+ my $pgdata = $self->data_dir;
+ my $snaprepo = join('/', $self->relfile_snapshot_repository, $snapname);
+ croak "snapname directory missing: $snaprepo" unless (-d $snaprepo);
+
+ if (defined $bytes)
+ {
+ tear_file($pgdata, $snaprepo, $bytes, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+ else
+ {
+ copy_file($pgdata, $snaprepo, 1, $_)
+ for (keys %{$self->{snapshot}->{$snapname}});
+ }
+}
+
+sub copy_file
+{
+ my ($dstdir, $srcdir, $overwrite, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ foreach my $part (split(m{/}, $path))
+ {
+ my $srcpart = "$srcdir/$part";
+ my $dstpart = "$dstdir/$part";
+
+ if (-d $srcpart)
+ {
+ $srcdir = $srcpart;
+ $dstdir = $dstpart;
+ die "$dstdir is in the way" if (-e $dstdir && ! -d $dstdir);
+ unless (-d $dstdir)
+ {
+ mkdir $dstdir
+ or BAIL_OUT("could not create directory \"$dstdir\": $!");
+ }
+ }
+ elsif (-f $srcpart)
+ {
+ die "$dstdir/$part is in the way" if (!$overwrite && -e "$dstdir/$part");
+
+ File::Copy::copy($srcpart, "$dstdir/$part");
+ }
+ }
+}
+
+sub tear_file
+{
+ my ($dstdir, $srcdir, $bytes, $path) = @_;
+
+ croak "No such directory: $dstdir" unless -d $dstdir;
+ croak "No such directory: $srcdir" unless -d $srcdir;
+
+ my $srcfile = "$srcdir/$path";
+ my $dstfile = "$dstdir/$path";
+
+ croak "No such file: $srcfile" unless -f $srcfile;
+ croak "No such file: $dstfile" unless -f $dstfile;
+
+ my ($srcfh, $dstfh);
+ open($srcfh, '<', $srcfile) or die "Cannot read $srcfile: $!";
+ open($dstfh, '+<', $dstfile) or die "Cannot modify $dstfile: $!";
+ binmode($srcfh);
+ binmode($dstfh);
+
+ my $buffer;
+ if ($bytes < 0)
+ {
+ $bytes *= -1; # Easier to use positive value
+ my $srcsize = (stat($srcfh))[7];
+ my $offset = $srcsize - $bytes;
+ seek($srcfh, $offset, 0);
+ seek($dstfh, $offset, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+ else
+ {
+ seek($srcfh, 0, 0);
+ seek($dstfh, 0, 0);
+ sysread($srcfh, $buffer, $bytes);
+ syswrite($dstfh, $buffer, $bytes);
+ }
+
+ close($srcfh);
+ close($dstfh);
+}
+
+=pod
+
+=back
+
=cut
1;
--
2.21.1 (Apple Git-122.3)
On Tue, Mar 2, 2021 at 12:10 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
On further reflection, I decided to implement these changes and not worry about the behavioral change.
Thanks.
I skipped this part. The initcmd argument is only handed to ParallelSlotsGetIdle(). Doing as you suggest would not really be simpler, it would just move that argument to ParallelSlotsSetup(). But I don't feel strongly about it, so I can move this, too, if you like.
I didn't do this either, and for the same reason. It's just a parameter to ParallelSlotsGetIdle(), so nothing is really gained by moving it to ParallelSlotsSetup().
OK. I thought it was more natural to pass a bunch of arguments at
setup time rather than passing a bunch of arguments at get-idle time,
but I don't feel strongly enough about it to insist, and somebody else
can always change it later if they decide I had the right idea.
Rather than the slots user tweak the slot's ConnParams, ParallelSlotsGetIdle() takes a dbname argument, and uses it as ConnParams->override_dbname.
OK, but you forgot to update the comments. ParallelSlotsGetIdle()
still talks about a cparams argument that it no longer has.
The usual idiom for sizing a memory allocation involving
FLEXIBLE_ARRAY_MEMBER is something like offsetof(ParallelSlotArray,
slots) + numslots * sizeof(ParallelSlot). Your version uses sizeof();
don't.
Other than that 0001 looks to me to be in pretty good shape now.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Mar 2, 2021 at 1:24 PM Robert Haas <robertmhaas@gmail.com> wrote:
Other than that 0001 looks to me to be in pretty good shape now.
Incidentally, we might want to move this to a new thread with a better
subject line, since the current subject line really doesn't describe
the uncommitted portion of the work. And create a new CF entry, too.
Moving onto 0002:
The index checking options should really be called btree index
checking options. I think I'd put the table options first, and the
btree options second. Other kinds of indexes could follow some day. I
would personally omit the short forms of --heapallindexed and
--parent-check; I think we'll run out of option names too quickly if
people add more kinds of checks.
Perhaps VerifyBtreeSlotHandler should emit a warning of some kind if
PQntuples(res) != 0.
+ /*
+ * Test that this function works, but for now we're
not using the list
+ * 'relations' that it builds.
+ */
+ conn = connectDatabase(&cparams, progname, opts.echo,
false, true);
This comment appears to have nothing to do with the code, since
connectDatabase() does not build a list of 'relations'.
amcheck_sql seems to include paranoia, but do we need that if we're
using a secure search path? Similarly for other SQL queries, e.g. in
prepare_table_command.
It might not be strictly necessary for the static functions in
pg_amcheck.c to use_three completelyDifferent NamingConventions for
its static functions.
should_processing_continue() is one semicolon over budget.
The initializer for opts puts a comma even after the last member
initializer. Is that going to be portable to all compilers?
+ for (failed = false, cell = opts.include.head; cell; cell = cell->next)
I think failed has to be false here, because it gets initialized at
the top of the function. If we need to reinitialize it for some
reason, I would prefer you do that on the previous line, separate from
the for loop stuff.
+ char *dbrgx; /* Database regexp parsed from pattern, or
+ * NULL */
+ char *nsprgx; /* Schema regexp parsed from pattern, or NULL */
+ char *relrgx; /* Relation regexp parsed from pattern, or
+ * NULL */
+ bool tblonly; /* true if relrgx should only match tables */
+ bool idxonly; /* true if relrgx should only match indexes */
Maybe: db_regex, nsp_regex, rel_regex, table_only, index_only?
Just because it seems theoretically possible that someone will see
nsprgx and not immediately understand what it's supposed to mean, even
if they know that nsp is a common abbreviation for namespace in
PostgreSQL code, and even if they also know what a regular expression
is.
Your four messages about there being nothing to check seem like they
could be consolidated down to one: "nothing to check for pattern
\"%s\"".
I would favor changing things so that once argument parsing is
complete, we switch to reporting all errors that way. So in other
words here, and everything that follows:
+ fprintf(stderr, "%s: no databases to check\n", progname);
+ * ParallelSlots based event loop follows.
"Main event loop."
To me it would read slightly better to change each reference to
"relations list" to "list of relations", but perhaps that is too
nitpicky.
I think the two instances of goto finish could be avoided with not
much work. At most a few things need to happen only if !failed, and
maybe not even that, if you just said "break;" instead.
+ * Note: Heap relation corruption is returned by verify_heapam() without the
+ * use of raising errors, but running verify_heapam() on a corrupted table may
How about "Heap relation corruption() is reported by verify_heapam()
via the result set, rather than an ERROR, ..."
It seems mighty inefficient to have a whole bunch of consecutive calls
to remove_relation_file() or corrupt_first_page() when every such call
stops and restarts the database. I would guess these tests will run
noticeably faster if you don't do that. Either the functions need to
take a list of arguments, or the stop/start needs to be pulled up and
done in the caller.
corrupt_first_page() could use a comment explaining what exactly we're
overwriting, and in particular noting that we don't want to just
clobber the LSN, but rather something where we can detect a wrong
value.
There's a long list of calls to command_checks_all() in 003_check.pl
that don't actually check anything but that the command failed, but
run it with a bunch of different options. I don't understand the value
of that, and suggest reducing the number of cases tested. If you want,
you can have tests elsewhere that focus -- perhaps by using verbose
mode -- on checking that the right tables are being checked.
This is not yet a full review of everything in this patch -- I haven't
sorted through all of the tests yet, or all of the new query
construction logic -- but to me this looks pretty close to
committable.
--
Robert Haas
EDB: http://www.enterprisedb.com