PATCH: pageinspect / add page_checksum and bt_page_items(bytea)
Hi,
while investigating some checksum-related issues, I needed to perform
some forensics on a copy of a btree page (taken from another instance
using 'dd').
But I've ran into two pageinspect limitations, hopefully addressed by
this patch:
1) bt_page_items(bytea) not defined
We have heap_page_items(bytea) but not bt_page_items(bytea). I suspect
this is partially historical API inconsistence, and partially due to the
need to handle the btree metapage explicitly.
The original function simply threw an error with blkno=0, the new
function simply checks for BTP_META page.
I believe this is sufficient, assuming the instance without data
corruption (which pageinspect assumes anyway). With data corruption all
bets are off anyway - for example the metapage might be written to a
different block (essentially what I saw in the investigated issue).
Actually, the flag check is better in this case - it detects the
metapage, while the blkno=0 check fails to do that (leading to crash).
2) page_checksum()
When everything is fine, you can do page_header() which also includes
the checksum. When the checksum gets broken, you may still dump the page
using 'dd+pg_read_binary_file' to see the header, but clearly that
checksum is wrong - and it's interesting to see the correct one and
compare it to the checksum in the header.
This function makes it possible - it accepts the bytea image of the
page, and blkno (so it's possible to see how would the block look if it
was written somewhere else, for example).
BTW I've noticed the pageinspect version is 1.6, but we only have
pageinspect--1.5.sql (and upgrade script to 1.6). Not sure that's
entirely intentional?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-pageinspect-page_checksum-and-bt_page_items-bytea.patchtext/x-diff; name=0001-pageinspect-page_checksum-and-bt_page_items-bytea.patchDownload
From 499a7bd9aea3032f03d787833c0501d9fa703271 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@2ndquadrant.com>
Date: Mon, 13 Feb 2017 21:20:12 +0100
Subject: [PATCH] pageinspect - page_checksum and bt_page_items(bytea)
Adds two functions to the pageinspect extension:
1) page_checksum(bytea,int4) allows computing checksum for
arbitrary page, even if data checksums are not enabled
2) bt_page_items(bytea) is similar to heap_page_items(bytea)
---
contrib/pageinspect/btreefuncs.c | 209 +++++++++++++++++++++++---
contrib/pageinspect/expected/btree.out | 13 ++
contrib/pageinspect/pageinspect--1.5--1.6.sql | 22 +++
contrib/pageinspect/rawpage.c | 37 +++++
contrib/pageinspect/sql/btree.sql | 4 +
doc/src/sgml/pageinspect.sgml | 58 +++++++
src/include/access/nbtree.h | 1 +
7 files changed, 320 insertions(+), 24 deletions(-)
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index d50ec3a..93da844 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -39,8 +39,14 @@
#include "utils/varlena.h"
+extern Datum bt_metap(PG_FUNCTION_ARGS);
+extern Datum bt_page_items(PG_FUNCTION_ARGS);
+extern Datum bt_page_items_bytea(PG_FUNCTION_ARGS);
+extern Datum bt_page_stats(PG_FUNCTION_ARGS);
+
PG_FUNCTION_INFO_V1(bt_metap);
PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_items_bytea);
PG_FUNCTION_INFO_V1(bt_page_stats);
#define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -215,17 +221,31 @@ bt_page_stats(PG_FUNCTION_ARGS)
elog(ERROR, "return type must be a row type");
j = 0;
- values[j++] = psprintf("%d", stat.blkno);
- values[j++] = psprintf("%c", stat.type);
- values[j++] = psprintf("%d", stat.live_items);
- values[j++] = psprintf("%d", stat.dead_items);
- values[j++] = psprintf("%d", stat.avg_item_size);
- values[j++] = psprintf("%d", stat.page_size);
- values[j++] = psprintf("%d", stat.free_size);
- values[j++] = psprintf("%d", stat.btpo_prev);
- values[j++] = psprintf("%d", stat.btpo_next);
- values[j++] = psprintf("%d", (stat.type == 'd') ? stat.btpo.xact : stat.btpo.level);
- values[j++] = psprintf("%d", stat.btpo_flags);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.blkno);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", stat.type);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.live_items);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.dead_items);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.avg_item_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.page_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.free_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.btpo_prev);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.btpo_next);
+ values[j] = palloc(32);
+ if (stat.type == 'd')
+ snprintf(values[j++], 32, "%d", stat.btpo.xact);
+ else
+ snprintf(values[j++], 32, "%d", stat.btpo.level);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.btpo_flags);
tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
values);
@@ -361,13 +381,18 @@ bt_page_items(PG_FUNCTION_ARGS)
itup = (IndexTuple) PageGetItem(uargs->page, id);
j = 0;
- values[j++] = psprintf("%d", uargs->offset);
- values[j++] = psprintf("(%u,%u)",
- BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)),
- itup->t_tid.ip_posid);
- values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
- values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
- values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", uargs->offset);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "(%u,%u)",
+ BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)),
+ itup->t_tid.ip_posid);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", (int) IndexTupleSize(itup));
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
@@ -396,6 +421,136 @@ bt_page_items(PG_FUNCTION_ARGS)
}
}
+/*-------------------------------------------------------
+ * bt_page_items_bytea()
+ *
+ * Get IndexTupleData set in a btree page
+ *
+ * Usage: SELECT * FROM bt_page_items(get_raw_page('t1_pkey', 1));
+ *-------------------------------------------------------
+ */
+
+Datum
+bt_page_items_bytea(PG_FUNCTION_ARGS)
+{
+ bytea *raw_page = PG_GETARG_BYTEA_P(0);
+ Datum result;
+ char *values[6];
+ HeapTuple tuple;
+ FuncCallContext *fctx;
+ struct user_args *uargs;
+ int raw_page_size;
+
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use pageinspect functions"))));
+
+ raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+ if (SRF_IS_FIRSTCALL())
+ {
+ BTPageOpaque opaque;
+ MemoryContext mctx;
+ TupleDesc tupleDesc;
+
+ if (raw_page_size < SizeOfPageHeaderData)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small (%d bytes)", raw_page_size)));
+
+ fctx = SRF_FIRSTCALL_INIT();
+ mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+ uargs = palloc(sizeof(struct user_args));
+
+ uargs->page = VARDATA(raw_page);
+
+ uargs->offset = FirstOffsetNumber;
+
+ opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+
+ if (P_ISMETA(opaque))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block is a meta page")));
+
+ if (P_ISDELETED(opaque))
+ elog(NOTICE, "page is deleted");
+
+ fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupleDesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ fctx->attinmeta = TupleDescGetAttInMetadata(tupleDesc);
+
+ fctx->user_fctx = uargs;
+
+ MemoryContextSwitchTo(mctx);
+ }
+
+ fctx = SRF_PERCALL_SETUP();
+ uargs = fctx->user_fctx;
+
+ if (fctx->call_cntr < fctx->max_calls)
+ {
+ ItemId id;
+ IndexTuple itup;
+ int j;
+ int off;
+ int dlen;
+ char *dump;
+ char *ptr;
+
+ id = PageGetItemId(uargs->page, uargs->offset);
+
+ if (!ItemIdIsValid(id))
+ elog(ERROR, "invalid ItemId");
+
+ itup = (IndexTuple) PageGetItem(uargs->page, id);
+ j = 0;
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", uargs->offset);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "(%u,%u)",
+ BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)),
+ itup->t_tid.ip_posid);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", (int) IndexTupleSize(itup));
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+ ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+ dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
+ dump = palloc0(dlen * 3 + 1);
+
+ values[j] = dump;
+ for (off = 0; off < dlen; off++)
+ {
+ if (off > 0)
+ *dump++ = ' ';
+ sprintf(dump, "%02x", *(ptr + off) & 0xff);
+ dump += 2;
+ }
+
+ tuple = BuildTupleFromCStrings(fctx->attinmeta, values);
+ result = HeapTupleGetDatum(tuple);
+
+ uargs->offset = uargs->offset + 1;
+
+ SRF_RETURN_NEXT(fctx, result);
+ }
+ else
+ {
+ pfree(uargs);
+ SRF_RETURN_DONE(fctx);
+ }
+}
+
/* ------------------------------------------------
* bt_metap()
@@ -453,12 +608,18 @@ bt_metap(PG_FUNCTION_ARGS)
elog(ERROR, "return type must be a row type");
j = 0;
- values[j++] = psprintf("%d", metad->btm_magic);
- values[j++] = psprintf("%d", metad->btm_version);
- values[j++] = psprintf("%d", metad->btm_root);
- values[j++] = psprintf("%d", metad->btm_level);
- values[j++] = psprintf("%d", metad->btm_fastroot);
- values[j++] = psprintf("%d", metad->btm_fastlevel);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_magic);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_version);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_root);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_level);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_fastroot);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
values);
diff --git a/contrib/pageinspect/expected/btree.out b/contrib/pageinspect/expected/btree.out
index 82a49e3..67b103a 100644
--- a/contrib/pageinspect/expected/btree.out
+++ b/contrib/pageinspect/expected/btree.out
@@ -42,4 +42,17 @@ data | 01 00 00 00 00 00 00 01
SELECT * FROM bt_page_items('test1_a_idx', 2);
ERROR: block number out of range
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
+ERROR: block is a meta page
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
+-[ RECORD 1 ]-----------------------
+itemoffset | 1
+ctid | (0,1)
+itemlen | 16
+nulls | f
+vars | f
+data | 01 00 00 00 00 00 00 01
+
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
+ERROR: block number 2 is out of range for relation "test1_a_idx"
DROP TABLE test1;
diff --git a/contrib/pageinspect/pageinspect--1.5--1.6.sql b/contrib/pageinspect/pageinspect--1.5--1.6.sql
index ac39568..df17fe6 100644
--- a/contrib/pageinspect/pageinspect--1.5--1.6.sql
+++ b/contrib/pageinspect/pageinspect--1.5--1.6.sql
@@ -75,3 +75,25 @@ CREATE FUNCTION hash_metapage_info(IN page bytea,
OUT mapp int8[])
AS 'MODULE_PATHNAME', 'hash_metapage_info'
LANGUAGE C STRICT PARALLEL SAFE;
+
+--
+-- page_checksum()
+--
+CREATE FUNCTION page_checksum(IN page bytea, IN blkno int4)
+RETURNS smallint
+AS 'MODULE_PATHNAME', 'page_checksum'
+LANGUAGE C STRICT;
+
+--
+-- bt_page_items_bytea()
+--
+CREATE FUNCTION bt_page_items(IN page bytea,
+ OUT itemoffset smallint,
+ OUT ctid tid,
+ OUT itemlen smallint,
+ OUT nulls bool,
+ OUT vars bool,
+ OUT data text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'bt_page_items_bytea'
+LANGUAGE C STRICT;
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 102f360..0605989 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -24,6 +24,7 @@
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/checksum.h"
#include "utils/builtins.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
@@ -275,3 +276,39 @@ page_header(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}
+
+/*
+ * page_header
+ *
+ * Allows inspection of page header fields of a raw page
+ */
+
+PG_FUNCTION_INFO_V1(page_checksum);
+
+Datum
+page_checksum(PG_FUNCTION_ARGS)
+{
+ bytea *raw_page = PG_GETARG_BYTEA_P(0);
+ uint32 blkno = PG_GETARG_INT32(1);
+ int raw_page_size;
+ PageHeader page;
+
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use raw page functions"))));
+
+ raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+ /*
+ * Check that the supplied page is of the right size.
+ */
+ if (raw_page_size != BLCKSZ)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("incorrect size of input page (%d bytes)", raw_page_size)));
+
+ page = (PageHeader) VARDATA(raw_page);
+
+ PG_RETURN_INT16(pg_checksum_page((char *)page, blkno));
+}
diff --git a/contrib/pageinspect/sql/btree.sql b/contrib/pageinspect/sql/btree.sql
index 1eafc32..8eac64c 100644
--- a/contrib/pageinspect/sql/btree.sql
+++ b/contrib/pageinspect/sql/btree.sql
@@ -14,4 +14,8 @@ SELECT * FROM bt_page_items('test1_a_idx', 0);
SELECT * FROM bt_page_items('test1_a_idx', 1);
SELECT * FROM bt_page_items('test1_a_idx', 2);
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
+
DROP TABLE test1;
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml
index 5e6712f..da5dd37 100644
--- a/doc/src/sgml/pageinspect.sgml
+++ b/doc/src/sgml/pageinspect.sgml
@@ -84,6 +84,33 @@ test=# SELECT * FROM page_header(get_raw_page('pg_class', 0));
<varlistentry>
<term>
+ <function>page_checksum(page bytea, blkno int4) returns smallint</function>
+ <indexterm>
+ <primary>page_checksum</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ <function>page_checksum</function> computes a checksum for the page, as if
+ it was located at the given block.
+ </para>
+
+ <para>
+ A page image obtained with <function>get_raw_page</function> should be
+ passed as argument. For example:
+<screen>
+test=# SELECT page_checksum(get_raw_page('pg_class', 0), 100);
+ page_checksum
+---------------
+ 13443
+</screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
<function>heap_page_items(page bytea) returns setof record</function>
<indexterm>
<primary>heap_page_items</primary>
@@ -290,6 +317,37 @@ test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>bt_page_items(page bytea) returns setof record</function>
+ <indexterm>
+ <primary>bt_page_items</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Similarly to <function>heap_page_items</function>, it is also possible to
+ pass the page to <function>bt_page_items</function> as a <type>bytea</>
+ value. So the last example may also be rewritten like this:
+<screen>
+test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1));
+ itemoffset | ctid | itemlen | nulls | vars | data
+------------+---------+---------+-------+------+-------------
+ 1 | (0,1) | 12 | f | f | 23 27 00 00
+ 2 | (0,2) | 12 | f | f | 24 27 00 00
+ 3 | (0,3) | 12 | f | f | 25 27 00 00
+ 4 | (0,4) | 12 | f | f | 26 27 00 00
+ 5 | (0,5) | 12 | f | f | 27 27 00 00
+ 6 | (0,6) | 12 | f | f | 28 27 00 00
+ 7 | (0,7) | 12 | f | f | 29 27 00 00
+ 8 | (0,8) | 12 | f | f | 2a 27 00 00
+</screen>
+ All the other details are the same as explained in the previous section.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</sect2>
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 6289ffa..f202715 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -177,6 +177,7 @@ typedef struct BTMetaPageData
#define P_ISLEAF(opaque) ((opaque)->btpo_flags & BTP_LEAF)
#define P_ISROOT(opaque) ((opaque)->btpo_flags & BTP_ROOT)
#define P_ISDELETED(opaque) ((opaque)->btpo_flags & BTP_DELETED)
+#define P_ISMETA(opaque) ((opaque)->btpo_flags & BTP_META)
#define P_ISHALFDEAD(opaque) ((opaque)->btpo_flags & BTP_HALF_DEAD)
#define P_IGNORE(opaque) ((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD))
#define P_HAS_GARBAGE(opaque) ((opaque)->btpo_flags & BTP_HAS_GARBAGE)
--
2.5.5
On Mon, Feb 20, 2017 at 9:43 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
BTW I've noticed the pageinspect version is 1.6, but we only have
pageinspect--1.5.sql (and upgrade script to 1.6). Not sure that's entirely
intentional?
Actually, that's the New Way. See 40b449ae84dcf71177d7749a7b0c582b64dc15f0.
+extern Datum bt_metap(PG_FUNCTION_ARGS);
+extern Datum bt_page_items(PG_FUNCTION_ARGS);
+extern Datum bt_page_items_bytea(PG_FUNCTION_ARGS);
+extern Datum bt_page_stats(PG_FUNCTION_ARGS);
Not needed. PG_FUNCTION_INFO_V1 now does it.
- values[j++] = psprintf("%d", stat.blkno);
- values[j++] = psprintf("%c", stat.type);
- values[j++] = psprintf("%d", stat.live_items);
- values[j++] = psprintf("%d", stat.dead_items);
- values[j++] = psprintf("%d", stat.avg_item_size);
- values[j++] = psprintf("%d", stat.page_size);
- values[j++] = psprintf("%d", stat.free_size);
- values[j++] = psprintf("%d", stat.btpo_prev);
- values[j++] = psprintf("%d", stat.btpo_next);
- values[j++] = psprintf("%d", (stat.type == 'd') ? stat.btpo.xact : stat.btp
o.level);
- values[j++] = psprintf("%d", stat.btpo_flags);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.blkno);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", stat.type);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.live_items);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.dead_items);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.avg_item_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.page_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.free_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.btpo_prev);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.btpo_next);
+ values[j] = palloc(32);
+ if (stat.type == 'd')
+ snprintf(values[j++], 32, "%d", stat.btpo.xact);
+ else
+ snprintf(values[j++], 32, "%d", stat.btpo.level);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.btpo_flags);
This does not seem like a good idea in any way, and the patch has
several instances of it.
I don't object to the concept, though.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/03/2017 05:09 AM, Robert Haas wrote:
On Mon, Feb 20, 2017 at 9:43 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:BTW I've noticed the pageinspect version is 1.6, but we only have
pageinspect--1.5.sql (and upgrade script to 1.6). Not sure that's entirely
intentional?Actually, that's the New Way. See 40b449ae84dcf71177d7749a7b0c582b64dc15f0.
Ah, great! Didn't notice that change.
+extern Datum bt_metap(PG_FUNCTION_ARGS); +extern Datum bt_page_items(PG_FUNCTION_ARGS); +extern Datum bt_page_items_bytea(PG_FUNCTION_ARGS); +extern Datum bt_page_stats(PG_FUNCTION_ARGS);Not needed. PG_FUNCTION_INFO_V1 now does it.
OK.
- ...
This does not seem like a good idea in any way, and the patch has
several instances of it.
Damn. In my defense, the patch was originally created for an older
PostgreSQL version (to investigate issue on a production system), which
used that approach to building values. Should have notice it, though.
Attached is v2, fixing both issues.
regard
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-pageinspect-page_checksum-and-bt_page_items-bytea-v2.patchbinary/octet-stream; name=0001-pageinspect-page_checksum-and-bt_page_items-bytea-v2.patchDownload
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index d50ec3a..15667db 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -41,6 +41,7 @@
PG_FUNCTION_INFO_V1(bt_metap);
PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_items_bytea);
PG_FUNCTION_INFO_V1(bt_page_stats);
#define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -396,6 +397,131 @@ bt_page_items(PG_FUNCTION_ARGS)
}
}
+/*-------------------------------------------------------
+ * bt_page_items_bytea()
+ *
+ * Get IndexTupleData set in a btree page
+ *
+ * Usage: SELECT * FROM bt_page_items(get_raw_page('t1_pkey', 1));
+ *-------------------------------------------------------
+ */
+
+Datum
+bt_page_items_bytea(PG_FUNCTION_ARGS)
+{
+ bytea *raw_page = PG_GETARG_BYTEA_P(0);
+ Datum result;
+ char *values[6];
+ HeapTuple tuple;
+ FuncCallContext *fctx;
+ struct user_args *uargs;
+ int raw_page_size;
+
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use pageinspect functions"))));
+
+ if (SRF_IS_FIRSTCALL())
+ {
+ BTPageOpaque opaque;
+ MemoryContext mctx;
+ TupleDesc tupleDesc;
+
+ raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+ if (raw_page_size < SizeOfPageHeaderData)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small (%d bytes)", raw_page_size)));
+
+ fctx = SRF_FIRSTCALL_INIT();
+ mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+ uargs = palloc(sizeof(struct user_args));
+
+ uargs->page = VARDATA(raw_page);
+
+ uargs->offset = FirstOffsetNumber;
+
+ opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+
+ if (P_ISMETA(opaque))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block is a meta page")));
+
+ if (P_ISDELETED(opaque))
+ elog(NOTICE, "page is deleted");
+
+ fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupleDesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ fctx->attinmeta = TupleDescGetAttInMetadata(tupleDesc);
+
+ fctx->user_fctx = uargs;
+
+ MemoryContextSwitchTo(mctx);
+ }
+
+ fctx = SRF_PERCALL_SETUP();
+ uargs = fctx->user_fctx;
+
+ if (fctx->call_cntr < fctx->max_calls)
+ {
+ ItemId id;
+ IndexTuple itup;
+ int j;
+ int off;
+ int dlen;
+ char *dump;
+ char *ptr;
+
+ id = PageGetItemId(uargs->page, uargs->offset);
+
+ if (!ItemIdIsValid(id))
+ elog(ERROR, "invalid ItemId");
+
+ itup = (IndexTuple) PageGetItem(uargs->page, id);
+
+ j = 0;
+ values[j++] = psprintf("%d", uargs->offset);
+ values[j++] = psprintf("(%u,%u)",
+ BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)),
+ itup->t_tid.ip_posid);
+ values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
+ values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+ values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+ ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+ dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
+ dump = palloc0(dlen * 3 + 1);
+ values[j] = dump;
+ for (off = 0; off < dlen; off++)
+ {
+ if (off > 0)
+ *dump++ = ' ';
+ sprintf(dump, "%02x", *(ptr + off) & 0xff);
+ dump += 2;
+ }
+
+ tuple = BuildTupleFromCStrings(fctx->attinmeta, values);
+ result = HeapTupleGetDatum(tuple);
+
+ uargs->offset = uargs->offset + 1;
+
+ SRF_RETURN_NEXT(fctx, result);
+ }
+ else
+ {
+ pfree(uargs);
+ SRF_RETURN_DONE(fctx);
+ }
+}
+
/* ------------------------------------------------
* bt_metap()
diff --git a/contrib/pageinspect/expected/btree.out b/contrib/pageinspect/expected/btree.out
index 82a49e3..67b103a 100644
--- a/contrib/pageinspect/expected/btree.out
+++ b/contrib/pageinspect/expected/btree.out
@@ -42,4 +42,17 @@ data | 01 00 00 00 00 00 00 01
SELECT * FROM bt_page_items('test1_a_idx', 2);
ERROR: block number out of range
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
+ERROR: block is a meta page
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
+-[ RECORD 1 ]-----------------------
+itemoffset | 1
+ctid | (0,1)
+itemlen | 16
+nulls | f
+vars | f
+data | 01 00 00 00 00 00 00 01
+
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
+ERROR: block number 2 is out of range for relation "test1_a_idx"
DROP TABLE test1;
diff --git a/contrib/pageinspect/pageinspect--1.5--1.6.sql b/contrib/pageinspect/pageinspect--1.5--1.6.sql
index ac39568..df17fe6 100644
--- a/contrib/pageinspect/pageinspect--1.5--1.6.sql
+++ b/contrib/pageinspect/pageinspect--1.5--1.6.sql
@@ -75,3 +75,25 @@ CREATE FUNCTION hash_metapage_info(IN page bytea,
OUT mapp int8[])
AS 'MODULE_PATHNAME', 'hash_metapage_info'
LANGUAGE C STRICT PARALLEL SAFE;
+
+--
+-- page_checksum()
+--
+CREATE FUNCTION page_checksum(IN page bytea, IN blkno int4)
+RETURNS smallint
+AS 'MODULE_PATHNAME', 'page_checksum'
+LANGUAGE C STRICT;
+
+--
+-- bt_page_items_bytea()
+--
+CREATE FUNCTION bt_page_items(IN page bytea,
+ OUT itemoffset smallint,
+ OUT ctid tid,
+ OUT itemlen smallint,
+ OUT nulls bool,
+ OUT vars bool,
+ OUT data text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'bt_page_items_bytea'
+LANGUAGE C STRICT;
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 102f360..0605989 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -24,6 +24,7 @@
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/checksum.h"
#include "utils/builtins.h"
#include "utils/pg_lsn.h"
#include "utils/rel.h"
@@ -275,3 +276,39 @@ page_header(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}
+
+/*
+ * page_header
+ *
+ * Allows inspection of page header fields of a raw page
+ */
+
+PG_FUNCTION_INFO_V1(page_checksum);
+
+Datum
+page_checksum(PG_FUNCTION_ARGS)
+{
+ bytea *raw_page = PG_GETARG_BYTEA_P(0);
+ uint32 blkno = PG_GETARG_INT32(1);
+ int raw_page_size;
+ PageHeader page;
+
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use raw page functions"))));
+
+ raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+ /*
+ * Check that the supplied page is of the right size.
+ */
+ if (raw_page_size != BLCKSZ)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("incorrect size of input page (%d bytes)", raw_page_size)));
+
+ page = (PageHeader) VARDATA(raw_page);
+
+ PG_RETURN_INT16(pg_checksum_page((char *)page, blkno));
+}
diff --git a/contrib/pageinspect/sql/btree.sql b/contrib/pageinspect/sql/btree.sql
index 1eafc32..8eac64c 100644
--- a/contrib/pageinspect/sql/btree.sql
+++ b/contrib/pageinspect/sql/btree.sql
@@ -14,4 +14,8 @@ SELECT * FROM bt_page_items('test1_a_idx', 0);
SELECT * FROM bt_page_items('test1_a_idx', 1);
SELECT * FROM bt_page_items('test1_a_idx', 2);
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
+
DROP TABLE test1;
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml
index 5e6712f..da5dd37 100644
--- a/doc/src/sgml/pageinspect.sgml
+++ b/doc/src/sgml/pageinspect.sgml
@@ -84,6 +84,33 @@ test=# SELECT * FROM page_header(get_raw_page('pg_class', 0));
<varlistentry>
<term>
+ <function>page_checksum(page bytea, blkno int4) returns smallint</function>
+ <indexterm>
+ <primary>page_checksum</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ <function>page_checksum</function> computes a checksum for the page, as if
+ it was located at the given block.
+ </para>
+
+ <para>
+ A page image obtained with <function>get_raw_page</function> should be
+ passed as argument. For example:
+<screen>
+test=# SELECT page_checksum(get_raw_page('pg_class', 0), 100);
+ page_checksum
+---------------
+ 13443
+</screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
<function>heap_page_items(page bytea) returns setof record</function>
<indexterm>
<primary>heap_page_items</primary>
@@ -290,6 +317,37 @@ test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>bt_page_items(page bytea) returns setof record</function>
+ <indexterm>
+ <primary>bt_page_items</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Similarly to <function>heap_page_items</function>, it is also possible to
+ pass the page to <function>bt_page_items</function> as a <type>bytea</>
+ value. So the last example may also be rewritten like this:
+<screen>
+test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1));
+ itemoffset | ctid | itemlen | nulls | vars | data
+------------+---------+---------+-------+------+-------------
+ 1 | (0,1) | 12 | f | f | 23 27 00 00
+ 2 | (0,2) | 12 | f | f | 24 27 00 00
+ 3 | (0,3) | 12 | f | f | 25 27 00 00
+ 4 | (0,4) | 12 | f | f | 26 27 00 00
+ 5 | (0,5) | 12 | f | f | 27 27 00 00
+ 6 | (0,6) | 12 | f | f | 28 27 00 00
+ 7 | (0,7) | 12 | f | f | 29 27 00 00
+ 8 | (0,8) | 12 | f | f | 2a 27 00 00
+</screen>
+ All the other details are the same as explained in the previous section.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</sect2>
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 6289ffa..f202715 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -177,6 +177,7 @@ typedef struct BTMetaPageData
#define P_ISLEAF(opaque) ((opaque)->btpo_flags & BTP_LEAF)
#define P_ISROOT(opaque) ((opaque)->btpo_flags & BTP_ROOT)
#define P_ISDELETED(opaque) ((opaque)->btpo_flags & BTP_DELETED)
+#define P_ISMETA(opaque) ((opaque)->btpo_flags & BTP_META)
#define P_ISHALFDEAD(opaque) ((opaque)->btpo_flags & BTP_HALF_DEAD)
#define P_IGNORE(opaque) ((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD))
#define P_HAS_GARBAGE(opaque) ((opaque)->btpo_flags & BTP_HAS_GARBAGE)
On 3/3/17 09:03, Tomas Vondra wrote:
Damn. In my defense, the patch was originally created for an older
PostgreSQL version (to investigate issue on a production system), which
used that approach to building values. Should have notice it, though.Attached is v2, fixing both issues.
Can we have a test case for page_checksum(), or is that too difficult to
get running deterministicly?
Also, perhaps page_checksum() doesn't need to be superuser-only, but I
can see arguments for keeping it that way for consistency.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/04/2017 02:08 PM, Peter Eisentraut wrote:
On 3/3/17 09:03, Tomas Vondra wrote:
Damn. In my defense, the patch was originally created for an older
PostgreSQL version (to investigate issue on a production system), which
used that approach to building values. Should have notice it, though.Attached is v2, fixing both issues.
Can we have a test case for page_checksum(), or is that too difficult to
get running deterministicly?
I'm not sure it can be made deterministic. Certainly not on heap pages,
because then it'd be susceptible to xmin/xmax changes, but we have other
bits that change even on index pages (say pd_lsn).
So I'm afraid that's not going to fly.
Also, perhaps page_checksum() doesn't need to be superuser-only, but
I can see arguments for keeping it that way for consistency.
Yes, I'll change that.
It won't have much impact in practice, because all functions providing
the page data (get_raw_page etc.) do require superuser. But you can
still input the page as a bytea directly, and it's good practice not to
add unnecessary superuser checks.
regard
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/4/17 12:39, Tomas Vondra wrote:
Can we have a test case for page_checksum(), or is that too difficult to
get running deterministicly?I'm not sure it can be made deterministic. Certainly not on heap pages,
because then it'd be susceptible to xmin/xmax changes, but we have other
bits that change even on index pages (say pd_lsn).So I'm afraid that's not going to fly.
Then just run it and throw away the result. See sql/page.sql for some
examples.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/3/17 09:03, Tomas Vondra wrote:
Attached is v2, fixing both issues.
I wonder if
+ bytea *raw_page = PG_GETARG_BYTEA_P(0);
+ uargs->page = VARDATA(raw_page);
is expected to work reliably, without copying the argument to a
different memory context.
I think it would be better not to maintain so much duplicate code
between bt_page_items(text, int) and bt_page_items(bytea). How about
just redefining bt_page_items(text, int) as an SQL-language function
calling bt_page_items(get_raw_page($1, $2))?
For page_checksum(), the checksums are internally unsigned, so maybe
return type int on the SQL level, so that there is no confusion about
the sign?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/06/2017 10:13 PM, Peter Eisentraut wrote:
On 3/3/17 09:03, Tomas Vondra wrote:
Attached is v2, fixing both issues.
I wonder if
+ bytea *raw_page = PG_GETARG_BYTEA_P(0);
+ uargs->page = VARDATA(raw_page);
is expected to work reliably, without copying the argument to a
different memory context.I think it would be better not to maintain so much duplicate code
between bt_page_items(text, int) and bt_page_items(bytea). How about
just redefining bt_page_items(text, int) as an SQL-language function
calling bt_page_items(get_raw_page($1, $2))?
Maybe. We can also probably share the code at the C level, so I'll look
into that.
For page_checksum(), the checksums are internally unsigned, so maybe
return type int on the SQL level, so that there is no confusion about
the sign?
That ship already sailed, I'm afraid. We already have checksum in the
page_header() output, and it's defined as smallint there. So using int
here would be inconsistent.
A similar issue is that get_raw_page() uses int for block number, while
internally it's uint32. That often requires a fair amount of gymnastics
on large tables.
I'm not against changing either of those bits - using int for checksums
and bigint for block numbers, but I think it should be a separate patch.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/6/17 16:33, Tomas Vondra wrote:
I think it would be better not to maintain so much duplicate code
between bt_page_items(text, int) and bt_page_items(bytea). How about
just redefining bt_page_items(text, int) as an SQL-language function
calling bt_page_items(get_raw_page($1, $2))?Maybe. We can also probably share the code at the C level, so I'll look
into that.
I think SQL would be easier, but either way some reduction in
duplication would be good.
For page_checksum(), the checksums are internally unsigned, so maybe
return type int on the SQL level, so that there is no confusion about
the sign?That ship already sailed, I'm afraid. We already have checksum in the
page_header() output, and it's defined as smallint there. So using int
here would be inconsistent.
OK, no worries then.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
I had a look into this patch and would like to share some of my review
comments that requires author's attention.
1) The comment for page_checksum() needs to be corrected. It seems
like it has been copied from page_header and not edited it further.
/*
* page_header
*
* Allows inspection of page header fields of a raw page
*/
PG_FUNCTION_INFO_V1(page_checksum);
Datum
page_checksum(PG_FUNCTION_ARGS)
2) It seems like you have choosen wrong datatype for page_checksum. I
am getting negative checksum value when trying to run below query. I
think the return type for the SQL function page_checksum should be
'integer' instead of 'smallint'.
postgres=# SELECT page_checksum(get_raw_page('pg_class', 0), 100);
page_checksum
---------------
-19532
(1 row)
Above problem also exist in case of page_header. I am facing similar
problem with page_header as well.
postgres=# SELECT page_header(get_raw_page('pg_class', 0));
page_header
---------------------------------------------
(0/2891538,-28949,1,220,7216,8192,8192,4,0)
(1 row)
3) Any reason for not marking bt_page_items or page_checksum as PARALLEL_SAFE.
4) Error messages in new bt_page_items are not consistent with old
bt_page_items. For eg. if i pass meta page blocknumber as input to
bt_page_items the error message thrown by new and old bt_page_items
are different.
postgres=# SELECT * FROM bt_page_items('btree_index', 0) LIMIT 1;
ERROR: block 0 is a meta page
postgres=# SELECT * FROM bt_page_items(get_raw_page('btree_index', 0)) LIMIT 1;
ERROR: block is a meta page
postgres=# SELECT * FROM bt_page_items(get_raw_page('btree_index',
1024)) LIMIT 1;
ERROR: block number 1024 is out of range for relation "btree_index"
postgres=# SELECT * FROM bt_page_items('btree_index', 1024) LIMIT 1;
ERROR: block number out of range
5) Code duplication in bt_page_items() and bt_page_items_bytea() needs
to be handled.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
On Tue, Mar 7, 2017 at 9:39 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 3/6/17 16:33, Tomas Vondra wrote:
I think it would be better not to maintain so much duplicate code
between bt_page_items(text, int) and bt_page_items(bytea). How about
just redefining bt_page_items(text, int) as an SQL-language function
calling bt_page_items(get_raw_page($1, $2))?Maybe. We can also probably share the code at the C level, so I'll look
into that.I think SQL would be easier, but either way some reduction in
duplication would be good.For page_checksum(), the checksums are internally unsigned, so maybe
return type int on the SQL level, so that there is no confusion about
the sign?That ship already sailed, I'm afraid. We already have checksum in the
page_header() output, and it's defined as smallint there. So using int
here would be inconsistent.OK, no worries then.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/13/2017 06:49 AM, Ashutosh Sharma wrote:
Hi,
I had a look into this patch and would like to share some of my review
comments that requires author's attention.1) The comment for page_checksum() needs to be corrected. It seems
like it has been copied from page_header and not edited it further./*
* page_header
*
* Allows inspection of page header fields of a raw page
*/PG_FUNCTION_INFO_V1(page_checksum);
Datum
page_checksum(PG_FUNCTION_ARGS)
True, will fix.
2) It seems like you have choosen wrong datatype for page_checksum. I
am getting negative checksum value when trying to run below query. I
think the return type for the SQL function page_checksum should be
'integer' instead of 'smallint'.postgres=# SELECT page_checksum(get_raw_page('pg_class', 0), 100);
page_checksum
---------------
-19532
(1 row)Above problem also exist in case of page_header. I am facing similar
problem with page_header as well.postgres=# SELECT page_header(get_raw_page('pg_class', 0));
page_header
---------------------------------------------
(0/2891538,-28949,1,220,7216,8192,8192,4,0)
(1 row)
No. This is consistent with page_header() which is also using smallint
for the checksum value.
3) Any reason for not marking bt_page_items or page_checksum as PARALLEL_SAFE.
No, not really. It's an oversight.
4) Error messages in new bt_page_items are not consistent with old
bt_page_items. For eg. if i pass meta page blocknumber as input to
bt_page_items the error message thrown by new and old bt_page_items
are different.postgres=# SELECT * FROM bt_page_items('btree_index', 0) LIMIT 1;
ERROR: block 0 is a meta pagepostgres=# SELECT * FROM bt_page_items(get_raw_page('btree_index', 0)) LIMIT 1;
ERROR: block is a meta page
Well, the new function does not actually know the block number, so how
could it include it in the error message? You only get the page itself,
and it might be read from anywhere. Granted, the meta page should only
be stored in block 0, but when the storage mixes up the pages somehow,
that's not reliable.
postgres=# SELECT * FROM bt_page_items(get_raw_page('btree_index',
1024)) LIMIT 1;
ERROR: block number 1024 is out of range for relation "btree_index"postgres=# SELECT * FROM bt_page_items('btree_index', 1024) LIMIT 1;
ERROR: block number out of range
Well, that error message does not come from the new function, it comes
from get_raw_page(), so I'm not sure what am I supposed to do with that?
Similarly to the previous case, the new function does not actually know
the block number at all.
5) Code duplication in bt_page_items() and bt_page_items_bytea() needs
to be handled.
Yes. If we adopt the approach proposed by Peter Eisentraut (redirecting
the old bt_page_items using a SQL function calling the new one), it will
also make the error messages consistent.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
2) It seems like you have choosen wrong datatype for page_checksum. I
am getting negative checksum value when trying to run below query. I
think the return type for the SQL function page_checksum should be
'integer' instead of 'smallint'.postgres=# SELECT page_checksum(get_raw_page('pg_class', 0), 100);
page_checksum
---------------
-19532
(1 row)Above problem also exist in case of page_header. I am facing similar
problem with page_header as well.postgres=# SELECT page_header(get_raw_page('pg_class', 0));
page_header
---------------------------------------------
(0/2891538,-28949,1,220,7216,8192,8192,4,0)
(1 row)No. This is consistent with page_header() which is also using smallint for
the checksum value.
Yes. But, as i said earlier I am getting negative checksum value for
page_header as well. Isn't that wrong. For eg. When I debug the
following query, i could pd_checksum value as '40074' in gdb where
page_header shows it as '-25462'.
SELECT page_header(get_raw_page('pg_class', 0));
(gdb) p page->pd_checksum
$2 = 40074
postgres=# SELECT page_header(get_raw_page('pg_class', 0));
page_header
---------------------------------------------
(0/304EDE0,-25462,1,220,7432,8192,8192,4,0)
(1 row)
I think pd_checksum in PageHeaderData is defined as uint16 (0 to
65,535) whereas in SQL function for page_header it is defined as
smallint (-32768 to +32767).
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Ashutosh Sharma wrote:
Yes. But, as i said earlier I am getting negative checksum value for
page_header as well. Isn't that wrong. For eg. When I debug the
following query, i could pd_checksum value as '40074' in gdb where
page_header shows it as '-25462'.
Yes; the point is that this is a pre-existing problem, not one being
introduced with this patch. In SQL we don't have unsigned types, and we
have historically chosen a type with the same width instead of one with
enough width to represent all possible unsigned values. IMO this patch
is doing the right thing; if in the future we want to change that
decision, that's fine but it's a separate patch.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mar 14, 2017 5:37 PM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:
Ashutosh Sharma wrote:
Yes. But, as i said earlier I am getting negative checksum value for
page_header as well. Isn't that wrong. For eg. When I debug the
following query, i could pd_checksum value as '40074' in gdb where
page_header shows it as '-25462'.
Yes; the point is that this is a pre-existing problem, not one being
introduced with this patch. In SQL we don't have unsigned types, and we
have historically chosen a type with the same width instead of one with
enough width to represent all possible unsigned values. IMO this patch
is doing the right thing; if in the future we want to change that
decision, that's fine but it's a separate patch
Okay, understood . Thank you.
With Regards,
Ashutosh Sharma
Import Notes
Reply to msg id not found: CAE9k0PkyMcAYhwVeRX1B2kiCWWRGt2mah5Y5geceebEmcsY7AA@mail.gmail.com
I have committed the page_checksum function, will work on the bt stuff next.
I left in the superuser check, because I was not confident how well
pg_checksum_page() would handle messed up data.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I'm struggling to find a good way to share code between
bt_page_items(text, int4) and bt_page_items(bytea).
If we do it via the SQL route, as I had suggested, it makes the
extension non-relocatable, and it will also create a bit of a mess
during upgrades.
If doing it in C, it will be a bit tricky to pass the SRF context
around. There is no "DirectFunctionCall within SRF context", AFAICT.
I'm half tempted to just rip out the (text, int4) variants.
In any case, I think we should add bytea variants to all the btree
functions, not just the bt_page_items one.
Attached is the remaining patch after the previous commits.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
v3-0001-pageinspect-Add-bt_page_items-function-with-bytea.patchapplication/x-patch; name=v3-0001-pageinspect-Add-bt_page_items-function-with-bytea.patchDownload
From 203fa8aeb22d56e2d24ba4a5cf0e48ab5253a212 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter_e@gmx.net>
Date: Fri, 17 Mar 2017 12:17:29 -0400
Subject: [PATCH v3] pageinspect: Add bt_page_items function with bytea
argument
XXX WIP
Author: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
---
contrib/pageinspect/btreefuncs.c | 126 ++++++++++++++++++++++++++
contrib/pageinspect/expected/btree.out | 13 +++
contrib/pageinspect/pageinspect--1.5--1.6.sql | 14 +++
contrib/pageinspect/sql/btree.sql | 4 +
doc/src/sgml/pageinspect.sgml | 31 +++++++
src/include/access/nbtree.h | 1 +
6 files changed, 189 insertions(+)
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 6f35e288fd..36185db19a 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -41,6 +41,7 @@
PG_FUNCTION_INFO_V1(bt_metap);
PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_items_bytea);
PG_FUNCTION_INFO_V1(bt_page_stats);
#define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -396,6 +397,131 @@ bt_page_items(PG_FUNCTION_ARGS)
}
}
+/*-------------------------------------------------------
+ * bt_page_items_bytea()
+ *
+ * Get IndexTupleData set in a btree page
+ *
+ * Usage: SELECT * FROM bt_page_items(get_raw_page('t1_pkey', 1));
+ *-------------------------------------------------------
+ */
+
+Datum
+bt_page_items_bytea(PG_FUNCTION_ARGS)
+{
+ bytea *raw_page = PG_GETARG_BYTEA_P(0);
+ Datum result;
+ char *values[6];
+ HeapTuple tuple;
+ FuncCallContext *fctx;
+ struct user_args *uargs;
+ int raw_page_size;
+
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use pageinspect functions"))));
+
+ if (SRF_IS_FIRSTCALL())
+ {
+ BTPageOpaque opaque;
+ MemoryContext mctx;
+ TupleDesc tupleDesc;
+
+ raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+ if (raw_page_size < SizeOfPageHeaderData)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small (%d bytes)", raw_page_size)));
+
+ fctx = SRF_FIRSTCALL_INIT();
+ mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+ uargs = palloc(sizeof(struct user_args));
+
+ uargs->page = VARDATA(raw_page);
+
+ uargs->offset = FirstOffsetNumber;
+
+ opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+
+ if (P_ISMETA(opaque))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block is a meta page")));
+
+ if (P_ISDELETED(opaque))
+ elog(NOTICE, "page is deleted");
+
+ fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupleDesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ fctx->attinmeta = TupleDescGetAttInMetadata(tupleDesc);
+
+ fctx->user_fctx = uargs;
+
+ MemoryContextSwitchTo(mctx);
+ }
+
+ fctx = SRF_PERCALL_SETUP();
+ uargs = fctx->user_fctx;
+
+ if (fctx->call_cntr < fctx->max_calls)
+ {
+ ItemId id;
+ IndexTuple itup;
+ int j;
+ int off;
+ int dlen;
+ char *dump;
+ char *ptr;
+
+ id = PageGetItemId(uargs->page, uargs->offset);
+
+ if (!ItemIdIsValid(id))
+ elog(ERROR, "invalid ItemId");
+
+ itup = (IndexTuple) PageGetItem(uargs->page, id);
+
+ j = 0;
+ values[j++] = psprintf("%d", uargs->offset);
+ values[j++] = psprintf("(%u,%u)",
+ BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)),
+ itup->t_tid.ip_posid);
+ values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
+ values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+ values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+ ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+ dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
+ dump = palloc0(dlen * 3 + 1);
+ values[j] = dump;
+ for (off = 0; off < dlen; off++)
+ {
+ if (off > 0)
+ *dump++ = ' ';
+ sprintf(dump, "%02x", *(ptr + off) & 0xff);
+ dump += 2;
+ }
+
+ tuple = BuildTupleFromCStrings(fctx->attinmeta, values);
+ result = HeapTupleGetDatum(tuple);
+
+ uargs->offset = uargs->offset + 1;
+
+ SRF_RETURN_NEXT(fctx, result);
+ }
+ else
+ {
+ pfree(uargs);
+ SRF_RETURN_DONE(fctx);
+ }
+}
+
/* ------------------------------------------------
* bt_metap()
diff --git a/contrib/pageinspect/expected/btree.out b/contrib/pageinspect/expected/btree.out
index 82a49e3d6c..67b103add3 100644
--- a/contrib/pageinspect/expected/btree.out
+++ b/contrib/pageinspect/expected/btree.out
@@ -42,4 +42,17 @@ data | 01 00 00 00 00 00 00 01
SELECT * FROM bt_page_items('test1_a_idx', 2);
ERROR: block number out of range
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
+ERROR: block is a meta page
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
+-[ RECORD 1 ]-----------------------
+itemoffset | 1
+ctid | (0,1)
+itemlen | 16
+nulls | f
+vars | f
+data | 01 00 00 00 00 00 00 01
+
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
+ERROR: block number 2 is out of range for relation "test1_a_idx"
DROP TABLE test1;
diff --git a/contrib/pageinspect/pageinspect--1.5--1.6.sql b/contrib/pageinspect/pageinspect--1.5--1.6.sql
index 6ac2a8011d..c4356288db 100644
--- a/contrib/pageinspect/pageinspect--1.5--1.6.sql
+++ b/contrib/pageinspect/pageinspect--1.5--1.6.sql
@@ -83,3 +83,17 @@ CREATE FUNCTION page_checksum(IN page bytea, IN blkno int4)
RETURNS smallint
AS 'MODULE_PATHNAME', 'page_checksum'
LANGUAGE C STRICT PARALLEL SAFE;
+
+--
+-- bt_page_items_bytea()
+--
+CREATE FUNCTION bt_page_items(IN page bytea,
+ OUT itemoffset smallint,
+ OUT ctid tid,
+ OUT itemlen smallint,
+ OUT nulls bool,
+ OUT vars bool,
+ OUT data text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'bt_page_items_bytea'
+LANGUAGE C STRICT PARALLEL SAFE;
diff --git a/contrib/pageinspect/sql/btree.sql b/contrib/pageinspect/sql/btree.sql
index 1eafc3249c..8eac64c7b3 100644
--- a/contrib/pageinspect/sql/btree.sql
+++ b/contrib/pageinspect/sql/btree.sql
@@ -14,4 +14,8 @@ CREATE INDEX test1_a_idx ON test1 USING btree (a);
SELECT * FROM bt_page_items('test1_a_idx', 1);
SELECT * FROM bt_page_items('test1_a_idx', 2);
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
+
DROP TABLE test1;
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml
index 9f41bb21eb..1e225724a7 100644
--- a/doc/src/sgml/pageinspect.sgml
+++ b/doc/src/sgml/pageinspect.sgml
@@ -333,6 +333,37 @@ <title>B-tree Functions</title>
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>bt_page_items(page bytea) returns setof record</function>
+ <indexterm>
+ <primary>bt_page_items</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Similarly to <function>heap_page_items</function>, it is also possible to
+ pass the page to <function>bt_page_items</function> as a <type>bytea</>
+ value. So the last example may also be rewritten like this:
+<screen>
+test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1));
+ itemoffset | ctid | itemlen | nulls | vars | data
+------------+---------+---------+-------+------+-------------
+ 1 | (0,1) | 12 | f | f | 23 27 00 00
+ 2 | (0,2) | 12 | f | f | 24 27 00 00
+ 3 | (0,3) | 12 | f | f | 25 27 00 00
+ 4 | (0,4) | 12 | f | f | 26 27 00 00
+ 5 | (0,5) | 12 | f | f | 27 27 00 00
+ 6 | (0,6) | 12 | f | f | 28 27 00 00
+ 7 | (0,7) | 12 | f | f | 29 27 00 00
+ 8 | (0,8) | 12 | f | f | 2a 27 00 00
+</screen>
+ All the other details are the same as explained in the previous section.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</sect2>
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 6289ffa9bd..f202715482 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -177,6 +177,7 @@ typedef struct BTMetaPageData
#define P_ISLEAF(opaque) ((opaque)->btpo_flags & BTP_LEAF)
#define P_ISROOT(opaque) ((opaque)->btpo_flags & BTP_ROOT)
#define P_ISDELETED(opaque) ((opaque)->btpo_flags & BTP_DELETED)
+#define P_ISMETA(opaque) ((opaque)->btpo_flags & BTP_META)
#define P_ISHALFDEAD(opaque) ((opaque)->btpo_flags & BTP_HALF_DEAD)
#define P_IGNORE(opaque) ((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD))
#define P_HAS_GARBAGE(opaque) ((opaque)->btpo_flags & BTP_HAS_GARBAGE)
--
2.12.0
Hi,
On 03/17/2017 05:23 PM, Peter Eisentraut wrote:
I'm struggling to find a good way to share code between
bt_page_items(text, int4) and bt_page_items(bytea).If we do it via the SQL route, as I had suggested, it makes the
extension non-relocatable, and it will also create a bit of a mess
during upgrades.If doing it in C, it will be a bit tricky to pass the SRF context
around. There is no "DirectFunctionCall within SRF context", AFAICT.
Not sure what it has to do with DirectFunctionCall? You want to call the
bytea variant from the existing one? Wouldn't it be easier to simply
define a static function with the shared parts, and pass around the
fctx/fcinfo? Not quite pretty, but should work.
I'm half tempted to just rip out the (text, int4) variants.
Perhaps. I see pageinspect as a tool for ad-hoc investigations, and I
can't really imagine it being hard-wired into something.
In any case, I think we should add bytea variants to all the btree
functions, not just the bt_page_items one.
I agree, but I think we need to find a way to share the code between the
text/bytea variants. Unless we rip the text ones out, obviously.
Thanks for the work on the patch, BTW.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/17/17 18:35, Tomas Vondra wrote:
On 03/17/2017 05:23 PM, Peter Eisentraut wrote:
I'm struggling to find a good way to share code between
bt_page_items(text, int4) and bt_page_items(bytea).If we do it via the SQL route, as I had suggested, it makes the
extension non-relocatable, and it will also create a bit of a mess
during upgrades.If doing it in C, it will be a bit tricky to pass the SRF context
around. There is no "DirectFunctionCall within SRF context", AFAICT.Not sure what it has to do with DirectFunctionCall? You want to call the
bytea variant from the existing one? Wouldn't it be easier to simply
define a static function with the shared parts, and pass around the
fctx/fcinfo? Not quite pretty, but should work.
Perhaps what was added in
<http://git.postgresql.org/pg/commitdiff/29bf5016835a2c2c23786f7cda347716c083d95f>
would actually work here.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/23/17 11:27 PM, Peter Eisentraut wrote:
On 3/17/17 18:35, Tomas Vondra wrote:
On 03/17/2017 05:23 PM, Peter Eisentraut wrote:
I'm struggling to find a good way to share code between
bt_page_items(text, int4) and bt_page_items(bytea).If we do it via the SQL route, as I had suggested, it makes the
extension non-relocatable, and it will also create a bit of a mess
during upgrades.If doing it in C, it will be a bit tricky to pass the SRF context
around. There is no "DirectFunctionCall within SRF context", AFAICT.Not sure what it has to do with DirectFunctionCall? You want to call the
bytea variant from the existing one? Wouldn't it be easier to simply
define a static function with the shared parts, and pass around the
fctx/fcinfo? Not quite pretty, but should work.Perhaps what was added in
<http://git.postgresql.org/pg/commitdiff/29bf5016835a2c2c23786f7cda347716c083d95f>
would actually work here.
This thread has been idle for five days. Please respond with a new
patch by 2017-03-30 00:00 AoE (UTC-12) or this submission will be marked
"Returned with Feedback".
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/24/2017 04:27 AM, Peter Eisentraut wrote:
On 3/17/17 18:35, Tomas Vondra wrote:
On 03/17/2017 05:23 PM, Peter Eisentraut wrote:
I'm struggling to find a good way to share code between
bt_page_items(text, int4) and bt_page_items(bytea).If we do it via the SQL route, as I had suggested, it makes the
extension non-relocatable, and it will also create a bit of a mess
during upgrades.If doing it in C, it will be a bit tricky to pass the SRF context
around. There is no "DirectFunctionCall within SRF context", AFAICT.Not sure what it has to do with DirectFunctionCall? You want to call the
bytea variant from the existing one? Wouldn't it be easier to simply
define a static function with the shared parts, and pass around the
fctx/fcinfo? Not quite pretty, but should work.Perhaps what was added in
<http://git.postgresql.org/pg/commitdiff/29bf5016835a2c2c23786f7cda347716c083d95f>
would actually work here.
I've tried to refactor the code using this, but the result was rather
ugly because (a) it really is quite tricky to pass around the contexts
and (b) the sanity checks are quite different for the two input types,
so mixing those parts (essentially the SRF_IS_FIRSTCALL bits) does not
make much sense IMHO.
The attached patch is the best I came up with - it essentially shares
just the tuple-forming part, which is exactly the same in both cases.
It also adds the P_ISMETA(opaque) check to the original function, which
seems like a useful defense against page written to a different place
(which is essentially the issue I was originally investigating).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
pageinspect-btree-bytea-v4.patchtext/x-patch; name=pageinspect-btree-bytea-v4.patchDownload
On 3/29/17 11:08 AM, Tomas Vondra wrote:
The attached patch is the best I came up with - it essentially shares
just the tuple-forming part, which is exactly the same in both cases.
I have marked this submission "Needs Review".
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On Wed, Mar 29, 2017 at 8:38 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On 03/24/2017 04:27 AM, Peter Eisentraut wrote:
On 3/17/17 18:35, Tomas Vondra wrote:
On 03/17/2017 05:23 PM, Peter Eisentraut wrote:
I'm struggling to find a good way to share code between
bt_page_items(text, int4) and bt_page_items(bytea).If we do it via the SQL route, as I had suggested, it makes the
extension non-relocatable, and it will also create a bit of a mess
during upgrades.If doing it in C, it will be a bit tricky to pass the SRF context
around. There is no "DirectFunctionCall within SRF context", AFAICT.Not sure what it has to do with DirectFunctionCall? You want to call the
bytea variant from the existing one? Wouldn't it be easier to simply
define a static function with the shared parts, and pass around the
fctx/fcinfo? Not quite pretty, but should work.Perhaps what was added in
<http://git.postgresql.org/pg/commitdiff/29bf5016835a2c2c23786f7cda347716c083d95f>
would actually work here.I've tried to refactor the code using this, but the result was rather ugly
because (a) it really is quite tricky to pass around the contexts and (b)
the sanity checks are quite different for the two input types, so mixing
those parts (essentially the SRF_IS_FIRSTCALL bits) does not make much sense
IMHO.The attached patch is the best I came up with - it essentially shares just
the tuple-forming part, which is exactly the same in both cases.It also adds the P_ISMETA(opaque) check to the original function, which
seems like a useful defense against page written to a different place (which
is essentially the issue I was originally investigating).
It seems like the latest patch(v4) shared by Tomas upthread is an
empty patch. If I am not wrong, please share the correct patch.
Thanks.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/31/2017 06:01 PM, Ashutosh Sharma wrote:
It seems like the latest patch(v4) shared by Tomas upthread is an
empty patch. If I am not wrong, please share the correct patch.
Thanks.
OMG, this is the second time I managed to generate an empty patch. I
really need to learn not to do that ..
Attached is v5 of the patch, hopefully correct this time ;-)
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
pageinspect-btree-bytea-v5.patchtext/x-patch; name=pageinspect-btree-bytea-v5.patchDownload
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 3d69aa9..bc2d27c 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -41,6 +41,7 @@
PG_FUNCTION_INFO_V1(bt_metap);
PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_items_bytea);
PG_FUNCTION_INFO_V1(bt_page_stats);
#define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -253,14 +254,62 @@ struct user_args
OffsetNumber offset;
};
+/*
+ * bt_page_print_tuples
+ * form a tuple describing index tuple at a given offset
+ */
+static Datum
+bt_page_print_tuples(FuncCallContext *fctx, Page page, OffsetNumber offset)
+{
+ char *values[6];
+ HeapTuple tuple;
+ ItemId id;
+ IndexTuple itup;
+ int j;
+ int off;
+ int dlen;
+ char *dump;
+ char *ptr;
+
+ id = PageGetItemId(page, offset);
+
+ if (!ItemIdIsValid(id))
+ elog(ERROR, "invalid ItemId");
+
+ itup = (IndexTuple) PageGetItem(page, id);
+
+ j = 0;
+ values[j++] = psprintf("%d", offset);
+ values[j++] = psprintf("(%u,%u)",
+ ItemPointerGetBlockNumberNoCheck(&itup->t_tid),
+ ItemPointerGetOffsetNumberNoCheck(&itup->t_tid));
+ values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
+ values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+ values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+ ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+ dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
+ dump = palloc0(dlen * 3 + 1);
+ values[j] = dump;
+ for (off = 0; off < dlen; off++)
+ {
+ if (off > 0)
+ *dump++ = ' ';
+ sprintf(dump, "%02x", *(ptr + off) & 0xff);
+ dump += 2;
+ }
+
+ tuple = BuildTupleFromCStrings(fctx->attinmeta, values);
+
+ return HeapTupleGetDatum(tuple);
+}
+
Datum
bt_page_items(PG_FUNCTION_ARGS)
{
text *relname = PG_GETARG_TEXT_PP(0);
uint32 blkno = PG_GETARG_UINT32(1);
Datum result;
- char *values[6];
- HeapTuple tuple;
FuncCallContext *fctx;
MemoryContext mctx;
struct user_args *uargs;
@@ -300,6 +349,11 @@ bt_page_items(PG_FUNCTION_ARGS)
if (blkno == 0)
elog(ERROR, "block 0 is a meta page");
+ if (P_ISMETA(opaque))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block is a meta page")));
+
CHECK_RELATION_BLOCK_RANGE(rel, blkno);
buffer = ReadBuffer(rel, blkno);
@@ -345,47 +399,8 @@ bt_page_items(PG_FUNCTION_ARGS)
if (fctx->call_cntr < fctx->max_calls)
{
- ItemId id;
- IndexTuple itup;
- int j;
- int off;
- int dlen;
- char *dump;
- char *ptr;
-
- id = PageGetItemId(uargs->page, uargs->offset);
-
- if (!ItemIdIsValid(id))
- elog(ERROR, "invalid ItemId");
-
- itup = (IndexTuple) PageGetItem(uargs->page, id);
-
- j = 0;
- values[j++] = psprintf("%d", uargs->offset);
- values[j++] = psprintf("(%u,%u)",
- ItemPointerGetBlockNumberNoCheck(&itup->t_tid),
- ItemPointerGetOffsetNumberNoCheck(&itup->t_tid));
- values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
- values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
- values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
-
- ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
- dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
- dump = palloc0(dlen * 3 + 1);
- values[j] = dump;
- for (off = 0; off < dlen; off++)
- {
- if (off > 0)
- *dump++ = ' ';
- sprintf(dump, "%02x", *(ptr + off) & 0xff);
- dump += 2;
- }
-
- tuple = BuildTupleFromCStrings(fctx->attinmeta, values);
- result = HeapTupleGetDatum(tuple);
-
- uargs->offset = uargs->offset + 1;
-
+ result = bt_page_print_tuples(fctx, uargs->page, uargs->offset);
+ uargs->offset++;
SRF_RETURN_NEXT(fctx, result);
}
else
@@ -396,6 +411,90 @@ bt_page_items(PG_FUNCTION_ARGS)
}
}
+/*-------------------------------------------------------
+ * bt_page_items_bytea()
+ *
+ * Get IndexTupleData set in a btree page
+ *
+ * Usage: SELECT * FROM bt_page_items(get_raw_page('t1_pkey', 1));
+ *-------------------------------------------------------
+ */
+
+Datum
+bt_page_items_bytea(PG_FUNCTION_ARGS)
+{
+ bytea *raw_page = PG_GETARG_BYTEA_P(0);
+ Datum result;
+ FuncCallContext *fctx;
+ struct user_args *uargs;
+ int raw_page_size;
+
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use pageinspect functions"))));
+
+ if (SRF_IS_FIRSTCALL())
+ {
+ BTPageOpaque opaque;
+ MemoryContext mctx;
+ TupleDesc tupleDesc;
+
+ raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+ if (raw_page_size < SizeOfPageHeaderData)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small (%d bytes)", raw_page_size)));
+
+ fctx = SRF_FIRSTCALL_INIT();
+ mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+ uargs = palloc(sizeof(struct user_args));
+
+ uargs->page = VARDATA(raw_page);
+
+ uargs->offset = FirstOffsetNumber;
+
+ opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+
+ if (P_ISMETA(opaque))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block is a meta page")));
+
+ if (P_ISDELETED(opaque))
+ elog(NOTICE, "page is deleted");
+
+ fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupleDesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ fctx->attinmeta = TupleDescGetAttInMetadata(tupleDesc);
+
+ fctx->user_fctx = uargs;
+
+ MemoryContextSwitchTo(mctx);
+ }
+
+ fctx = SRF_PERCALL_SETUP();
+ uargs = fctx->user_fctx;
+
+ if (fctx->call_cntr < fctx->max_calls)
+ {
+ result = bt_page_print_tuples(fctx, uargs->page, uargs->offset);
+ uargs->offset++;
+ SRF_RETURN_NEXT(fctx, result);
+ }
+ else
+ {
+ pfree(uargs);
+ SRF_RETURN_DONE(fctx);
+ }
+}
+
/* ------------------------------------------------
* bt_metap()
diff --git a/contrib/pageinspect/expected/btree.out b/contrib/pageinspect/expected/btree.out
index 82a49e3..67b103a 100644
--- a/contrib/pageinspect/expected/btree.out
+++ b/contrib/pageinspect/expected/btree.out
@@ -42,4 +42,17 @@ data | 01 00 00 00 00 00 00 01
SELECT * FROM bt_page_items('test1_a_idx', 2);
ERROR: block number out of range
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
+ERROR: block is a meta page
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
+-[ RECORD 1 ]-----------------------
+itemoffset | 1
+ctid | (0,1)
+itemlen | 16
+nulls | f
+vars | f
+data | 01 00 00 00 00 00 00 01
+
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
+ERROR: block number 2 is out of range for relation "test1_a_idx"
DROP TABLE test1;
diff --git a/contrib/pageinspect/pageinspect--1.5--1.6.sql b/contrib/pageinspect/pageinspect--1.5--1.6.sql
index 6ac2a80..c435628 100644
--- a/contrib/pageinspect/pageinspect--1.5--1.6.sql
+++ b/contrib/pageinspect/pageinspect--1.5--1.6.sql
@@ -83,3 +83,17 @@ CREATE FUNCTION page_checksum(IN page bytea, IN blkno int4)
RETURNS smallint
AS 'MODULE_PATHNAME', 'page_checksum'
LANGUAGE C STRICT PARALLEL SAFE;
+
+--
+-- bt_page_items_bytea()
+--
+CREATE FUNCTION bt_page_items(IN page bytea,
+ OUT itemoffset smallint,
+ OUT ctid tid,
+ OUT itemlen smallint,
+ OUT nulls bool,
+ OUT vars bool,
+ OUT data text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'bt_page_items_bytea'
+LANGUAGE C STRICT PARALLEL SAFE;
diff --git a/contrib/pageinspect/sql/btree.sql b/contrib/pageinspect/sql/btree.sql
index 1eafc32..8eac64c 100644
--- a/contrib/pageinspect/sql/btree.sql
+++ b/contrib/pageinspect/sql/btree.sql
@@ -14,4 +14,8 @@ SELECT * FROM bt_page_items('test1_a_idx', 0);
SELECT * FROM bt_page_items('test1_a_idx', 1);
SELECT * FROM bt_page_items('test1_a_idx', 2);
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
+
DROP TABLE test1;
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml
index 9f41bb2..1e22572 100644
--- a/doc/src/sgml/pageinspect.sgml
+++ b/doc/src/sgml/pageinspect.sgml
@@ -333,6 +333,37 @@ test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>bt_page_items(page bytea) returns setof record</function>
+ <indexterm>
+ <primary>bt_page_items</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Similarly to <function>heap_page_items</function>, it is also possible to
+ pass the page to <function>bt_page_items</function> as a <type>bytea</>
+ value. So the last example may also be rewritten like this:
+<screen>
+test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1));
+ itemoffset | ctid | itemlen | nulls | vars | data
+------------+---------+---------+-------+------+-------------
+ 1 | (0,1) | 12 | f | f | 23 27 00 00
+ 2 | (0,2) | 12 | f | f | 24 27 00 00
+ 3 | (0,3) | 12 | f | f | 25 27 00 00
+ 4 | (0,4) | 12 | f | f | 26 27 00 00
+ 5 | (0,5) | 12 | f | f | 27 27 00 00
+ 6 | (0,6) | 12 | f | f | 28 27 00 00
+ 7 | (0,7) | 12 | f | f | 29 27 00 00
+ 8 | (0,8) | 12 | f | f | 2a 27 00 00
+</screen>
+ All the other details are the same as explained in the previous section.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</sect2>
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index f9304db..15771ce 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -176,6 +176,7 @@ typedef struct BTMetaPageData
#define P_ISLEAF(opaque) ((opaque)->btpo_flags & BTP_LEAF)
#define P_ISROOT(opaque) ((opaque)->btpo_flags & BTP_ROOT)
#define P_ISDELETED(opaque) ((opaque)->btpo_flags & BTP_DELETED)
+#define P_ISMETA(opaque) ((opaque)->btpo_flags & BTP_META)
#define P_ISHALFDEAD(opaque) ((opaque)->btpo_flags & BTP_HALF_DEAD)
#define P_IGNORE(opaque) ((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD))
#define P_HAS_GARBAGE(opaque) ((opaque)->btpo_flags & BTP_HAS_GARBAGE)
Hi Tomas,
On Fri, Mar 31, 2017 at 11:05 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:
On 03/31/2017 06:01 PM, Ashutosh Sharma wrote:
It seems like the latest patch(v4) shared by Tomas upthread is an
empty patch. If I am not wrong, please share the correct patch.
Thanks.OMG, this is the second time I managed to generate an empty patch. I
really
need to learn not to do that ..
Attached is v5 of the patch, hopefully correct this time ;-)
Yes, it was correct. I have spent some time on reviewing your patch today
and the review comments are as follows.
1. It would be good to have a blank line in between following set of lines
describing bt_page_print_tuples.
+/*
+ * bt_page_print_tuples
+ * form a tuple describing index tuple at a given offset
+ */
Something like this,
+/* ---------------------------------------------------------------------
+ * bt_page_print_tuples()
+ *
+ * Form a tuple describing index tuple at a given offset
+ * ---------------------------------------------------------------------
+ */
Please note that, if you do not agree with my suggestion, you may ignore it
:)
2. I think the following comments for bt_page_items needs to be moved to
the right place in the code.
/*-------------------------------------------------------------------------
* bt_page_items()
*
* Get IndexTupleData set in a btree page
*
* Usage: SELECT * FROM bt_page_items('t1_pkey', 1);
*-------------------------------------------------------------------------
*/
/*
* cross-call data structure for SRF
*/
*struct user_args{ Page page; OffsetNumber offset;};*
With your patch, above structure definition is followed by the definition
for bt_page_print_tuples() not bt_page_items(), hence you may need to put
the comments for bt_page_items just above it's definition.
3. I am seeing a server crash when running the sql statement 'SELECT * FROM
bt_page_items('bt_i4_index', 1);'. Here is the backtrace.
I think this crash is happening because in bt_page_items, without reading
opaque region of a page, you have added this if statement,
*if (P_ISMETA(opaque)) ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("block
is a meta page")));*
Please pull back above if statement below the following line to get rid of
this crash.
opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
OR, the best thing to do would be to retain the earlier if statement for
checking meta page because that way you can avoid reading a
buffer unnecessarily.
#0 0x00007f592975030c in bt_page_items (fcinfo=0x7fffe5fdb860) at
btreefuncs.c:352
352 if (P_ISMETA(opaque))
Missing separate debuginfos, use: debuginfo-install glibc-2.17-78.el7.x86_64
(gdb) bt
#0 0x00007f592975030c in bt_page_items (fcinfo=0x7fffe5fdb860) at
btreefuncs.c:352
#1 0x00000000006797e0 in ExecMakeTableFunctionResult (setexpr=0x21269f8,
econtext=0x2126738, argContext=0x2136c98, expectedDesc=0x2155178,
randomAccess=0 '\000') at execSRF.c:230
#2 0x0000000000689dda in FunctionNext (node=0x2126628) at
nodeFunctionscan.c:94
#3 0x0000000000678f0e in ExecScanFetch (node=0x2126628, accessMtd=0x689d23
<FunctionNext>, recheckMtd=0x68a108 <FunctionRecheck>) at execScan.c:95
#4 0x0000000000678f7d in ExecScan (node=0x2126628, accessMtd=0x689d23
<FunctionNext>, recheckMtd=0x68a108 <FunctionRecheck>) at execScan.c:143
4. If you agree with the reason mentioned by me in comment #3, please
remove the following if statement from bt_page_items(),
*if (P_ISMETA(opaque)) ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("block
is a meta page")));*
Apart from above comments, your patch looks good to me. I have also marked
this patch as 'Waiting for Author' in the commitfest. Thanks.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
Hi Tomas,
On 4/1/17 5:40 AM, Ashutosh Sharma wrote:
Apart from above comments, your patch looks good to me. I have also
marked this patch as 'Waiting for Author' in the commitfest. Thanks.
The CF has been extended until April 7 but time is still growing short
and this thread has been idle for three days. Please respond with a new
patch by 2017-04-05 00:00 AoE (UTC-12) or this submission will be marked
"Returned with Feedback".
Thank,
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4 April 2017 at 09:05, David Steele <david@pgmasters.net> wrote:
Hi Tomas,
On 4/1/17 5:40 AM, Ashutosh Sharma wrote:
Apart from above comments, your patch looks good to me. I have also
marked this patch as 'Waiting for Author' in the commitfest. Thanks.The CF has been extended until April 7 but time is still growing short
and this thread has been idle for three days. Please respond with a new
patch by 2017-04-05 00:00 AoE (UTC-12) or this submission will be marked
"Returned with Feedback".
We know time is growing short. Is it necessary to say that on every
single thread? What benefit do we get from that?
We voted to extend the time until the deadline. In some cases we may
well take it all the way to the deadline, so fighting you or others
trying to reject things *before* the deadline *because* of the
deadline seems weird.
How about we just leave everything until the deadline, then apply the
sword swiftly to anything that remains?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/4/17 9:11 AM, Simon Riggs wrote:
On 4 April 2017 at 09:05, David Steele <david@pgmasters.net> wrote:
Hi Tomas,
On 4/1/17 5:40 AM, Ashutosh Sharma wrote:
Apart from above comments, your patch looks good to me. I have also
marked this patch as 'Waiting for Author' in the commitfest. Thanks.The CF has been extended until April 7 but time is still growing short
and this thread has been idle for three days. Please respond with a new
patch by 2017-04-05 00:00 AoE (UTC-12) or this submission will be marked
"Returned with Feedback".We know time is growing short. Is it necessary to say that on every
single thread? What benefit do we get from that?
Hackers has become such a fire hose that I don't take it for granted
that all messages are read by everyone. This may be news to some
authors, especially those that have not responded on a thread since
March 31.
We voted to extend the time until the deadline. In some cases we may
well take it all the way to the deadline, so fighting you or others
trying to reject things *before* the deadline *because* of the
deadline seems weird.
My goal is to help people focus on patches that have a chance. At this
point I think that includes poking authors who are not being responsive
using the limited means at my disposal.
How about we just leave everything until the deadline, then apply the
sword swiftly to anything that remains?
I didn't take the extension to mean that I should stop CF management.
However, If you think I have exceeded my mandate over the course of the
CF, then I'm happy to discuss that.
Regards,
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 4, 2017 at 9:32 AM, David Steele <david@pgmasters.net> wrote:
My goal is to help people focus on patches that have a chance. At this
point I think that includes poking authors who are not being responsive
using the limited means at my disposal.
+1. Pings on specific threads can help clear things up when, for
example, the author and reviewer are each waiting for the other. And,
as you say, they also help avoid the situation where a patch just
falls off the radar and misses the release for no especially good
reason, which naturally causes people frustration.
I think your pings have been quite helpful, although I think it would
have been better in some cases if you'd done them sooner. Pinging
after a week, with a 3 day deadline, when there are only a few days
left in the CommitFest isn't really leaving a lot of room to maneuver.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/4/17 9:43 AM, Robert Haas wrote:
On Tue, Apr 4, 2017 at 9:32 AM, David Steele <david@pgmasters.net> wrote:
My goal is to help people focus on patches that have a chance. At this
point I think that includes poking authors who are not being responsive
using the limited means at my disposal.+1. Pings on specific threads can help clear things up when, for
example, the author and reviewer are each waiting for the other. And,
as you say, they also help avoid the situation where a patch just
falls off the radar and misses the release for no especially good
reason, which naturally causes people frustration.I think your pings have been quite helpful, although I think it would
have been better in some cases if you'd done them sooner. Pinging
after a week, with a 3 day deadline, when there are only a few days
left in the CommitFest isn't really leaving a lot of room to maneuver.
Thanks for the feedback! My thinking is that I don't want to bug people
too soon, but there's a maximum number of days a thread should be idle.
Over the course of the CF that has gone from 10 days, to 7, 5, and 3.
I don't look at all patches every day so it can be a bit uneven, i.e.,
all patches are allowed certain amount of idle time, but some may get a
bit more depending on when I check up on them.
Thanks,
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
As I am not seeing any response from Tomas for last 2-3 days and since
the commit-fest is coming towards end, I have planned to work on the
review comments that I had given few days back and submit the updated
patch. PFA new version of patch that takes care of all the review
comments given by me. I have also ran pgindent on btreefuncs.c file
and have run some basic test cases. All looks fine to me now!
Please note that this patch still belongs to Tomas not me. I still
remain the reviewer of this patch. The same thing has been very
clearly mentioned in the attached patch. The only intention behind
Ashutosh (reviewer) working on this patch is to ensure that we do not
miss the things that can easily get committed in this commit-fest.
Thanks.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
Show quoted text
On Tue, Apr 4, 2017 at 7:23 PM, David Steele <david@pgmasters.net> wrote:
On 4/4/17 9:43 AM, Robert Haas wrote:
On Tue, Apr 4, 2017 at 9:32 AM, David Steele <david@pgmasters.net> wrote:
My goal is to help people focus on patches that have a chance. At this
point I think that includes poking authors who are not being responsive
using the limited means at my disposal.+1. Pings on specific threads can help clear things up when, for
example, the author and reviewer are each waiting for the other. And,
as you say, they also help avoid the situation where a patch just
falls off the radar and misses the release for no especially good
reason, which naturally causes people frustration.I think your pings have been quite helpful, although I think it would
have been better in some cases if you'd done them sooner. Pinging
after a week, with a 3 day deadline, when there are only a few days
left in the CommitFest isn't really leaving a lot of room to maneuver.Thanks for the feedback! My thinking is that I don't want to bug people
too soon, but there's a maximum number of days a thread should be idle.
Over the course of the CF that has gone from 10 days, to 7, 5, and 3.I don't look at all patches every day so it can be a bit uneven, i.e.,
all patches are allowed certain amount of idle time, but some may get a
bit more depending on when I check up on them.Thanks,
--
-David
david@pgmasters.net
Attachments:
0001-pageinspect-Add-bt_page_items-function-with-bytea-v6.patchtext/x-patch; charset=US-ASCII; name=0001-pageinspect-Add-bt_page_items-function-with-bytea-v6.patchDownload
From 6d9814182eb03521f093d687eb8024c9df1c11d5 Mon Sep 17 00:00:00 2001
From: ashu <ashu@localhost.localdomain>
Date: Tue, 4 Apr 2017 21:38:42 +0530
Subject: [PATCH] pageinspect: Add bt_page_items function with bytea v6
Author: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com>
---
contrib/pageinspect/btreefuncs.c | 198 +++++++++++++++++++-------
contrib/pageinspect/expected/btree.out | 13 ++
contrib/pageinspect/pageinspect--1.5--1.6.sql | 14 ++
contrib/pageinspect/sql/btree.sql | 4 +
doc/src/sgml/pageinspect.sgml | 31 ++++
src/include/access/nbtree.h | 1 +
6 files changed, 210 insertions(+), 51 deletions(-)
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 3d69aa9..02440ec 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -41,6 +41,7 @@
PG_FUNCTION_INFO_V1(bt_metap);
PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_items_bytea);
PG_FUNCTION_INFO_V1(bt_page_stats);
#define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -235,14 +236,6 @@ bt_page_stats(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(result);
}
-/*-------------------------------------------------------
- * bt_page_items()
- *
- * Get IndexTupleData set in a btree page
- *
- * Usage: SELECT * FROM bt_page_items('t1_pkey', 1);
- *-------------------------------------------------------
- */
/*
* cross-call data structure for SRF
@@ -253,14 +246,72 @@ struct user_args
OffsetNumber offset;
};
+/*-------------------------------------------------------
+ * bt_page_print_tuples()
+ *
+ * Form a tuple describing index tuple at a given offset
+ * ------------------------------------------------------
+ */
+static Datum
+bt_page_print_tuples(FuncCallContext *fctx, Page page, OffsetNumber offset)
+{
+ char *values[6];
+ HeapTuple tuple;
+ ItemId id;
+ IndexTuple itup;
+ int j;
+ int off;
+ int dlen;
+ char *dump;
+ char *ptr;
+
+ id = PageGetItemId(page, offset);
+
+ if (!ItemIdIsValid(id))
+ elog(ERROR, "invalid ItemId");
+
+ itup = (IndexTuple) PageGetItem(page, id);
+
+ j = 0;
+ values[j++] = psprintf("%d", offset);
+ values[j++] = psprintf("(%u,%u)",
+ ItemPointerGetBlockNumberNoCheck(&itup->t_tid),
+ ItemPointerGetOffsetNumberNoCheck(&itup->t_tid));
+ values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
+ values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+ values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+ ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+ dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
+ dump = palloc0(dlen * 3 + 1);
+ values[j] = dump;
+ for (off = 0; off < dlen; off++)
+ {
+ if (off > 0)
+ *dump++ = ' ';
+ sprintf(dump, "%02x", *(ptr + off) & 0xff);
+ dump += 2;
+ }
+
+ tuple = BuildTupleFromCStrings(fctx->attinmeta, values);
+
+ return HeapTupleGetDatum(tuple);
+}
+
+/*-------------------------------------------------------
+ * bt_page_items()
+ *
+ * Get IndexTupleData set in a btree page
+ *
+ * Usage: SELECT * FROM bt_page_items('t1_pkey', 1);
+ *-------------------------------------------------------
+ */
Datum
bt_page_items(PG_FUNCTION_ARGS)
{
text *relname = PG_GETARG_TEXT_PP(0);
uint32 blkno = PG_GETARG_UINT32(1);
Datum result;
- char *values[6];
- HeapTuple tuple;
FuncCallContext *fctx;
MemoryContext mctx;
struct user_args *uargs;
@@ -345,47 +396,8 @@ bt_page_items(PG_FUNCTION_ARGS)
if (fctx->call_cntr < fctx->max_calls)
{
- ItemId id;
- IndexTuple itup;
- int j;
- int off;
- int dlen;
- char *dump;
- char *ptr;
-
- id = PageGetItemId(uargs->page, uargs->offset);
-
- if (!ItemIdIsValid(id))
- elog(ERROR, "invalid ItemId");
-
- itup = (IndexTuple) PageGetItem(uargs->page, id);
-
- j = 0;
- values[j++] = psprintf("%d", uargs->offset);
- values[j++] = psprintf("(%u,%u)",
- ItemPointerGetBlockNumberNoCheck(&itup->t_tid),
- ItemPointerGetOffsetNumberNoCheck(&itup->t_tid));
- values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
- values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
- values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
-
- ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
- dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
- dump = palloc0(dlen * 3 + 1);
- values[j] = dump;
- for (off = 0; off < dlen; off++)
- {
- if (off > 0)
- *dump++ = ' ';
- sprintf(dump, "%02x", *(ptr + off) & 0xff);
- dump += 2;
- }
-
- tuple = BuildTupleFromCStrings(fctx->attinmeta, values);
- result = HeapTupleGetDatum(tuple);
-
- uargs->offset = uargs->offset + 1;
-
+ result = bt_page_print_tuples(fctx, uargs->page, uargs->offset);
+ uargs->offset++;
SRF_RETURN_NEXT(fctx, result);
}
else
@@ -396,6 +408,90 @@ bt_page_items(PG_FUNCTION_ARGS)
}
}
+/*-------------------------------------------------------
+ * bt_page_items_bytea()
+ *
+ * Get IndexTupleData set in a btree page
+ *
+ * Usage: SELECT * FROM bt_page_items(get_raw_page('t1_pkey', 1));
+ *-------------------------------------------------------
+ */
+
+Datum
+bt_page_items_bytea(PG_FUNCTION_ARGS)
+{
+ bytea *raw_page = PG_GETARG_BYTEA_P(0);
+ Datum result;
+ FuncCallContext *fctx;
+ struct user_args *uargs;
+ int raw_page_size;
+
+ if (!superuser())
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use pageinspect functions"))));
+
+ if (SRF_IS_FIRSTCALL())
+ {
+ BTPageOpaque opaque;
+ MemoryContext mctx;
+ TupleDesc tupleDesc;
+
+ raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
+
+ if (raw_page_size < SizeOfPageHeaderData)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("input page too small (%d bytes)", raw_page_size)));
+
+ fctx = SRF_FIRSTCALL_INIT();
+ mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+ uargs = palloc(sizeof(struct user_args));
+
+ uargs->page = VARDATA(raw_page);
+
+ uargs->offset = FirstOffsetNumber;
+
+ opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+
+ if (P_ISMETA(opaque))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("block is a meta page")));
+
+ if (P_ISDELETED(opaque))
+ elog(NOTICE, "page is deleted");
+
+ fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+
+ /* Build a tuple descriptor for our result type */
+ if (get_call_result_type(fcinfo, NULL, &tupleDesc) != TYPEFUNC_COMPOSITE)
+ elog(ERROR, "return type must be a row type");
+
+ fctx->attinmeta = TupleDescGetAttInMetadata(tupleDesc);
+
+ fctx->user_fctx = uargs;
+
+ MemoryContextSwitchTo(mctx);
+ }
+
+ fctx = SRF_PERCALL_SETUP();
+ uargs = fctx->user_fctx;
+
+ if (fctx->call_cntr < fctx->max_calls)
+ {
+ result = bt_page_print_tuples(fctx, uargs->page, uargs->offset);
+ uargs->offset++;
+ SRF_RETURN_NEXT(fctx, result);
+ }
+ else
+ {
+ pfree(uargs);
+ SRF_RETURN_DONE(fctx);
+ }
+}
+
/* ------------------------------------------------
* bt_metap()
diff --git a/contrib/pageinspect/expected/btree.out b/contrib/pageinspect/expected/btree.out
index 82a49e3..67b103a 100644
--- a/contrib/pageinspect/expected/btree.out
+++ b/contrib/pageinspect/expected/btree.out
@@ -42,4 +42,17 @@ data | 01 00 00 00 00 00 00 01
SELECT * FROM bt_page_items('test1_a_idx', 2);
ERROR: block number out of range
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
+ERROR: block is a meta page
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
+-[ RECORD 1 ]-----------------------
+itemoffset | 1
+ctid | (0,1)
+itemlen | 16
+nulls | f
+vars | f
+data | 01 00 00 00 00 00 00 01
+
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
+ERROR: block number 2 is out of range for relation "test1_a_idx"
DROP TABLE test1;
diff --git a/contrib/pageinspect/pageinspect--1.5--1.6.sql b/contrib/pageinspect/pageinspect--1.5--1.6.sql
index 6ac2a80..c435628 100644
--- a/contrib/pageinspect/pageinspect--1.5--1.6.sql
+++ b/contrib/pageinspect/pageinspect--1.5--1.6.sql
@@ -83,3 +83,17 @@ CREATE FUNCTION page_checksum(IN page bytea, IN blkno int4)
RETURNS smallint
AS 'MODULE_PATHNAME', 'page_checksum'
LANGUAGE C STRICT PARALLEL SAFE;
+
+--
+-- bt_page_items_bytea()
+--
+CREATE FUNCTION bt_page_items(IN page bytea,
+ OUT itemoffset smallint,
+ OUT ctid tid,
+ OUT itemlen smallint,
+ OUT nulls bool,
+ OUT vars bool,
+ OUT data text)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'bt_page_items_bytea'
+LANGUAGE C STRICT PARALLEL SAFE;
diff --git a/contrib/pageinspect/sql/btree.sql b/contrib/pageinspect/sql/btree.sql
index 1eafc32..8eac64c 100644
--- a/contrib/pageinspect/sql/btree.sql
+++ b/contrib/pageinspect/sql/btree.sql
@@ -14,4 +14,8 @@ SELECT * FROM bt_page_items('test1_a_idx', 0);
SELECT * FROM bt_page_items('test1_a_idx', 1);
SELECT * FROM bt_page_items('test1_a_idx', 2);
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 0));
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 1));
+SELECT * FROM bt_page_items(get_raw_page('test1_a_idx', 2));
+
DROP TABLE test1;
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml
index c87b160..499d3e9 100644
--- a/doc/src/sgml/pageinspect.sgml
+++ b/doc/src/sgml/pageinspect.sgml
@@ -333,6 +333,37 @@ test=# SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
</para>
</listitem>
</varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>bt_page_items(page bytea) returns setof record</function>
+ <indexterm>
+ <primary>bt_page_items</primary>
+ </indexterm>
+ </term>
+
+ <listitem>
+ <para>
+ Similarly to <function>heap_page_items</function>, it is also possible to
+ pass the page to <function>bt_page_items</function> as a <type>bytea</>
+ value. So the last example may also be rewritten like this:
+<screen>
+test=# SELECT * FROM bt_page_items(get_raw_page('pg_cast_oid_index', 1));
+ itemoffset | ctid | itemlen | nulls | vars | data
+------------+---------+---------+-------+------+-------------
+ 1 | (0,1) | 12 | f | f | 23 27 00 00
+ 2 | (0,2) | 12 | f | f | 24 27 00 00
+ 3 | (0,3) | 12 | f | f | 25 27 00 00
+ 4 | (0,4) | 12 | f | f | 26 27 00 00
+ 5 | (0,5) | 12 | f | f | 27 27 00 00
+ 6 | (0,6) | 12 | f | f | 28 27 00 00
+ 7 | (0,7) | 12 | f | f | 29 27 00 00
+ 8 | (0,8) | 12 | f | f | 2a 27 00 00
+</screen>
+ All the other details are the same as explained in the previous section.
+ </para>
+ </listitem>
+ </varlistentry>
</variablelist>
</sect2>
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index f9304db..15771ce 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -176,6 +176,7 @@ typedef struct BTMetaPageData
#define P_ISLEAF(opaque) ((opaque)->btpo_flags & BTP_LEAF)
#define P_ISROOT(opaque) ((opaque)->btpo_flags & BTP_ROOT)
#define P_ISDELETED(opaque) ((opaque)->btpo_flags & BTP_DELETED)
+#define P_ISMETA(opaque) ((opaque)->btpo_flags & BTP_META)
#define P_ISHALFDEAD(opaque) ((opaque)->btpo_flags & BTP_HALF_DEAD)
#define P_IGNORE(opaque) ((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD))
#define P_HAS_GARBAGE(opaque) ((opaque)->btpo_flags & BTP_HAS_GARBAGE)
--
1.8.3.1
Thanks. I planned to look into this today, but you've been faster ;-)
regards
Tomas
On 04/04/2017 06:55 PM, Ashutosh Sharma wrote:
Hi,
As I am not seeing any response from Tomas for last 2-3 days and since
the commit-fest is coming towards end, I have planned to work on the
review comments that I had given few days back and submit the updated
patch. PFA new version of patch that takes care of all the review
comments given by me. I have also ran pgindent on btreefuncs.c file
and have run some basic test cases. All looks fine to me now!Please note that this patch still belongs to Tomas not me. I still
remain the reviewer of this patch. The same thing has been very
clearly mentioned in the attached patch. The only intention behind
Ashutosh (reviewer) working on this patch is to ensure that we do not
miss the things that can easily get committed in this commit-fest.
Thanks.
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/4/17 12:55 PM, Ashutosh Sharma wrote:
As I am not seeing any response from Tomas for last 2-3 days and since
the commit-fest is coming towards end, I have planned to work on the
review comments that I had given few days back and submit the updated
patch. PFA new version of patch that takes care of all the review
comments given by me. I have also ran pgindent on btreefuncs.c file
and have run some basic test cases. All looks fine to me now!
Awesome, Ashutosh!
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 4/4/17 12:55, Ashutosh Sharma wrote:
As I am not seeing any response from Tomas for last 2-3 days and since
the commit-fest is coming towards end, I have planned to work on the
review comments that I had given few days back and submit the updated
patch. PFA new version of patch that takes care of all the review
comments given by me. I have also ran pgindent on btreefuncs.c file
and have run some basic test cases. All looks fine to me now!
committed
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers