BRIN page type identifier
The "special" area in a BRIN page looks like this:
/* special space on all BRIN pages stores a "type" identifier */
#define BRIN_PAGETYPE_META 0xF091
#define BRIN_PAGETYPE_REVMAP 0xF092
#define BRIN_PAGETYPE_REGULAR 0xF093
...
typedef struct BrinSpecialSpace
{
uint16 flags;
uint16 type;
} BrinSpecialSpace;
I believe this is supposed to follow the usual convention that the last
two bytes of a page can be used to identify the page type. SP-GiST uses
0xFF82, while GIN uses values 0x00XX.
However, because the special size is MAXALIGNed, the 'type' field are
not the last 2 bytes on the page, as intended. I'd suggest just adding
"char padding[6]" in BrinSpecialSpace, before 'flags'. That'll waste 4
bytes on 32-bit systems, but that seems acceptable.
Also, the comment in GinPageOpaqueData needs updating:
/*
* Page opaque data in an inverted index page.
*
* Note: GIN does not include a page ID word as do the other index types.
* This is OK because the opaque data is only 8 bytes and so can be reliably
* distinguished by size. Revisit this if the size ever increases.
* Further note: as of 9.2, SP-GiST also uses 8-byte special space. This is
* still OK, as long as GIN isn't using all of the high-order bits in its
* flags word, because that way the flags word cannot match the page ID used
* by SP-GiST.
*/
BRIN now also uses 8-byte special space. While you're at it, might want
to move that comment somewhere else, as it's really about a convention
among all page types, not just GIN.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Heikki Linnakangas wrote:
The "special" area in a BRIN page looks like this:
/* special space on all BRIN pages stores a "type" identifier */
#define BRIN_PAGETYPE_META 0xF091
#define BRIN_PAGETYPE_REVMAP 0xF092
#define BRIN_PAGETYPE_REGULAR 0xF093
...
typedef struct BrinSpecialSpace
{
uint16 flags;
uint16 type;
} BrinSpecialSpace;I believe this is supposed to follow the usual convention that the last two
bytes of a page can be used to identify the page type. SP-GiST uses 0xFF82,
while GIN uses values 0x00XX.However, because the special size is MAXALIGNed, the 'type' field are not
the last 2 bytes on the page, as intended. I'd suggest just adding "char
padding[6]" in BrinSpecialSpace, before 'flags'. That'll waste 4 bytes on
32-bit systems, but that seems acceptable.
Ouch. You're right. I don't understand why you suggest to use 6 bytes,
though -- that would make the struct size be 10 bytes, which maxaligns
to 16, and so we're back where we started. Using 4 bytes does the
trick.
I wonder if this is permissible and whether it will do the right thing
on 32-bit systems:
/*
* Special area of BRIN pages.
*
* We add some padding bytes to ensure that 'type' ends up in the last two
* bytes of the page, for consumption by pg_filedump and similar utilities.
* (Special space is MAXALIGN'ed).
*/
typedef struct BrinSpecialSpace
{
char padding[MAXALIGN(1) - 2 * sizeof(uint16)];
uint16 flags;
uint16 type;
} BrinSpecialSpace;
It's a bit ugly, but it seems to work for me on x86-64 ...
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I wonder if this is permissible and whether it will do the right thing
on 32-bit systems:
/*
* Special area of BRIN pages.
*
* We add some padding bytes to ensure that 'type' ends up in the last two
* bytes of the page, for consumption by pg_filedump and similar utilities.
* (Special space is MAXALIGN'ed).
*/
typedef struct BrinSpecialSpace
{
char padding[MAXALIGN(1) - 2 * sizeof(uint16)];
uint16 flags;
uint16 type;
} BrinSpecialSpace;
I should expect that to fail altogether on 32-bit machines, because
the declared array size will be zero.
You could try something like
typedef struct BrinSpecialSpace
{
uint16 vector[MAXALIGN(1) / sizeof(uint16)];
} BrinSpecialSpace;
and then some access macros to use the last and next-to-last
elements of that array.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
typedef struct BrinSpecialSpace
{
char padding[MAXALIGN(1) - 2 * sizeof(uint16)];
uint16 flags;
uint16 type;
} BrinSpecialSpace;I should expect that to fail altogether on 32-bit machines, because
the declared array size will be zero.
Hah, of course.
You could try something like
typedef struct BrinSpecialSpace
{
uint16 vector[MAXALIGN(1) / sizeof(uint16)];
} BrinSpecialSpace;and then some access macros to use the last and next-to-last
elements of that array.
Ah, thanks, that works fine on x86-64. Here's a patch I intend to push
tomorrow.
Heikki suggested that the comment above GinPageOpaqueData be moved to
some better place, but I couldn't find any such. If there are other
ideas, I'm all ears.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
brin-pagetype.patchtext/x-diff; charset=us-asciiDownload
commit 8532bfeffdaeffa1d49c390ba6b7b0b46a20afb7
Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Mon Mar 9 23:18:16 2015 -0300
Move BRIN page type to page's last two bytes
... which is the usual convention, so that pg_filedump and similar
utilities can tell apart pages of different AMs.
Per note from Heikki at
http://www.postgresql.org/message-id/546A16EF.9070005@vmware.com
diff --git a/contrib/pageinspect/brinfuncs.c b/contrib/pageinspect/brinfuncs.c
index 630dda4..1b15a7b 100644
--- a/contrib/pageinspect/brinfuncs.c
+++ b/contrib/pageinspect/brinfuncs.c
@@ -58,12 +58,9 @@ brin_page_type(PG_FUNCTION_ARGS)
{
bytea *raw_page = PG_GETARG_BYTEA_P(0);
Page page = VARDATA(raw_page);
- BrinSpecialSpace *special;
char *type;
- special = (BrinSpecialSpace *) PageGetSpecialPointer(page);
-
- switch (special->type)
+ switch (BrinPageType(page))
{
case BRIN_PAGETYPE_META:
type = "meta";
@@ -75,7 +72,7 @@ brin_page_type(PG_FUNCTION_ARGS)
type = "regular";
break;
default:
- type = psprintf("unknown (%02x)", special->type);
+ type = psprintf("unknown (%02x)", BrinPageType(page));
break;
}
@@ -91,7 +88,6 @@ verify_brin_page(bytea *raw_page, uint16 type, const char *strtype)
{
Page page;
int raw_page_size;
- BrinSpecialSpace *special;
raw_page_size = VARSIZE(raw_page) - VARHDRSZ;
@@ -104,13 +100,12 @@ verify_brin_page(bytea *raw_page, uint16 type, const char *strtype)
page = VARDATA(raw_page);
/* verify the special space says this page is what we want */
- special = (BrinSpecialSpace *) PageGetSpecialPointer(page);
- if (special->type != type)
+ if (BrinPageType(page) != type)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("page is not a BRIN page of type \"%s\"", strtype),
errdetail("Expected special type %08x, got %08x.",
- type, special->type)));
+ type, BrinPageType(page))));
return page;
}
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 4e9392b..acb64bd 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -53,7 +53,6 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
BrinTuple *oldtup;
Size oldsz;
Buffer newbuf;
- BrinSpecialSpace *special;
bool extended = false;
newsz = MAXALIGN(newsz);
@@ -113,8 +112,6 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
return false;
}
- special = (BrinSpecialSpace *) PageGetSpecialPointer(oldpage);
-
/*
* Great, the old tuple is intact. We can proceed with the update.
*
@@ -124,7 +121,7 @@ brin_doupdate(Relation idxrel, BlockNumber pagesPerRange,
* caller told us there isn't, if a concurrent update moved another tuple
* elsewhere or replaced a tuple with a smaller one.
*/
- if (((special->flags & BRIN_EVACUATE_PAGE) == 0) &&
+ if (((BrinPageFlags(oldpage) & BRIN_EVACUATE_PAGE) == 0) &&
brin_can_do_samepage_update(oldbuf, origsz, newsz))
{
if (BufferIsValid(newbuf))
@@ -374,12 +371,9 @@ brin_doinsert(Relation idxrel, BlockNumber pagesPerRange,
void
brin_page_init(Page page, uint16 type)
{
- BrinSpecialSpace *special;
-
PageInit(page, BLCKSZ, sizeof(BrinSpecialSpace));
- special = (BrinSpecialSpace *) PageGetSpecialPointer(page);
- special->type = type;
+ BrinPageType(page) = type;
}
/*
@@ -420,7 +414,6 @@ brin_start_evacuating_page(Relation idxRel, Buffer buf)
{
OffsetNumber off;
OffsetNumber maxoff;
- BrinSpecialSpace *special;
Page page;
page = BufferGetPage(buf);
@@ -428,8 +421,6 @@ brin_start_evacuating_page(Relation idxRel, Buffer buf)
if (PageIsNew(page))
return false;
- special = (BrinSpecialSpace *) PageGetSpecialPointer(page);
-
maxoff = PageGetMaxOffsetNumber(page);
for (off = FirstOffsetNumber; off <= maxoff; off++)
{
@@ -439,7 +430,7 @@ brin_start_evacuating_page(Relation idxRel, Buffer buf)
if (ItemIdIsUsed(lp))
{
/* prevent other backends from adding more stuff to this page */
- special->flags |= BRIN_EVACUATE_PAGE;
+ BrinPageFlags(page) |= BRIN_EVACUATE_PAGE;
MarkBufferDirtyHint(buf, true);
return true;
@@ -463,8 +454,7 @@ brin_evacuate_page(Relation idxRel, BlockNumber pagesPerRange,
page = BufferGetPage(buf);
- Assert(((BrinSpecialSpace *)
- PageGetSpecialPointer(page))->flags & BRIN_EVACUATE_PAGE);
+ Assert(BrinPageFlags(page) & BRIN_EVACUATE_PAGE);
maxoff = PageGetMaxOffsetNumber(page);
for (off = FirstOffsetNumber; off <= maxoff; off++)
@@ -677,11 +667,8 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
static Size
br_page_get_freespace(Page page)
{
- BrinSpecialSpace *special;
-
- special = (BrinSpecialSpace *) PageGetSpecialPointer(page);
if (!BRIN_IS_REGULAR_PAGE(page) ||
- (special->flags & BRIN_EVACUATE_PAGE) != 0)
+ (BrinPageFlags(page) & BRIN_EVACUATE_PAGE) != 0)
return 0;
else
return PageGetFreeSpace(page);
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 5e4499d..80795ec 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -446,7 +446,7 @@ revmap_physical_extend(BrinRevmap *revmap)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("unexpected page type 0x%04X in BRIN index \"%s\" block %u",
- BRIN_PAGE_TYPE(page),
+ BrinPageType(page),
RelationGetRelationName(irel),
BufferGetBlockNumber(buf))));
diff --git a/src/include/access/brin_page.h b/src/include/access/brin_page.h
index 44ce5f6..6c645b3 100644
--- a/src/include/access/brin_page.h
+++ b/src/include/access/brin_page.h
@@ -20,24 +20,44 @@
#include "storage/block.h"
#include "storage/itemptr.h"
+/*
+ * Special area of BRIN pages.
+ *
+ * We define it in this odd way so that it always occupies the last
+ * MAXALIGN-sized element of each page.
+ */
+typedef struct BrinSpecialSpace
+{
+ uint16 vector[MAXALIGN(1) / sizeof(uint16)];
+} BrinSpecialSpace;
+
+/*
+ * Make the page type be the last half-word in the page, for consumption by
+ * pg_filedump and similar utilities. We don't really care much about the
+ * position of the "flags" half-word, but it's simpler to apply a consistent
+ * rule to both.
+ *
+ * See comments above GinPageOpaqueData.
+ */
+#define BrinPageType(page) \
+ (((BrinSpecialSpace *) \
+ PageGetSpecialPointer(page))->vector[MAXALIGN(1) / sizeof(uint16) - 1])
+
+#define BrinPageFlags(page) \
+ (((BrinSpecialSpace *) \
+ PageGetSpecialPointer(page))->vector[MAXALIGN(1) / sizeof(uint16) - 2])
+
/* special space on all BRIN pages stores a "type" identifier */
#define BRIN_PAGETYPE_META 0xF091
#define BRIN_PAGETYPE_REVMAP 0xF092
#define BRIN_PAGETYPE_REGULAR 0xF093
-#define BRIN_PAGE_TYPE(page) \
- (((BrinSpecialSpace *) PageGetSpecialPointer(page))->type)
-#define BRIN_IS_REVMAP_PAGE(page) (BRIN_PAGE_TYPE(page) == BRIN_PAGETYPE_REVMAP)
-#define BRIN_IS_REGULAR_PAGE(page) (BRIN_PAGE_TYPE(page) == BRIN_PAGETYPE_REGULAR)
+#define BRIN_IS_REVMAP_PAGE(page) (BrinPageType(page) == BRIN_PAGETYPE_REVMAP)
+#define BRIN_IS_REGULAR_PAGE(page) (BrinPageType(page) == BRIN_PAGETYPE_REGULAR)
/* flags for BrinSpecialSpace */
#define BRIN_EVACUATE_PAGE (1 << 0)
-typedef struct BrinSpecialSpace
-{
- uint16 flags;
- uint16 type;
-} BrinSpecialSpace;
/* Metapage definitions */
typedef struct BrinMetaPageData
diff --git a/src/include/access/gin_private.h b/src/include/access/gin_private.h
index c1a2049..c9f2026 100644
--- a/src/include/access/gin_private.h
+++ b/src/include/access/gin_private.h
@@ -24,10 +24,10 @@
* Note: GIN does not include a page ID word as do the other index types.
* This is OK because the opaque data is only 8 bytes and so can be reliably
* distinguished by size. Revisit this if the size ever increases.
- * Further note: as of 9.2, SP-GiST also uses 8-byte special space. This is
- * still OK, as long as GIN isn't using all of the high-order bits in its
- * flags word, because that way the flags word cannot match the page ID used
- * by SP-GiST.
+ * Further note: as of 9.2, SP-GiST also uses 8-byte special space, as does
+ * BRIN as of 9.5. This is still OK, as long as GIN isn't using all of the
+ * high-order bits in its flags word, because that way the flags word cannot
+ * match the page IDs used by SP-GiST and BRIN.
*/
typedef struct GinPageOpaqueData
{
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 0492ef6..413f71e 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -64,6 +64,8 @@ typedef SpGistPageOpaqueData *SpGistPageOpaque;
* which otherwise would have a hard time telling pages of different index
* types apart. It should be the last 2 bytes on the page. This is more or
* less "free" due to alignment considerations.
+ *
+ * See comments above GinPageOpaqueData.
*/
#define SPGIST_PAGE_ID 0xFF82
I wrote:
You could try something like
typedef struct BrinSpecialSpace
{
uint16 vector[MAXALIGN(1) / sizeof(uint16)];
} BrinSpecialSpace;
and then some access macros to use the last and next-to-last
elements of that array.
On second thought, consider
typedef struct BrinSpecialSpace
{
uint16 flags;
uint16 vector[MAXALIGN(1) / sizeof(uint16) - 1];
} BrinSpecialSpace;
This way, accesses to "flags" require no source code changes.
You still need a macro to map "type" onto the last element of
the vector, but there's probably about one reference to "type"
in the source code so it shouldn't be too painful.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
[ paths crossed in mail ]
I wrote:
This way, accesses to "flags" require no source code changes.
You still need a macro to map "type" onto the last element of
the vector, but there's probably about one reference to "type"
in the source code so it shouldn't be too painful.
Ah, nevermind, I see you already did the work to do it the other
way.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers