Refactoring log_newpage
At present log_newpage() produces log records called XLOG_HEAP_NEWPAGE.
That routine is used by HEAP, BTREE, GIN, SPGIST rmgrs, as well as
various forks.
WAL contains no information as to which rmgr the data refers to,
making debugging much harder and skewing efforts to optimise WAL
traffic and is a pretty gross modularity violation of the whole rmgr
concept.
This refactoring adds an RmgrId field onto each new page record and
makes clearer that certain "heap" routines are actually generic. The
WAL records are still marked as HEAP rmgr and have XLOG_NEWPAGE record
type, but at least we can tell them apart. (We already had forknum,
just not rmgrid).
Another refactoring option would be to have specific record types for
each rmgr, and would normally be my preferred option but that seems
likely to use up too many record type numbers in the index rmgrs.
For immediate commit.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
refactor_newpage.v1.patchtext/x-diff; charset=US-ASCII; name=refactor_newpage.v1.patchDownload
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index fe06bdc..055ce0b 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -528,10 +528,10 @@ ginbuildempty(PG_FUNCTION_ARGS)
MarkBufferDirty(RootBuffer);
/* XLOG the new pages */
- log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+ log_newpage(RM_GIN_ID, &index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
BufferGetBlockNumber(MetaBuffer),
BufferGetPage(MetaBuffer));
- log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+ log_newpage(RM_GIN_ID, &index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
BufferGetBlockNumber(RootBuffer),
BufferGetPage(RootBuffer));
END_CRIT_SECTION();
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 99a431a..feabe28 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4502,7 +4502,7 @@ log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from,
}
/*
- * Perform XLogInsert of a HEAP_NEWPAGE record to WAL. Caller is responsible
+ * Perform XLogInsert of an XLOG_NEWPAGE record to WAL. Caller is responsible
* for writing the page to disk after calling this routine.
*
* Note: all current callers build pages in private memory and write them
@@ -4514,10 +4514,10 @@ log_heap_update(Relation reln, Buffer oldbuf, ItemPointerData from,
* not do anything that assumes we are touching a heap.
*/
XLogRecPtr
-log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno,
- Page page)
+log_newpage(RmgrId rmid, RelFileNode *rnode, ForkNumber forkNum,
+ BlockNumber blkno, Page page)
{
- xl_heap_newpage xlrec;
+ xl_newpage xlrec;
XLogRecPtr recptr;
XLogRecData rdata[2];
@@ -4527,9 +4527,10 @@ log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno,
xlrec.node = *rnode;
xlrec.forknum = forkNum;
xlrec.blkno = blkno;
+ xlrec.rmid = rmid;
rdata[0].data = (char *) &xlrec;
- rdata[0].len = SizeOfHeapNewpage;
+ rdata[0].len = SizeOfXLogNewpage;
rdata[0].buffer = InvalidBuffer;
rdata[0].next = &(rdata[1]);
@@ -4538,7 +4539,7 @@ log_newpage(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno,
rdata[1].buffer = InvalidBuffer;
rdata[1].next = NULL;
- recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_NEWPAGE, rdata);
+ recptr = XLogInsert(rmid, XLOG_NEWPAGE, rdata);
/*
* The page may be uninitialized. If so, we can't set the LSN and TLI
@@ -4800,25 +4801,25 @@ heap_xlog_visible(XLogRecPtr lsn, XLogRecord *record)
}
}
-static void
-heap_xlog_newpage(XLogRecPtr lsn, XLogRecord *record)
+/*
+ * Generic helper function used by many Rmgrs to restore new pages, so do
+ * not do anything that assumes we are touching a heap or a specific fork.
+ */
+void
+redo_xlog_newpage(XLogRecPtr lsn, XLogRecord *record)
{
- xl_heap_newpage *xlrec = (xl_heap_newpage *) XLogRecGetData(record);
+ xl_newpage *xlrec = (xl_newpage *) XLogRecGetData(record);
Buffer buffer;
Page page;
- /*
- * Note: the NEWPAGE log record is used for both heaps and indexes, so do
- * not do anything that assumes we are touching a heap.
- */
buffer = XLogReadBufferExtended(xlrec->node, xlrec->forknum, xlrec->blkno,
RBM_ZERO);
Assert(BufferIsValid(buffer));
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
page = (Page) BufferGetPage(buffer);
- Assert(record->xl_len == SizeOfHeapNewpage + BLCKSZ);
- memcpy(page, (char *) xlrec + SizeOfHeapNewpage, BLCKSZ);
+ Assert(record->xl_len == SizeOfXLogNewpage + BLCKSZ);
+ memcpy(page, (char *) xlrec + SizeOfXLogNewpage, BLCKSZ);
/*
* The page may be uninitialized. If so, we can't set the LSN and TLI
@@ -5515,8 +5516,8 @@ heap_redo(XLogRecPtr lsn, XLogRecord *record)
case XLOG_HEAP_HOT_UPDATE:
heap_xlog_update(lsn, record, true);
break;
- case XLOG_HEAP_NEWPAGE:
- heap_xlog_newpage(lsn, record);
+ case XLOG_NEWPAGE:
+ redo_xlog_newpage(lsn, record);
break;
case XLOG_HEAP_LOCK:
heap_xlog_lock(lsn, record);
@@ -5619,11 +5620,12 @@ heap_desc(StringInfo buf, uint8 xl_info, char *rec)
ItemPointerGetBlockNumber(&(xlrec->newtid)),
ItemPointerGetOffsetNumber(&(xlrec->newtid)));
}
- else if (info == XLOG_HEAP_NEWPAGE)
+ else if (info == XLOG_NEWPAGE)
{
- xl_heap_newpage *xlrec = (xl_heap_newpage *) rec;
+ xl_newpage *xlrec = (xl_newpage *) rec;
- appendStringInfo(buf, "newpage: rel %u/%u/%u; fork %u, blk %u",
+ appendStringInfo(buf, "newpage: rm %u rel %u/%u/%u; fork %u, blk %u",
+ xlrec->rmid,
xlrec->node.spcNode, xlrec->node.dbNode,
xlrec->node.relNode, xlrec->forknum,
xlrec->blkno);
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index 31b2b67..bfed76e 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -267,7 +267,8 @@ end_heap_rewrite(RewriteState state)
if (state->rs_buffer_valid)
{
if (state->rs_use_wal)
- log_newpage(&state->rs_new_rel->rd_node,
+ log_newpage(RM_HEAP_ID,
+ &state->rs_new_rel->rd_node,
MAIN_FORKNUM,
state->rs_blockno,
state->rs_buffer);
@@ -613,7 +614,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
/* XLOG stuff */
if (state->rs_use_wal)
- log_newpage(&state->rs_new_rel->rd_node,
+ log_newpage(RM_HEAP_ID,
+ &state->rs_new_rel->rd_node,
MAIN_FORKNUM,
state->rs_blockno,
page);
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 7627738..b402b4d 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -219,7 +219,7 @@ btbuildempty(PG_FUNCTION_ARGS)
smgrwrite(index->rd_smgr, INIT_FORKNUM, BTREE_METAPAGE,
(char *) metapage, true);
if (XLogIsNeeded())
- log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+ log_newpage(RM_BTREE_ID, &index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
BTREE_METAPAGE, metapage);
/*
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 9aa3a13..f60f6ff 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -269,7 +269,7 @@ _bt_blwritepage(BTWriteState *wstate, Page page, BlockNumber blkno)
if (wstate->btws_use_wal)
{
/* We use the heap NEWPAGE record type for this */
- log_newpage(&wstate->index->rd_node, MAIN_FORKNUM, blkno, page);
+ log_newpage(RM_BTREE_ID, &wstate->index->rd_node, MAIN_FORKNUM, blkno, page);
}
else
{
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index cbcf655..81f0212 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -153,7 +153,7 @@ spgbuildempty(PG_FUNCTION_ARGS)
smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_METAPAGE_BLKNO,
(char *) page, true);
if (XLogIsNeeded())
- log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+ log_newpage(RM_SPGIST_ID, &index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
SPGIST_METAPAGE_BLKNO, page);
/* Likewise for the root page. */
@@ -162,7 +162,7 @@ spgbuildempty(PG_FUNCTION_ARGS)
smgrwrite(index->rd_smgr, INIT_FORKNUM, SPGIST_HEAD_BLKNO,
(char *) page, true);
if (XLogIsNeeded())
- log_newpage(&index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
+ log_newpage(RM_SPGIST_ID, &index->rd_smgr->smgr_rnode.node, INIT_FORKNUM,
SPGIST_HEAD_BLKNO, page);
/*
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07dc326..57e3ffd 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8539,7 +8539,7 @@ copy_relation_data(SMgrRelation src, SMgrRelation dst,
/* XLOG stuff */
if (use_wal)
- log_newpage(&dst->smgr_rnode.node, forkNum, blkno, page);
+ log_newpage(RM_HEAP_ID, &dst->smgr_rnode.node, forkNum, blkno, page);
/*
* Now write the page. We say isTemp = true even if it's not a temp
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fa38803..4529ba6 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -14,6 +14,7 @@
#ifndef HEAPAM_H
#define HEAPAM_H
+#include "access/rmgr.h"
#include "access/sdir.h"
#include "access/skey.h"
#include "access/xlog.h"
@@ -143,8 +144,9 @@ extern XLogRecPtr log_heap_freeze(Relation reln, Buffer buffer,
OffsetNumber *offsets, int offcnt);
extern XLogRecPtr log_heap_visible(RelFileNode rnode, BlockNumber block,
Buffer vm_buffer);
-extern XLogRecPtr log_newpage(RelFileNode *rnode, ForkNumber forkNum,
+extern XLogRecPtr log_newpage(RmgrId rmid, RelFileNode *rnode, ForkNumber forkNum,
BlockNumber blk, Page page);
+extern void redo_xlog_newpage(XLogRecPtr lsn, XLogRecord *record);
/* in heap/pruneheap.c */
extern void heap_page_prune_opt(Relation relation, Buffer buffer,
diff --git a/src/include/access/htup.h b/src/include/access/htup.h
index 6a3778d..9e5193d 100644
--- a/src/include/access/htup.h
+++ b/src/include/access/htup.h
@@ -16,6 +16,7 @@
#include "access/tupdesc.h"
#include "access/tupmacs.h"
+#include "access/rmgr.h"
#include "storage/bufpage.h"
#include "storage/itemptr.h"
#include "storage/relfilenode.h"
@@ -587,7 +588,7 @@ typedef HeapTupleData *HeapTuple;
#define XLOG_HEAP_UPDATE 0x20
/* 0x030 is free, was XLOG_HEAP_MOVE */
#define XLOG_HEAP_HOT_UPDATE 0x40
-#define XLOG_HEAP_NEWPAGE 0x50
+#define XLOG_NEWPAGE 0x50
#define XLOG_HEAP_LOCK 0x60
#define XLOG_HEAP_INPLACE 0x70
@@ -742,15 +743,16 @@ typedef struct xl_heap_cleanup_info
/* This is for replacing a page's contents in toto */
/* NB: this is used for indexes as well as heaps */
-typedef struct xl_heap_newpage
+typedef struct xl_newpage
{
RelFileNode node;
ForkNumber forknum;
BlockNumber blkno; /* location of new page */
+ RmgrId rmid; /* RmgrId */
/* entire page contents follow at end of record */
-} xl_heap_newpage;
+} xl_newpage;
-#define SizeOfHeapNewpage (offsetof(xl_heap_newpage, blkno) + sizeof(BlockNumber))
+#define SizeOfXLogNewpage (offsetof(xl_newpage, rmid) + sizeof(RmgrId))
/* This is what we need to know about lock */
typedef struct xl_heap_lock
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index b81c156..3962d2f 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -71,7 +71,7 @@ typedef struct XLogContRecord
/*
* Each page of XLOG file has a header like this:
*/
-#define XLOG_PAGE_MAGIC 0xD070 /* can be used as WAL version indicator */
+#define XLOG_PAGE_MAGIC 0xD071 /* can be used as WAL version indicator */
typedef struct XLogPageHeaderData
{
On Feb 1, 2012, at 4:25 AM, Simon Riggs wrote:
At present log_newpage() produces log records called XLOG_HEAP_NEWPAGE.
That routine is used by HEAP, BTREE, GIN, SPGIST rmgrs, as well as
various forks.WAL contains no information as to which rmgr the data refers to,
making debugging much harder and skewing efforts to optimise WAL
traffic and is a pretty gross modularity violation of the whole rmgr
concept.This refactoring adds an RmgrId field onto each new page record and
makes clearer that certain "heap" routines are actually generic. The
WAL records are still marked as HEAP rmgr and have XLOG_NEWPAGE record
type, but at least we can tell them apart. (We already had forknum,
just not rmgrid).
But we already had RelFileNode; wouldn't that be enough to tell what rmgr was responsible for the new page? Can 2 different rmgrs write to the same file node?
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
On Wed, Feb 1, 2012 at 10:42 PM, Jim Nasby <jim@nasby.net> wrote:
On Feb 1, 2012, at 4:25 AM, Simon Riggs wrote:
At present log_newpage() produces log records called XLOG_HEAP_NEWPAGE.
That routine is used by HEAP, BTREE, GIN, SPGIST rmgrs, as well as
various forks.WAL contains no information as to which rmgr the data refers to,
making debugging much harder and skewing efforts to optimise WAL
traffic and is a pretty gross modularity violation of the whole rmgr
concept.This refactoring adds an RmgrId field onto each new page record and
makes clearer that certain "heap" routines are actually generic. The
WAL records are still marked as HEAP rmgr and have XLOG_NEWPAGE record
type, but at least we can tell them apart. (We already had forknum,
just not rmgrid).But we already had RelFileNode; wouldn't that be enough to tell what rmgr was responsible for the new page? Can 2 different rmgrs write to the same file node?
No, but which one? No way to tell unless you have full list of
relfilenodes and check each one. So btree changes look like heap
changes etc..
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 02.02.2012 08:19, Simon Riggs wrote:
On Wed, Feb 1, 2012 at 10:42 PM, Jim Nasby<jim@nasby.net> wrote:
On Feb 1, 2012, at 4:25 AM, Simon Riggs wrote:
At present log_newpage() produces log records called XLOG_HEAP_NEWPAGE.
That routine is used by HEAP, BTREE, GIN, SPGIST rmgrs, as well as
various forks.WAL contains no information as to which rmgr the data refers to,
making debugging much harder and skewing efforts to optimise WAL
traffic and is a pretty gross modularity violation of the whole rmgr
concept.This refactoring adds an RmgrId field onto each new page record and
makes clearer that certain "heap" routines are actually generic. The
WAL records are still marked as HEAP rmgr and have XLOG_NEWPAGE record
type, but at least we can tell them apart. (We already had forknum,
just not rmgrid).But we already had RelFileNode; wouldn't that be enough to tell what rmgr was responsible for the new page? Can 2 different rmgrs write to the same file node?
No, but which one? No way to tell unless you have full list of
relfilenodes and check each one.
Well, you can obviously check the catalogs for that, but you must be
assuming that you don't have access to the catalogs or this would be a
non-issue.
You can also identify the kind of page by looking at the special area of
the stored page. See:
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00392.php
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Thu, Feb 2, 2012 at 7:26 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
Well, you can obviously check the catalogs for that, but you must be
assuming that you don't have access to the catalogs or this would be a
non-issue.You can also identify the kind of page by looking at the special area of the
stored page. See:
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00392.php
How does that work with different forks?
I think its very ugly to mark all sorts of different pages as if they
were heap pages when they clearly aren't. I don't recall anything so
ugly being allowed anywhere else in the system. Why is it *needed*
here?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 02.02.2012 11:35, Simon Riggs wrote:
On Thu, Feb 2, 2012 at 7:26 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:Well, you can obviously check the catalogs for that, but you must be
assuming that you don't have access to the catalogs or this would be a
non-issue.You can also identify the kind of page by looking at the special area of the
stored page. See:
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00392.phpHow does that work with different forks?
You have the fork number explicitly in the newpage record already.
I think its very ugly to mark all sorts of different pages as if they
were heap pages when they clearly aren't. I don't recall anything so
ugly being allowed anywhere else in the system. Why is it *needed*
here?
It's not needed. Beauty is in the eye of the beholder, but I don't find
it all that ugly, and the comment in log_newpage explains it well.
I don't see much value in adding a new field to the record. Any tools
that read the WAL format will need to be taught about that change. Not a
huge issue, but I also don't see much gain. On the whole I'd be inclined
to just leave it all alone, but whatever.
I don't think it's a good idea to rename XLOG_HEAP_NEWPAGE to
XLOG_NEWPAGE. The WAL record is still part of the heapam rmgr, even if
it's used by other access methods, too.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndQuadrant.com> writes:
On Wed, Feb 1, 2012 at 10:42 PM, Jim Nasby <jim@nasby.net> wrote:
But we already had RelFileNode; wouldn't that be enough to tell what rmgr was responsible for the new page? Can 2 different rmgrs write to the same file node?
No, but which one? No way to tell unless you have full list of
relfilenodes and check each one. So btree changes look like heap
changes etc..
What I'm not following is why we should care. There are no cases at
present where this matters, and no proposed patches (that I know of)
that would make it matter. If an individual rmgr needs control of
what happens during a given operation, it wouldn't use XLOG_NEWPAGE
to represent the operation. End of problem. You haven't provided
any convincing argument why this needs to be changed globally.
regards, tom lane