rmgr hooks and contrib/rmgr_hook
As previously discussed on -hackers on Aug 19, "Proposed Resource
Manager Changes".
Enclosed are two closely related items:
1) A refactoring of calls to Rmgr code from xlog.c, and having isolated
the code for rmgrs then to allow rmgr plugins to modify and/or add rmgrs
to Postgres. Includes additional code to generate log messages so we can
see what is happening after plugin has executed.
Introduces a shared memory area for Rmgrs that allows each backend to
read which RmgrIds are valid for the currently running server, allowing
call to be made during XLogInsert() to validate rmgrid. (The validation
uses a fixed length BitMapSet, a minor new invention for this patch, but
that is begging to be refactored - I await advice and/or comments on the
fastest way to do this if that isn't it.)
(I'd like to rip out WAL_DEBUG completely in favour of this new
mechanism, but I haven't done that here).
2) contrib module that contains an example rmgr_hook - actually two
examples in one module
These have both been tested in normal mode, WAL_DEBUG mode and in warm
standby recovery, so not a WIP progress patch.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Attachments:
rmgr_plugin.v4.patchtext/x-patch; charset=UTF-8; name=rmgr_plugin.v4.patchDownload
Index: src/backend/access/transam/rmgr.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/rmgr.c,v
retrieving revision 1.25
diff -c -r1.25 rmgr.c
*** src/backend/access/transam/rmgr.c 5 Nov 2006 22:42:07 -0000 1.25
--- src/backend/access/transam/rmgr.c 31 Aug 2008 08:25:37 -0000
***************
*** 7,12 ****
--- 7,13 ----
*/
#include "postgres.h"
+ #include "miscadmin.h"
#include "access/clog.h"
#include "access/gin.h"
#include "access/gist_private.h"
***************
*** 16,28 ****
#include "access/nbtree.h"
#include "access/xact.h"
#include "access/xlog_internal.h"
#include "commands/dbcommands.h"
#include "commands/sequence.h"
#include "commands/tablespace.h"
#include "storage/smgr.h"
! const RmgrData RmgrTable[RM_MAX_ID + 1] = {
{"XLOG", xlog_redo, xlog_desc, NULL, NULL, NULL},
{"Transaction", xact_redo, xact_desc, NULL, NULL, NULL},
{"Storage", smgr_redo, smgr_desc, NULL, NULL, NULL},
--- 17,30 ----
#include "access/nbtree.h"
#include "access/xact.h"
#include "access/xlog_internal.h"
+ #include "nodes/bitmapset.h"
#include "commands/dbcommands.h"
#include "commands/sequence.h"
#include "commands/tablespace.h"
#include "storage/smgr.h"
! const RmgrData FixedRmgrTable[RM_MAX_FIXED_ID + 1] = {
{"XLOG", xlog_redo, xlog_desc, NULL, NULL, NULL},
{"Transaction", xact_redo, xact_desc, NULL, NULL, NULL},
{"Storage", smgr_redo, smgr_desc, NULL, NULL, NULL},
***************
*** 40,42 ****
--- 42,365 ----
{"Gist", gist_redo, gist_desc, gist_xlog_startup, gist_xlog_cleanup, gist_safe_restartpoint},
{"Sequence", seq_redo, seq_desc, NULL, NULL, NULL}
};
+
+ /* Main table of Resource Managers */
+ RmgrData *RmgrTable;
+
+ /* Hook for plugins to get control in RmgrInitialize() */
+ rmgr_hook_type rmgr_hook = NULL;
+
+ #define MAX_NUM_RMGRS 255
+ #define RMGR_BITS_PER_WORD 32
+ #define RMGR_BITMAP_WORDS ((MAX_NUM_RMGRS + 1) / RMGR_BITS_PER_WORD)
+ typedef struct RmgrCtlData
+ {
+ uint32 RmgrBitmap[RMGR_BITMAP_WORDS]; /* fixed size bitmapset */
+ } RmgrCtlData;
+
+ static RmgrCtlData *RmgrCtl = NULL;
+
+ /*
+ * Initialization of shared memory for Rmgrs
+ */
+ Size
+ RmgrShmemSize(void)
+ {
+ return sizeof(RmgrCtlData);
+ }
+
+ void
+ RmgrShmemInit(void)
+ {
+ bool foundRmgr;
+
+ RmgrCtl = (RmgrCtlData *)
+ ShmemInitStruct("Rmgr Ctl", RmgrShmemSize(), &foundRmgr);
+
+ if (foundRmgr)
+ return;
+
+ memset(RmgrCtl, 0, sizeof(RmgrCtlData));
+ }
+
+ /*
+ * RmgrInitialize()
+ *
+ * Create RmgrTable, populate RmgrTable with fixed Rmgrs, call plugin
+ * and then initialize shared list of current resource managers.
+ *
+ * RmgrInitialize must run before any other Rmgr function. Normally,
+ * it runs only in the Startup process (once), ensuring that the RmgrTable
+ * and its functions cannot be accessed by normal backends.
+ *
+ * If we have WAL_DEBUG set then the plugin will be called multiple
+ * times and so must be designed to return same answer every time.
+ */
+ void
+ RmgrInitialize(void)
+ {
+ int rmid;
+ int num_rmgrs = 0;
+
+ if (RmgrTable)
+ return;
+
+ /*
+ * Create local copy of RmgrTable. Memory is never freed.
+ */
+ RmgrTable = malloc(sizeof(RmgrData) * (MAX_NUM_RMGRS + 1));
+
+ for (rmid = 0; rmid <= MAX_NUM_RMGRS; rmid++)
+ {
+ if (rmid <= RM_MAX_FIXED_ID)
+ {
+ RmgrTable[rmid].rm_name = FixedRmgrTable[rmid].rm_name;
+ RmgrTable[rmid].rm_redo = FixedRmgrTable[rmid].rm_redo;
+ RmgrTable[rmid].rm_desc = FixedRmgrTable[rmid].rm_desc;
+ RmgrTable[rmid].rm_startup = FixedRmgrTable[rmid].rm_startup;
+ RmgrTable[rmid].rm_cleanup = FixedRmgrTable[rmid].rm_cleanup;
+ RmgrTable[rmid].rm_safe_restartpoint =
+ FixedRmgrTable[rmid].rm_safe_restartpoint;
+ }
+ else
+ {
+ RmgrTable[rmid].rm_name = NULL;
+ RmgrTable[rmid].rm_redo = NULL;
+ RmgrTable[rmid].rm_desc = NULL;
+ RmgrTable[rmid].rm_startup = NULL;
+ RmgrTable[rmid].rm_cleanup = NULL;
+ RmgrTable[rmid].rm_safe_restartpoint = NULL;
+ }
+ }
+
+ /*
+ * Call plugin, if present and output log message to assist with
+ * diagnosis of recovery issues.
+ *
+ * What is the plugin allowed to do?
+ * All RMs are assigned to a single RmgrId. All RM functions are
+ * optional, though we will only allow WAL inserts for RmgrIds that
+ * have a defined rm_redo function. The fixed RMs can be overridden
+ * by providing new routines that do similar, yet slightly different
+ * actions such as filters. There is no direct connection between
+ * an RM and and entry in pg_am, so it is possible to create a new
+ * index type but then not be able to insert into that index, iff
+ * the new index code issues any WAL inserts. That error is quickly
+ * discovered in practice, even if the docs are ignored. That sounds
+ * a little strange, but then pg_am is not readable at the time we
+ * need to know which RM functions to use for recovery.
+ *
+ * XXX we cannot yet issue an index rebuild from rm_cleanup()
+ * though that feature is expected to be achieved one day
+ */
+ if (rmgr_hook)
+ {
+ elog(LOG, "Executing Rmgr hook to extend and/or modify resource managers");
+ (*rmgr_hook) (RmgrTable);
+ }
+
+ /*
+ * Now create data structure to allow checking of RmgrId in each backend.
+ */
+ for (rmid = 0; rmid <= MAX_NUM_RMGRS; rmid++)
+ {
+ if (RmgrTable[rmid].rm_redo != NULL)
+ {
+ int wordnum = rmid / RMGR_BITS_PER_WORD;
+ int bitnum = rmid % RMGR_BITS_PER_WORD;
+ RmgrCtl->RmgrBitmap[wordnum] |= ((uint32) 1 << bitnum);
+
+ num_rmgrs++;
+
+ /*
+ * Keep track of new or modified RMs for diagnostic purposes
+ */
+ if (rmid <= RM_MAX_FIXED_ID)
+ {
+ if (RmgrTable[rmid].rm_name != FixedRmgrTable[rmid].rm_name ||
+ RmgrTable[rmid].rm_redo != FixedRmgrTable[rmid].rm_redo ||
+ RmgrTable[rmid].rm_startup != FixedRmgrTable[rmid].rm_startup ||
+ RmgrTable[rmid].rm_cleanup != FixedRmgrTable[rmid].rm_cleanup ||
+ RmgrTable[rmid].rm_safe_restartpoint !=
+ FixedRmgrTable[rmid].rm_safe_restartpoint)
+ elog(LOG, "Rmgr (%d) %s has been modified", rmid,
+ FixedRmgrTable[rmid].rm_name);
+ }
+ else
+ elog(LOG, "Rmgr (%d) %s defined", rmid,
+ (RmgrTable[rmid].rm_name != NULL ?
+ RmgrTable[rmid].rm_name : "<Unnamed>"));
+ }
+ }
+
+ if (num_rmgrs == 0)
+ elog(FATAL, "No valid resource managers defined");
+ }
+
+
+ /*
+ * RmgrStartup()
+ *
+ * Allow resource managers to do any required startup.
+ * Run RM startup functions once, only ever runs in Startup process.
+ */
+ void
+ RmgrStartup(void)
+ {
+ int rmid;
+
+ Assert(RmgrTable);
+ for (rmid = 0; rmid <= MAX_NUM_RMGRS; rmid++)
+ {
+ if (RmgrTable[rmid].rm_startup != NULL)
+ {
+ elog(DEBUG2, "RmgrStartup (%d) %s", rmid, RmgrTable[rmid].rm_name);
+ RmgrTable[rmid].rm_startup();
+ }
+ }
+ }
+
+ /*
+ * RmgrCleanup()
+ *
+ * Allow resource managers to do any required cleanup.
+ * Run RM cleanup functions once, only ever runs in Startup process.
+ *
+ * Called once prior to start of normal running
+ */
+ void
+ RmgrCleanup(void)
+ {
+ int rmid;
+
+ Assert(RmgrTable);
+ for (rmid = 0; rmid <= MAX_NUM_RMGRS; rmid++)
+ {
+ if (RmgrTable[rmid].rm_cleanup != NULL)
+ {
+ elog(DEBUG2, "RmgrCleanup (%d) %s", rmid, RmgrTable[rmid].rm_name);
+ RmgrTable[rmid].rm_cleanup();
+ }
+ }
+ }
+
+ /*
+ * RmgrSafeRestartpoint()
+ *
+ * Is it safe to mark a restartpoint? We must ask each of the resource
+ * managers whether they have any partial state information that might
+ * prevent a correct restart from the supplied LSN.
+ *
+ * Called once every 5 minutes with default settings, sometimes more
+ * frequently if last call returned false.
+ */
+ bool
+ RmgrSafeRestartpoint(XLogRecPtr checkPointRedo)
+ {
+ int rmid;
+
+ Assert(RmgrTable);
+ for (rmid = 0; rmid <= MAX_NUM_RMGRS; rmid++)
+ {
+ if (RmgrTable[rmid].rm_safe_restartpoint != NULL)
+ if (!(RmgrTable[rmid].rm_safe_restartpoint()))
+ {
+ elog(DEBUG2, "RM (%d) %s not safe to record restart point at %X/%X",
+ rmid,
+ RmgrTable[rmid].rm_name,
+ checkPointRedo.xlogid,
+ checkPointRedo.xrecoff);
+ return false;
+ }
+ }
+
+ return true;
+ }
+
+ /*
+ * RmgrRedo
+ *
+ * Call the RM's redo function to apply WAL records. The RmgrId should
+ * previously have been checked using RmgrIdIsValid().
+ */
+ void
+ RmgrRedo(XLogRecPtr lsn, XLogRecord *rptr)
+ {
+ RmgrTable[rptr->xl_rmid].rm_redo(lsn, rptr);
+ }
+
+ /*
+ * Error context callback for errors occurring during rm_redo().
+ */
+ void
+ rm_redo_error_callback(void *arg)
+ {
+ XLogRecord *record = (XLogRecord *) arg;
+ StringInfoData buf;
+
+ initStringInfo(&buf) ;
+ Assert(RmgrTable);
+ if (RmgrTable[record->xl_rmid].rm_desc != NULL)
+ RmgrTable[record->xl_rmid].rm_desc(&buf,
+ record->xl_info,
+ XLogRecGetData(record));
+
+ /* don't bother emitting empty description */
+ if (buf.len > 0)
+ errcontext("xlog redo %s", buf.data);
+
+ pfree(buf.data);
+ }
+
+ /*
+ * RmgrDesc
+ *
+ * Call the RM's desc function to print details of WAL records
+ * during error handling or when WAL_DEBUG is set
+ */
+ void
+ RmgrDesc(StringInfo buf, XLogRecord *rptr, char *rec)
+ {
+ if (RmgrTable[rptr->xl_rmid].rm_desc != NULL)
+ RmgrTable[rptr->xl_rmid].rm_desc(buf, rptr->xl_info, rec);
+ }
+
+ /*
+ * RmgrName
+ *
+ * Get the RM's namefor use when WAL_DEBUG is set
+ */
+ const char *
+ RmgrName(RmgrId rmid)
+ {
+ Assert(RmgrTable);
+ return RmgrTable[rmid].rm_name;
+ }
+
+ /*
+ * RmgrIsIsValid
+ *
+ * Check whether the RmgrId has a valid rm_redo function currently
+ * Caller throws any error, not here.
+ */
+ bool
+ RmgrIdIsValid(RmgrId rmid)
+ {
+ int wordnum = rmid / RMGR_BITS_PER_WORD;
+ int bitnum = rmid % RMGR_BITS_PER_WORD;
+
+ /*
+ * If not already done so, get set of valid RmgrIds as a bms
+ */
+ if (!RmgrCtl)
+ RmgrShmemInit();
+
+ /*
+ * Check validity
+ */
+ if ((RmgrCtl->RmgrBitmap[wordnum] & ((bitmapword) 1 << bitnum)) != 0)
+ return true;
+
+ return false;
+ }
+
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.317
diff -c -r1.317 xlog.c
*** src/backend/access/transam/xlog.c 11 Aug 2008 11:05:10 -0000 1.317
--- src/backend/access/transam/xlog.c 31 Aug 2008 08:24:18 -0000
***************
*** 435,441 ****
static void pg_start_backup_callback(int code, Datum arg);
static bool read_backup_label(XLogRecPtr *checkPointLoc,
XLogRecPtr *minRecoveryLoc);
- static void rm_redo_error_callback(void *arg);
static int get_sync_bit(int method);
--- 435,440 ----
***************
*** 496,501 ****
--- 495,511 ----
}
/*
+ * Check that we are writing a currently valid RmgrId. If not, then
+ * we cannot allow this to be written to WAL otherwise we would be
+ * unable to recover correctly from crash and could lose data written
+ * after an invalid WAL record.
+ */
+ if (!RmgrIdIsValid(rmid))
+ ereport(ERROR,
+ (errmsg("invalid resource manager ID %u", rmid),
+ errhint("You must configure resource manager at startup.")));
+
+ /*
* Here we scan the rdata chain, determine which buffers must be backed
* up, and compute the CRC values for the data. Note that the record
* header isn't added into the CRC initially since we don't know the final
***************
*** 827,832 ****
--- 837,844 ----
{
StringInfoData buf;
+ RmgrInitialize();
+
initStringInfo(&buf);
appendStringInfo(&buf, "INSERT @ %X/%X: ",
RecPtr.xlogid, RecPtr.xrecoff);
***************
*** 834,840 ****
if (rdata->data != NULL)
{
appendStringInfo(&buf, " - ");
! RmgrTable[record->xl_rmid].rm_desc(&buf, record->xl_info, rdata->data);
}
elog(LOG, "%s", buf.data);
pfree(buf.data);
--- 846,852 ----
if (rdata->data != NULL)
{
appendStringInfo(&buf, " - ");
! RmgrDesc(&buf, record, rdata->data);
}
elog(LOG, "%s", buf.data);
pfree(buf.data);
***************
*** 3144,3153 ****
RecPtr->xlogid, RecPtr->xrecoff)));
goto next_record_is_invalid;
}
! if (record->xl_rmid > RM_MAX_ID)
{
ereport(emode,
! (errmsg("invalid resource manager ID %u at %X/%X",
record->xl_rmid, RecPtr->xlogid, RecPtr->xrecoff)));
goto next_record_is_invalid;
}
--- 3156,3165 ----
RecPtr->xlogid, RecPtr->xrecoff)));
goto next_record_is_invalid;
}
! if (!RmgrIdIsValid(record->xl_rmid))
{
ereport(emode,
! (errmsg("invalid resource manager ID %u" " at %X/%X",
record->xl_rmid, RecPtr->xlogid, RecPtr->xrecoff)));
goto next_record_is_invalid;
}
***************
*** 4880,4885 ****
--- 4892,4906 ----
*/
readRecoveryCommandFile();
+ /*
+ * Initialize RMs. Startup functions are *not* called unless we
+ * are InRecovery. We do this in two steps so that we know what
+ * RMs are currently valid for when we call XLogInsert(). We
+ * specifically do not write the list of RMs to the ControlFile,
+ * to allow more flexibility in adding/removing RMs.
+ */
+ RmgrInitialize();
+
/* Now we can determine the list of expected TLIs */
expectedTLIs = readTimeLineHistory(recoveryTargetTLI);
***************
*** 5009,5016 ****
/* REDO */
if (InRecovery)
{
- int rmid;
-
/*
* Update pg_control to show that we are recovering and to show the
* selected checkpoint as the place we are starting from. We also mark
--- 5030,5035 ----
***************
*** 5056,5067 ****
BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));
}
! /* Initialize resource managers */
! for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
! {
! if (RmgrTable[rmid].rm_startup != NULL)
! RmgrTable[rmid].rm_startup();
! }
/*
* Find the first record that logically follows the checkpoint --- it
--- 5075,5084 ----
BACKUP_LABEL_FILE, BACKUP_LABEL_OLD)));
}
! /*
! * Startup Resource Managers
! */
! RmgrStartup();
/*
* Find the first record that logically follows the checkpoint --- it
***************
*** 5105,5113 ****
EndRecPtr.xlogid, EndRecPtr.xrecoff);
xlog_outrec(&buf, record);
appendStringInfo(&buf, " - ");
! RmgrTable[record->xl_rmid].rm_desc(&buf,
! record->xl_info,
! XLogRecGetData(record));
elog(LOG, "%s", buf.data);
pfree(buf.data);
}
--- 5122,5128 ----
EndRecPtr.xlogid, EndRecPtr.xrecoff);
xlog_outrec(&buf, record);
appendStringInfo(&buf, " - ");
! RmgrDesc(&buf, record, XLogRecGetData(record));
elog(LOG, "%s", buf.data);
pfree(buf.data);
}
***************
*** 5125,5131 ****
}
/* Setup error traceback support for ereport() */
! errcontext.callback = rm_redo_error_callback;
errcontext.arg = (void *) record;
errcontext.previous = error_context_stack;
error_context_stack = &errcontext;
--- 5140,5146 ----
}
/* Setup error traceback support for ereport() */
! errcontext.callback = rm_redo_error_callback; /* see rmgr.c */
errcontext.arg = (void *) record;
errcontext.previous = error_context_stack;
error_context_stack = &errcontext;
***************
*** 5141,5147 ****
if (record->xl_info & XLR_BKP_BLOCK_MASK)
RestoreBkpBlocks(record, EndRecPtr);
! RmgrTable[record->xl_rmid].rm_redo(EndRecPtr, record);
/* Pop the error context stack */
error_context_stack = errcontext.previous;
--- 5156,5162 ----
if (record->xl_info & XLR_BKP_BLOCK_MASK)
RestoreBkpBlocks(record, EndRecPtr);
! RmgrRedo(EndRecPtr, record);
/* Pop the error context stack */
error_context_stack = errcontext.previous;
***************
*** 5288,5303 ****
if (InRecovery)
{
! int rmid;
!
! /*
! * Allow resource managers to do any required cleanup.
! */
! for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
! {
! if (RmgrTable[rmid].rm_cleanup != NULL)
! RmgrTable[rmid].rm_cleanup();
! }
/*
* Check to see if the XLOG sequence contained any unresolved
--- 5303,5309 ----
if (InRecovery)
{
! RmgrCleanup();
/*
* Check to see if the XLOG sequence contained any unresolved
***************
*** 6037,6043 ****
RecoveryRestartPoint(const CheckPoint *checkPoint)
{
int elapsed_secs;
- int rmid;
/*
* Do nothing if the elapsed time since the last restartpoint is less than
--- 6043,6048 ----
***************
*** 6053,6075 ****
return;
/*
! * Is it safe to checkpoint? We must ask each of the resource managers
! * whether they have any partial state information that might prevent a
! * correct restart from this point. If so, we skip this opportunity, but
* return at the next checkpoint record for another try.
*/
! for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
! {
! if (RmgrTable[rmid].rm_safe_restartpoint != NULL)
! if (!(RmgrTable[rmid].rm_safe_restartpoint()))
! {
! elog(DEBUG2, "RM %d not safe to record restart point at %X/%X",
! rmid,
! checkPoint->redo.xlogid,
! checkPoint->redo.xrecoff);
! return;
! }
! }
/*
* OK, force data out to disk
--- 6058,6068 ----
return;
/*
! * Is it safe to restart from here? If not, we skip this opportunity, but
* return at the next checkpoint record for another try.
*/
! if (!RmgrSafeRestartpoint(checkPoint->redo))
! return;
/*
* OK, force data out to disk
***************
*** 6304,6310 ****
appendStringInfo(buf, "; bkpb%d", i + 1);
}
! appendStringInfo(buf, ": %s", RmgrTable[record->xl_rmid].rm_name);
}
#endif /* WAL_DEBUG */
--- 6297,6303 ----
appendStringInfo(buf, "; bkpb%d", i + 1);
}
! appendStringInfo(buf, ": %s", RmgrName(record->xl_rmid));
}
#endif /* WAL_DEBUG */
***************
*** 7065,7091 ****
}
/*
- * Error context callback for errors occurring during rm_redo().
- */
- static void
- rm_redo_error_callback(void *arg)
- {
- XLogRecord *record = (XLogRecord *) arg;
- StringInfoData buf;
-
- initStringInfo(&buf);
- RmgrTable[record->xl_rmid].rm_desc(&buf,
- record->xl_info,
- XLogRecGetData(record));
-
- /* don't bother emitting empty description */
- if (buf.len > 0)
- errcontext("xlog redo %s", buf.data);
-
- pfree(buf.data);
- }
-
- /*
* BackupInProgress: check if online backup mode is active
*
* This is done by checking for existence of the "backup_label" file.
--- 7058,7063 ----
Index: src/backend/storage/ipc/ipci.c
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/storage/ipc/ipci.c,v
retrieving revision 1.96
diff -c -r1.96 ipci.c
*** src/backend/storage/ipc/ipci.c 12 May 2008 00:00:50 -0000 1.96
--- src/backend/storage/ipc/ipci.c 20 Aug 2008 15:24:26 -0000
***************
*** 18,23 ****
--- 18,24 ----
#include "access/heapam.h"
#include "access/multixact.h"
#include "access/nbtree.h"
+ #include "access/rmgr.h"
#include "access/subtrans.h"
#include "access/twophase.h"
#include "miscadmin.h"
***************
*** 102,107 ****
--- 103,109 ----
size = add_size(size, LockShmemSize());
size = add_size(size, ProcGlobalShmemSize());
size = add_size(size, XLOGShmemSize());
+ size = add_size(size, RmgrShmemSize());
size = add_size(size, CLOGShmemSize());
size = add_size(size, SUBTRANSShmemSize());
size = add_size(size, TwoPhaseShmemSize());
***************
*** 176,184 ****
InitShmemIndex();
/*
! * Set up xlog, clog, and buffers
*/
XLOGShmemInit();
CLOGShmemInit();
SUBTRANSShmemInit();
TwoPhaseShmemInit();
--- 178,187 ----
InitShmemIndex();
/*
! * Set up xlog, rmgr, clog, and buffers
*/
XLOGShmemInit();
+ RmgrShmemInit();
CLOGShmemInit();
SUBTRANSShmemInit();
TwoPhaseShmemInit();
Index: src/include/access/rmgr.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/rmgr.h,v
retrieving revision 1.17
diff -c -r1.17 rmgr.h
*** src/include/access/rmgr.h 5 Nov 2006 22:42:10 -0000 1.17
--- src/include/access/rmgr.h 20 Aug 2008 15:25:04 -0000
***************
*** 10,15 ****
--- 10,18 ----
typedef uint8 RmgrId;
+ extern Size RmgrShmemSize(void);
+ extern void RmgrShmemInit(void);
+
/*
* Built-in resource managers
*
***************
*** 23,28 ****
--- 26,33 ----
#define RM_DBASE_ID 4
#define RM_TBLSPC_ID 5
#define RM_MULTIXACT_ID 6
+ /* RmgrId 7 is currently unused */
+ /* RmgrId 8 is currently unused */
#define RM_HEAP2_ID 9
#define RM_HEAP_ID 10
#define RM_BTREE_ID 11
***************
*** 30,35 ****
#define RM_GIN_ID 13
#define RM_GIST_ID 14
#define RM_SEQ_ID 15
! #define RM_MAX_ID RM_SEQ_ID
#endif /* RMGR_H */
--- 35,49 ----
#define RM_GIN_ID 13
#define RM_GIST_ID 14
#define RM_SEQ_ID 15
! #define RM_MAX_FIXED_ID RM_SEQ_ID
!
! /*
! * RmgrId Reservation
! *
! * RmgrIds up to 31 are reserved for PostgreSQL Core use only
! * RmgrIds between 32 and 127 are available for registration by
! * PostgreSQL-related projects
! * RmgrIds of 128 and above are always designed for each site
! */
#endif /* RMGR_H */
Index: src/include/access/xlog_internal.h
===================================================================
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/include/access/xlog_internal.h,v
retrieving revision 1.24
diff -c -r1.24 xlog_internal.h
*** src/include/access/xlog_internal.h 11 Aug 2008 11:05:11 -0000 1.24
--- src/include/access/xlog_internal.h 31 Aug 2008 08:27:31 -0000
***************
*** 229,235 ****
*/
typedef struct RmgrData
{
! const char *rm_name;
void (*rm_redo) (XLogRecPtr lsn, XLogRecord *rptr);
void (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec);
void (*rm_startup) (void);
--- 229,235 ----
*/
typedef struct RmgrData
{
! char *rm_name;
void (*rm_redo) (XLogRecPtr lsn, XLogRecord *rptr);
void (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec);
void (*rm_startup) (void);
***************
*** 237,243 ****
bool (*rm_safe_restartpoint) (void);
} RmgrData;
! extern const RmgrData RmgrTable[];
/*
* Exported to support xlog switching from bgwriter
--- 237,254 ----
bool (*rm_safe_restartpoint) (void);
} RmgrData;
! typedef void (*rmgr_hook_type) (RmgrData *);
! extern PGDLLIMPORT rmgr_hook_type rmgr_hook;
!
! extern void RmgrInitialize(void);
! extern void RmgrStartup(void);
! extern void RmgrCleanup(void);
! extern bool RmgrSafeRestartpoint(XLogRecPtr checkPointRedo);
! extern void RmgrRedo(XLogRecPtr lsn, XLogRecord *rptr);
! extern void RmgrDesc(StringInfo buf, XLogRecord *rptr, char *rec);
! extern void rm_redo_error_callback(void *arg);
! extern const char *RmgrName(RmgrId rmid);
! extern bool RmgrIdIsValid(RmgrId rmid);
/*
* Exported to support xlog switching from bgwriter
Simon Riggs <simon@2ndQuadrant.com> wrote:
1) A refactoring of calls to Rmgr code from xlog.c, and having isolated
the code for rmgrs then to allow rmgr plugins to modify and/or add rmgrs
to Postgres. Includes additional code to generate log messages so we can
see what is happening after plugin has executed.
Why do we need to set rmgr_hook in _PG_init(), and add or mofify rmgrs
in our hook functions? I think it is possible to modify RmgrTable
directly in _PG_init() instead of to have rmgr_hook. If we can do so,
the patch would be more simple, no? Am I missing something?
Index: src/backend/access/transam/rmgr.c
===================================================================
--- src/backend/access/transam/rmgr.c (head)
+++ src/backend/access/transam/rmgr.c (new)
@@ -25,1 +25,1 @@
-const RmgrData RmgrTable[RM_MAX_ID + 1] = {
+RmgrData RmgrTable[MAX_NUM_RMGRS + 1] = {
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
On Mon, 2008-09-01 at 12:42 +0900, ITAGAKI Takahiro wrote:
Simon Riggs <simon@2ndQuadrant.com> wrote:
1) A refactoring of calls to Rmgr code from xlog.c, and having isolated
the code for rmgrs then to allow rmgr plugins to modify and/or add rmgrs
to Postgres. Includes additional code to generate log messages so we can
see what is happening after plugin has executed.Why do we need to set rmgr_hook in _PG_init(), and add or mofify rmgrs
in our hook functions? I think it is possible to modify RmgrTable
directly in _PG_init() instead of to have rmgr_hook. If we can do so,
the patch would be more simple, no? Am I missing something?
If we modify RmgrTable in _PG_init() then we would have to have that
structure available in all backends, which was a stated objective to
avoid. We would still need a fast access data structure for the
XLogInsert() check, so the RmgrTable would just be wasted space in all
normal backends. In the patch, plugin is only called when we call
RmgrInitialize(), so the memory is malloc'd only when required.
The other reason is that this way of doing things is common to most
other server hooks. It allows the loaded module to specify that there
will be a plugin, but for the server to determine when that is called.
Calling code at the time _PG_init() runs limits us to certain types of
activity.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> wrote:
Why do we need to set rmgr_hook in _PG_init(), and add or mofify rmgrs
in our hook functions?If we modify RmgrTable in _PG_init() then we would have to have that
structure available in all backends, which was a stated objective to
avoid. We would still need a fast access data structure for the
XLogInsert() check, so the RmgrTable would just be wasted space in all
normal backends. In the patch, plugin is only called when we call
RmgrInitialize(), so the memory is malloc'd only when required.
Um? AFAICS RmgrTable is not accessed in XLogInsert unless we use WAL_DEBUG.
I see that RmgrTable should be malloc'd when required,
but there is another issue; when to load rmgr libraries.
Rmgr objects are needed only in startup process during recovery.
If we want to reduce resource consumption by rmgrs, I think it is
better not to load rmgr libraries through shared_preload_libraries.
We don't have to load rmgr libs if recovery is not needed or after recovery.
How about adding a new variable "recovery_preload_libaries" like as
shared_preload_libraries? Rmgr libs in it are loaded only in startup
process and only if recovery is needed.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
On Tue, 2008-09-02 at 18:30 +0900, ITAGAKI Takahiro wrote:
Simon Riggs <simon@2ndQuadrant.com> wrote:
Why do we need to set rmgr_hook in _PG_init(), and add or mofify rmgrs
in our hook functions?If we modify RmgrTable in _PG_init() then we would have to have that
structure available in all backends, which was a stated objective to
avoid. We would still need a fast access data structure for the
XLogInsert() check, so the RmgrTable would just be wasted space in all
normal backends. In the patch, plugin is only called when we call
RmgrInitialize(), so the memory is malloc'd only when required.Um? AFAICS RmgrTable is not accessed in XLogInsert unless we use WAL_DEBUG.
Exactly why I want to malloc it.
I see that RmgrTable should be malloc'd when required,
but there is another issue; when to load rmgr libraries.
Rmgr objects are needed only in startup process during recovery.
If we want to reduce resource consumption by rmgrs, I think it is
better not to load rmgr libraries through shared_preload_libraries.
We don't have to load rmgr libs if recovery is not needed or after recovery.How about adding a new variable "recovery_preload_libaries" like as
shared_preload_libraries? Rmgr libs in it are loaded only in startup
process and only if recovery is needed.
Good point. If others agree, I will re-implement this way.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
ITAGAKI Takahiro wrote:
I see that RmgrTable should be malloc'd when required,
but there is another issue; when to load rmgr libraries.Rmgr objects are needed only in startup process during recovery.
If we want to reduce resource consumption by rmgrs, I think it is
better not to load rmgr libraries through shared_preload_libraries.
We don't have to load rmgr libs if recovery is not needed or after recovery.How about adding a new variable "recovery_preload_libaries" like as
shared_preload_libraries? Rmgr libs in it are loaded only in startup
process and only if recovery is needed.
It doesn't seem worth it to introduce a new GUC like that, just to
reduce the memory usage a tiny bit in the rare case that a rmgr plugin
is present. How much memory will loading an extra library consume
anyway? Depends on the library of course, but I believe we're talking
about something in the ballpark of a few hundred kb. Besides, a decent
OS should swap that to disk, if it's not used, and the system is tight
on memory.
Also, presumably the library containing the recovery functions, also
contains the functions that generate those WAL records. So, it will be
needed after startup anyway, if the plugin is used at all.
There's one more reason to use shared_preload_libraries. It provides a
sanity check that the library required for recovery is present and can
be loaded, even when no recovery is required. If you have misconfigured
your system so that it can't recover, you want to find out sooner rather
than later when recovery is needed.
So IMHO, just use shared_preload_libraries.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Tue, 2008-09-02 at 13:38 +0300, Heikki Linnakangas wrote:
There's one more reason to use shared_preload_libraries. It provides a
sanity check that the library required for recovery is present and
can
be loaded, even when no recovery is required. If you have
misconfigured
your system so that it can't recover, you want to find out sooner
rather
than later when recovery is needed.
Great reason.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes:
On Tue, 2008-09-02 at 18:30 +0900, ITAGAKI Takahiro wrote:
How about adding a new variable "recovery_preload_libaries" like as
shared_preload_libraries? Rmgr libs in it are loaded only in startup
process and only if recovery is needed.
Good point. If others agree, I will re-implement this way.
Aside from the objections raised by Heikki, I'm not seeing the use-case
for an rmgr that only executes during recovery; in fact I'm not entirely
sure that I see a use-case for this entire patch. Where are the WAL
records that the "loadable rmgr" processes going to come from?
regards, tom lane
On Tue, 2008-09-02 at 11:39 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On Tue, 2008-09-02 at 18:30 +0900, ITAGAKI Takahiro wrote:
How about adding a new variable "recovery_preload_libaries" like as
shared_preload_libraries? Rmgr libs in it are loaded only in startup
process and only if recovery is needed.Good point. If others agree, I will re-implement this way.
Aside from the objections raised by Heikki
Heikki hasn't raised any. He was objecting to an additional thought from
Itagaki. There haven't been any other objections to this concept.
, I'm not seeing the use-case
for an rmgr that only executes during recovery; in fact I'm not entirely
sure that I see a use-case for this entire patch. Where are the WAL
records that the "loadable rmgr" processes going to come from?
There is ample reason to do this. I covered this in my first post,
please re-read up thread. You have commented on this post already, so
I'm suprised by your comments.
Rmgr functions only execute during recovery, that is their role in life.
Except when we have WAL_DEBUG enabled they are never called elsewhere.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
On Tue, 2008-09-02 at 11:39 -0400, Tom Lane wrote:
, I'm not seeing the use-case
for an rmgr that only executes during recovery; in fact I'm not entirely
sure that I see a use-case for this entire patch. Where are the WAL
records that the "loadable rmgr" processes going to come from?There is ample reason to do this. I covered this in my first post,
please re-read up thread. You have commented on this post already, so
I'm suprised by your comments.
I think there's two different use cases here:
1. Filter WAL that's already generated, or is being generated by an
unmodified PostgreSQL instance.
2. Allow external modules to define new resource managers.
The examples you posted with the patch were of type 1. That's a very
valid use case, the example of only restoring a single database seems
like a useful tool. Another tool like that is pglesslog, although that
one couldn't actually be implemented with these hooks. I'm sure there's
more tricks like that people would find useful, if the tools existed.
The importance of the WAL will only increase as more people start to use
it for PITR, replication etc.
The 2nd use case, however, I find pretty unconvincing. I can't think of
a good example of that. Anything that needs to define its own resource
manager is very low-level stuff, and probably needs to go into the core
anyway.
So, let's focus on the 1st use case. I think a better approach for that
is to implement the filters as external programs, like pglesslog. It
allows more flexibility, although it also means that you can't rely on
existing backend functions to manipulate the WAL. I'd love to see a "WAL
toolkit" on pgfoundry, with tools like the filter to only restore a
single database, pglesslog, a WAL record viewer etc. A while ago, you
also talked about replacing the Slony triggers in the master with a tool
that reads the WAL; another good example of an external tool that needs
to read WAL. The toolkit could provide some sort of a framework and
common user interface to read and write WAL files for all those tools.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Sat, 2008-09-13 at 10:59 +0300, Heikki Linnakangas wrote:
The importance of the WAL will only increase as more people start to
use it for PITR, replication etc.
Agreed.
The 2nd use case, however, I find pretty unconvincing. I can't think of
a good example of that. Anything that needs to define its own resource
manager is very low-level stuff, and probably needs to go into the core
anyway.
New indexes are a big one, but I listed others also.
Indexes have always been able to be added dynamically. Now they can be
recovered correctly as well.
Other data structures can be maintained by trigger code that writes new
types of WAL. That was always possible before, now they can be
recoverable too.
If we have extensible functions, triggers, indexes, why not WAL? What is
the problem with making WAL extensible? It carries no penalty at all for
standard WAL records, since the internal design for WAL already caters
for exactly this.
So, let's focus on the 1st use case.
No, lets look at both...you can't just wave away half the use cases. If
you look at all of the use cases the argument for doing it externally
quickly falls apart since it severely limits what can be achieved.
I think a better approach for that
is to implement the filters as external programs, like pglesslog. It
allows more flexibility, although it also means that you can't rely on
existing backend functions to manipulate the WAL. I'd love to see a "WAL
toolkit" on pgfoundry, with tools like the filter to only restore a
single database, pglesslog, a WAL record viewer etc. A while ago, you
also talked about replacing the Slony triggers in the master with a tool
that reads the WAL; another good example of an external tool that needs
to read WAL. The toolkit could provide some sort of a framework and
common user interface to read and write WAL files for all those tools.
This patch provides exactly the toolkit you describe, just internally.
As you point out, doing it other ways means you can't access internal
functions easily and can't maintain internal data structures correctly
either. So doing it externally is *not* a substitute and this is not a
simple discussion of include/exclude from core.
I'm lost as to why suggesting we limit the functionality is going to be
a good thing? If external tools really are so good, then we can do that
*as well*.
But this is only a plugin API, so the tools will be developed externally
anyway.
A while ago, you
also talked about replacing the Slony triggers in the master with a tool
that reads the WAL
Writes the WAL you mean? Slony triggers could write data to WAL rather
than log tables and the slon daemon can be implemented as an rmgr
plugin. Or many other options.
Another tool like that is pglesslog, although that
one couldn't actually be implemented with these hooks.
Sounds like we'll want to integrate that into synch replication some
how, so suggestions as to how to do that welcome - if you're not already
doing it via some other plugin in synch rep code?
I'm sure there's
more tricks like that people would find useful, if the tools existed.
Agreed. So lets make them exist.
If there's an argument against doing this, I've not heard it made
clearly by anybody. When we discussed it first on hackers there was no
objection, so I wrote the patch. If people want to see this blocked now,
we need some good reasons.
I've got nothing riding on the acceptance of this patch, I just think
its a good thing. That's why I deprioritised it. If there's some hidden
threat to national security or whatever, tell me off list.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes:
On Sat, 2008-09-13 at 10:59 +0300, Heikki Linnakangas wrote:
The 2nd use case, however, I find pretty unconvincing. I can't think of
a good example of that. Anything that needs to define its own resource
manager is very low-level stuff, and probably needs to go into the core
anyway.New indexes are a big one, but I listed others also.
Indexes have always been able to be added dynamically. Now they can be
recovered correctly as well.
Hm, so currently if you want to add a new indexam you can't just insert into
pg_am and make them recoverable. You basically have to build in your new index
access method into Postgres with the new rmgr. That is annoying and a problem
worth tackling.
But I'm a bit worried about having this be an external plugin. There's no way
to looking at a WAL file to know whether it will be recoverable with the
plugins available. Worse, there's a risk you could have a plugin but not the
*right* plugin. Perhaps this could be tackled simply by having startup insert
a record listing all the rmgr's in use with identifying information and their
version numbers.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!
On Mon, 2008-09-15 at 10:47 +0100, Gregory Stark wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On Sat, 2008-09-13 at 10:59 +0300, Heikki Linnakangas wrote:
The 2nd use case, however, I find pretty unconvincing. I can't think of
a good example of that. Anything that needs to define its own resource
manager is very low-level stuff, and probably needs to go into the core
anyway.New indexes are a big one, but I listed others also.
Indexes have always been able to be added dynamically. Now they can be
recovered correctly as well.Hm, so currently if you want to add a new indexam you can't just insert into
pg_am and make them recoverable. You basically have to build in your new index
access method into Postgres with the new rmgr. That is annoying and a problem
worth tackling.
Agreed.
But I'm a bit worried about having this be an external plugin. There's no way
to looking at a WAL file to know whether it will be recoverable with the
plugins available. Worse, there's a risk you could have a plugin but not the
*right* plugin.
That risk was discussed and is handled in the plugin. You are limited to
only insert data into WAL that has a current plugin that says it will
handle redo for that type.
Perhaps this could be tackled simply by having startup insert
a record listing all the rmgr's in use with identifying information and their
version numbers.
Non-standard plugins in use are listed when in use, so we can all see
what's going on. Plugins can issue their own startup messages if they
choose, with version numbers and other details.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Gregory Stark <stark@enterprisedb.com> writes:
Simon Riggs <simon@2ndQuadrant.com> writes:
Indexes have always been able to be added dynamically. Now they can be
recovered correctly as well.
Hm, so currently if you want to add a new indexam you can't just insert into
pg_am and make them recoverable. You basically have to build in your new index
access method into Postgres with the new rmgr. That is annoying and a problem
worth tackling.
I concur with Heikki that that's not exactly a compelling use-case.
I've never heard of anyone building a non-core index AM at all; much
less trying to use it in a production context. Given the obvious
potential for version-mismatch-type problems, it's hard to believe
that anyone ever would try.
regards, tom lane
On Mon, 2008-09-15 at 08:52 -0400, Tom Lane wrote:
Gregory Stark <stark@enterprisedb.com> writes:
Simon Riggs <simon@2ndQuadrant.com> writes:
Indexes have always been able to be added dynamically. Now they can be
recovered correctly as well.Hm, so currently if you want to add a new indexam you can't just insert into
pg_am and make them recoverable. You basically have to build in your new index
access method into Postgres with the new rmgr. That is annoying and a problem
worth tackling.I concur with Heikki that that's not exactly a compelling use-case.
I've never heard of anyone building a non-core index AM at all; much
less trying to use it in a production context. Given the obvious
potential for version-mismatch-type problems, it's hard to believe
that anyone ever would try.
The lack of a chicken is not an argument against the use case for an
egg.
But in any case, Bizgres was exactly this case, so they already did. We
just forced the authors to produce a code fork to do it, confusing
people rather than attracting people to Postgres.
We have plugin APIs with possible version mismatches in other contexts,
but I don't see us doing anything about that. In the context of WAL, the
basic WAL format has not changed in about 6 releases, so its been one of
the most stable file formats. Certain message types have changed, but
messages are all independent across rmgrs, so insulated from change.
The version mismatch idea presumes that a code author would structure
their code in two pieces: one to generate the WAL and one to read it.
Seems much more likely to me that authors would have one module
containing both as a way of avoiding the problem altogether. So I'm not
sure what to check, and against what?
When people do write useful plugins in the future they will be
potentially usable with any server at 8.4 or above. If we had had this
feature a few releases ago, we could have made GIN available to earlier
releases, for example.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Tom Lane <tgl@sss.pgh.pa.us> writes:
Gregory Stark <stark@enterprisedb.com> writes:
Simon Riggs <simon@2ndQuadrant.com> writes:
Indexes have always been able to be added dynamically. Now they can be
recovered correctly as well.Hm, so currently if you want to add a new indexam you can't just insert into
pg_am and make them recoverable. You basically have to build in your new index
access method into Postgres with the new rmgr. That is annoying and a problem
worth tackling.I concur with Heikki that that's not exactly a compelling use-case.
I've never heard of anyone building a non-core index AM at all; much
less trying to use it in a production context. Given the obvious
potential for version-mismatch-type problems, it's hard to believe
that anyone ever would try.
Well wasn't GIST such an instance until we put it in core? IIRC it lived in
contrib for a long time. It happens that the route they took was to implement
it without recoverability until it was in core then add logging. I suspect we
would lean on any new method to have logging before it was merged in though.
I think the version-mismatch problems are fairly important though which is why
I was suggesting providing checks for that in postgres. Simon's right though
that the plugin could check for it itself.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
Simon Riggs <simon@2ndQuadrant.com> writes:
The version mismatch idea presumes that a code author would structure
their code in two pieces: one to generate the WAL and one to read it.
Seems much more likely to me that authors would have one module
containing both as a way of avoiding the problem altogether. So I'm not
sure what to check, and against what?
No, the danger is that someone generates a backup with one version of the
plugin and then restores with a different version of the plugin.
That would be frightfully easy to do when doing a minor upgrade, for example.
Or on a standby database if you've installed a new version of the plugin since
the standby was built.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!
Simon Riggs wrote:
On Mon, 2008-09-15 at 08:52 -0400, Tom Lane wrote:
I've never heard of anyone building a non-core index AM at all; much
less trying to use it in a production context. Given the obvious
potential for version-mismatch-type problems, it's hard to believe
that anyone ever would try.The lack of a chicken is not an argument against the use case for an
egg.But in any case, Bizgres was exactly this case, so they already did. We
just forced the authors to produce a code fork to do it, confusing
people rather than attracting people to Postgres.
Are you referring to the bitmap index patch? IIRC, there was some
non-trivial changes to indexam API in there, as well as issues with
VACUUM. If anything, that patch supports the assumption that anything
that needs WAL-logging is working at such a low-level that it needs to
be in core anyway.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndQuadrant.com> writes:
We have plugin APIs with possible version mismatches in other contexts,
but I don't see us doing anything about that. In the context of WAL, the
basic WAL format has not changed in about 6 releases, so its been one of
the most stable file formats.
Er, that's simply false. Read the revision history for xlog_internal.h.
The version mismatch idea presumes that a code author would structure
their code in two pieces: one to generate the WAL and one to read it.
No, the version mismatch problem is that you might try to read the WAL
with a different version of the plugin than you wrote it with. Or maybe
with a completely unrelated plugin that was unfortunate enough to choose
the same rmgr ID. We can't afford to insert complete versioning
information into each WAL record, so it's going to be pretty difficult
to avoid this risk.
When people do write useful plugins in the future they will be
potentially usable with any server at 8.4 or above. If we had had this
feature a few releases ago, we could have made GIN available to earlier
releases, for example.
Well, the initial commit for GIN looked like this:
2006-05-02 07:28 teodor
* contrib/tsearch2/Makefile, contrib/tsearch2/ginidx.c,
contrib/tsearch2/tsearch.sql.in,
contrib/tsearch2/expected/tsearch2.out,
contrib/tsearch2/sql/tsearch2.sql, src/backend/access/Makefile,
src/backend/access/gin/Makefile, src/backend/access/gin/README,
src/backend/access/gin/ginarrayproc.c,
src/backend/access/gin/ginbtree.c,
src/backend/access/gin/ginbulk.c,
src/backend/access/gin/gindatapage.c,
src/backend/access/gin/ginentrypage.c,
src/backend/access/gin/ginget.c,
src/backend/access/gin/gininsert.c,
src/backend/access/gin/ginscan.c, src/backend/access/gin/ginutil.c,
src/backend/access/gin/ginvacuum.c,
src/backend/access/gin/ginxlog.c,
src/backend/access/transam/rmgr.c, src/backend/commands/cluster.c,
src/backend/commands/opclasscmds.c, src/backend/commands/vacuum.c,
src/backend/utils/adt/selfuncs.c, src/backend/utils/init/globals.c,
src/backend/utils/misc/guc.c, src/include/access/gin.h,
src/include/access/rmgr.h, src/include/catalog/catversion.h,
src/include/catalog/pg_am.h, src/include/catalog/pg_amop.h,
src/include/catalog/pg_amproc.h, src/include/catalog/pg_opclass.h,
src/include/catalog/pg_operator.h, src/include/catalog/pg_proc.h,
src/include/utils/selfuncs.h, src/test/regress/data/array.data,
src/test/regress/expected/arrays.out,
src/test/regress/expected/create_index.out,
src/test/regress/expected/create_table.out,
src/test/regress/expected/opr_sanity.out,
src/test/regress/expected/sanity_check.out,
src/test/regress/input/copy.source,
src/test/regress/output/copy.source,
src/test/regress/output/misc.source,
src/test/regress/sql/arrays.sql,
src/test/regress/sql/create_index.sql,
src/test/regress/sql/create_table.sql,
src/test/regress/sql/opr_sanity.sql: GIN: Generalized Inverted
iNdex. text[], int4[], Tsearch2 support for GIN.
Had the only core source file touched been rmgr.c, then maybe this
argument would be valid ...
regards, tom lane
On Mon, 2008-09-15 at 10:04 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
The version mismatch idea presumes that a code author would structure
their code in two pieces: one to generate the WAL and one to read it.No, the version mismatch problem is that you might try to read the WAL
with a different version of the plugin than you wrote it with. Or maybe
with a completely unrelated plugin that was unfortunate enough to choose
the same rmgr ID. We can't afford to insert complete versioning
information into each WAL record, so it's going to be pretty difficult
to avoid this risk.
I'm happy to include additional things into the patch, but I don't see
anything to force rejection of the patch entirely, from what has been
said.
Bottom line is that any backup of Postgres needs to include plugin
directories, otherwise parts of the application could stop working
following restore. This patch doesn't change that.
I proposed a registration scheme to avoid one of those problems.
If a plugin changed its file format, it would clearly need to include a
version test within it. It wouldn't be the fault of the plugin API if
the plugin author didn't handle that. But if they can work out how to
build an index AM and write WAL, I'm sure they can handle version
management between software components and informative error messages if
problems occur. And if they can't, they'll get a bad rep and nobody will
use the plugin.
Few ideas:
* add the rmgr bms to the long header of each WAL file
* change !RmgrIdIsValid() so that it causes FATAL by default. This then
allows people to correct a mistake and retry. We provide an option to
treat such errors as corrupt data and assume we have reached the end of
WAL.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs <simon@2ndQuadrant.com> writes:
Bottom line is that any backup of Postgres needs to include plugin
directories, otherwise parts of the application could stop working
following restore. This patch doesn't change that.
No, backups of executables are normally not the same backups as the data and
in many cases -- minor upgrades for example -- cannot be.
* add the rmgr bms to the long header of each WAL file
* change !RmgrIdIsValid() so that it causes FATAL by default. This then
allows people to correct a mistake and retry. We provide an option to
treat such errors as corrupt data and assume we have reached the end of
WAL.
I'm not sure but I think this just begs the question. The problem is to ensure
that the rmgrid means the same thing on the restoring database as it does on
the original database, or at least a compatible version. I think this would
mean having a long text description and version number to compare.
And as Tom points out startup isn't often enough. Would WAL headers even be
often enough? We would have to ensure there was never two versions of the
plugin in the same WAL file.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning
On Mon, 2008-09-15 at 16:43 +0100, Gregory Stark wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
Bottom line is that any backup of Postgres needs to include plugin
directories, otherwise parts of the application could stop working
following restore. This patch doesn't change that.No, backups of executables are normally not the same backups as the data and
in many cases -- minor upgrades for example -- cannot be.
So you advise your clients to do backup in two halves. Then complain
that this is a bad way to do a backup because it may cause
insurmountable problems on restore. And then seek to reject a patch
because of that, even though similar problems already exist in other
parts of the system. I'm sorry, but that is circular, then faulted
logic.
If you do a minor upgrade that changes the on-disk format of *any* part
of the system then you have just introduced a limitation into what
software can be used for restore. That could be a new version of a
custom datatype just as easily as it could be an rmgr plugin. Shall we
ban custom datatypes? Should we add a version number into every varlen
header just in case somebody switched release levels, then forgot?
* add the rmgr bms to the long header of each WAL file
* change !RmgrIdIsValid() so that it causes FATAL by default. This then
allows people to correct a mistake and retry. We provide an option to
treat such errors as corrupt data and assume we have reached the end of
WAL.I'm not sure but I think this just begs the question. The problem is to ensure
that the rmgrid means the same thing on the restoring database as it does on
the original database, or at least a compatible version. I think this would
mean having a long text description and version number to compare.
Why is that any different to using functions or other plugins? If you
restore data into a database where the functions have the same names,
yet do different things then you are in trouble. Period.
If you don't use an rmgr plugin at all then you have nothing different
to do, nor will you see any different messages. If you use *any*
external server software, expect to read the instructions or have
problems.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Mon, 2008-09-15 at 16:00 +0100, Simon Riggs wrote:
On Mon, 2008-09-15 at 10:04 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
The version mismatch idea presumes that a code author would
structure
their code in two pieces: one to generate the WAL and one to read
it.
No, the version mismatch problem is that you might try to read the
WAL
with a different version of the plugin than you wrote it with. Or
maybe
with a completely unrelated plugin that was unfortunate enough to
choose
the same rmgr ID. We can't afford to insert complete versioning
information into each WAL record, so it's going to be prettydifficult
to avoid this risk.
There are a few cases we might be concerned about:
* rmgr plugin using duplicate id
* rmgr plugin using wrong id
* missing rmgr plugin
* version mismatch on WAL format of Rmgr record
* rmgr plugin wrong version to server version
We can handle those issues like this
* rmgr plugin using duplicate id
Registration scheme avoids this somewhat. Plugin setup disallows
possibility of multiple plugins with same id. Ranges of values have been
assigned so we advise people which values to use. This is same issue as
IANA for well known port numbers.
* rmgr plugin using wrong id
We can't actually force code to make calls using the right RmgrId. So
the best we can do is tell the plugin what number it has been assigned
to and have it throw an error if it doesn't expect that number and can't
dynamically change its RmgrId. The latter is just too complex too expect
people to do, so hardcoding the RmgrId is the expected way. That,
combined with the registration scheme should be sufficient. We aren't
expecting more than a 10-20 of these anyway. Again, similar to TCP/IP -
not all software knows how to operate on different port numbers.
* missing rmgr plugin
It seems sufficient to make the test for !RmgrIdIsValid return FATAL.
This then allows people to correct mistakes and retry. By the time this
test happens the WAL record has been CRC checked and has a valid header
in all other ways. We can document how to put in a dummy handler if the
rmgr plugin has bugs that need to be worked around to continue recovery.
* version mismatch on WAL format of Rmgr record
Immediately following the "shutdown" checkpoint at startup we should
write out a version string for all rmgr plugins as a WAL record. On
replay xlog_redo() will read the WAL record and test it against the
version information presented by the current plugin. If there is a
mismatch, we throw a FATAL error. This stops replay exactly on a
checkpoint record, so if you change rmgr to next version and restart
then no problems ensue. We allow new rmgr plugins that were not there
previously. These changes allow you to rollforward past a change in rmgr
WAL format. This then allows a standby server to be upgraded immediately
following a master server when using log shipping replication even when
the rmgr plugin changes its WAL format.
We don't go to that trouble for datatype versions, perhaps we should.
Overall, version management of rmgr plugins is same difficulty and can
result in errors of same severity as existing PostgresSQL extensions. So
there seems to be no additional risk from allowing them, in general.
We never expect to handle WAL changes across major versions and we will
increase the version numbers of all standard rmgrs at major releases, by
keying them to version numbers/strings already in use. WAL format
changes will be strongly discouraged for any single release.
* rmgr plugin wrong version to server version
Plugins are expected to test whether they are operating at the correct
version for the server. This information is already available if the
plugin wishes to act correctly.
So to do all of the above we need 3 changes:
* !RmgrIdIsValid throws FATAL not ERROR in ReadRecord()
* Alter rmgr API call to send RmgrId to plugin so it can throw error or
handle it somehow.In most cases this will lead to an plugin error "must
be defined on RmgrId 125" or similar.
* New rmgr API call to getRmgrVersion() so we can test it at startup
shutdown checkpoints.
It will take some time to make and test these changes, so this patch
should be marked returned with feedback from this commitfest.
If there are some more error modes, please say. I can't see any others.
This patch is about flexibility for the future. I have no pressing need
for this to go into 8.4, I just think its a good thing if it does. If
people think it poses a genuine risk or that there is insufficient time
to properly assess risks, then we can withdraw it.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support