PITR Phase 1 - Test results
I've now completed the coding of Phase 1 of PITR.
This allows a backup to be recovered and then rolled forward (all the
way) on transaction logs. This proves the code and the design works, but
also validates a lot of the earlier assumptions that were the subject of
much earlier debate.
As noted in the previous designs, PostgreSQL talks to an external
archiver using the XLogArchive API.
I've now completed:
- changes to PostgreSQL
- written a simple archiving utility, pg_arch
Using both of these together, I have successfully:
- started pg_arch
- started postgres
- taken a backup using tar
- ran pgbench for an extended period, so that the transaction logs taken
at the start have long since been recycled
- killed postmaster
- wait for completion
- rm -R $PGDATA
- restore using tar
- restore xlogs from archive directory
- start postmaster and watch it recover to end of logs
This has been tested through a number of times on non-trivial tests and
I've sat and watch the beast at work to make sure nothing wierd was
happening on timing.
At this stage:
Missing Functions -
- recovery does NOT yet stop at a specified point-in-time (that was
always planned for Phase 2)
- few more log messages required to report progress
- debug mode required to allow most to be turned off
Wrinkles
- code is system testable, but not as cute as it could be
- input from committers is now sought to complete the work
- you are strongly advised not to treat any of the patches as usable in
any real world situation YET - that bit comes next
Bugs
- two bugs currently occur during some tests:
1. the notification mechanism as originally designed causes ALL backends
to report that a log file has closed. That works most of the time,
though does give rise to occaisional timing errors - nothing too
serious, but this inexactness could lead to later errors.
2. After restore, the notification system doesn't recover fully - this
is a straightforward one
I'm building a full patchset for this code and will upload this soon. As
you might expect over the time its taken me to develop this, some bitrot
has set in, so I'm rebuilding it against the latest dev version now, and
will complete fixes for the two bugs mentioned above.
I'm sure some will say "no words, show me the code"... I thought you all
would appreciate some advance warning of this, to plan time to
investigate and comment upon the coding.
Best Regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com
I want to come hug you --- where do you live? !!!
:-)
---------------------------------------------------------------------------
Simon Riggs wrote:
I've now completed the coding of Phase 1 of PITR.
This allows a backup to be recovered and then rolled forward (all the
way) on transaction logs. This proves the code and the design works, but
also validates a lot of the earlier assumptions that were the subject of
much earlier debate.As noted in the previous designs, PostgreSQL talks to an external
archiver using the XLogArchive API.
I've now completed:
- changes to PostgreSQL
- written a simple archiving utility, pg_archUsing both of these together, I have successfully:
- started pg_arch
- started postgres
- taken a backup using tar
- ran pgbench for an extended period, so that the transaction logs taken
at the start have long since been recycled
- killed postmaster
- wait for completion
- rm -R $PGDATA
- restore using tar
- restore xlogs from archive directory
- start postmaster and watch it recover to end of logsThis has been tested through a number of times on non-trivial tests and
I've sat and watch the beast at work to make sure nothing wierd was
happening on timing.At this stage:
Missing Functions -
- recovery does NOT yet stop at a specified point-in-time (that was
always planned for Phase 2)
- few more log messages required to report progress
- debug mode required to allow most to be turned offWrinkles
- code is system testable, but not as cute as it could be
- input from committers is now sought to complete the work
- you are strongly advised not to treat any of the patches as usable in
any real world situation YET - that bit comes nextBugs
- two bugs currently occur during some tests:
1. the notification mechanism as originally designed causes ALL backends
to report that a log file has closed. That works most of the time,
though does give rise to occaisional timing errors - nothing too
serious, but this inexactness could lead to later errors.
2. After restore, the notification system doesn't recover fully - this
is a straightforward oneI'm building a full patchset for this code and will upload this soon. As
you might expect over the time its taken me to develop this, some bitrot
has set in, so I'm rebuilding it against the latest dev version now, and
will complete fixes for the two bugs mentioned above.I'm sure some will say "no words, show me the code"... I thought you all
would appreciate some advance warning of this, to plan time to
investigate and comment upon the coding.Best Regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Well, I guess I was fairly happy too :-)
I'd be more comfortable if I'd found more bugs though, but I'm sure the
kind folk on this list will see that wish of mine comes true!
The code is in a "needs more polishing" state - which is just the right
time for some last discussions before everything sets too solid.
Regards, Simon
Show quoted text
On Mon, 2004-04-26 at 17:48, Bruce Momjian wrote:
I want to come hug you --- where do you live? !!!
:-)
---------------------------------------------------------------------------
Simon Riggs wrote:
I've now completed the coding of Phase 1 of PITR.
This allows a backup to be recovered and then rolled forward (all the
way) on transaction logs. This proves the code and the design works, but
also validates a lot of the earlier assumptions that were the subject of
much earlier debate.As noted in the previous designs, PostgreSQL talks to an external
archiver using the XLogArchive API.
I've now completed:
- changes to PostgreSQL
- written a simple archiving utility, pg_archUsing both of these together, I have successfully:
- started pg_arch
- started postgres
- taken a backup using tar
- ran pgbench for an extended period, so that the transaction logs taken
at the start have long since been recycled
- killed postmaster
- wait for completion
- rm -R $PGDATA
- restore using tar
- restore xlogs from archive directory
- start postmaster and watch it recover to end of logsThis has been tested through a number of times on non-trivial tests and
I've sat and watch the beast at work to make sure nothing wierd was
happening on timing.At this stage:
Missing Functions -
- recovery does NOT yet stop at a specified point-in-time (that was
always planned for Phase 2)
- few more log messages required to report progress
- debug mode required to allow most to be turned offWrinkles
- code is system testable, but not as cute as it could be
- input from committers is now sought to complete the work
- you are strongly advised not to treat any of the patches as usable in
any real world situation YET - that bit comes nextBugs
- two bugs currently occur during some tests:
1. the notification mechanism as originally designed causes ALL backends
to report that a log file has closed. That works most of the time,
though does give rise to occaisional timing errors - nothing too
serious, but this inexactness could lead to later errors.
2. After restore, the notification system doesn't recover fully - this
is a straightforward oneI'm building a full patchset for this code and will upload this soon. As
you might expect over the time its taken me to develop this, some bitrot
has set in, so I'm rebuilding it against the latest dev version now, and
will complete fixes for the two bugs mentioned above.I'm sure some will say "no words, show me the code"... I thought you all
would appreciate some advance warning of this, to plan time to
investigate and comment upon the coding.Best Regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Simon Riggs wrote:
Well, I guess I was fairly happy too :-)
YES!
I'd be more comfortable if I'd found more bugs though, but I'm sure the
kind folk on this list will see that wish of mine comes true!The code is in a "needs more polishing" state - which is just the right
time for some last discussions before everything sets too solid.
Once we see the patch, we will be able to eyeball all the code paths and
interface to existing code and will be able to spot a lot of stuff, I am
sure.
It might take a few passes over it but you will get all the support and
ideas we have.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
I want to come hug you --- where do you live? !!!
You're not the only one. But we don't want to smother the poor guy, at
least not before he completes his work :-)
On Mon, 2004-04-26 at 16:37, Simon Riggs wrote:
I've now completed the coding of Phase 1 of PITR.
This allows a backup to be recovered and then rolled forward (all the
way) on transaction logs. This proves the code and the design works, but
also validates a lot of the earlier assumptions that were the subject of
much earlier debate.As noted in the previous designs, PostgreSQL talks to an external
archiver using the XLogArchive API.
I've now completed:
- changes to PostgreSQL
- written a simple archiving utility, pg_arch
This will be on HACKERS not PATCHES for a while...
OVERVIEW :
Various code changes. Not all included here...but I want to prove this
is real, rather than have you waiting for my patch release skills to
improve.
PostgreSQL changes include:
============================
- guc.c
New GUC called wal_archive to control archival logging/not.
- xlog.h
GUC added here
- xlog.c
The most critical parts of the code live here. The way things currently
work can be thought of as a circular set of logs, with the current log
position sweeping around the circle like a clock. In order to archive an
xlog, you must start just AFTER the file has been closed and BEFORE the
pointer sweeps round again.
The code here tries to spot the right moment to notify the archive that
its time to archive. That point is critical, too early and the archive
may yet be incomplete, too late and a window of failure creeps into the
system.
Finding that point is more complicated than it seems because every
backend has the same file open and decides to close it at different
times - nearly the same time if you're running pgbench, but could vary
considerably otherwise. That timing difference is the source of Bug#1.
My solution is to use the piece of code that first updates pg_control,
since there is a similar need to only-do-it-once. My understanding is
that the other backends eventually discover they are supposed to be
looking at a different file now and reset themselves - so that the xlog
gets fsynced only once.
It's taken me a week to consider the alternatives...this point is
critical, so please suggest if you know/think differently.
When the pointer sweeps round again, if we are still archiving, we
simply increase the number of logs in the cycle to defer when we can
recycle the xlog. The code doesn't yet handle a failure condition we
discussed previously: running out of disk space and how we handle that
(there was detailed debate, noted for future implementation).
New utility aimed at being located in src/bin/pg_arch
=======================================================
- pg_arch.c
The idea of pg_arch is that it is a functioning archival tool and at the
same time is the reference implementation of the XLogArchive API. The
API is all wrapped up in the same file currently, to make it easier to
implement, but I envisage separating these out into two parts after it
passes initial inspection - shouldn't take too much work given that was
its design goal. This will then allow the API to be used for wider
applications that want to backup PostgreSQL.
- src/bin/Makefile has been updated to include pg_arch, so that this
then gets made as part of the full system rather than an add-on. I'm
sure somebody has feelings on this...my thinking was that it ought to be
available without too much effort.
What's NOT included (YET!)
==========================
-changes to initdb
-changes to postgresql.conf
-changes to wal_debug
-related changes
-user documentation
- changes to initdb
XLogArchive API implementation relies on the existence of
$PGDATA/pg_rlog
That would be relatively simple to add to initdb, but its also a no
brainer to add without it, so I thought I'd leave it for discussion in
case anybody has good reasons to put elsewhere/rename it etc.
More importantly, this effects the security model used by XLogArchive.
The way I had originally envisaged this, the directory permissions would
be opened up for group level read/write thus:
pg_xlog rwxr-x---
pg_rlog rwxrwx---
though this of course relies on $PGDATA being opened up also. That then
would allow the archiving tool to be in its own account also, yet with a
shared group. (Thinking that a standard Legato install (for instance) is
unlikely to recommend sharing a UNIX userid with PostgreSQL). I was
unaware that PostgreSQL checks the permissions of PGDATA before it
starts and does not allow you to proceed if group permissions exist.
We have two options:-related changes
-user documentation
i) alter all things that rely on security being userlevel-only
- initdb
- startup
- most other security features?
ii) encourage (i.e. force) people using XLogArchive API to run as the
PostgreSQL owning-user (postgres).
I've avoided this issue in the general implementation, thinking that
there'll be some strong feelings either way, or an alternative that I
haven't thought of yet (please...)
-changes to postgresql.conf
The parameter setting
wal_archive=true
needs to be added to make XLogArchive work or not.
I've not added this to the install template (yet), in case we had some
further suggestions for what this might be called.
-related changes
-user documentation
-changes to wal_debug
The XLOG_DEBUG flag is set as a value between 1 and 16, though the code
only ever treats this as a boolean. For my development, I partially
implemented an earlier suggestion of mine: set the flag to 1 in the
config file, then set the more verbose portions of debug output to
trigger when its set to 16. That effected a couple of places in xlog.c.
That may not be needed, so thats not included either.
-user documentation
Not yet...but it will be.
Show quoted text
Bugs
- two bugs currently occur during some tests:
1. the notification mechanism as originally designed causes ALL backends
to report that a log file has closed. That works most of the time,
though does give rise to occasional timing errors - nothing too
serious, but this inexactness could lead to later errors.
2. After restore, the notification system doesn't recover fully - this
is a straightforward one
Attachments:
Makefile.patchtext/x-patch; charset=; name=Makefile.patchDownload
*** Makefile1.43 2004-04-24 09:56:30.000000000 +0100
--- Makefile 2004-04-24 09:59:02.000000000 +0100
***************
*** 13,19 ****
top_builddir = ../..
include $(top_builddir)/src/Makefile.global
! DIRS := initdb initlocation ipcclean pg_ctl pg_dump \
psql scripts pg_config pg_controldata pg_resetxlog
all install installdirs uninstall depend distprep:
--- 13,19 ----
top_builddir = ../..
include $(top_builddir)/src/Makefile.global
! DIRS := initdb initlocation ipcclean pg_ctl pg_dump pg_arch\
psql scripts pg_config pg_controldata pg_resetxlog
all install installdirs uninstall depend distprep:
pgarch.tarapplication/x-tar; name=pgarch.tarDownload
pg_arch/pg_arch.c 0100754 0000764 0000764 00000036301 10041320574 013620 0 ustar sriggs sriggs /*-------------------------------------------------------------------------
*
* pg_arch.c
* A utility to archive xlogs to an archive directory
* Uses the XLogArchive API to decide when to archive particular xlogs
*
* Copyright (c) Simon Riggs <simon@2ndQuadrant.com>, 2004;
* licence: BSD
*
* Portions Copyright (c) 2004, PostgreSQL Global Development Group
* Usage Notes:
* Will use PGDATA if no DATADIR is supplied
*
* During archive, the name and filedates of the xlog are NOT changed!
*
* If you have multiple PostgreSQL instances on one machine, then you
* SHOULD NOT use pg_arch to copy all xlogs from all instances to the
* same archive directory - the xlogs will be indistinguishable and
* recovery will be impossible. Create an archive directory for each instance.
*
* Program overview:
* 1. Initialises, then enters main loop
* 2. Main Loop:
* XLogArchiveXLogs( ) to check for files to be archived
* If there is a file
* Copy file to <ArchiveDest>
* XLogArchiveComplete( )
* Else
* Wait for <CheckTimer>
* Loop again forever
* 3. Handle any signals that get sent
*-------------------------------------------------------------------------
*/
#include "postgres.h"
#include <errno.h>
#include <unistd.h>
#include <dirent.h>
#include <locale.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include "access/xlog.h"
/*
#include "xarchive.h"
*/
#define _(x) gettext((x))
/******************** stuff copied from xlog.c ********************/
/* Increment an xlogid/segment pair */
#define NextLogSeg(logId, logSeg) \
do { \
if ((logSeg) >= XLogSegsPerFile-1) \
{ \
(logId)++; \
(logSeg) = 0; \
} \
else \
(logSeg)++; \
} while (0)
#define XLogFileName(path, log, seg) \
snprintf(path, MAXPGPATH, "%s/%08X%08X", \
XLogDir, log, seg)
/******************** end of stuff copied from xlog.c ********************/
static char *progname;
extern char *optarg;
static char *DataDir;
static char *ArchiveDestDir;
static char XLogDir[MAXPGPATH];
static char XLogArchiveDir[MAXPGPATH]; /* XLogArchive API */
uint32 nextrlogId = -1;
uint32 nextrlogSeg;
/* Function prototypes */
static bool XLogArchiveXLogs(char *xlog, char *rlogdir);
static bool XLogArchiveComplete(char *xlog, char *rlogdir);
static bool CopyXLogtoArchive(char *xlog);
static void usage(void);
/*
* XLogArchiveXLogs
*
* Return name of the oldest xlog file that has not yet been archived,
* setting notification that file archiving is now in progress.
* It is important that we return the oldest, so that we archive xlogs
* in order that they were written, for two reasons:
* 1) to maintain the sequential chain of xlogs required for recovery
* 2) because the oldest ones will sooner become candidates for
* recycling by checkpoint backend.
*/
static bool
XLogArchiveXLogs(char *xlog, char *rlogdir)
{
/* implementation:
* if first call, open XLogArchive directory and read through list of
* rlogs that have the .full suffix, looking for earliest file.
* Decode xlog part of rlog filename back to
* log/seg values, then increment, so we can predict next rlog.
* If not first call, use remembered next rlog value in call to stat,
* to see if that file is available yet. If not, return empty handed.
* If so, set rlog file to .busy, increment rlog value again and then
* return name of available file to allow copy to archive to begin.
*/
char rlogfull[MAXPGPATH];
char rlogbusy[MAXPGPATH];
char newxlog[32];
char nextlogstr[8];
DIR *rldir;
struct dirent *rlde;
char *endptr;
int rc;
struct stat statbuf;
bool firstfile;
if (nextrlogId == -1) {
rldir = opendir(rlogdir);
if (rldir == NULL)
fprintf(stderr, _("%s could not open rlog directory\n"), progname);
printf(_("\n%s firstcall: scanning rlogdir...\n"), progname);
firstfile = true;
while ((rlde = readdir(rldir)) != NULL)
{
/*
printf(_("\n%s found... %s\n"), progname, rlde->d_name);
if (strlen(rlde->d_name) == 21)
printf(_("\n%s namelen=21\n"), progname);
if (strspn(rlde->d_name, "0123456789ABCDEF") == 16)
printf(_("\n%s composed of hex chars\n"), progname);
if (strcmp(rlde->d_name + 16, ".full") == 0)
printf(_("\n%s name+16=full\n"), progname);
*/
if (strlen(rlde->d_name) == 21 &&
strspn(rlde->d_name, "0123456789ABCDEF") == 16 &&
strcmp(rlde->d_name + 16, ".full") == 0)
{
/*
printf(_("\n%s identify... %s\n"), progname, rlde->d_name);
*/
if (firstfile) {
strcpy(newxlog, rlde->d_name);
firstfile = false;
} else {
/* strip off the suffix to get xlog name */
if (strcmp(rlde->d_name, newxlog) <= 0)
strcpy(newxlog, rlde->d_name);
}
}
}
printf(_("%s closing rlogdir...\n"), progname);
rc = closedir(rldir);
if (rc < 0)
fprintf(stderr, _("%s could not close rlog directory %i\n"), progname,rc);
if (firstfile) {
printf(_("%s no .full rlogs found...\n"), progname);
return false;
}
printf(_("%s found...%s\n"), progname, newxlog);
/* decode xlog back to LogId and SegId, so we can increment */
sprintf(nextlogstr,"00000000");
memcpy(nextlogstr, newxlog, 8);
nextrlogId = strtoul(nextlogstr, &endptr, 16);
if (endptr == nextlogstr || *endptr != '\0')
{
fprintf(stderr, _("%s decode xlog logID error\n"), progname);
exit(1);
}
memcpy(nextlogstr, newxlog+8, 8);
nextrlogSeg = strtoul(nextlogstr, &endptr, 16);
if (endptr == nextlogstr || *endptr != '\0')
{
fprintf(stderr, _("%s decode xlog logSeg error\n"), progname);
exit(1);
}
memcpy(xlog, newxlog, 16);
/* set the rlog to .busy until XLogArchiveComplete is called */
snprintf(rlogfull, MAXPGPATH, "%s/%s.full", rlogdir, xlog);
snprintf(rlogbusy, MAXPGPATH, "%s/%s.busy", rlogdir, xlog);
rc = rename (rlogfull, rlogbusy);
if (rc < 0) {
fprintf(stderr,
_("%s XLogArchiveXLogs could not rename %s to %s\n"),
progname, rlogfull, rlogbusy);
return false;
}
}
else {
snprintf(nextlogstr, 32, "%08X%08X",
nextrlogId, nextrlogSeg);
snprintf(rlogfull, MAXPGPATH, "%s/%s.full",
rlogdir, nextlogstr);
rc = stat (rlogfull, &statbuf);
/* if .full file is not there...that's OK...we wait until it is */
if (rc < 0) {
/* Good error checking required here, otherwise we might loop
forever, slowly! */
printf(_("%s %s not found yet...\n"), progname,nextlogstr);
return false;
}
/* set the xlog that will be archived next */
sprintf(xlog, "%08X%08X", nextrlogId, nextrlogSeg);
/* set the rlog to .busy until XLogArchiveComplete is called */
snprintf(rlogbusy, MAXPGPATH, "%s/%s.busy", rlogdir, xlog);
rc = rename (rlogfull, rlogbusy);
if (rc < 0) {
fprintf(stderr,
_("%s XLogArchiveComplete could not rename %s to %s\n"),
progname, rlogfull, rlogbusy);
return false;
}
}
/* increment onto the next rlog */
NextLogSeg(nextrlogId, nextrlogSeg);
/* we have an xlog to archive...*/
return true;
}
/*
* XLogArchiveComplete
*
* Write notification that an xlog has now been successfully archived
*/
static bool
XLogArchiveComplete(char *xlog, char *rlogdir)
{
/* implementation:
* stat the notification file as xlog filename with .busy suffix
* Rename the notification file to a suffix of .done
*/
char rlogbusy[MAXPGPATH];
char rlogdone[MAXPGPATH];
int rc;
struct stat statbuf;
snprintf(rlogbusy, MAXPGPATH, "%s/%s.busy", rlogdir, xlog);
rc = stat (rlogbusy, &statbuf);
if (rc < 0) {
fprintf(stderr,
_("%s XLogArchiveComplete could not locate %s\n"), progname, rlogbusy);
return false;
}
/*
archive_time_sec = time() - statbuf->st_mtime;
printf("%s archive elapsed time = %n", archive_time_sec);
*/
snprintf(rlogdone, MAXPGPATH, "%s/%s.done", rlogdir, xlog);
rc = rename (rlogbusy, rlogdone);
if (rc < 0) {
fprintf(stderr, _("%s XLogArchiveComplete could not rename %s to %s\n"),
progname, rlogbusy, rlogdone);
return false;
}
return true;
}
/*
* CopyXLogtoArchive
*
* Copy transaction log from the pg_xlog directory of a PostgreSQL instance identified
* by the DATADIR parameter through to an archive destination, ARCHIVEDESTDIR
*
* Should ignore signals during this section, to allow archive to complete
*/
static bool
CopyXLogtoArchive(char *xlog)
{
/* Implementation:
* We open the archive file using O_SYNC to make sure no mistakes
* writing data in buffers equal to the blocksize, so we will
* always have at least a partially consistent set of data to recover from
* ...then we check filesize of written file to ensure we did it right
*/
char xlogpath[MAXPGPATH];
char archpath[MAXPGPATH];
int n, xlogfd, archfd;
char buf[BLCKSZ];
int rc;
struct stat statbuf;
snprintf(xlogpath, MAXPGPATH, "%s/%s", XLogDir, xlog);
printf(_("%s xlogpath= %s\n"), progname, xlogpath);
rc = stat(xlogpath, &statbuf);
if (rc < 0 ) {
fprintf(stderr, _("%s xlog does not exist\n"), progname);
return false;
}
snprintf(xlogpath, MAXPGPATH, "%s/%s", XLogDir, xlog);
xlogfd = open(xlogpath,
O_RDONLY);
if (errno != ENOENT)
{
if (xlogfd == EACCES)
fprintf(stderr, _("%s EACCES\n"), progname);
if (xlogfd < 0)
return false;
}
fprintf(stderr, _("%s xlog file opened\n"), progname);
snprintf(archpath, MAXPGPATH, "%s/%s", ArchiveDestDir, xlog);
archfd = open(archpath, O_RDWR | O_CREAT | O_EXCL | O_SYNC | PG_BINARY,
S_IRUSR | S_IWUSR |
S_IRGRP | S_IWGRP);
if (archfd < 0) {
if (errno == EEXIST) {
fprintf(stderr, _("%s archive file %s already exists in %s\n"), progname, xlog, ArchiveDestDir);
exit(1);
}
return false;
}
fprintf(stderr, _("%s archive file opened\n"), progname);
while ( (n = read( xlogfd, buf, BLCKSZ)) > 0)
if ( write( archfd, buf, n) != n)
return false;
if (n < 0)
return false;
fprintf(stderr, _("%s archive written...\n"), progname);
archfd = close(archfd);
if (archfd < 0)
return false;
/* Should stat the archpath, to check filesize == XLogFileSize */
/* Reset the file date/time on the xlog, to maintain the original
* timing of the xlog final write by PostgreSQL
*/
xlogfd = close(xlogfd);
if (xlogfd < 0)
return false;
return true;
}
int
main(int argc, char *argv[])
{
char xlog[16];
/* Options read in from command line, or defaults */
/* option t */
int ArchiveCheckLoopTime = 3;
/* option n */
bool noarchive = false;
/* option s */
bool noloop = false;
int c;
char *endptr;
setlocale(LC_ALL, "");
#ifdef ENABLE_NLS
bindtextdomain("pg_arch", LOCALEDIR);
textdomain("pg_arch");
#endif
progname = get_progname(argv[0]);
if (argc > 1)
{
if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
{
usage();
exit(0);
}
if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
{
puts("pg_arch (PostgreSQL) " PG_VERSION);
exit(0);
}
}
while ((c = getopt(argc, argv, "snt:")) != -1)
{
switch (c)
{
case 'n':
noarchive = true;
break;
case 's':
noloop = true;
break;
case 't':
ArchiveCheckLoopTime = strtoul(optarg, &endptr, 0);
if (endptr == optarg || *endptr != '\0')
{
fprintf(stderr,
_("%s invalid argument for option -t\n"),
progname);
fprintf(stderr,
_("Try \"%s --help\" for more information.\n"),
progname);
exit(1);
}
if (ArchiveCheckLoopTime < 1)
{
fprintf(stderr,
_("%s wait time (-t) must be > 0\n"),
progname);
exit(1);
}
if (ArchiveCheckLoopTime > 999)
{
fprintf(stderr,
_("%s wait time (-t) must be < 1000\n"),
progname);
exit(1);
}
break;
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
}
}
if (optind == argc)
{
fprintf(stderr, _("%s no archive directory specified\n"), progname);
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
}
ArchiveDestDir = argv[optind++];
if (optind == argc)
DataDir = getenv("PGDATA");
else
DataDir = argv[optind];
if (DataDir == NULL)
{
fprintf(stderr, _("%s no data directory specified\n"), progname);
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
exit(1);
}
snprintf(XLogDir, MAXPGPATH, "%s/pg_xlog", DataDir);
fprintf(stderr, _("%s Archiving transaction logs from %s to %s\n"),
progname, XLogDir, ArchiveDestDir);
snprintf(XLogArchiveDir, MAXPGPATH, "%s/pg_rlog", DataDir);
/* File/Directory Permissions required:
* pg_arch should run as either:
* i) database-owning userid i.e. postgres
* ii) another user in the same group as database-owning userid
*
* Permissions required are:
* XLogDir r
* XLogArchiveDir rw
* ArchiveDestDir w
*
* Security options are:
* i) add XLogArchiveDir under DataDir
* allow access to ArchiveDestDir
* ii) chmod 760 DataDir
* chmod 760 XLogArchiveDir
* chmod 740 XLogDir
*
* Let's test our access rights to these directories now. At this stage
* all of these directories may be empty, or not, without error.
*/
/* check directory XLogDir */
/* check directory ArchiveDestDir
* Directory must NOT have World read rights - security hole
*/
/*
* XLogArchive environment creation & connection to PostgreSQL
*
* Currently, there isn't any. If there was, it would go here
*
*/
/* Main Loop */
do
{
if (XLogArchiveXLogs(xlog, XLogArchiveDir)) {
printf(_("%s archive starting for transaction log %s\n"), progname, xlog);
if (noarchive || (!noarchive && CopyXLogtoArchive(xlog))) {
if (XLogArchiveComplete(xlog, XLogArchiveDir))
fprintf(stderr,
_("%s archive complete for transaction log %s \n\n"),
progname, xlog);
else {
fprintf(stderr,
_("%s XLogArchiveComplete error\n"), progname);
exit(1);
}
} else {
fprintf(stderr,
_("%s archive copy error\n"), progname);
exit(1);
}
/* if we have copied one file, we do not wait:
immediately loop back round and check to see if another is there.
If we're too quick....then we wait
*/
} else
{
printf(_("%s sleeping...\n"), progname);
sleep(ArchiveCheckLoopTime);
printf(_("%s .....awake\n"), progname);
}
} while (!noloop);
printf(_("%s ending\n"), progname);
/*
* XLogArchive disconnection from PostgreSQL & environment tear-down
*
* Currently, there isn't any. If there was, it would go here
*
*/
return 0;
}
static void
usage(void)
{
printf(
_("%s copies PostgreSQL transaction log files to an archive directory.\n\n"),
progname);
printf(_("Usage:\n %s [OPTIONS]... ARCHIVEDESTDIR [DATADIR]\n\n"), progname);
printf(_("Options:\n"));
printf(_(" -t wait time (secs) between checks for xlogs to archive\n"));
printf(_(" -n no archival, just show xlog file names (for testing)\n"));
printf(_(" -s single execution - archive all full xlogs then stop\n"));
printf(_(" --help show this help, then exit\n"));
printf(_(" --version output version information, then exit\n"));
printf(_("\nIf no data directory is specified, the environment variable PGDATA\n"));
printf(_("is used. An archive destination must be specified. Default wait=30 secs.\n"));
printf(_("\nReport bugs to <pgsql-bugs@postgresql.org>.\n"));
}
pg_arch/Makefile 0100754 0000764 0000764 00000001262 10025677145 013522 0 ustar sriggs sriggs #-------------------------------------------------------------------------
#
# Makefile for src/bin/pg_arch
#
# Copyright (c) 2004, PostgreSQL Global Development Group
#
#-------------------------------------------------------------------------
subdir = src/bin/pg_arch
top_builddir = ../../..
include $(top_builddir)/src/Makefile.global
OBJS= pg_arch.o
all: pg_arch
pg_arch: $(OBJS)
$(CC) $(CFLAGS) $^ $(LDFLAGS) $(LIBS) -o $@
install: all installdirs
$(INSTALL_PROGRAM) pg_arch$(X) $(DESTDIR)$(bindir)/pg_arch$(X)
installdirs:
$(mkinstalldirs) $(DESTDIR)$(bindir)
uninstall:
rm -f $(DESTDIR)$(bindir)/pg_arch$(X)
clean distclean maintainer-clean:
rm -f pg_arch$(X) pg_arch.o
On Mon, 2004-04-26 at 18:08, Bruce Momjian wrote:
Simon Riggs wrote:
Well, I guess I was fairly happy too :-)
YES!
I'd be more comfortable if I'd found more bugs though, but I'm sure the
kind folk on this list will see that wish of mine comes true!The code is in a "needs more polishing" state - which is just the right
time for some last discussions before everything sets too solid.Once we see the patch, we will be able to eyeball all the code paths and
interface to existing code and will be able to spot a lot of stuff, I am
sure.It might take a few passes over it but you will get all the support and
ideas we have.
Thanks very much.
Code will be there in full tomorrow now (oh it is tomorrow...)
Fixed the bugs that I spoke of earlier though. They all make sense when
you try to tell someone else about them...
Best Regards, Simon
Simon Riggs wrote:
New utility aimed at being located in src/bin/pg_arch
Why isn't the archiver process integrated into the server?
Peter Eisentraut wrote:
Simon Riggs wrote:
New utility aimed at being located in src/bin/pg_arch
Why isn't the archiver process integrated into the server?
I think it is because the archiver process has to be started/stopped
independently of the server.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, 2004-04-27 at 18:10, Peter Eisentraut wrote:
Simon Riggs wrote:
New utility aimed at being located in src/bin/pg_arch
Why isn't the archiver process integrated into the server?
Number of reasons....
Overall, I initially favoured the archiver as another special backend,
like checkpoint. That is exactly the same architecture as Oracle uses,
so is a good starting place for thought.
We discussed the design in detail on the list and the suggestion was
made to implement PITR using an API to send notification to an archiver.
In Oracle7, it was considered OK to just dump the files in some
directory and call them archived. Later, most DBMSs have gone to some
trouble to integrate with generic or at least market leading backup and
recovery (BAR) software products. Informix and DB2 provide open
interfaces to BARs; Oracle does not, but then it figures it already
(had) market share, so we'll just do it our way.
The XLogArchive design allows ANY external archiver to work with
PostgreSQL. The pg_arch program supplied is really to show how that
might be implemented. This leaves the door open for any BAR product to
interface through to PostgreSQL, whether this be your favourite open
source BAR or the leading proprietary vendors.
Wide adoption is an important design feature and the design presented
offers this.
The other reason is to do with how and when archival takes place. An
asynchronous communication mechanism is required between PostgreSQL and
the archiver, to allow for such situations as tape mounts or simple
failure of the archiver. The method chosen for implementing this
asynchronous comms mechanism lends itself to being an external API -
there were other designs but these were limited to internal use only.
You ask a reasonable question however. If pg_autovacuum exists, why
should pg_autoarch not work also? My own thinking about external
connectivity may have overshadowed my thinking there.
It would not require too much additional work to add another GUC which
gives the name of the external archiver to confirm execution of, or
start/restart if it fails. At this point, such a feature is a nice to
have in comparison with the goal of being able to recover to a PIT, so I
will defer this issue to Phase 3....
Best regards, Simon Riggs
Am Tuesday 27 April 2004 22:21 schrieb Simon Riggs:
Why isn't the archiver process integrated into the server?
You ask a reasonable question however. If pg_autovacuum exists, why
should pg_autoarch not work also?
pg_autovacuum is going away to be integrated as a backend process.
Am Tuesday 27 April 2004 19:59 schrieb Bruce Momjian:
Peter Eisentraut wrote:
Simon Riggs wrote:
New utility aimed at being located in src/bin/pg_arch
Why isn't the archiver process integrated into the server?
I think it is because the archiver process has to be started/stopped
independently of the server.
When the server is not running there is nothing to archive, so I don't follow
this argument.
Am Monday 26 April 2004 23:11 schrieb Simon Riggs:
ii) encourage (i.e. force) people using XLogArchive API to run as the
PostgreSQL owning-user (postgres).
I think this is perfectly reasonable.
On Wed, 2004-04-28 at 16:14, Peter Eisentraut wrote:
Am Tuesday 27 April 2004 19:59 schrieb Bruce Momjian:
Peter Eisentraut wrote:
Simon Riggs wrote:
New utility aimed at being located in src/bin/pg_arch
Why isn't the archiver process integrated into the server?
I think it is because the archiver process has to be started/stopped
independently of the server.When the server is not running there is nothing to archive, so I don't follow
this argument.
The running server creates xlogs, which are still available for archive
even when the server is not running...
Overall, your point is taken, with many additional comments in my other
posts in reply to you.
I accept that this may be desirable in the future, for some simple
implementations. The pg_autovacuum evolution path is a good model - if
it works and the code is stable, bring it under the postmaster at a
later time.
Best Regards, Simon Riggs
Simon Riggs wrote:
When the server is not running there is nothing to archive, so I don't follow
this argument.The running server creates xlogs, which are still available for archive
even when the server is not running...Overall, your point is taken, with many additional comments in my other
posts in reply to you.I accept that this may be desirable in the future, for some simple
implementations. The pg_autovacuum evolution path is a good model - if
it works and the code is stable, bring it under the postmaster at a
later time.
[ This email isn't focused because I haven't resolved all my ideas yet.]
OK, I looked over the code. Basically it appears pg_arch is a
client-side program that copies files from pg_xlog to a specified
directory, and marks completion in a new pg_rlog directory.
The driving part of the program seems to be:
while ( (n = read( xlogfd, buf, BLCKSZ)) > 0)
if ( write( archfd, buf, n) != n)
return false;
The program basically sleeps and when it awakes checks to see if new WAL
files have been created.
There is some additional GUC variable to prevent WAL from being recycled
until it has been archived, but the posted patch only had pg_arch.c, its
Makefile, and a patch to update bin/Makefile.
Simon (the submitter) specified he was providing an API to archive, but
it is really just a set of C routines to call that do copies. It is not
a wire protocol or anything like that.
The program has a mode where it archives all available wal files and
exits, but by default it has to remain running to continue archiving.
I am wondering if this is the way to approach the situation. I
apologize for not considering this earlier. Archives of PITR postings
of interest are at:
http://momjian.postgresql.org/cgi-bin/pgtodo?pitr
It seems the backend is the one who knows right away when a new WAL file
has been created and needs to be archived.
Also, are folks happy with archiving only full WAL files? This will not
restore all transactions up to the point of failure, but might lose
perhaps 2-5 minutes of transactions before the failure.
Also, a client application is a separate process that must remain
running. With Informix, there is a separate utility to do PITR logging.
It is a pain to have to make sure a separate process is always running.
Here is an idea. What if we add two GUC settings:
pitr = true/false;
pitr_path = 'filename or |program';
In this way, you would basically specify your path to dump all WAL logs
into (just keep appending 16MB chunks) or call a program that you pipe
all the WAL logs into.
You can't change pitr_path while pitr is on. Each backend opens the
filename in append mode before writing. One problem is that this slows
down the backend because it has to do the write, and it might be slow.
We also need the ability to write to a tape drive, and you can't
open/close those like a file. Different backends will be doing the WAL
file additions, there isn't a central process to keep a tape drive file
descriptor open.
Seems pg_arch should at least use libpq to connect to a database and do
a LISTEN and have the backend NOTIFY when they create a new WAL file or
something. Polling for new WAL files seems non-optimal, but maybe a
database connection is overkill.
Then, you start the backend, specify the path, turn on pitr, do the tar,
and you are on your way.
Also, pg_arch should only be run the the install user. No need to allow
other users to run this.
Another idea is to have a client program like pg_ctl that controls PITR
logging (start, stop, location), but does its job and exits, rather than
remains running.
I apologies for not bringing up these issues earlier. I didn't realize
the direction it was going. I wasn't focused on it. Sorry.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Thu, Apr 29, 2004 at 12:18:38AM -0400, Bruce Momjian wrote:
OK, I looked over the code. Basically it appears pg_arch is a
client-side program that copies files from pg_xlog to a specified
directory, and marks completion in a new pg_rlog directory.The driving part of the program seems to be:
while ( (n = read( xlogfd, buf, BLCKSZ)) > 0)
if ( write( archfd, buf, n) != n)
return false;The program basically sleeps and when it awakes checks to see if new WAL
files have been created.
Is the API able to indicate a written but not-yet-filled WAL segment?
So an archiver could copy the filled part, and refill it later. This
may be needed because a segment could take a while to be filled.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Hoy es el primer d�a del resto de mi vida"
Alvaro Herrera wrote:
On Thu, Apr 29, 2004 at 12:18:38AM -0400, Bruce Momjian wrote:
OK, I looked over the code. Basically it appears pg_arch is a
client-side program that copies files from pg_xlog to a specified
directory, and marks completion in a new pg_rlog directory.The driving part of the program seems to be:
while ( (n = read( xlogfd, buf, BLCKSZ)) > 0)
if ( write( archfd, buf, n) != n)
return false;The program basically sleeps and when it awakes checks to see if new WAL
files have been created.Is the API able to indicate a written but not-yet-filled WAL segment?
So an archiver could copy the filled part, and refill it later. This
may be needed because a segment could take a while to be filled.
I couldn't figure that out, but I don't think it does. It would have to
lock the WAL writes so it could get a good copy, I think, and I didn't
see that.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Thu, Apr 29, 2004 at 10:07:01AM -0400, Bruce Momjian wrote:
Alvaro Herrera wrote:
Is the API able to indicate a written but not-yet-filled WAL segment?
So an archiver could copy the filled part, and refill it later. This
may be needed because a segment could take a while to be filled.I couldn't figure that out, but I don't think it does. It would have to
lock the WAL writes so it could get a good copy, I think, and I didn't
see that.
I'm not sure but I don't think so. You don't have to lock the WAL for
writing, because it will always write later in the file than you are
allowed to read. (If you read more than you were told to, it's your
fault as an archiver.)
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Et put se mouve" (Galileo Galilei)
Alvaro Herrera wrote:
On Thu, Apr 29, 2004 at 10:07:01AM -0400, Bruce Momjian wrote:
Alvaro Herrera wrote:
Is the API able to indicate a written but not-yet-filled WAL segment?
So an archiver could copy the filled part, and refill it later. This
may be needed because a segment could take a while to be filled.I couldn't figure that out, but I don't think it does. It would have to
lock the WAL writes so it could get a good copy, I think, and I didn't
see that.I'm not sure but I don't think so. You don't have to lock the WAL for
writing, because it will always write later in the file than you are
allowed to read. (If you read more than you were told to, it's your
fault as an archiver.)
My point was that without locking the WAL, we might get part of a WAL
write in our file, but I now realize that during a crash the same thing
might happen, so it would be OK to just copy it even if it is being
written to.
Simon posted the rest of his patch that shows changes to the backend,
and a comment reads:
+ * The name of the notification file is the message that will be picked up
+ * by the archiver, e.g. we write RLogDir/00000001000000C6.full
+ * and the archiver then knows to archive XLOgDir/00000001000000C6,
+ * while it is doing so it will rename RLogDir/00000001000000C6.full
+ * to RLogDir/00000001000000C6.busy, then when complete, rename it again
+ * to RLogDir/00000001000000C6.done
so it is only archiving full logs.
Also, I think this archiver should be able to log to a local drive,
network drive (trivial), tape drive, ftp, or use an external script to
transfer the logs somewhere. (ftp would probably be an external script
with 'expect').
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Thu, 2004-04-29 at 15:22, Bruce Momjian wrote:
Alvaro Herrera wrote:
On Thu, Apr 29, 2004 at 10:07:01AM -0400, Bruce Momjian wrote:
Alvaro Herrera wrote:
Is the API able to indicate a written but not-yet-filled WAL segment?
So an archiver could copy the filled part, and refill it later. This
may be needed because a segment could take a while to be filled.I couldn't figure that out, but I don't think it does. It would have to
lock the WAL writes so it could get a good copy, I think, and I didn't
see that.I'm not sure but I don't think so. You don't have to lock the WAL for
writing, because it will always write later in the file than you are
allowed to read. (If you read more than you were told to, it's your
fault as an archiver.)My point was that without locking the WAL, we might get part of a WAL
write in our file, but I now realize that during a crash the same thing
might happen, so it would be OK to just copy it even if it is being
written to.Simon posted the rest of his patch that shows changes to the backend,
and a comment reads:+ * The name of the notification file is the message that will be picked up + * by the archiver, e.g. we write RLogDir/00000001000000C6.full + * and the archiver then knows to archive XLOgDir/00000001000000C6, + * while it is doing so it will rename RLogDir/00000001000000C6.full + * to RLogDir/00000001000000C6.busy, then when complete, rename it again + * to RLogDir/00000001000000C6.doneso it is only archiving full logs.
Also, I think this archiver should be able to log to a local drive,
network drive (trivial), tape drive, ftp, or use an external script to
transfer the logs somewhere. (ftp would probably be an external script
with 'expect').
Bruce is correct, the API waits for the archive to be full before
archiving.
I had thought about the case for partial archiving: basically, if you
want to archive in smaller chunks, make your log files smaller...this is
now a compile time option. Possibly there is an argument to make the
xlog file size configurable, as a way of doing what you suggest.
Taking multiple copies of the same file, yet trying to work out which
one to apply sounds complex and error prone to me. It also increases the
cost of the archival process and thus drains other resources.
The archiver should be able to do a whole range of things. Basically,
that point was discussed and the agreed approach was to provide an API
that would allow anybody and everybody to write whatever they wanted.
The design included pg_arch since it was clear that there would be a
requirement in the basic product to have those facilities - and in any
case any practically focused API has a reference port as a way of
showing how to use it and exposing any bugs in the server side
implementation.
The point is...everybody is now empowered to write tape drive code,
whatever you fancy.... go do.
Best regards, Simon Riggs
Simon Riggs wrote:
Also, I think this archiver should be able to log to a local drive,
network drive (trivial), tape drive, ftp, or use an external script to
transfer the logs somewhere. (ftp would probably be an external script
with 'expect').Bruce is correct, the API waits for the archive to be full before
archiving.I had thought about the case for partial archiving: basically, if you
want to archive in smaller chunks, make your log files smaller...this is
now a compile time option. Possibly there is an argument to make the
xlog file size configurable, as a way of doing what you suggest.Taking multiple copies of the same file, yet trying to work out which
one to apply sounds complex and error prone to me. It also increases the
cost of the archival process and thus drains other resources.The archiver should be able to do a whole range of things. Basically,
that point was discussed and the agreed approach was to provide an API
that would allow anybody and everybody to write whatever they wanted.
The design included pg_arch since it was clear that there would be a
requirement in the basic product to have those facilities - and in any
case any practically focused API has a reference port as a way of
showing how to use it and exposing any bugs in the server side
implementation.The point is...everybody is now empowered to write tape drive code,
whatever you fancy.... go do.
Agreed we want to allow the superuser control over writing of the
archive logs. The question is how do they get access to that. Is it by
running a client program continuously or calling an interface script
from the backend?
My point was that having the backend call the program has improved
reliablity and control over when to write, and easier administration.
How are people going to run pg_arch? Via nohup? In virtual screens? If
I am at the console and I want to start it, do I use "&"? If I want to
stop it, do I do a 'ps' and issue a 'kill'? This doesn't seem like a
good user interface to me.
To me the problem isn't pg_arch itself but the idea that a client
program is going to be independently finding(polling) and copying of the
archive logs.
I am thinking the client program is called with two arguments, the xlog
file name, and the arch location defined in GUC. Then the client
program does the write. The problem there though is who gets the write
error since the backend will not wait around for completion?
Another case is server start/stop. You want to start/stop the archive
logger to match the database server, particularly if you reboot the
server. I know Informix used a client program for logging, and it was a
pain to administer.
I would be happy with an exteral program if it was started/stoped by the
postmaster (or via GUC change) and received a signal when a WAL file was
written. But if we do that, it isn't really an external program anymore
but another child process like our stats collector.
I am willing to work on this if folks think this is a better approach.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Thu, Apr 29, 2004 at 07:34:47PM +0100, Simon Riggs wrote:
Bruce is correct, the API waits for the archive to be full before
archiving.I had thought about the case for partial archiving: basically, if you
want to archive in smaller chunks, make your log files smaller...this is
now a compile time option. Possibly there is an argument to make the
xlog file size configurable, as a way of doing what you suggest.Taking multiple copies of the same file, yet trying to work out which
one to apply sounds complex and error prone to me. It also increases the
cost of the archival process and thus drains other resources.
My idea was basically that the archiver could be told "I've finished
writing XLog segment 1 until byte 9000", so the archiver would
dd if=xlog-1 seek=0 skip=0 bs=1c count=9000c of=archive-1
And later, it would get a notification "segment 1 until byte 18000" he does
dd if=xlog-1 seek=0 skip=0 bs=1c count=18000c of=archive-1
Or, if it's smart enough,
dd if=xlog-1 seek=9000c skip=9000c bs=1c count=9000c of=archive-1
Basically it is updating the logs as soon as it receives the
notifications. Writing 16 MB of xlogs could take some time.
When a full xlog segment has been written, a different kind of
notification can be issued. A dumb archiver could just ignore the
incremental ones and copy the files only upon receiving this other kind.
I think that if log files are too small, maybe it will be a waste of
resources (which ones?). Anyway, it's just an idea.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
On Thu, 2004-04-29 at 20:24, Bruce Momjian wrote:
Simon Riggs wrote:
The archiver should be able to do a whole range of things. Basically,
that point was discussed and the agreed approach was to provide an API
that would allow anybody and everybody to write whatever they wanted.
The design included pg_arch since it was clear that there would be a
requirement in the basic product to have those facilities - and in any
case any practically focused API has a reference port as a way of
showing how to use it and exposing any bugs in the server side
implementation.The point is...everybody is now empowered to write tape drive code,
whatever you fancy.... go do.Agreed we want to allow the superuser control over writing of the
archive logs. The question is how do they get access to that. Is it by
running a client program continuously or calling an interface script
from the backend?My point was that having the backend call the program has improved
reliablity and control over when to write, and easier administration.
Agreed. We've both suggested ways that can occur, though I suggest this
is much less of a priority, for now. Not "no", just not "now".
How are people going to run pg_arch? Via nohup? In virtual screens? If
I am at the console and I want to start it, do I use "&"? If I want to
stop it, do I do a 'ps' and issue a 'kill'? This doesn't seem like a
good user interface to me.To me the problem isn't pg_arch itself but the idea that a client
program is going to be independently finding(polling) and copying of the
archive logs.I am thinking the client program is called with two arguments, the xlog
file name, and the arch location defined in GUC. Then the client
program does the write. The problem there though is who gets the write
error since the backend will not wait around for completion?Another case is server start/stop. You want to start/stop the archive
logger to match the database server, particularly if you reboot the
server. I know Informix used a client program for logging, and it was a
pain to administer.
pg_arch is just icing on top of the API. The API is the real deal here.
I'm not bothered if pg_arch is not accepted, as long as we can adopt the
API. As noted previously, my original mind was to split the API away
from the pg_arch application to make it clearer what was what. Once that
has been done, I encourage others to improve pg_arch - but also to use
the API to interface with other BAR prodiucts.
If you're using PostgreSQL for serious business then you will be using a
serious BAR product as well. There are many FOSS alternatives...
The API's purpose is to allow larger, pre-existing BAR products to know
when and how to retrieve data from PostgreSQL. Those products don't and
won't run underneath postmaster, so although I agree with Peter's
original train of thought, I also agree with Tom's suggestion that we
need an API more than we need an archiver process.
I would be happy with an exteral program if it was started/stoped by the
postmaster (or via GUC change) and received a signal when a WAL file was
written.
That is exactly what has been written.
The PostgreSQL side of the API is written directly into the backend, in
xlog.c and is therefore activated by postmaster controlled code. That
then sends "a signal" to the process that will do the archiving - the
Archiver side of the XLogArchive API has it as an in-process library.
(The "signal" is, in fact, a zero-length file written to disk because
there are many reasons why an external archiver may not be ready to
archive or even up and running to receive a signal).
The only difference is that there is some confusion as to the role and
importance of pg_arch.
Best Regards, Simon Riggs
On Thu, 2004-04-29 at 20:24, Bruce Momjian wrote:
I am willing to work on this...
There is much work still to be done to make PITR work..accepting all of
the many comments made.
If anybody wants this by 1 June, I think we'd better look sharp. My aim
has been to knock one of the URGENT items on the TODO list into touch,
however that was to be achieved.
The following work remains...from all that has been said...
- halt restore at particular condition (point in time, txnid etc)
- archive policy to control whether to halt database should archiving
fail and space run out (as Oracle, Db2 do), or not (as discussed)
- cope with restoring a stream of logs larger than the disk space on the
restoration target system
- integrate restore with tablespace code, to allow tablespace backups
- build XLogSpy mechanism to allow DBA to better know when to recover to
- extend logging mechanism to allow recovery time prediction
- publicise the API with BAR open source teams, to get feedback and to
encourage them to use the API to allow PostgreSQL support for their BAR
- use the API to build interfaces to the 100+ BAR products on the market
- performance tuning of xlogs, to ensure minimum xlog volume written
- performance tuning of recovery, to ensure wasted effort avoided
- allow archiver utility to be managed by postmaster
- write some good documentation
- comprehensive crash testing
- really comprehensive crash testing
- very comprehensive crash testing
It seems worth working on things in some kind of priority order.
I claim these, by the way, but many others look important and
interesting to me:
- halt restore at particular condition (point in time, txnid etc)
- cope with restoring a stream of logs larger than the disk space on the
restoration target system
- write some good documentation
Best Regards, Simon Riggs
Simon Riggs wrote:
Agreed we want to allow the superuser control over writing of the
archive logs. The question is how do they get access to that. Is it by
running a client program continuously or calling an interface script
from the backend?My point was that having the backend call the program has improved
reliablity and control over when to write, and easier administration.Agreed. We've both suggested ways that can occur, though I suggest this
is much less of a priority, for now. Not "no", just not "now".Another case is server start/stop. You want to start/stop the archive
logger to match the database server, particularly if you reboot the
server. I know Informix used a client program for logging, and it was a
pain to administer.pg_arch is just icing on top of the API. The API is the real deal here.
I'm not bothered if pg_arch is not accepted, as long as we can adopt the
API. As noted previously, my original mind was to split the API away
from the pg_arch application to make it clearer what was what. Once that
has been done, I encourage others to improve pg_arch - but also to use
the API to interface with other BAR prodiucts.If you're using PostgreSQL for serious business then you will be using a
serious BAR product as well. There are many FOSS alternatives...The API's purpose is to allow larger, pre-existing BAR products to know
when and how to retrieve data from PostgreSQL. Those products don't and
won't run underneath postmaster, so although I agree with Peter's
original train of thought, I also agree with Tom's suggestion that we
need an API more than we need an archiver process.I would be happy with an exteral program if it was started/stoped by the
postmaster (or via GUC change) and received a signal when a WAL file was
written.That is exactly what has been written.
The PostgreSQL side of the API is written directly into the backend, in
xlog.c and is therefore activated by postmaster controlled code. That
then sends "a signal" to the process that will do the archiving - the
Archiver side of the XLogArchive API has it as an in-process library.
(The "signal" is, in fact, a zero-length file written to disk because
there are many reasons why an external archiver may not be ready to
archive or even up and running to receive a signal).The only difference is that there is some confusion as to the role and
importance of pg_arch.
OK, I have finalized my thinking on this.
We both agree that a pg_arch client-side program certainly works for
PITR logging. The big question in my mind is whether a client-side
program is what we want to use long-term, and whether we want to release
a 7.5 that uses it and then change it in 7.6 to something more
integrated into the backend.
Let me add this is a little different from pg_autovacuum. With that,
you could put it in cron and be done with it. With pg_arch, there is a
routine that has to be used to do PITR, and if we change the process in
7.6, I am afraid there will be confusion.
Let me also add that I am not terribly worried about having the feature
to restore to an arbitrary point in time for 7.5. I would much rather
have a good PITR solution that works cleanly in 7.5 and add it to 7.6,
than to have retore to an arbitrary point but have a strained
implementation that we have to revisit for 7.6.
Here are my ideas. (I talked to Tom about this and am including his
ideas too.) Basically, the archiver that scans the xlog directory to
identify files to be archived should be a subprocess of the postmaster.
You already have that code and it can be moved into the backend.
Here is my implementation idea. First, your pg_arch code runs in the
backend and is started just like the statistics process. It has to be
started whether PITR is being used or not, but will be inactive if PITR
isn't enabled. This must be done because we can't have a backend start
this process later in case they turn on PITR after server start.
The process id of the archive process is stored in shared memory. When
PITR is turned on, each backend that complete a WAL file sends a signal
to the archiver process. The archiver wakes up on the signal and scans
the directory, finds files that need archiving, and either does a 'cp'
or runs a user-defined program (like scp) to transfer the file to the
archive location.
In GUC we add:
pitr = true/false
pitr_location = 'directory, user@host:/dir, etc'
pitr_transfer = 'cp, scp, etc'
The archiver program updates its config values when someone changes
these values via postgresql.conf (and uses pg_ctl reload). These can
only be modified from postgresql.conf. Changing them via SET has to be
disabled because they are cluster-level settings, not per session, like
port number or checkpoint_segments.
Basically, I think that we need to push user-level control of this
process down beyond the directory scanning code (that is pretty
standard), and allow them to call an arbitrary program to transfer the
logs. My idea is that the pitr_transfer program will get $1=WAL file
name and $2=pitr_location and the program can use those arguments to do
the transfer. We can even put a pitr_transfer.sample program in share
and document $1 and $2.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Fri, 2004-04-30 at 04:02, Bruce Momjian wrote:
Let me also add that I am not terribly worried about having the feature
to restore to an arbitrary point in time for 7.5. I would much rather
have a good PITR solution that works cleanly in 7.5 and add it to 7.6,
than to have retore to an arbitrary point but have a strained
implementation that we have to revisit for 7.6.
Interesting thought, I see now your priorities.
Will read and digest over next few days.
Thanks for your help and attention,
Best regards, Simon Riggs
Basically it is updating the logs as soon as it receives the
notifications. Writing 16 MB of xlogs could take some time.
In my experience with archiving logs, 16 Mb is on the contrary way too
small for a single log. The overhead of starting e.g. a tape session
is so high that you cannot keep up (a few seconds). Once the tape is
streaming it is usually quite fast. So imho it is not really practical to
have logs so small that they can fill in less that 20 seconds.
Andreas
Import Notes
Resolved by subject fallback
On Fri, 2004-04-30 at 04:02, Bruce Momjian wrote:
Simon Riggs wrote:
Agreed we want to allow the superuser control over writing of the
archive logs. The question is how do they get access to that. Is it by
running a client program continuously or calling an interface script
from the backend?My point was that having the backend call the program has improved
reliablity and control over when to write, and easier administration.Agreed. We've both suggested ways that can occur, though I suggest this
is much less of a priority, for now. Not "no", just not "now".Another case is server start/stop. You want to start/stop the archive
logger to match the database server, particularly if you reboot the
server. I know Informix used a client program for logging, and it was a
pain to administer.pg_arch is just icing on top of the API. The API is the real deal here.
I'm not bothered if pg_arch is not accepted, as long as we can adopt the
API. As noted previously, my original mind was to split the API away
from the pg_arch application to make it clearer what was what. Once that
has been done, I encourage others to improve pg_arch - but also to use
the API to interface with other BAR prodiucts.If you're using PostgreSQL for serious business then you will be using a
serious BAR product as well. There are many FOSS alternatives...The API's purpose is to allow larger, pre-existing BAR products to know
when and how to retrieve data from PostgreSQL. Those products don't and
won't run underneath postmaster, so although I agree with Peter's
original train of thought, I also agree with Tom's suggestion that we
need an API more than we need an archiver process.I would be happy with an exteral program if it was started/stoped by the
postmaster (or via GUC change) and received a signal when a WAL file was
written.That is exactly what has been written.
The PostgreSQL side of the API is written directly into the backend, in
xlog.c and is therefore activated by postmaster controlled code. That
then sends "a signal" to the process that will do the archiving - the
Archiver side of the XLogArchive API has it as an in-process library.
(The "signal" is, in fact, a zero-length file written to disk because
there are many reasons why an external archiver may not be ready to
archive or even up and running to receive a signal).The only difference is that there is some confusion as to the role and
importance of pg_arch.OK, I have finalized my thinking on this.
We both agree that a pg_arch client-side program certainly works for
PITR logging. The big question in my mind is whether a client-side
program is what we want to use long-term, and whether we want to release
a 7.5 that uses it and then change it in 7.6 to something more
integrated into the backend.Let me add this is a little different from pg_autovacuum. With that,
you could put it in cron and be done with it. With pg_arch, there is a
routine that has to be used to do PITR, and if we change the process in
7.6, I am afraid there will be confusion.Let me also add that I am not terribly worried about having the feature
to restore to an arbitrary point in time for 7.5. I would much rather
have a good PITR solution that works cleanly in 7.5 and add it to 7.6,
than to have retore to an arbitrary point but have a strained
implementation that we have to revisit for 7.6.Here are my ideas. (I talked to Tom about this and am including his
ideas too.) Basically, the archiver that scans the xlog directory to
identify files to be archived should be a subprocess of the postmaster.
You already have that code and it can be moved into the backend.Here is my implementation idea. First, your pg_arch code runs in the
backend and is started just like the statistics process. It has to be
started whether PITR is being used or not, but will be inactive if PITR
isn't enabled. This must be done because we can't have a backend start
this process later in case they turn on PITR after server start.The process id of the archive process is stored in shared memory. When
PITR is turned on, each backend that complete a WAL file sends a signal
to the archiver process. The archiver wakes up on the signal and scans
the directory, finds files that need archiving, and either does a 'cp'
or runs a user-defined program (like scp) to transfer the file to the
archive location.In GUC we add:
pitr = true/false
pitr_location = 'directory, user@host:/dir, etc'
pitr_transfer = 'cp, scp, etc'The archiver program updates its config values when someone changes
these values via postgresql.conf (and uses pg_ctl reload). These can
only be modified from postgresql.conf. Changing them via SET has to be
disabled because they are cluster-level settings, not per session, like
port number or checkpoint_segments.Basically, I think that we need to push user-level control of this
process down beyond the directory scanning code (that is pretty
standard), and allow them to call an arbitrary program to transfer the
logs. My idea is that the pitr_transfer program will get $1=WAL file
name and $2=pitr_location and the program can use those arguments to do
the transfer. We can even put a pitr_transfer.sample program in share
and document $1 and $2.
...Bruce and I have just discussed this in some detail and reached a
good understanding of the design proposals as a whole. It looks like all
of this can happen in the next few weeks, with a worst case time
estimate of mid-June. TGFT!
I'll write this up and post this shortly, with a rough roadmap for
further development of recovery-related features.
Best Regards,
Simon Riggs
2nd Quadrant
Further design plans for PITR...as posted previously, Bruce and I had a
long discussion recently to iron out the major thinking and a good deal
of the detail also.
In overview, major change is introducing an ARCHIVE process running
under control of the Postmaster, similar to Stats collector.
Due to personal commitments in latter May, early June, these changes
will not be complete until mid/late June. Best I can do...
Including time required for the fair amount of documentation required
for code to be usefully tested during beta. The good news is there is
little speculation in this design now, it is just hanging the code in
the right place - about half the code is waiting to be remerged into
this latest design.
I'll submit the code in pieces as well, so we can view progress, whether
or not those are incrementally committed.
Committers & all others interested: pls check this out and make any
comments or questions now...time for rework is now slipping fast.
Best Regards, Simon Riggs, 2nd Quadrant
...detail chatter follows
On Thu, 2004-05-06 at 05:38, Bruce Momjian wrote:
Simon Riggs wrote:
Bruce, was this OK with you...shall I post?
Some items occurred to me during write up...are you OK with those? Do
you want to alter anything before I post?Looks good with a few adjustments:
Some additions and backtracks...
These choices should be offered as a single GUC, with mutually
exclusive values of
- CIRCULAR (named same as DB2 to illustrate that some xlogging does take
place, just not archive logging)
- ARCHIVE
- EXTERNALIt would be nice to allow the external program to work if you specfied
the program as '', but external isn't the same as running no program
because the external program will also do the flag file removal once it
is archived. I am a little worried about adding an external capability
when we don't have anyone ready to actually show someone wanting such an
external program. Not sure how to handle that -- add it in 7.5 and
see, or go with a boolean and see if we can get an external thing
working for 7.6.
OK, EXTERNAL will not be included in the 7.5 drop; I'm not certain it is
necessary now because of other changes in the design (below).
We always spawn an ARCHIVER process under postmaster, no matter what the
setting of the main GUC. That way, it can be started up if required.
Archive process id is stored in shared memory (or on disk as
postmaster?)I think shared memory, but I am not positive. I think shared memory
because the postmaster could potentially have to stop/restart it. I
will have to look at how the stats process is done.
Looks to me like this would have to be a disk file, e.g. archiver.pid
but I'll isolate that piece of code in case someone has a bright idea.
The archiver program updates its config values when someone changes
these values via postgresql.conf (and uses pg_ctl reload). These can
only be modified from postgresql.conf.This would be PGC_SIGHUP. However, we need to make sure the archiver
sees those changes like the backends see such changes now.
Agreed.
Basically, I think that we need to push user-level control of this
process down beyond the directory scanning code (that is pretty
standard), and allow them to call an arbitrary program to transfer the
logs. My idea is that the pitr_transfer program will get $1=WAL file
name and $2=pitr_location and the program can use those arguments to do
the transfer. We can even put a pitr_transfer.sample program in share
and document $1 and $2.
Agreed.
- initdb needs to be altered to add the pg_rlog directory
Should we put the rlog directory as subdirectory of xlog? Seems so.
Agreed.
- code also required to note when xlog file switches occur during
extended recovery across a number of xlog files
...was accepted
- didn't discuss when we test for archive_dest and what happens then. We
know Informix, DB2 and Oracle all freeze if archive_dest is not
available. That's not an option at the moment...for the future. Right
now we can choose to either PANIC, ERROR or WARNING and so need a
GUC-specified policy to control that behaviour. (Suggest naming options
SHUTDOWN(=PANIC) or WARNING)Yep, we can allow the admin to specify what happens if we can't archive.
Summary of additional GUCs required (names not discussed...still open!)
SUSET...NOT SET!
- wal_archive_mode = CIRCULAR (default) | ARCHIVE | EXTERNAL
- archive_dest = 'directory, user@host:/dir, etc' (no default)
- archive_program = 'cp, scp, etc' (no default, or scp?)
- wal_archive_error_policy = WARNING (default) | SHUTDOWNI would remove the wal_ part because though it is implemented via WAL,
the actually process is archiving. WAL is just an implementation
detail.
So, in summary, we have 5 GUCs, all PGC_SIGHUP
- archive_mode = CIRCULAR (default) (==off)| ARCHIVE (==on)
- archive_dest = 'directory, user@host:/dir, etc' (no default)
- archive_program = 'cp, scp, etc' (no default)
- archive_error_policy = WARNING (default) | SHUTDOWN
- archive_debug
- The GUC for recovery target maybe should be a postmaster command
line
switch? That way we wouldn't need to edit postgresql.conf before
recovery and we also wouldn't need to give it a name...I like centralizing it all in GUC. Command-line parameters are pretty
hard to specify for one-time usage like this. However, if you set it
via GUC, and you don't modify the value and restart the postmaster, is
it going to honor that old xid. That would be a strange problem. I
guess we could fail to start if we don't find the specified xid in the
wal files.
Postmaster startup only, applies only if enters recovery
- recovery_target = 12345262 (default is NOT SET)recovery_xid?
Does it stop before that xid or after that xid?
Recovery target supplied at recovery-time start of postmaster cannot
easily be supplied as a GUC or Postmaster startup switch. Suggestion is
to test for a file called:
pgrecovery.conf
which has something in it like this
ROLLFORWARD UNTIL TRANSACTIONID 0x2343D4 INCLUSIVE;
That looks over-cooked, but I'll make it simple (believe me!)
After recovery completes, the file is renamed to:
pgrecovery.done
This then avoids complications with interactions of crash recovery and
rollforward recovery. If we crash during recovery, it will restart
cleanly and continue. Once recovery completes, if we then crash, we
don't go back into rollforward recovery (unless we want to), which would
not be the case if we put a GUC in the postgresql.conf file directly
because we would need to re-edit it and send out a SIGHUP via pg_ctl
reload - which is guaranteed not to happen under stress at 4am.
No changes to postgresql.conf are required.
[No capability, for now, to rollforward when logspace > available disk,
but that can be a later addition]
ARCHIVER architecture very similar to Stats Collector. Startup just
before Stats collector, postmaster will restart. I'll put all the code
in one place like we have with stats collector.
At startup, ARCHIVER will test archive capability: We write a test file
to xlog directory called [pgarch_startup_$pid_$date] to xlog, then
execute the command once using that name as a parameter, which should
then copy file to archive location using the archive_program command. At
startup, failure of the archive_program will be a PANIC condition,
whereas once started, PostgreSQL will act according to
archive_error_policy.
If ARCHIVER fails, it will be restarted by Postmaster. Archive_program
runs in its own process, so shouldn't be able to touch PostgreSQL. It
will run in (postgres) security context, so no permissions changes.
archive_error_policy will only come into effect once the situation
occurs that archive directory runs out of space - after archiver_program
has failed and the WARNING to restart it has been ignored by admins.
Since EXTERNAL is not being supported, originally posted program called
pg_arch lives no more...c'est la vie
Final issues:
- need to know which signal to use from backend->ARCHIVER when an xlog
fills. Somebody let me know - not bothered which...?
===
Import Notes
Reply to msg id not found: 200405060438.i464cnF27335@candle.pha.pa.usReference msg id not found: 200405060438.i464cnF27335@candle.pha.pa.us | Resolved by subject fallback
A few questions may help to speed up my work
I need to send a signal from a backend to the archiver process.
1. What signal should I use?
2. How do I give the processid of the archiver to the backend? The
archiver may restart at any time, so its pid could change after a
backend is forked.
I have answers, but I strive for the best answer.
Thanks very much, Best regards, Simon Riggs
Simon Riggs wrote:
A few questions may help to speed up my work
I need to send a signal from a backend to the archiver process.
1. What signal should I use?
You can use any unused signal. I would suggest looking at what the
stats processes uses, and use something else like SIGUSR1.
2. How do I give the processid of the archiver to the backend? The
archiver may restart at any time, so its pid could change after a
backend is forked.
I was thinking of having it be in shared memory. I am going to work on
that part, but I need to finish the relocatable install stuff for Win32
first.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Simon Riggs <simon@2ndquadrant.com> writes:
I need to send a signal from a backend to the archiver process.
1. What signal should I use?
SIGUSR1 or SIGUSR2 would be the safest choices.
2. How do I give the processid of the archiver to the backend? The
archiver may restart at any time, so its pid could change after a
backend is forked.
My answer would be "don't". Send a signal to the postmaster and
let it signal the current archiver child. Use the existing
SendPostmasterSignal() code for the first part of this.
regards, tom lane
On Tue, 2004-05-11 at 22:15, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
I need to send a signal from a backend to the archiver process.
1. What signal should I use?
SIGUSR1 or SIGUSR2 would be the safest choices.
2. How do I give the processid of the archiver to the backend? The
archiver may restart at any time, so its pid could change after a
backend is forked.My answer would be "don't". Send a signal to the postmaster and
let it signal the current archiver child. Use the existing
SendPostmasterSignal() code for the first part of this.
Brilliant - very clean. Many thanks. Best Regards, Simon Riggs