PITR Phase 1 - Test results
I've now completed the coding of Phase 1 of PITR.
This allows a backup to be recovered and then rolled forward (all the
way) on transaction logs. This proves the code and the design works, but
also validates a lot of the earlier assumptions that were the subject of
much earlier debate.
As noted in the previous designs, PostgreSQL talks to an external
archiver using the XLogArchive API.
I've now completed:
- changes to PostgreSQL
- written a simple archiving utility, pg_arch
Using both of these together, I have successfully:
- started pg_arch
- started postgres
- taken a backup using tar
- ran pgbench for an extended period, so that the transaction logs taken
at the start have long since been recycled
- killed postmaster
- wait for completion
- rm -R $PGDATA
- restore using tar
- restore xlogs from archive directory
- start postmaster and watch it recover to end of logs
This has been tested through a number of times on non-trivial tests and
I've sat and watch the beast at work to make sure nothing wierd was
happening on timing.
At this stage:
Missing Functions -
- recovery does NOT yet stop at a specified point-in-time (that was
always planned for Phase 2)
- few more log messages required to report progress
- debug mode required to allow most to be turned off
Wrinkles
- code is system testable, but not as cute as it could be
- input from committers is now sought to complete the work
- you are strongly advised not to treat any of the patches as usable in
any real world situation YET - that bit comes next
Bugs
- two bugs currently occur during some tests:
1. the notification mechanism as originally designed causes ALL backends
to report that a log file has closed. That works most of the time,
though does give rise to occaisional timing errors - nothing too
serious, but this inexactness could lead to later errors.
2. After restore, the notification system doesn't recover fully - this
is a straightforward one
I'm building a full patchset for this code and will upload this soon. As
you might expect over the time its taken me to develop this, some bitrot
has set in, so I'm rebuilding it against the latest dev version now, and
will complete fixes for the two bugs mentioned above.
I'm sure some will say "no words, show me the code"... I thought you all
would appreciate some advance warning of this, to plan time to
investigate and comment upon the coding.
Best Regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com
I want to come hug you --- where do you live? !!!
:-)
---------------------------------------------------------------------------
Simon Riggs wrote:
I've now completed the coding of Phase 1 of PITR.
This allows a backup to be recovered and then rolled forward (all the
way) on transaction logs. This proves the code and the design works, but
also validates a lot of the earlier assumptions that were the subject of
much earlier debate.As noted in the previous designs, PostgreSQL talks to an external
archiver using the XLogArchive API.
I've now completed:
- changes to PostgreSQL
- written a simple archiving utility, pg_archUsing both of these together, I have successfully:
- started pg_arch
- started postgres
- taken a backup using tar
- ran pgbench for an extended period, so that the transaction logs taken
at the start have long since been recycled
- killed postmaster
- wait for completion
- rm -R $PGDATA
- restore using tar
- restore xlogs from archive directory
- start postmaster and watch it recover to end of logsThis has been tested through a number of times on non-trivial tests and
I've sat and watch the beast at work to make sure nothing wierd was
happening on timing.At this stage:
Missing Functions -
- recovery does NOT yet stop at a specified point-in-time (that was
always planned for Phase 2)
- few more log messages required to report progress
- debug mode required to allow most to be turned offWrinkles
- code is system testable, but not as cute as it could be
- input from committers is now sought to complete the work
- you are strongly advised not to treat any of the patches as usable in
any real world situation YET - that bit comes nextBugs
- two bugs currently occur during some tests:
1. the notification mechanism as originally designed causes ALL backends
to report that a log file has closed. That works most of the time,
though does give rise to occaisional timing errors - nothing too
serious, but this inexactness could lead to later errors.
2. After restore, the notification system doesn't recover fully - this
is a straightforward oneI'm building a full patchset for this code and will upload this soon. As
you might expect over the time its taken me to develop this, some bitrot
has set in, so I'm rebuilding it against the latest dev version now, and
will complete fixes for the two bugs mentioned above.I'm sure some will say "no words, show me the code"... I thought you all
would appreciate some advance warning of this, to plan time to
investigate and comment upon the coding.Best Regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Well, I guess I was fairly happy too :-)
I'd be more comfortable if I'd found more bugs though, but I'm sure the
kind folk on this list will see that wish of mine comes true!
The code is in a "needs more polishing" state - which is just the right
time for some last discussions before everything sets too solid.
Regards, Simon
Show quoted text
On Mon, 2004-04-26 at 17:48, Bruce Momjian wrote:
I want to come hug you --- where do you live? !!!
:-)
---------------------------------------------------------------------------
Simon Riggs wrote:
I've now completed the coding of Phase 1 of PITR.
This allows a backup to be recovered and then rolled forward (all the
way) on transaction logs. This proves the code and the design works, but
also validates a lot of the earlier assumptions that were the subject of
much earlier debate.As noted in the previous designs, PostgreSQL talks to an external
archiver using the XLogArchive API.
I've now completed:
- changes to PostgreSQL
- written a simple archiving utility, pg_archUsing both of these together, I have successfully:
- started pg_arch
- started postgres
- taken a backup using tar
- ran pgbench for an extended period, so that the transaction logs taken
at the start have long since been recycled
- killed postmaster
- wait for completion
- rm -R $PGDATA
- restore using tar
- restore xlogs from archive directory
- start postmaster and watch it recover to end of logsThis has been tested through a number of times on non-trivial tests and
I've sat and watch the beast at work to make sure nothing wierd was
happening on timing.At this stage:
Missing Functions -
- recovery does NOT yet stop at a specified point-in-time (that was
always planned for Phase 2)
- few more log messages required to report progress
- debug mode required to allow most to be turned offWrinkles
- code is system testable, but not as cute as it could be
- input from committers is now sought to complete the work
- you are strongly advised not to treat any of the patches as usable in
any real world situation YET - that bit comes nextBugs
- two bugs currently occur during some tests:
1. the notification mechanism as originally designed causes ALL backends
to report that a log file has closed. That works most of the time,
though does give rise to occaisional timing errors - nothing too
serious, but this inexactness could lead to later errors.
2. After restore, the notification system doesn't recover fully - this
is a straightforward oneI'm building a full patchset for this code and will upload this soon. As
you might expect over the time its taken me to develop this, some bitrot
has set in, so I'm rebuilding it against the latest dev version now, and
will complete fixes for the two bugs mentioned above.I'm sure some will say "no words, show me the code"... I thought you all
would appreciate some advance warning of this, to plan time to
investigate and comment upon the coding.Best Regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
Simon Riggs wrote:
Well, I guess I was fairly happy too :-)
YES!
I'd be more comfortable if I'd found more bugs though, but I'm sure the
kind folk on this list will see that wish of mine comes true!The code is in a "needs more polishing" state - which is just the right
time for some last discussions before everything sets too solid.
Once we see the patch, we will be able to eyeball all the code paths and
interface to existing code and will be able to spot a lot of stuff, I am
sure.
It might take a few passes over it but you will get all the support and
ideas we have.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
I want to come hug you --- where do you live? !!!
You're not the only one. But we don't want to smother the poor guy, at
least not before he completes his work :-)
On Mon, 2004-04-26 at 16:37, Simon Riggs wrote:
I've now completed the coding of Phase 1 of PITR.
This allows a backup to be recovered and then rolled forward (all the
way) on transaction logs. This proves the code and the design works, but
also validates a lot of the earlier assumptions that were the subject of
much earlier debate.As noted in the previous designs, PostgreSQL talks to an external
archiver using the XLogArchive API.
I've now completed:
- changes to PostgreSQL
- written a simple archiving utility, pg_arch
This will be on HACKERS not PATCHES for a while...
OVERVIEW :
Various code changes. Not all included here...but I want to prove this
is real, rather than have you waiting for my patch release skills to
improve.
PostgreSQL changes include:
============================
- guc.c
New GUC called wal_archive to control archival logging/not.
- xlog.h
GUC added here
- xlog.c
The most critical parts of the code live here. The way things currently
work can be thought of as a circular set of logs, with the current log
position sweeping around the circle like a clock. In order to archive an
xlog, you must start just AFTER the file has been closed and BEFORE the
pointer sweeps round again.
The code here tries to spot the right moment to notify the archive that
its time to archive. That point is critical, too early and the archive
may yet be incomplete, too late and a window of failure creeps into the
system.
Finding that point is more complicated than it seems because every
backend has the same file open and decides to close it at different
times - nearly the same time if you're running pgbench, but could vary
considerably otherwise. That timing difference is the source of Bug#1.
My solution is to use the piece of code that first updates pg_control,
since there is a similar need to only-do-it-once. My understanding is
that the other backends eventually discover they are supposed to be
looking at a different file now and reset themselves - so that the xlog
gets fsynced only once.
It's taken me a week to consider the alternatives...this point is
critical, so please suggest if you know/think differently.
When the pointer sweeps round again, if we are still archiving, we
simply increase the number of logs in the cycle to defer when we can
recycle the xlog. The code doesn't yet handle a failure condition we
discussed previously: running out of disk space and how we handle that
(there was detailed debate, noted for future implementation).
New utility aimed at being located in src/bin/pg_arch
=======================================================
- pg_arch.c
The idea of pg_arch is that it is a functioning archival tool and at the
same time is the reference implementation of the XLogArchive API. The
API is all wrapped up in the same file currently, to make it easier to
implement, but I envisage separating these out into two parts after it
passes initial inspection - shouldn't take too much work given that was
its design goal. This will then allow the API to be used for wider
applications that want to backup PostgreSQL.
- src/bin/Makefile has been updated to include pg_arch, so that this
then gets made as part of the full system rather than an add-on. I'm
sure somebody has feelings on this...my thinking was that it ought to be
available without too much effort.
What's NOT included (YET!)
==========================
-changes to initdb
-changes to postgresql.conf
-changes to wal_debug
-related changes
-user documentation
- changes to initdb
XLogArchive API implementation relies on the existence of
$PGDATA/pg_rlog
That would be relatively simple to add to initdb, but its also a no
brainer to add without it, so I thought I'd leave it for discussion in
case anybody has good reasons to put elsewhere/rename it etc.
More importantly, this effects the security model used by XLogArchive.
The way I had originally envisaged this, the directory permissions would
be opened up for group level read/write thus:
pg_xlog rwxr-x---
pg_rlog rwxrwx---
though this of course relies on $PGDATA being opened up also. That then
would allow the archiving tool to be in its own account also, yet with a
shared group. (Thinking that a standard Legato install (for instance) is
unlikely to recommend sharing a UNIX userid with PostgreSQL). I was
unaware that PostgreSQL checks the permissions of PGDATA before it
starts and does not allow you to proceed if group permissions exist.
We have two options:-related changes
-user documentation
i) alter all things that rely on security being userlevel-only
- initdb
- startup
- most other security features?
ii) encourage (i.e. force) people using XLogArchive API to run as the
PostgreSQL owning-user (postgres).
I've avoided this issue in the general implementation, thinking that
there'll be some strong feelings either way, or an alternative that I
haven't thought of yet (please...)
-changes to postgresql.conf
The parameter setting
wal_archive=true
needs to be added to make XLogArchive work or not.
I've not added this to the install template (yet), in case we had some
further suggestions for what this might be called.
-related changes
-user documentation
-changes to wal_debug
The XLOG_DEBUG flag is set as a value between 1 and 16, though the code
only ever treats this as a boolean. For my development, I partially
implemented an earlier suggestion of mine: set the flag to 1 in the
config file, then set the more verbose portions of debug output to
trigger when its set to 16. That effected a couple of places in xlog.c.
That may not be needed, so thats not included either.
-user documentation
Not yet...but it will be.
Show quoted text
Bugs
- two bugs currently occur during some tests:
1. the notification mechanism as originally designed causes ALL backends
to report that a log file has closed. That works most of the time,
though does give rise to occasional timing errors - nothing too
serious, but this inexactness could lead to later errors.
2. After restore, the notification system doesn't recover fully - this
is a straightforward one
On Mon, 2004-04-26 at 18:08, Bruce Momjian wrote:
Simon Riggs wrote:
Well, I guess I was fairly happy too :-)
YES!
I'd be more comfortable if I'd found more bugs though, but I'm sure the
kind folk on this list will see that wish of mine comes true!The code is in a "needs more polishing" state - which is just the right
time for some last discussions before everything sets too solid.Once we see the patch, we will be able to eyeball all the code paths and
interface to existing code and will be able to spot a lot of stuff, I am
sure.It might take a few passes over it but you will get all the support and
ideas we have.
Thanks very much.
Code will be there in full tomorrow now (oh it is tomorrow...)
Fixed the bugs that I spoke of earlier though. They all make sense when
you try to tell someone else about them...
Best Regards, Simon
Simon Riggs wrote:
New utility aimed at being located in src/bin/pg_arch
Why isn't the archiver process integrated into the server?
Peter Eisentraut wrote:
Simon Riggs wrote:
New utility aimed at being located in src/bin/pg_arch
Why isn't the archiver process integrated into the server?
I think it is because the archiver process has to be started/stopped
independently of the server.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, 2004-04-27 at 18:10, Peter Eisentraut wrote:
Simon Riggs wrote:
New utility aimed at being located in src/bin/pg_arch
Why isn't the archiver process integrated into the server?
Number of reasons....
Overall, I initially favoured the archiver as another special backend,
like checkpoint. That is exactly the same architecture as Oracle uses,
so is a good starting place for thought.
We discussed the design in detail on the list and the suggestion was
made to implement PITR using an API to send notification to an archiver.
In Oracle7, it was considered OK to just dump the files in some
directory and call them archived. Later, most DBMSs have gone to some
trouble to integrate with generic or at least market leading backup and
recovery (BAR) software products. Informix and DB2 provide open
interfaces to BARs; Oracle does not, but then it figures it already
(had) market share, so we'll just do it our way.
The XLogArchive design allows ANY external archiver to work with
PostgreSQL. The pg_arch program supplied is really to show how that
might be implemented. This leaves the door open for any BAR product to
interface through to PostgreSQL, whether this be your favourite open
source BAR or the leading proprietary vendors.
Wide adoption is an important design feature and the design presented
offers this.
The other reason is to do with how and when archival takes place. An
asynchronous communication mechanism is required between PostgreSQL and
the archiver, to allow for such situations as tape mounts or simple
failure of the archiver. The method chosen for implementing this
asynchronous comms mechanism lends itself to being an external API -
there were other designs but these were limited to internal use only.
You ask a reasonable question however. If pg_autovacuum exists, why
should pg_autoarch not work also? My own thinking about external
connectivity may have overshadowed my thinking there.
It would not require too much additional work to add another GUC which
gives the name of the external archiver to confirm execution of, or
start/restart if it fails. At this point, such a feature is a nice to
have in comparison with the goal of being able to recover to a PIT, so I
will defer this issue to Phase 3....
Best regards, Simon Riggs
Am Tuesday 27 April 2004 22:21 schrieb Simon Riggs:
Why isn't the archiver process integrated into the server?
You ask a reasonable question however. If pg_autovacuum exists, why
should pg_autoarch not work also?
pg_autovacuum is going away to be integrated as a backend process.
Am Tuesday 27 April 2004 19:59 schrieb Bruce Momjian:
Peter Eisentraut wrote:
Simon Riggs wrote:
New utility aimed at being located in src/bin/pg_arch
Why isn't the archiver process integrated into the server?
I think it is because the archiver process has to be started/stopped
independently of the server.
When the server is not running there is nothing to archive, so I don't follow
this argument.
Am Monday 26 April 2004 23:11 schrieb Simon Riggs:
ii) encourage (i.e. force) people using XLogArchive API to run as the
PostgreSQL owning-user (postgres).
I think this is perfectly reasonable.
On Wed, 2004-04-28 at 16:14, Peter Eisentraut wrote:
Am Tuesday 27 April 2004 19:59 schrieb Bruce Momjian:
Peter Eisentraut wrote:
Simon Riggs wrote:
New utility aimed at being located in src/bin/pg_arch
Why isn't the archiver process integrated into the server?
I think it is because the archiver process has to be started/stopped
independently of the server.When the server is not running there is nothing to archive, so I don't follow
this argument.
The running server creates xlogs, which are still available for archive
even when the server is not running...
Overall, your point is taken, with many additional comments in my other
posts in reply to you.
I accept that this may be desirable in the future, for some simple
implementations. The pg_autovacuum evolution path is a good model - if
it works and the code is stable, bring it under the postmaster at a
later time.
Best Regards, Simon Riggs
Simon Riggs wrote:
When the server is not running there is nothing to archive, so I don't follow
this argument.The running server creates xlogs, which are still available for archive
even when the server is not running...Overall, your point is taken, with many additional comments in my other
posts in reply to you.I accept that this may be desirable in the future, for some simple
implementations. The pg_autovacuum evolution path is a good model - if
it works and the code is stable, bring it under the postmaster at a
later time.
[ This email isn't focused because I haven't resolved all my ideas yet.]
OK, I looked over the code. Basically it appears pg_arch is a
client-side program that copies files from pg_xlog to a specified
directory, and marks completion in a new pg_rlog directory.
The driving part of the program seems to be:
while ( (n = read( xlogfd, buf, BLCKSZ)) > 0)
if ( write( archfd, buf, n) != n)
return false;
The program basically sleeps and when it awakes checks to see if new WAL
files have been created.
There is some additional GUC variable to prevent WAL from being recycled
until it has been archived, but the posted patch only had pg_arch.c, its
Makefile, and a patch to update bin/Makefile.
Simon (the submitter) specified he was providing an API to archive, but
it is really just a set of C routines to call that do copies. It is not
a wire protocol or anything like that.
The program has a mode where it archives all available wal files and
exits, but by default it has to remain running to continue archiving.
I am wondering if this is the way to approach the situation. I
apologize for not considering this earlier. Archives of PITR postings
of interest are at:
http://momjian.postgresql.org/cgi-bin/pgtodo?pitr
It seems the backend is the one who knows right away when a new WAL file
has been created and needs to be archived.
Also, are folks happy with archiving only full WAL files? This will not
restore all transactions up to the point of failure, but might lose
perhaps 2-5 minutes of transactions before the failure.
Also, a client application is a separate process that must remain
running. With Informix, there is a separate utility to do PITR logging.
It is a pain to have to make sure a separate process is always running.
Here is an idea. What if we add two GUC settings:
pitr = true/false;
pitr_path = 'filename or |program';
In this way, you would basically specify your path to dump all WAL logs
into (just keep appending 16MB chunks) or call a program that you pipe
all the WAL logs into.
You can't change pitr_path while pitr is on. Each backend opens the
filename in append mode before writing. One problem is that this slows
down the backend because it has to do the write, and it might be slow.
We also need the ability to write to a tape drive, and you can't
open/close those like a file. Different backends will be doing the WAL
file additions, there isn't a central process to keep a tape drive file
descriptor open.
Seems pg_arch should at least use libpq to connect to a database and do
a LISTEN and have the backend NOTIFY when they create a new WAL file or
something. Polling for new WAL files seems non-optimal, but maybe a
database connection is overkill.
Then, you start the backend, specify the path, turn on pitr, do the tar,
and you are on your way.
Also, pg_arch should only be run the the install user. No need to allow
other users to run this.
Another idea is to have a client program like pg_ctl that controls PITR
logging (start, stop, location), but does its job and exits, rather than
remains running.
I apologies for not bringing up these issues earlier. I didn't realize
the direction it was going. I wasn't focused on it. Sorry.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Thu, Apr 29, 2004 at 12:18:38AM -0400, Bruce Momjian wrote:
OK, I looked over the code. Basically it appears pg_arch is a
client-side program that copies files from pg_xlog to a specified
directory, and marks completion in a new pg_rlog directory.The driving part of the program seems to be:
while ( (n = read( xlogfd, buf, BLCKSZ)) > 0)
if ( write( archfd, buf, n) != n)
return false;The program basically sleeps and when it awakes checks to see if new WAL
files have been created.
Is the API able to indicate a written but not-yet-filled WAL segment?
So an archiver could copy the filled part, and refill it later. This
may be needed because a segment could take a while to be filled.
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Hoy es el primer d�a del resto de mi vida"
Alvaro Herrera wrote:
On Thu, Apr 29, 2004 at 12:18:38AM -0400, Bruce Momjian wrote:
OK, I looked over the code. Basically it appears pg_arch is a
client-side program that copies files from pg_xlog to a specified
directory, and marks completion in a new pg_rlog directory.The driving part of the program seems to be:
while ( (n = read( xlogfd, buf, BLCKSZ)) > 0)
if ( write( archfd, buf, n) != n)
return false;The program basically sleeps and when it awakes checks to see if new WAL
files have been created.Is the API able to indicate a written but not-yet-filled WAL segment?
So an archiver could copy the filled part, and refill it later. This
may be needed because a segment could take a while to be filled.
I couldn't figure that out, but I don't think it does. It would have to
lock the WAL writes so it could get a good copy, I think, and I didn't
see that.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Thu, Apr 29, 2004 at 10:07:01AM -0400, Bruce Momjian wrote:
Alvaro Herrera wrote:
Is the API able to indicate a written but not-yet-filled WAL segment?
So an archiver could copy the filled part, and refill it later. This
may be needed because a segment could take a while to be filled.I couldn't figure that out, but I don't think it does. It would have to
lock the WAL writes so it could get a good copy, I think, and I didn't
see that.
I'm not sure but I don't think so. You don't have to lock the WAL for
writing, because it will always write later in the file than you are
allowed to read. (If you read more than you were told to, it's your
fault as an archiver.)
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Et put se mouve" (Galileo Galilei)
Alvaro Herrera wrote:
On Thu, Apr 29, 2004 at 10:07:01AM -0400, Bruce Momjian wrote:
Alvaro Herrera wrote:
Is the API able to indicate a written but not-yet-filled WAL segment?
So an archiver could copy the filled part, and refill it later. This
may be needed because a segment could take a while to be filled.I couldn't figure that out, but I don't think it does. It would have to
lock the WAL writes so it could get a good copy, I think, and I didn't
see that.I'm not sure but I don't think so. You don't have to lock the WAL for
writing, because it will always write later in the file than you are
allowed to read. (If you read more than you were told to, it's your
fault as an archiver.)
My point was that without locking the WAL, we might get part of a WAL
write in our file, but I now realize that during a crash the same thing
might happen, so it would be OK to just copy it even if it is being
written to.
Simon posted the rest of his patch that shows changes to the backend,
and a comment reads:
+ * The name of the notification file is the message that will be picked up
+ * by the archiver, e.g. we write RLogDir/00000001000000C6.full
+ * and the archiver then knows to archive XLOgDir/00000001000000C6,
+ * while it is doing so it will rename RLogDir/00000001000000C6.full
+ * to RLogDir/00000001000000C6.busy, then when complete, rename it again
+ * to RLogDir/00000001000000C6.done
so it is only archiving full logs.
Also, I think this archiver should be able to log to a local drive,
network drive (trivial), tape drive, ftp, or use an external script to
transfer the logs somewhere. (ftp would probably be an external script
with 'expect').
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Thu, 2004-04-29 at 15:22, Bruce Momjian wrote:
Alvaro Herrera wrote:
On Thu, Apr 29, 2004 at 10:07:01AM -0400, Bruce Momjian wrote:
Alvaro Herrera wrote:
Is the API able to indicate a written but not-yet-filled WAL segment?
So an archiver could copy the filled part, and refill it later. This
may be needed because a segment could take a while to be filled.I couldn't figure that out, but I don't think it does. It would have to
lock the WAL writes so it could get a good copy, I think, and I didn't
see that.I'm not sure but I don't think so. You don't have to lock the WAL for
writing, because it will always write later in the file than you are
allowed to read. (If you read more than you were told to, it's your
fault as an archiver.)My point was that without locking the WAL, we might get part of a WAL
write in our file, but I now realize that during a crash the same thing
might happen, so it would be OK to just copy it even if it is being
written to.Simon posted the rest of his patch that shows changes to the backend,
and a comment reads:+ * The name of the notification file is the message that will be picked up + * by the archiver, e.g. we write RLogDir/00000001000000C6.full + * and the archiver then knows to archive XLOgDir/00000001000000C6, + * while it is doing so it will rename RLogDir/00000001000000C6.full + * to RLogDir/00000001000000C6.busy, then when complete, rename it again + * to RLogDir/00000001000000C6.doneso it is only archiving full logs.
Also, I think this archiver should be able to log to a local drive,
network drive (trivial), tape drive, ftp, or use an external script to
transfer the logs somewhere. (ftp would probably be an external script
with 'expect').
Bruce is correct, the API waits for the archive to be full before
archiving.
I had thought about the case for partial archiving: basically, if you
want to archive in smaller chunks, make your log files smaller...this is
now a compile time option. Possibly there is an argument to make the
xlog file size configurable, as a way of doing what you suggest.
Taking multiple copies of the same file, yet trying to work out which
one to apply sounds complex and error prone to me. It also increases the
cost of the archival process and thus drains other resources.
The archiver should be able to do a whole range of things. Basically,
that point was discussed and the agreed approach was to provide an API
that would allow anybody and everybody to write whatever they wanted.
The design included pg_arch since it was clear that there would be a
requirement in the basic product to have those facilities - and in any
case any practically focused API has a reference port as a way of
showing how to use it and exposing any bugs in the server side
implementation.
The point is...everybody is now empowered to write tape drive code,
whatever you fancy.... go do.
Best regards, Simon Riggs