Forcing current WAL file to be archived
Patch included to implement xlog switching, using an xlog record
"processing instruction" and forcibly moving xlog pointers.
1. Happens automatically on pg_stop_backup()
2. Can happen manually via pg_switch_xlog()
3. Implement range of utility functions:
pg_current_wal_offset()
pg_current_xlogfile()
pg_current_xlogfile_offset() - for Hannu
pg_xlogfile_from_wal_offset() for interpreting output from
pg_switch_xlog, pg_start/stop_backup()
Passes make check, applies cleanly to HEAD, includes doc patches with
clean SGML builds.
Design as clean as possible given and has implementation of
archive_timeout in mind also. Happy to work further on any code cleanups
requested.
I've done a variety of testing on it, doing concurrent pg_regress and
pg_switch_xlog(). All known issues resolved. Main test cases and sample
outputs are in switchtest.sh
Wide variety of cases need testing, so I'm expecting some further issues
to be reported.
I'm now working on completing the restartable recovery patch, which will
include further tests of PITR recoveries on the xswitch.patch.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes:
Patch included to implement xlog switching, using an xlog record
"processing instruction" and forcibly moving xlog pointers.
Just to be clear --- does this fully supersede your draft patch of
27-July, or is that still on the table too?
regards, tom lane
On Thu, 2006-08-03 at 18:00 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
Patch included to implement xlog switching, using an xlog record
"processing instruction" and forcibly moving xlog pointers.Just to be clear --- does this fully supersede your draft patch of
27-July, or is that still on the table too?
Sorry for lack of clarity.
This patch supersedes the draft patch of 27 July, which should have been
marked more clearly as WIP.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes:
Patch included to implement xlog switching, using an xlog record
"processing instruction" and forcibly moving xlog pointers.
Applied with revisions. I didn't like the extra state you added to
track whether an xlog switch had occurred --- the more bits of
interdependent state the more chance for bugs, IMHO, and it seemed
unnecessary since it's easy enough to test whether we are at a
segment boundary. I also made the new user-level functions a bit
more orthogonal, so that filenames could be extracted from the
existing functions like pg_stop_backup.
regards, tom lane
On Sat, 2006-08-05 at 23:57 -0400, Tom Lane wrote:
I also made the new user-level functions a bit
more orthogonal, so that filenames could be extracted from the
existing functions like pg_stop_backup.
Something Hannu wrote has just reminded me that
pg_current_xlog_location() returns the current Insert pointer rather
than the current Write pointer.
That would not be useful for streaming xlog records would it?
Methinks it should be the Write pointer all of the time, since I can't
think of a valid reason for wanting to know where the Insert pointer is
*before* we've written to the xlog file. Having it be the Insert pointer
could lead to some errors.
Any objections if I correct that?
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Ühel kenal päeval, K, 2006-08-09 kell 12:56, kirjutas Simon Riggs:
On Sat, 2006-08-05 at 23:57 -0400, Tom Lane wrote:
I also made the new user-level functions a bit
more orthogonal, so that filenames could be extracted from the
existing functions like pg_stop_backup.Something Hannu wrote has just reminded me that
pg_current_xlog_location() returns the current Insert pointer rather
than the current Write pointer.That would not be useful for streaming xlog records would it?
Methinks it should be the Write pointer all of the time, since I can't
think of a valid reason for wanting to know where the Insert pointer is
*before* we've written to the xlog file. Having it be the Insert pointer
could lead to some errors.Any objections if I correct that?
What is the difference ?
I'd expect it to point either to last byte written or to the next byte
that will be written, and I want to know which one it is :)
And another question:
is is possible that under some circumstances the last few bytes of a WAL
file will not be written to ?
or is the writing done as if all the wal files together form one huge
tape, without any gaps between ?
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia
Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com
Simon Riggs <simon@2ndquadrant.com> writes:
Something Hannu wrote has just reminded me that
pg_current_xlog_location() returns the current Insert pointer rather
than the current Write pointer.
That would not be useful for streaming xlog records would it?
Good point.
Methinks it should be the Write pointer all of the time, since I can't
think of a valid reason for wanting to know where the Insert pointer is
*before* we've written to the xlog file. Having it be the Insert pointer
could lead to some errors.
However the start/stop_backup functions return the Insert pointer.
I can see scripts getting confused if pg_current_xlog_location reports
something less than what they just got from pg_stop_backup.
Is there value in exposing both pointers? (Maybe not, it'll just cause
confusion probably.)
Another option is to have pg_current_xlog_location force a write (but
not fsync) as far as the Insert pointer it's about to return. This
would eliminate any issues about inconsistency between results, but
perhaps there's too much performance penalty.
I'm not necessarily against your suggestion, just trying to be sure
we've thought about all the options.
regards, tom lane
Hannu Krosing <hannu@skype.net> writes:
Ühel kenal päeval, K, 2006-08-09 kell 12:56, kirjutas Simon Riggs:
Methinks it should be the Write pointer all of the time, since I can't
think of a valid reason for wanting to know where the Insert pointer is
*before* we've written to the xlog file. Having it be the Insert pointer
could lead to some errors.
What is the difference ?
Insert points to the next byte to be written within the internal WAL
buffers. The byte(s) preceding it haven't necessarily gotten out of
those buffers yet. Write points to the end of what we've actually
written to the kernel, and there's also a Flush pointer that points
to the end of what we believe is down on disk.
Simon's point is that if you're going to use pg_current_xlog_location()
to control partial shipping of xlog files, you probably want to know
about the Write location, because that indicates the limit of what
is visible to an external process.
regards, tom lane
On Wed, 2006-08-09 at 10:04 -0400, Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
Something Hannu wrote has just reminded me that
pg_current_xlog_location() returns the current Insert pointer rather
than the current Write pointer.
That would not be useful for streaming xlog records would it?Good point.
Methinks it should be the Write pointer all of the time, since I can't
think of a valid reason for wanting to know where the Insert pointer is
*before* we've written to the xlog file. Having it be the Insert pointer
could lead to some errors.However the start/stop_backup functions return the Insert pointer.
I can see scripts getting confused if pg_current_xlog_location reports
something less than what they just got from pg_stop_backup.Is there value in exposing both pointers? (Maybe not, it'll just cause
confusion probably.)Another option is to have pg_current_xlog_location force a write (but
not fsync) as far as the Insert pointer it's about to return. This
would eliminate any issues about inconsistency between results, but
perhaps there's too much performance penalty.I'm not necessarily against your suggestion, just trying to be sure
we've thought about all the options.
Hannu's Use Case is to have the function called regularly from an
external polling agent. If we don't do the write it could be possible in
some circumstances for an XLogWrite to be delayed for some time if we
have both large wal_buffers and period of few commits, whereas we might
want to have the writes be fairly regular to support regular streaming.
So I do see there is a reasonable case for performing a write.
The way we have XLogWrite now, I would want that write to be a
"flexible" write, which would leave us in the position that calling
pg_current_xlog_location() would return something logically between the
Insert pointer and the immediately prior Write pointer (even though very
often there would be no difference at all).
I'm inclined to say we should add a flexible write (i.e.
XLogWrite(WriteRqst, true)) to pg_xlogfile_name_offset() and ignore the
esoteric difference between the return value and the Insert pointer.
I'm not clear how pg_xlogfile_name_offset() would ever return a
different answer to pg_stop_backup() - surely we just forcibly moved the
Insert and the Write pointer forwards together, so you'll get the same
answer from each.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes:
On Wed, 2006-08-09 at 10:04 -0400, Tom Lane wrote:
Another option is to have pg_current_xlog_location force a write (but
not fsync) as far as the Insert pointer it's about to return. This
would eliminate any issues about inconsistency between results, but
perhaps there's too much performance penalty.
Hannu's Use Case is to have the function called regularly from an
external polling agent. If we don't do the write it could be possible in
some circumstances for an XLogWrite to be delayed for some time if we
have both large wal_buffers and period of few commits, whereas we might
want to have the writes be fairly regular to support regular streaming.
So I do see there is a reasonable case for performing a write.
Now the other side of that coin is that any commit forces a write anyway.
So any data that would hypothetically be written by pg_current_xlog_location
would be uncommitted data, which is maybe not so important to write yet?
Anyway, it's easy enough for a polling program to force a write if it
wants to.
The way we have XLogWrite now, I would want that write to be a
"flexible" write, which would leave us in the position that calling
pg_current_xlog_location() would return something logically between the
Insert pointer and the immediately prior Write pointer (even though very
often there would be no difference at all).
I really don't want that; it makes it impossible to define what the
function is actually giving you. It's not an "esoteric difference".
I'm not clear how pg_xlogfile_name_offset() would ever return a
different answer to pg_stop_backup() - surely we just forcibly moved the
Insert and the Write pointer forwards together, so you'll get the same
answer from each.
Hmm ... I guess given the just-added behavior of forcing an xlog switch,
that's probably true now, but it wasn't before.
Anyway, after further thought I've concluded that we really should
supply something that returns the Insert pointer, as this would be
useful for debugging and system-monitoring purposes. It's clear however
that we also need something that returns the Write pointer, as that's
what's needed for partial log-shipping. So my vote is for two
functions, both read-only (and hence not superuser-only). Not sure
what to name them exactly.
regards, tom lane
On Thu, 2006-08-10 at 08:57 -0400, Tom Lane wrote:
Anyway, after further thought I've concluded that we really should
supply something that returns the Insert pointer, as this would be
useful for debugging and system-monitoring purposes. It's clear however
that we also need something that returns the Write pointer, as that's
what's needed for partial log-shipping.
So my vote is for two
functions, both read-only (and hence not superuser-only).
Thats probably the most important consideration.
Not sure
what to name them exactly.
pg_current_xlog_location() - gives the write pointer i.e. the offset up
to which you can read() the xlog file and trust what it tells you
pg_current_wal_insert_pointer() - gives the insert pointer :-)
Named sufficiently differently that there is no confusion between them.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
On Aug 10, 2006, at 7:57 AM, Tom Lane wrote:
Anyway, after further thought I've concluded that we really should
supply something that returns the Insert pointer, as this would be
useful for debugging and system-monitoring purposes. It's clear
however
that we also need something that returns the Write pointer, as that's
what's needed for partial log-shipping. So my vote is for two
functions, both read-only (and hence not superuser-only). Not sure
what to name them exactly.
Dumb question... is there any need to be able to get those values in
sync (I'm assuming that in the time taken to call two separate
functions the value on the second function called could change from
what it was when the first function was called)? Should there be a
SRF that returns both values?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Ühel kenal päeval, K, 2006-08-09 kell 10:57, kirjutas Tom Lane:
Hannu Krosing <hannu@skype.net> writes:
Ühel kenal päeval, K, 2006-08-09 kell 12:56, kirjutas Simon Riggs:
Methinks it should be the Write pointer all of the time, since I can't
think of a valid reason for wanting to know where the Insert pointer is
*before* we've written to the xlog file. Having it be the Insert pointer
could lead to some errors.What is the difference ?
Insert points to the next byte to be written within the internal WAL
buffers. The byte(s) preceding it haven't necessarily gotten out of
those buffers yet. Write points to the end of what we've actually
written to the kernel,
I assume that it also points to the byte after what is written to
kernel, or is it tha last byte written ?
and there's also a Flush pointer that points
to the end of what we believe is down on disk.Simon's point is that if you're going to use pg_current_xlog_location()
to control partial shipping of xlog files, you probably want to know
about the Write location, because that indicates the limit of what
is visible to an external process.
Yes, that is what I need
regards, tom lane
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia
Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com
Hannu Krosing <hannu@skype.net> writes:
Ühel kenal päeval, K, 2006-08-09 kell 10:57, kirjutas Tom Lane:
Insert points to the next byte to be written within the internal WAL
buffers. The byte(s) preceding it haven't necessarily gotten out of
those buffers yet. Write points to the end of what we've actually
written to the kernel,
I assume that it also points to the byte after what is written to
kernel, or is it tha last byte written ?
Right, it's really first-unwritten-byte for all three pointers.
The two newly added functions to convert WAL locations to filenames
use XLByteToPrevSeg(), so they should do the right thing here
(see comments in src/include/access/xlog_internal.h).
regards, tom lane
Ühel kenal päeval, L, 2006-08-12 kell 10:59, kirjutas Tom Lane:
Hannu Krosing <hannu@skype.net> writes:
Ühel kenal päeval, K, 2006-08-09 kell 10:57, kirjutas Tom Lane:
Insert points to the next byte to be written within the internal WAL
buffers. The byte(s) preceding it haven't necessarily gotten out of
those buffers yet. Write points to the end of what we've actually
written to the kernel,I assume that it also points to the byte after what is written to
kernel, or is it tha last byte written ?Right, it's really first-unwritten-byte for all three pointers.
The two newly added functions to convert WAL locations to filenames
use XLByteToPrevSeg(), so they should do the right thing here
(see comments in src/include/access/xlog_internal.h).
How do they behave exactly at the file boundary ?
That is will it point 1 byte past end of old file, or byte 0 of the new
one ?
regards, tom lane
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia
Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com
This issue is closed, right?
---------------------------------------------------------------------------
Tom Lane wrote:
Simon Riggs <simon@2ndquadrant.com> writes:
Something Hannu wrote has just reminded me that
pg_current_xlog_location() returns the current Insert pointer rather
than the current Write pointer.
That would not be useful for streaming xlog records would it?Good point.
Methinks it should be the Write pointer all of the time, since I can't
think of a valid reason for wanting to know where the Insert pointer is
*before* we've written to the xlog file. Having it be the Insert pointer
could lead to some errors.However the start/stop_backup functions return the Insert pointer.
I can see scripts getting confused if pg_current_xlog_location reports
something less than what they just got from pg_stop_backup.Is there value in exposing both pointers? (Maybe not, it'll just cause
confusion probably.)Another option is to have pg_current_xlog_location force a write (but
not fsync) as far as the Insert pointer it's about to return. This
would eliminate any issues about inconsistency between results, but
perhaps there's too much performance penalty.I'm not necessarily against your suggestion, just trying to be sure
we've thought about all the options.regards, tom lane
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
This issue is closed, right?
We've agreed we need two functions, but it's not done yet. Seems pretty
trivial though ...
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
This issue is closed, right?
We've agreed we need two functions, but it's not done yet. Seems pretty
trivial though ...
OK, that's what I was unclear about.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Sun, 2006-08-13 at 22:50 -0400, Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
This issue is closed, right?
We've agreed we need two functions, but it's not done yet. Seems pretty
trivial though ...
Just back from India. I'll work on this tonight.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
On Fri, 2006-08-11 at 08:04 +0100, Simon Riggs wrote:
On Thu, 2006-08-10 at 08:57 -0400, Tom Lane wrote:
Anyway, after further thought I've concluded that we really should
supply something that returns the Insert pointer, as this would be
useful for debugging and system-monitoring purposes. It's clear however
that we also need something that returns the Write pointer, as that's
what's needed for partial log-shipping.So my vote is for two
functions, both read-only (and hence not superuser-only).Thats probably the most important consideration.
Not sure
what to name them exactly.pg_current_xlog_location() - gives the write pointer i.e. the offset up
to which you can read() the xlog file and trust what it tells youpg_current_wal_insert_pointer() - gives the insert pointer :-)
Named sufficiently differently that there is no confusion between them.
Patch implementing the above attached.
Sample execution, with commentary at bottom.
postgres=# select pg_current_wal_insert_pointer(),
pg_current_xlog_location();
pg_current_wal_insert_pointer | pg_current_xlog_location
-------------------------------+--------------------------
0/3A0824 | 0/3A0824
(1 row)
postgres=# begin;insert into blah values (1);
BEGIN
INSERT 0 1
postgres=# select pg_current_wal_insert_pointer(),
pg_current_xlog_location();
pg_current_wal_insert_pointer | pg_current_xlog_location
-------------------------------+--------------------------
0/3A085C | 0/3A0824
(1 row)
postgres=# insert into blah values (1);
INSERT 0 1
postgres=# select pg_current_wal_insert_pointer(),
pg_current_xlog_location();
pg_current_wal_insert_pointer | pg_current_xlog_location
-------------------------------+--------------------------
0/3A0894 | 0/3A0824
(1 row)
postgres=# commit;
COMMIT
postgres=# select pg_current_wal_insert_pointer(),
pg_current_xlog_location();
pg_current_wal_insert_pointer | pg_current_xlog_location
-------------------------------+--------------------------
0/3A08BC | 0/3A08BC
(1 row)
postgres=# select pg_switch_xlog();
pg_switch_xlog
----------------
0/3A091C
(1 row)
postgres=# select pg_current_wal_insert_pointer(),
pg_current_xlog_location();
pg_current_wal_insert_pointer | pg_current_xlog_location
-------------------------------+--------------------------
0/1000020 | 0/1000000
(1 row)
postgres=# select pg_xlogfile_name_offset(pg_current_xlog_location());
pg_xlogfile_name_offset
-----------------------------------
000000010000000000000000 16777216
(1 row)
The above shows that the Insert pointer is always ahead of or the same
as the Write pointer. After a log switch the current location is shown
as being in the next file, though the current filename still shows as
the previous filename (since there has been no write activity yet on the
new file) with an offset of 1 beyond EOF, to indicate that the whole
file may now be read.
pg_switch_xlog() shows the next-to-be written byte in the file that we
have just switched out of, or the current location if we just performed
a log switch. So the following sequence does *not* show there is an
error in the returned pointer values.
postgres=# insert into blah values (1);
INSERT 0 1
postgres=# select pg_xlogfile_name_offset(pg_current_xlog_location());
pg_xlogfile_name_offset
------------------------------
000000010000000000000001 372
(1 row)
postgres=# select pg_xlogfile_name_offset(pg_switch_xlog());
pg_xlogfile_name_offset
------------------------------
000000010000000000000001 400
(1 row)
...a log switch was performed
postgres=# select pg_xlogfile_name_offset(pg_switch_xlog());
pg_xlogfile_name_offset
-----------------------------------
000000010000000000000001 16777216
(1 row)
...a log switch was *not* performed, since we're already at EOF
I've not taken up Jim Nasby's suggestion to make this an SRF with
multiple return rows/columns since that much complexity isn't justified
IMHO.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com