Proposed doc-patch: Identifying the Current WAL file
Reading Tom's posting here:
http://archives.postgresql.org/pgsql-general/2006-04/msg00499.php
I just realised we don't seem to mention this in the docs anywhere. I
propose adding a short paragraph to 23.3.1
http://www.postgresql.org/docs/8.1/static/backup-online.html#BACKUP-ONLINE-CAVEATS
After "Except in the case of retrying a failure, it will be called only
once for any given file name."
To identify the current, partially-filled WAL segment, sort first by
mtime and second by file name. That is, take the latest mtime among the
properly-named files, breaking ties by taking the higher filename.
--
Richard Huxton
Archonet Ltd
Richard Huxton wrote:
Reading Tom's posting here:
http://archives.postgresql.org/pgsql-general/2006-04/msg00499.phpI just realised we don't seem to mention this in the docs anywhere. I
propose adding a short paragraph to 23.3.1
http://www.postgresql.org/docs/8.1/static/backup-online.html#BACKUP-ONLINE-CAVEATSAfter "Except in the case of retrying a failure, it will be called only
once for any given file name."To identify the current, partially-filled WAL segment, sort first by
mtime and second by file name. That is, take the latest mtime among the
properly-named files, breaking ties by taking the higher filename.
I am confused by this. Why do both mtime and file name need to be
checked?
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Richard Huxton wrote:
To identify the current, partially-filled WAL segment, sort first by
mtime and second by file name. That is, take the latest mtime among the
properly-named files, breaking ties by taking the higher filename.
I am confused by this. Why do both mtime and file name need to be
checked?
Because recycled WAL segments are renamed to have higher file names than
the currently-in-use segment. So you can't depend on file name first.
However, shortly after a segment switch two WAL segments could have the
same mtime (to within whatever the mtime granularity is, typ. 1 second).
The proposed rule should be OK as long as checkpoints (and ensuing
renames) can't occur oftener than the mtime granularity. If you're
checkpointing more than once a second, well, you need help ...
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Richard Huxton wrote:
To identify the current, partially-filled WAL segment, sort first by
mtime and second by file name. That is, take the latest mtime among the
properly-named files, breaking ties by taking the higher filename.I am confused by this. Why do both mtime and file name need to be
checked?Because recycled WAL segments are renamed to have higher file names than
the currently-in-use segment. So you can't depend on file name first.
However, shortly after a segment switch two WAL segments could have the
same mtime (to within whatever the mtime granularity is, typ. 1 second).The proposed rule should be OK as long as checkpoints (and ensuing
renames) can't occur oftener than the mtime granularity. If you're
checkpointing more than once a second, well, you need help ...
I am trying to figure out how this could even be done in a shell script.
'ls -lt' is going to show:
-rw-r--r-- 1 root postgres 0 Apr 15 11:56 x1
-rw-r--r-- 1 root postgres 0 Apr 15 11:56 x2
-rw-r--r-- 1 root postgres 0 Apr 15 11:56 x3
but this might be with second resolution:
-rw-r--r-- 1 root postgres 0 Apr 15 11:56:47 x1
-rw-r--r-- 1 root postgres 0 Apr 15 11:56:47 x2
-rw-r--r-- 1 root postgres 0 Apr 15 11:56:34 x3
or it might be:
-rw-r--r-- 1 root postgres 0 Apr 15 11:56:47 x1
-rw-r--r-- 1 root postgres 0 Apr 15 11:56:34 x2
-rw-r--r-- 1 root postgres 0 Apr 15 11:56:25 x3
In the first case, x2 is current, having be just switched to from x1,
while in the second case, x1 is current. In BSD, you can use ls -ltT to
see the seconds, but in Linux it is something different, and I am sure
there are some operating systems that don't allow you to see the seconds
at all. What general command-line solution can we propose for this
process? And if we can't provide one, should we supply an SQL function
to return the current WAL name?
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <pgman@candle.pha.pa.us> writes:
In the first case, x2 is current, having be just switched to from x1,
while in the second case, x1 is current. In BSD, you can use ls -ltT to
see the seconds, but in Linux it is something different, and I am sure
there are some operating systems that don't allow you to see the seconds
at all. What general command-line solution can we propose for this
process?
For a command-line solution it's probably sufficient to sort by mtime,
ie
ls -t | head -1
You'll be at worst 1 second behind reality, assuming 1-second
granularity of mtime (and assuming ls sorts by the real mtime not what
it shows you, but that's true everywhere AFAIK).
regards, tom lane
On Sat, 15 Apr 2006, Tom Lane wrote:
For a command-line solution it's probably sufficient to sort by mtime,
ie
ls -t | head -1
A while back when I was trying to work this out on the admin list, I believe
we came up with the following:
ls -tp /pg_xlog/ | grep -v "backup\|/" | head -1
which seems to work fairly well. Looks like that thread is here:
http://archives.postgresql.org/pgsql-admin/2005-10/msg00173.php
--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost wrote:
On Sat, 15 Apr 2006, Tom Lane wrote:
For a command-line solution it's probably sufficient to sort by mtime,
ie
ls -t | head -1A while back when I was trying to work this out on the admin list, I believe
we came up with the following:ls -tp /pg_xlog/ | grep -v "backup\|/" | head -1
which seems to work fairly well. Looks like that thread is here:
http://archives.postgresql.org/pgsql-admin/2005-10/msg00173.php
What does the -p and \| pipe check do? We don't have named pipes in
that directory, do we?
Also, what happens if the log switch happens, and some data change is
written to the new WAL file in the first second, but nothing happens to
the database after that for a minute? Your test would still show the
old log file.
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Sat, 15 Apr 2006, Bruce Momjian wrote:
ls -tp /pg_xlog/ | grep -v "backup\|/" | head -1
which seems to work fairly well. Looks like that thread is here:
http://archives.postgresql.org/pgsql-admin/2005-10/msg00173.php
What does the -p and \| pipe check do? We don't have named pipes in
that directory, do we?Also, what happens if the log switch happens, and some data change is
written to the new WAL file in the first second, but nothing happens to
the database after that for a minute? Your test would still show the
old log file.
The -p shows forward slashes after directories and the \| acts as an or and
the / following that just lets us filter directories out. I added that
because I used to find the archive_status directory winning the ls -t from
time to time.
Now about your what if question. I don't know if there is a way to get past
the mtime granularity. If I understand your scenario correctly, you indicate
that the previous log is written to and the new log is switched in during the
same mtime second. I did a quick test on linux to see how that operates:
touch AA AB
This yields both files with the same mtime.
ls -tp |head -1
yields AA as you suggested it would.
The following seems to do the trick if we can rely on alphabetizing to
properly decide the winner of a tie:
ls -tp | head -2 | sort -r | head -1
So, with my previous example, it would look like:
ls -tp /pg_xlog/ | grep -v "backup\|/" | head -2 | sort -r | head -1
--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost <jeff@frostconsultingllc.com> writes:
ls -tp /pg_xlog/ | grep -v "backup\|/" | head -1
What does the -p and \| pipe check do? We don't have named pipes in
that directory, do we?
The -p shows forward slashes after directories and the \| acts as an or and
the / following that just lets us filter directories out.
This seems both overly cute and wrong, because it fails to filter plain
files that might have a new mtime but aren't WAL files. I'd suggest a
simple test on file name to make sure it looks like a WAL file, ie,
24 hex digits.
ls -t .../pg_xlog | grep '^[0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]$' | head -1
regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Also, what happens if the log switch happens, and some data change is
written to the new WAL file in the first second, but nothing happens to
the database after that for a minute? Your test would still show the
old log file.
You seem to be assuming that ls will sort on the basis of the truncated
mtime that it displays, which is not the actual behavior of ls AFAIK.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Also, what happens if the log switch happens, and some data change is
written to the new WAL file in the first second, but nothing happens to
the database after that for a minute? Your test would still show the
old log file.You seem to be assuming that ls will sort on the basis of the truncated
mtime that it displays, which is not the actual behavior of ls AFAIK.
No, I am not:
$ touch x1 x2; touch x2
$ sleep 2; ls -lt
total 0
-rw-r--r-- 1 root postgres 0 Apr 15 14:04 x1
-rw-r--r-- 1 root postgres 0 Apr 15 14:04 x2
If the write to x2 happens in the first second, but no later writes
happen, you still see x1 as first, even though x2 is the new one and
might have WAL data in it. The point is that the test does not have a
one-second window of showing the wrong answer, meaning I could wait for
60 seconds, and still see the wrong WAL file at the top.
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Sat, 15 Apr 2006, Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Also, what happens if the log switch happens, and some data change is
written to the new WAL file in the first second, but nothing happens to
the database after that for a minute? Your test would still show the
old log file.You seem to be assuming that ls will sort on the basis of the truncated
mtime that it displays, which is not the actual behavior of ls AFAIK.
I believe you're correct, at least with the ls I have here on my linux system.
I created two files quickly with touch. Here is the stat output:
File: `AA'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 802h/2050d Inode: 2736263 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 26/postgres) Gid: ( 26/postgres)
Access: 2006-04-15 11:02:46.000000000 -0700
Modify: 2006-04-15 11:02:46.000000000 -0700
Change: 2006-04-15 11:02:46.000000000 -0700
File: `AB'
Size: 0 Blocks: 0 IO Block: 4096 regular empty file
Device: 802h/2050d Inode: 2736264 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 26/postgres) Gid: ( 26/postgres)
Access: 2006-04-15 11:02:48.000000000 -0700
Modify: 2006-04-15 11:02:48.000000000 -0700
Change: 2006-04-15 11:02:48.000000000 -0700
ls -t | head -1
AB
So it looks like the original was correct all along, but with Tom's regex it's
much cleaner. Too bad grep's regex engine doesn't support {24}.
I'll change all my scripts to use the following:
ls -t /pg_xlog/ | grep '^[0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]$' | head -1
Thanks Tom!
--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954
On Sat, 15 Apr 2006, Bruce Momjian wrote:
Tom Lane wrote:
No, I am not:
$ touch x1 x2; touch x2
$ sleep 2; ls -lt
total 0
-rw-r--r-- 1 root postgres 0 Apr 15 14:04 x1
-rw-r--r-- 1 root postgres 0 Apr 15 14:04 x2If the write to x2 happens in the first second, but no later writes
happen, you still see x1 as first, even though x2 is the new one and
might have WAL data in it. The point is that the test does not have a
one-second window of showing the wrong answer, meaning I could wait for
60 seconds, and still see the wrong WAL file at the top.
Bruce, what does
stat x1 x2
look like on your system? Which OS? Maybe we need caveats for various OSes?
--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost wrote:
On Sat, 15 Apr 2006, Bruce Momjian wrote:
Tom Lane wrote:
No, I am not:
$ touch x1 x2; touch x2
$ sleep 2; ls -lt
total 0
-rw-r--r-- 1 root postgres 0 Apr 15 14:04 x1
-rw-r--r-- 1 root postgres 0 Apr 15 14:04 x2If the write to x2 happens in the first second, but no later writes
happen, you still see x1 as first, even though x2 is the new one and
might have WAL data in it. The point is that the test does not have a
one-second window of showing the wrong answer, meaning I could wait for
60 seconds, and still see the wrong WAL file at the top.Bruce, what does
stat x1 x2
look like on your system? Which OS? Maybe we need caveats for various OSes?
System is BSD/OS:
$ touch x1 x2; stat x1 x2
filename: x1
inode: 4547362
device: 18,7
size: 0
type: regular file
links: 1
mode: 0644(-rw-r--r--)
owner: 0(root)
group: 102(postgres)
access: Sat Apr 15 14:41:12 2006
modification: Sat Apr 15 14:41:12 2006
change: Sat Apr 15 14:41:12 2006
filename: x2
inode: 4547363
device: 18,7
size: 0
type: regular file
links: 1
mode: 0644(-rw-r--r--)
owner: 0(root)
group: 102(postgres)
access: Sat Apr 15 14:41:12 2006
modification: Sat Apr 15 14:41:12 2006
change: Sat Apr 15 14:41:12 2006
And I tried it on Fedora Core 2:
bmomjian@x86-linux2:~$ touch x1 x2 ; stat x1 x2
File: `x1'
Size: 0 Blocks: 0 IO Block: 8192 regular
empty file
Device: 11h/17d Inode: 24707337 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 7078/bmomjian) Gid: ( 100/ users)
Access: 2006-04-15 11:41:53.000000000 -0700
Modify: 2006-04-15 11:41:53.000000000 -0700
Change: 2006-04-15 11:41:53.000000000 -0700
File: `x2'
Size: 0 Blocks: 0 IO Block: 8192 regular
empty file
Device: 11h/17d Inode: 24707338 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 7078/bmomjian) Gid: ( 100/ users)
Access: 2006-04-15 11:41:53.000000000 -0700
Modify: 2006-04-15 11:41:53.000000000 -0700
Change: 2006-04-15 11:41:53.000000000 -0700
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Sat, 15 Apr 2006, Bruce Momjian wrote:
And I tried it on Fedora Core 2:
Device: 11h/17d Inode: 24707338 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 7078/bmomjian) Gid: ( 100/ users)
Access: 2006-04-15 11:41:53.000000000 -0700
Modify: 2006-04-15 11:41:53.000000000 -0700
Change: 2006-04-15 11:41:53.000000000 -0700
And you know what? I pulled a bonehead maneuver when I read the output of my
stat command. FC3/4 appear to be the same. I wonder if this is filesystem
dependent since stat has all those trailing 0's for some reason. Maybe we are
back to using sort to decide the winner of a tie?
--
Jeff Frost, Owner <jeff@frostconsultingllc.com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954
Bruce Momjian <pgman@candle.pha.pa.us> writes:
The point is that the test does not have a
one-second window of showing the wrong answer, meaning I could wait for
60 seconds, and still see the wrong WAL file at the top.
Oh, I see your point: you can lose at most one second's worth of data,
but that second could be arbitrarily long ago if it was the latest
activity in the database. Yeah, that's a bit unpleasant. So we really
do need both parts of the ordering rule, and there seems no way to do
that with just 'ls'.
I wonder if you could do anything with find(1)'s -newer switch?
It seems to be a '>' condition not a '>=' condition, so it'd be
pretty awkward ... certainly not a one-liner.
I think everyone agrees that adding a SQL function would be a reasonable
thing to do, anyway.
regards, tom lane
On Sat, 2006-04-15 at 12:24 -0400, Bruce Momjian wrote:
And if we can't provide one, should we supply an SQL function
to return the current WAL name?
I'll do this. Just give me a few days to get my feet under the new desk.
I know its well past time I sorted this and a few other things out.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com/
Simon Riggs wrote:
On Sat, 2006-04-15 at 12:24 -0400, Bruce Momjian wrote:
And if we can't provide one, should we supply an SQL function
to return the current WAL name?I'll do this. Just give me a few days to get my feet under the new desk.
I know its well past time I sorted this and a few other things out.
If we get some mechanism to write those partial WAL files, we might not
need the ability to identify the current WAL file, and because a new
function is going to require an initdb, I am thinking we can't get this
done until 8.2 anyway, so Simon, please come up with a plan to finish
the missing PITR pieces. I am getting tired of trying to explain
workarounds to PITR users, especially when the workarounds are not easy.
We added PITR in 8.0, and we have made little improvement to it since
then, and its limitations are getting tiring.
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On 4/15/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
The point is that the test does not have a
one-second window of showing the wrong answer, meaning I could wait for
60 seconds, and still see the wrong WAL file at the top.Oh, I see your point: you can lose at most one second's worth of data,
but that second could be arbitrarily long ago if it was the latest
activity in the database. Yeah, that's a bit unpleasant. So we really
do need both parts of the ordering rule, and there seems no way to do
that with just 'ls'.I wonder if you could do anything with find(1)'s -newer switch?
It seems to be a '>' condition not a '>=' condition, so it'd be
pretty awkward ... certainly not a one-liner.I think everyone agrees that adding a SQL function would be a reasonable
thing to do, anyway.regards, tom lane
specially for those using windows that hadn't those wonderfull tools... :)
--
regards,
Jaime Casanova
"What they (MySQL) lose in usability, they gain back in benchmarks, and that's
all that matters: getting the wrong answer really fast."
Randal L. Schwartz
On Sat, 2006-04-15 at 16:20 -0400, Bruce Momjian wrote:
Simon Riggs wrote:
On Sat, 2006-04-15 at 12:24 -0400, Bruce Momjian wrote:
And if we can't provide one, should we supply an SQL function
to return the current WAL name?I'll do this. Just give me a few days to get my feet under the new desk.
I know its well past time I sorted this and a few other things out.If we get some mechanism to write those partial WAL files, we might not
need the ability to identify the current WAL file, and because a new
function is going to require an initdb, I am thinking we can't get this
done until 8.2 anyway, so Simon, please come up with a plan to finish
the missing PITR pieces. I am getting tired of trying to explain
workarounds to PITR users, especially when the workarounds are not easy.We added PITR in 8.0, and we have made little improvement to it since
then, and its limitations are getting tiring.
Yes, I know all of this, thats why I'm pleased to be in a position to
change this, now that I don't have a day job ;-). (Having said this, I'm
in California all week, so give me a little longer).
For 8.0. and 8.1 users, I'd suggest we release an external function as a
contrib module, so that we don't compromise reliability and not force an
initdb for them. With docs, of course.
I suggest we have two functions:
1. pg_xlog_file_from_offset(text)
This allows the output of pg_stop_backup to be formatted into a
filename, so it would be used like this:
select pg_xlog_file_from_offset(pg_stop_backup());
2. pg_xlog_file_current()
Can be run at any time to find the current xlog file
We need both because we need to know the current xlog file at the time
stop backup was run, not just at the time the function was executed. But
we may need the second function at other times.
For 8.2 we definitely need the logswitch logic to function at time of
pg_stop_backup() - and this should not return until archiver has
successfully copied the switched file away. 8.2 can have function (2)
internally in case anyone cares. (I agree, f(1) would be redundant at
that point).
(I'll let you guys decide the function names.)
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com/