BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION
The following bug has been logged online:
Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:
An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.
SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)
Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)
The resulting *.backup file:
START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CST
In my 8.3.4 instance, WAL file naming occurs as:
...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...
WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.
- r.
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
The following bug has been logged online:
Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)The resulting *.backup file:
START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CSTIn my 8.3.4 instance, WAL file naming occurs as:
...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.
It's a bug of pg_stop_backup(), which has been talked before.
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php
Attached is a patch against HEAD. I think that we should
also backport.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
stopxlogfilename_bugfix.patchtext/x-patch; name=stopxlogfilename_bugfix.patchDownload
? GNUmakefile
? config.log
? config.status
? src/Makefile.global
? src/backend/postgres
? src/backend/catalog/postgres.bki
? src/backend/catalog/postgres.description
? src/backend/catalog/postgres.shdescription
? src/backend/snowball/snowball_create.sql
? src/backend/utils/probes.h
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/bin/initdb/initdb
? src/bin/pg_config/pg_config
? src/bin/pg_controldata/pg_controldata
? src/bin/pg_ctl/pg_ctl
? src/bin/pg_dump/pg_dump
? src/bin/pg_dump/pg_dumpall
? src/bin/pg_dump/pg_restore
? src/bin/pg_resetxlog/pg_resetxlog
? src/bin/psql/psql
? src/bin/scripts/clusterdb
? src/bin/scripts/createdb
? src/bin/scripts/createlang
? src/bin/scripts/createuser
? src/bin/scripts/dropdb
? src/bin/scripts/droplang
? src/bin/scripts/dropuser
? src/bin/scripts/reindexdb
? src/bin/scripts/vacuumdb
? src/include/pg_config.h
? src/include/stamp-h
? src/interfaces/ecpg/compatlib/exports.list
? src/interfaces/ecpg/compatlib/libecpg_compat.so.3.1
? src/interfaces/ecpg/ecpglib/exports.list
? src/interfaces/ecpg/ecpglib/libecpg.so.6.1
? src/interfaces/ecpg/include/ecpg_config.h
? src/interfaces/ecpg/pgtypeslib/exports.list
? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.3.1
? src/interfaces/ecpg/preproc/ecpg
? src/interfaces/libpq/exports.list
? src/interfaces/libpq/libpq.so.5.2
? src/port/pg_config_paths.h
? src/test/regress/log
? src/test/regress/pg_regress
? src/test/regress/results
? src/test/regress/testtablespace
? src/test/regress/tmp_check
? src/test/regress/expected/constraints.out
? src/test/regress/expected/copy.out
? src/test/regress/expected/create_function_1.out
? src/test/regress/expected/create_function_2.out
? src/test/regress/expected/largeobject.out
? src/test/regress/expected/largeobject_1.out
? src/test/regress/expected/misc.out
? src/test/regress/expected/tablespace.out
? src/test/regress/sql/constraints.sql
? src/test/regress/sql/copy.sql
? src/test/regress/sql/create_function_1.sql
? src/test/regress/sql/create_function_2.sql
? src/test/regress/sql/largeobject.sql
? src/test/regress/sql/misc.sql
? src/test/regress/sql/tablespace.sql
? src/timezone/zic
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.323
diff -c -r1.323 xlog.c
*** src/backend/access/transam/xlog.c 3 Dec 2008 08:20:11 -0000 1.323
--- src/backend/access/transam/xlog.c 6 Dec 2008 04:21:05 -0000
***************
*** 6710,6716 ****
*/
stoppoint = RequestXLogSwitch();
! XLByteToSeg(stoppoint, _logId, _logSeg);
XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
/* Use the log timezone here, not the session timezone */
--- 6710,6716 ----
*/
stoppoint = RequestXLogSwitch();
! XLByteToPrevSeg(stoppoint, _logId, _logSeg);
XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
/* Use the log timezone here, not the session timezone */
Would someone please tell me if this should be applied?
---------------------------------------------------------------------------
Fujii Masao wrote:
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
The following bug has been logged online:
Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)The resulting *.backup file:
START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CSTIn my 8.3.4 instance, WAL file naming occurs as:
...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.It's a bug of pg_stop_backup(), which has been talked before.
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.phpAttached is a patch against HEAD. I think that we should
also backport.Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
[ Attachment, skipping... ]
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
I think not
(http://archives.postgresql.org/pgsql-hackers/2008-12/msg00126.php). The
return value of pg_stop_backup() is currently the same as
pg_switch_xlog()'s: the location of the last byte before the XLOG switch
+ 1. The proposed patch would remove the "+ 1". Seems like an
unnecessary API change, and I don't recall any reason why the new
definition would be better.
A fix for the broken waiting behavior discussed in that thread was
committed.
Bruce Momjian wrote:
Would someone please tell me if this should be applied?
---------------------------------------------------------------------------
Fujii Masao wrote:
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
The following bug has been logged online:
Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)The resulting *.backup file:
START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CSTIn my 8.3.4 instance, WAL file naming occurs as:
...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.It's a bug of pg_stop_backup(), which has been talked before.
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.phpAttached is a patch against HEAD. I think that we should
also backport.Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center[ Attachment, skipping... ]
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Hi,
On Thu, Jan 15, 2009 at 9:09 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
I think not
(http://archives.postgresql.org/pgsql-hackers/2008-12/msg00126.php). The
return value of pg_stop_backup() is currently the same as
pg_switch_xlog()'s: the location of the last byte before the XLOG switch +
1. The proposed patch would remove the "+ 1". Seems like an unnecessary API
change, and I don't recall any reason why the new definition would be
better.
My patch doesn't change the return value of pg_stop_backup(), it's still
the same as the return value of pg_switch_xlog(). Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Looking at the original post again:
The resulting *.backup file:
START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CSTIn my 8.3.4 instance, WAL file naming occurs as:
...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.
I can see the potential confusion here. START WAL LOCATION is an
inclusive value, while STOP WAL LOCATION is exclusive. You need to
archive all WAL files < STOP WAL LOCATION to have a valid backup, not
<=. Printing the filenames adds to the confusion.
Perhaps if we printed them like "files 0000000200000010000000FE <= X <
0000000200000010000000FF" the intention would be clearer, but we can't
change the format now without braking all existing backups.
In 8.4, this will be less of an issue, because pg_stop_backup() now
waits for the last file to be archived before returning, so you don't
have to look at those values to implement the waiting yourself.
In the passing, I notice that the manual says for pg_xlog_switch():
pg_switch_xlog moves to the next transaction log file, allowing the current file to be archived (assuming you are using continuous archiving). The result is the ending transaction log location within the just-completed transaction log file. If there has been no transaction log activity since the last transaction log switch, pg_switch_xlog does nothing and returns the end location of the previous transaction log file.
That's incorrect. According comments in RequestXLogSwitch(), what it
actually returns is:
* The return value is either the end+1 address of the switch record,
* or the end+1 address of the prior segment if we did not need to
* write a switch record because we are already at segment start.
Note that "end+1 address of the prior segment" is the same as "first
byte of the *next* segment", which contradicts with the manual. I'll
change that paragraph in the manual into:
The result is the ending transaction log location *+ 1* within the
just-completed transaction log file.
If there has been no transaction log activity since the last
transaction log switch,
<function>pg_switch_xlog</> does nothing and returns the *start*
location
of the transaction log file *currently in use*.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Fujii Masao wrote:
On Thu, Jan 15, 2009 at 9:09 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:1. The proposed patch would remove the "+ 1". Seems like an unnecessary API
change, and I don't recall any reason why the new definition would be
better.My patch doesn't change the return value of pg_stop_backup(), it's still
the same as the return value of pg_switch_xlog().
Oh, ok.
Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.
Hmm, I guess that would make it less confusing. Seems quite dangerous to
change the meaning now, however :-(. A program (or person) that knows
its current meaning would currently wait for STOP WAL filename - 1 file
to be archived. If we change the meaning, the same program would
determine that the backup is safe, even if the last xlog file hasn't yet
been archived. So I think this is not back-portable.
Should we change it in HEAD? I'm leaning towards no, on the grounds that
tools/people would then have to know the version it's dealing with to
interpret the value correctly, and because pg_stop_backup() now waits
for the last xlog file to be archived before returning, there's little
need to look at that file.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
Fujii Masao wrote:
Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.
Should we change it in HEAD? I'm leaning towards no, on the grounds that
tools/people would then have to know the version it's dealing with to
interpret the value correctly, and because pg_stop_backup() now waits
for the last xlog file to be archived before returning, there's little
need to look at that file.
I agree. It might have been better to define it the other way
originally, but the risks of changing it now outweigh any likely
benefit.
regards, tom lane
On Thu, 2009-01-15 at 11:15 -0500, Tom Lane wrote:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
Fujii Masao wrote:
Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.Should we change it in HEAD? I'm leaning towards no, on the grounds that
tools/people would then have to know the version it's dealing with to
interpret the value correctly, and because pg_stop_backup() now waits
for the last xlog file to be archived before returning, there's little
need to look at that file.I agree. It might have been better to define it the other way
originally, but the risks of changing it now outweigh any likely
benefit.
Agreed. It's too confusing the other way.
The manual entry wasn't changed from my original submission
unfortunately.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
On Thu, 2009-01-15 at 11:15 -0500, Tom Lane wrote:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
Fujii Masao wrote:
Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.Should we change it in HEAD? I'm leaning towards no, on the grounds that
tools/people would then have to know the version it's dealing with to
interpret the value correctly, and because pg_stop_backup() now waits
for the last xlog file to be archived before returning, there's little
need to look at that file.I agree. It might have been better to define it the other way
originally, but the risks of changing it now outweigh any likely
benefit.Agreed. It's too confusing the other way.
The manual entry wasn't changed from my original submission
unfortunately.
OK, do you have updated wording?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Thu, 2009-01-15 at 12:43 -0500, Bruce Momjian wrote:
OK, do you have updated wording?
We are not changing the code, so Heikki's wording is appropriate since
it matches the code.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Heikki has updated the documentation to mention the meaning of this
field. Thanks for the report.
---------------------------------------------------------------------------
Fujii Masao wrote:
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
The following bug has been logged online:
Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)The resulting *.backup file:
START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CSTIn my 8.3.4 instance, WAL file naming occurs as:
...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.It's a bug of pg_stop_backup(), which has been talked before.
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.phpAttached is a patch against HEAD. I think that we should
also backport.Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
[ Attachment, skipping... ]
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Hi,
On Fri, Jan 16, 2009 at 12:23 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a
boundary
byte. This would confuse the user, I think.Hmm, I guess that would make it less confusing. Seems quite dangerous to
change the meaning now, however :-(. A program (or person) that knows its
current meaning would currently wait for STOP WAL filename - 1 file to be
archived. If we change the meaning, the same program would determine that
the backup is safe, even if the last xlog file hasn't yet been archived. So
I think this is not back-portable.
Yes, I agree that we need to be careful about changing such meaning.
But, there are two reasons why I think this would confuse the users.
1.
Currently, stop wal filename is not always exclusive. If stop wal location
doesn't indicate a boundary byte, its filename is inclusive. I'm afraid that
the users cannot easily judge which "filename - 1" or "filename" should be
waited. I mean that the users need to calculate whether stop wal location
indicates a boundary byte or not before starting waiting. Such calculation
should be done by the users?
2.
I think it's odd that the return value of pg_xlogfile_name(pg_stop_backup())
is different from the wal stop filename in backup history file, though
the return value of pg_stop_backup() is the same as the wal stop location
in backup history file. We should uniform them? pg_xlogfile_name() always
returns the inclusive filename, so the users don't need to care about
whether the return value of pg_stop_backup() indicates a boundary byte.
This is already documented.
-----------------
http://www.postgresql.org/docs/current/static/functions-admin.html
Similarly, pg_xlogfile_name extracts just the transaction log file name.
When the given transaction log location is exactly at a transaction log file
boundary, both these functions return the name of the preceding transaction
log file. This is usually the desired behavior for managing transaction log
archiving behavior, since the preceding file is the last one that currently
needs to be archived.
-----------------
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes:
Currently, stop wal filename is not always exclusive. If stop wal location
doesn't indicate a boundary byte, its filename is inclusive. I'm afraid that
the users cannot easily judge which "filename - 1" or "filename" should be
waited. I mean that the users need to calculate whether stop wal location
indicates a boundary byte or not before starting waiting. Such calculation
should be done by the users?
No, which is why we provide functions to do it ;-)
It's really not worth changing the file contents. We're far more likely
to hear complaints like "you broke my archive script and I lost all my
data" than compliments about "the contents of this internal
implementation file are lots more sensible now".
regards, tom lane
Hi,
On Fri, Jan 16, 2009 at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
It's really not worth changing the file contents. We're far more likely
to hear complaints like "you broke my archive script and I lost all my
data" than compliments about "the contents of this internal
implementation file are lots more sensible now".
OK. I understood that changing the filename would more confuse users.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)The resulting *.backup file:
START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CSTIn my 8.3.4 instance, WAL file naming occurs as:
...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.
Sorry for resurrecting an old argument.
http://archives.postgresql.org/message-id/200812051441.mB5EfG1M007309@wwwmaster.postgresql.org
I got the complaint about this behavior of the current pg_stop_backup()
in this morning. I thought that this is the bug, and created the patch.
But it was rejected because its change might break the existing app.
Though I'm not sure if there is really such an app. Anyway I think that
something like the following statements should be added into the document.
Thought?
------------
Note that the WAL file name in the backup history file cannot be used
to determine which WAL files are required for the backup. Because it
indicates the subsequent WAL file of the starting or ending one for
the backup, when its location is exactly at a WAL file boundary (What
is worse, sometimes it indicates a nonexistent WAL file).
------------
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
But it was rejected because its change might break the existing app.
It might break existing applications if it returns "FE" instead of "FF",
but never-used filename surprises users. (IMO, the existing apps probably
crash if "FF" returned, i.e, 1/256 of the time.)
Should it return the *next* reasonable log filename instead of "FF"?
For example, 000000020000002000000000 for the above case.
Regards,
---
Takahiro Itagaki
NTT Open Source Software Center
On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
But it was rejected because its change might break the existing app.
It might break existing applications if it returns "FE" instead of "FF",
but never-used filename surprises users. (IMO, the existing apps probably
crash if "FF" returned, i.e, 1/256 of the time.)Should it return the *next* reasonable log filename instead of "FF"?
For example, 000000020000002000000000 for the above case.
I wonder if that change also breaks the existing app. But since
I've never seen the app that doesn't use that filename at face
value, I agree to change the existing (odd for me) behavior of
pg_stop_backup().
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Thu, Feb 4, 2010 at 4:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
Sorry for resurrecting an old argument.
http://archives.postgresql.org/message-id/200812051441.mB5EfG1M007309@wwwmaster.postgresql.orgI got the complaint about this behavior of the current pg_stop_backup()
in this morning. I thought that this is the bug, and created the patch.
But it was rejected because its change might break the existing app.
Though I'm not sure if there is really such an app. Anyway I think that
something like the following statements should be added into the document.
Thought?------------
Note that the WAL file name in the backup history file cannot be used
to determine which WAL files are required for the backup. Because it
indicates the subsequent WAL file of the starting or ending one for
the backup, when its location is exactly at a WAL file boundary (What
is worse, sometimes it indicates a nonexistent WAL file).
------------
Here is the patch that adds the above-mentioned note. I think this
should be back-patched up to 8.0. Thought?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
note_backup_history_file_0215.patchtext/x-patch; charset=US-ASCII; name=note_backup_history_file_0215.patchDownload
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 859,864 **** SELECT pg_stop_backup();
--- 859,869 ----
If you used the label to identify the associated dump file,
then the archived history file is enough to tell you which dump file to
restore.
+ Note that the WAL file name in the backup history file cannot be used
+ to determine which WAL files are required for the backup. Because it
+ indicates the subsequent WAL file of the starting or ending one for
+ the backup, when its location is exactly at a WAL file boundary (What
+ is worse, sometimes it indicates a nonexistent WAL file).
</para>
<para>
On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:
Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)But it was rejected because its change might break the existing app.
It might break existing applications if it returns "FE" instead of "FF",
but never-used filename surprises users. (IMO, the existing apps probably
crash if "FF" returned, i.e, 1/256 of the time.)Should it return the *next* reasonable log filename instead of "FF"?
For example, 000000020000002000000000 for the above case.
Here is the patch that avoids a nonexistent file name, according to
Itagaki-san's suggestion. If we are crossing a logid boundary, the
next reasonable file name is used instead of a nonexistent one.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
stop_file_name_0216.patchtext/x-patch; charset=US-ASCII; name=stop_file_name_0216.patchDownload
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 8057,8063 **** pg_stop_backup(PG_FUNCTION_ARGS)
*/
RequestXLogSwitch();
! XLByteToSeg(stoppoint, _logId, _logSeg);
XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
/* Use the log timezone here, not the session timezone */
--- 8057,8078 ----
*/
RequestXLogSwitch();
! if (stoppoint.xrecoff >= XLogSegSize)
! {
! XLogRecPtr recptr = stoppoint;
!
! /*
! * Since xlog segment file name is calculated by using XLByteToSeg,
! * it might indicate a nonexistent file (i.e., which ends in "FF")
! * when we are crossing a logid boundary. In this case, we use the
! * next reasonable file name instead of nonexistent one.
! */
! recptr.xlogid += 1;
! recptr.xrecoff = XLOG_BLCKSZ;
! XLByteToSeg(recptr, _logId, _logSeg);
! }
! else
! XLByteToSeg(stoppoint, _logId, _logSeg);
XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
/* Use the log timezone here, not the session timezone */
I'd like to apply the patch to HEAD and previous releases because
the issue seems to be a bug in the core. Any comments or objections?
Some users actually use STOP WAL LOCATION in their backup script,
and they've countered the bug with 1/256 probability in recent days.
Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:Fujii Masao <masao.fujii@gmail.com> wrote:
On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:
An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)But it was rejected because its change might break the existing app.
It might break existing applications if it returns "FE" instead of "FF",
but never-used filename surprises users. (IMO, the existing apps probably
crash if "FF" returned, i.e, 1/256 of the time.)Should it return the *next* reasonable log filename instead of "FF"?
For example, 000000020000002000000000 for the above case.Here is the patch that avoids a nonexistent file name, according to
Itagaki-san's suggestion. If we are crossing a logid boundary, the
next reasonable file name is used instead of a nonexistent one.
Regards,
---
Takahiro Itagaki
NTT Open Source Software Center
Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:
I'd like to apply the patch to HEAD and previous releases because
the issue seems to be a bug in the core. Any comments or objections?
The proposed patch seems quite ugly to me; not only the messy coding,
but the fact that it might return either the segment containing the
XLOG_BACKUP_END record or the next one.
I think an appropriate fix might just be s/XLByteToSeg/XLByteToPrevSeg/,
so that the result is always the segment containing the XLOG_BACKUP_END
record even when the record ends exactly at a segment boundary.
regards, tom lane