BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Started by Randy Isbellabout 17 years ago22 messages
#1Randy Isbell
jisbell@cisco.com

The following bug has been logged online:

Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.

SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)

Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)

The resulting *.backup file:

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CST

In my 8.3.4 instance, WAL file naming occurs as:

...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...

WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.

- r.

#2Fujii Masao
masao.fujii@gmail.com
In reply to: Randy Isbell (#1)
1 attachment(s)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:

The following bug has been logged online:

Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.

SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)

Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)

The resulting *.backup file:

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CST

In my 8.3.4 instance, WAL file naming occurs as:

...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...

WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.

It's a bug of pg_stop_backup(), which has been talked before.
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php

Attached is a patch against HEAD. I think that we should
also backport.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

stopxlogfilename_bugfix.patchtext/x-patch; name=stopxlogfilename_bugfix.patchDownload
? GNUmakefile
? config.log
? config.status
? src/Makefile.global
? src/backend/postgres
? src/backend/catalog/postgres.bki
? src/backend/catalog/postgres.description
? src/backend/catalog/postgres.shdescription
? src/backend/snowball/snowball_create.sql
? src/backend/utils/probes.h
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/bin/initdb/initdb
? src/bin/pg_config/pg_config
? src/bin/pg_controldata/pg_controldata
? src/bin/pg_ctl/pg_ctl
? src/bin/pg_dump/pg_dump
? src/bin/pg_dump/pg_dumpall
? src/bin/pg_dump/pg_restore
? src/bin/pg_resetxlog/pg_resetxlog
? src/bin/psql/psql
? src/bin/scripts/clusterdb
? src/bin/scripts/createdb
? src/bin/scripts/createlang
? src/bin/scripts/createuser
? src/bin/scripts/dropdb
? src/bin/scripts/droplang
? src/bin/scripts/dropuser
? src/bin/scripts/reindexdb
? src/bin/scripts/vacuumdb
? src/include/pg_config.h
? src/include/stamp-h
? src/interfaces/ecpg/compatlib/exports.list
? src/interfaces/ecpg/compatlib/libecpg_compat.so.3.1
? src/interfaces/ecpg/ecpglib/exports.list
? src/interfaces/ecpg/ecpglib/libecpg.so.6.1
? src/interfaces/ecpg/include/ecpg_config.h
? src/interfaces/ecpg/pgtypeslib/exports.list
? src/interfaces/ecpg/pgtypeslib/libpgtypes.so.3.1
? src/interfaces/ecpg/preproc/ecpg
? src/interfaces/libpq/exports.list
? src/interfaces/libpq/libpq.so.5.2
? src/port/pg_config_paths.h
? src/test/regress/log
? src/test/regress/pg_regress
? src/test/regress/results
? src/test/regress/testtablespace
? src/test/regress/tmp_check
? src/test/regress/expected/constraints.out
? src/test/regress/expected/copy.out
? src/test/regress/expected/create_function_1.out
? src/test/regress/expected/create_function_2.out
? src/test/regress/expected/largeobject.out
? src/test/regress/expected/largeobject_1.out
? src/test/regress/expected/misc.out
? src/test/regress/expected/tablespace.out
? src/test/regress/sql/constraints.sql
? src/test/regress/sql/copy.sql
? src/test/regress/sql/create_function_1.sql
? src/test/regress/sql/create_function_2.sql
? src/test/regress/sql/largeobject.sql
? src/test/regress/sql/misc.sql
? src/test/regress/sql/tablespace.sql
? src/timezone/zic
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.323
diff -c -r1.323 xlog.c
*** src/backend/access/transam/xlog.c	3 Dec 2008 08:20:11 -0000	1.323
--- src/backend/access/transam/xlog.c	6 Dec 2008 04:21:05 -0000
***************
*** 6710,6716 ****
  	 */
  	stoppoint = RequestXLogSwitch();
  
! 	XLByteToSeg(stoppoint, _logId, _logSeg);
  	XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
  
  	/* Use the log timezone here, not the session timezone */
--- 6710,6716 ----
  	 */
  	stoppoint = RequestXLogSwitch();
  
! 	XLByteToPrevSeg(stoppoint, _logId, _logSeg);
  	XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
  
  	/* Use the log timezone here, not the session timezone */
#3Bruce Momjian
bruce@momjian.us
In reply to: Fujii Masao (#2)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Would someone please tell me if this should be applied?

---------------------------------------------------------------------------

Fujii Masao wrote:

On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:

The following bug has been logged online:

Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.

SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)

Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)

The resulting *.backup file:

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CST

In my 8.3.4 instance, WAL file naming occurs as:

...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...

WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.

It's a bug of pg_stop_backup(), which has been talked before.
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php

Attached is a patch against HEAD. I think that we should
also backport.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

[ Attachment, skipping... ]

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#3)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

I think not
(http://archives.postgresql.org/pgsql-hackers/2008-12/msg00126.php). The
return value of pg_stop_backup() is currently the same as
pg_switch_xlog()'s: the location of the last byte before the XLOG switch
+ 1. The proposed patch would remove the "+ 1". Seems like an
unnecessary API change, and I don't recall any reason why the new
definition would be better.

A fix for the broken waiting behavior discussed in that thread was
committed.

Bruce Momjian wrote:

Would someone please tell me if this should be applied?

---------------------------------------------------------------------------

Fujii Masao wrote:

On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:

The following bug has been logged online:

Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.

SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)

Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)

The resulting *.backup file:

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CST

In my 8.3.4 instance, WAL file naming occurs as:

...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...

WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.

It's a bug of pg_stop_backup(), which has been talked before.
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php

Attached is a patch against HEAD. I think that we should
also backport.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

[ Attachment, skipping... ]

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#5Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#4)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Hi,

On Thu, Jan 15, 2009 at 9:09 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

I think not
(http://archives.postgresql.org/pgsql-hackers/2008-12/msg00126.php). The
return value of pg_stop_backup() is currently the same as
pg_switch_xlog()'s: the location of the last byte before the XLOG switch +
1. The proposed patch would remove the "+ 1". Seems like an unnecessary API
change, and I don't recall any reason why the new definition would be
better.

My patch doesn't change the return value of pg_stop_backup(), it's still
the same as the return value of pg_switch_xlog(). Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Heikki Linnakangas (#4)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Looking at the original post again:

The resulting *.backup file:

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CST

In my 8.3.4 instance, WAL file naming occurs as:

...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...

WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.

I can see the potential confusion here. START WAL LOCATION is an
inclusive value, while STOP WAL LOCATION is exclusive. You need to
archive all WAL files < STOP WAL LOCATION to have a valid backup, not
<=. Printing the filenames adds to the confusion.

Perhaps if we printed them like "files 0000000200000010000000FE <= X <
0000000200000010000000FF" the intention would be clearer, but we can't
change the format now without braking all existing backups.

In 8.4, this will be less of an issue, because pg_stop_backup() now
waits for the last file to be archived before returning, so you don't
have to look at those values to implement the waiting yourself.

In the passing, I notice that the manual says for pg_xlog_switch():

pg_switch_xlog moves to the next transaction log file, allowing the current file to be archived (assuming you are using continuous archiving). The result is the ending transaction log location within the just-completed transaction log file. If there has been no transaction log activity since the last transaction log switch, pg_switch_xlog does nothing and returns the end location of the previous transaction log file.

That's incorrect. According comments in RequestXLogSwitch(), what it
actually returns is:

* The return value is either the end+1 address of the switch record,
* or the end+1 address of the prior segment if we did not need to
* write a switch record because we are already at segment start.

Note that "end+1 address of the prior segment" is the same as "first
byte of the *next* segment", which contradicts with the manual. I'll
change that paragraph in the manual into:

The result is the ending transaction log location *+ 1* within the
just-completed transaction log file.
If there has been no transaction log activity since the last
transaction log switch,
<function>pg_switch_xlog</> does nothing and returns the *start*
location
of the transaction log file *currently in use*.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Fujii Masao (#5)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Fujii Masao wrote:

On Thu, Jan 15, 2009 at 9:09 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

1. The proposed patch would remove the "+ 1". Seems like an unnecessary API
change, and I don't recall any reason why the new definition would be
better.

My patch doesn't change the return value of pg_stop_backup(), it's still
the same as the return value of pg_switch_xlog().

Oh, ok.

Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.

Hmm, I guess that would make it less confusing. Seems quite dangerous to
change the meaning now, however :-(. A program (or person) that knows
its current meaning would currently wait for STOP WAL filename - 1 file
to be archived. If we change the meaning, the same program would
determine that the backup is safe, even if the last xlog file hasn't yet
been archived. So I think this is not back-portable.

Should we change it in HEAD? I'm leaning towards no, on the grounds that
tools/people would then have to know the version it's dealing with to
interpret the value correctly, and because pg_stop_backup() now waits
for the last xlog file to be archived before returning, there's little
need to look at that file.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#7)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Fujii Masao wrote:

Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.

Should we change it in HEAD? I'm leaning towards no, on the grounds that
tools/people would then have to know the version it's dealing with to
interpret the value correctly, and because pg_stop_backup() now waits
for the last xlog file to be archived before returning, there's little
need to look at that file.

I agree. It might have been better to define it the other way
originally, but the risks of changing it now outweigh any likely
benefit.

regards, tom lane

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#8)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

On Thu, 2009-01-15 at 11:15 -0500, Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Fujii Masao wrote:

Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.

Should we change it in HEAD? I'm leaning towards no, on the grounds that
tools/people would then have to know the version it's dealing with to
interpret the value correctly, and because pg_stop_backup() now waits
for the last xlog file to be archived before returning, there's little
need to look at that file.

I agree. It might have been better to define it the other way
originally, but the risks of changing it now outweigh any likely
benefit.

Agreed. It's too confusing the other way.

The manual entry wasn't changed from my original submission
unfortunately.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#10Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#9)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Simon Riggs wrote:

On Thu, 2009-01-15 at 11:15 -0500, Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Fujii Masao wrote:

Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a boundary
byte. This would confuse the user, I think.

Should we change it in HEAD? I'm leaning towards no, on the grounds that
tools/people would then have to know the version it's dealing with to
interpret the value correctly, and because pg_stop_backup() now waits
for the last xlog file to be archived before returning, there's little
need to look at that file.

I agree. It might have been better to define it the other way
originally, but the risks of changing it now outweigh any likely
benefit.

Agreed. It's too confusing the other way.

The manual entry wasn't changed from my original submission
unfortunately.

OK, do you have updated wording?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#11Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#10)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

On Thu, 2009-01-15 at 12:43 -0500, Bruce Momjian wrote:

OK, do you have updated wording?

We are not changing the code, so Heikki's wording is appropriate since
it matches the code.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#12Bruce Momjian
bruce@momjian.us
In reply to: Fujii Masao (#2)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Heikki has updated the documentation to mention the meaning of this
field. Thanks for the report.

---------------------------------------------------------------------------

Fujii Masao wrote:

On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:

The following bug has been logged online:

Bug reference: 4566
Logged by: Randy Isbell
Email address: jisbell@cisco.com
PostgreSQL version: 8.3.4
Operating system: FreeBSD 6.2
Description: pg_stop_backup() reports incorrect STOP WAL LOCATION
Details:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.

SELECT pg_start_backup('filename');
pg_start_backup
-----------------
10/FE1E2BAC
(1 row)

Later:
SELECT pg_stop_backup();
pg_stop_backup
----------------
10/FF000000
(1 row)

The resulting *.backup file:

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CST

In my 8.3.4 instance, WAL file naming occurs as:

...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...

WAL files never end in 'FF'. This causes a problem when trying to collect
the ending WAL file for backup.

It's a bug of pg_stop_backup(), which has been talked before.
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00108.php

Attached is a patch against HEAD. I think that we should
also backport.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

[ Attachment, skipping... ]

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#13Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#7)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Hi,

On Fri, Jan 16, 2009 at 12:23 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Only a part of backup
history file (the file name including stop wal location) is changed.
Currently, the file name is wrong if stop wal location indicates a
boundary
byte. This would confuse the user, I think.

Hmm, I guess that would make it less confusing. Seems quite dangerous to
change the meaning now, however :-(. A program (or person) that knows its
current meaning would currently wait for STOP WAL filename - 1 file to be
archived. If we change the meaning, the same program would determine that
the backup is safe, even if the last xlog file hasn't yet been archived. So
I think this is not back-portable.

Yes, I agree that we need to be careful about changing such meaning.
But, there are two reasons why I think this would confuse the users.

1.
Currently, stop wal filename is not always exclusive. If stop wal location
doesn't indicate a boundary byte, its filename is inclusive. I'm afraid that
the users cannot easily judge which "filename - 1" or "filename" should be
waited. I mean that the users need to calculate whether stop wal location
indicates a boundary byte or not before starting waiting. Such calculation
should be done by the users?

2.
I think it's odd that the return value of pg_xlogfile_name(pg_stop_backup())
is different from the wal stop filename in backup history file, though
the return value of pg_stop_backup() is the same as the wal stop location
in backup history file. We should uniform them? pg_xlogfile_name() always
returns the inclusive filename, so the users don't need to care about
whether the return value of pg_stop_backup() indicates a boundary byte.
This is already documented.

-----------------
http://www.postgresql.org/docs/current/static/functions-admin.html

Similarly, pg_xlogfile_name extracts just the transaction log file name.
When the given transaction log location is exactly at a transaction log file
boundary, both these functions return the name of the preceding transaction
log file. This is usually the desired behavior for managing transaction log
archiving behavior, since the preceding file is the last one that currently
needs to be archived.

-----------------

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fujii Masao (#13)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Fujii Masao <masao.fujii@gmail.com> writes:

Currently, stop wal filename is not always exclusive. If stop wal location
doesn't indicate a boundary byte, its filename is inclusive. I'm afraid that
the users cannot easily judge which "filename - 1" or "filename" should be
waited. I mean that the users need to calculate whether stop wal location
indicates a boundary byte or not before starting waiting. Such calculation
should be done by the users?

No, which is why we provide functions to do it ;-)

It's really not worth changing the file contents. We're far more likely
to hear complaints like "you broke my archive script and I lost all my
data" than compliments about "the contents of this internal
implementation file are lots more sensible now".

regards, tom lane

#15Fujii Masao
masao.fujii@gmail.com
In reply to: Tom Lane (#14)
Re: BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Hi,

On Fri, Jan 16, 2009 at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's really not worth changing the file contents. We're far more likely
to hear complaints like "you broke my archive script and I lost all my
data" than compliments about "the contents of this internal
implementation file are lots more sensible now".

OK. I understood that changing the filename would more confuse users.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Randy Isbell (#1)
Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.

SELECT pg_start_backup('filename');
        pg_start_backup
       -----------------
        10/FE1E2BAC
       (1 row)

Later:
SELECT pg_stop_backup();
        pg_stop_backup
       ----------------
        10/FF000000
       (1 row)

The resulting *.backup file:

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)
CHECKPOINT LOCATION: 10/FE1E2BAC
START TIME: 2008-11-09 01:15:06 CST
LABEL: /bck/db/sn200811090115.tar.gz
STOP TIME: 2008-11-09 01:15:48 CST

In my 8.3.4 instance, WAL file naming occurs as:

...
0000000100000003000000FD
0000000100000003000000FE
000000010000000400000000
000000010000000400000001
...

WAL files never end in 'FF'.  This causes a problem when trying to collect
the ending WAL file for backup.

Sorry for resurrecting an old argument.
http://archives.postgresql.org/message-id/200812051441.mB5EfG1M007309@wwwmaster.postgresql.org

I got the complaint about this behavior of the current pg_stop_backup()
in this morning. I thought that this is the bug, and created the patch.
But it was rejected because its change might break the existing app.
Though I'm not sure if there is really such an app. Anyway I think that
something like the following statements should be added into the document.
Thought?

------------
Note that the WAL file name in the backup history file cannot be used
to determine which WAL files are required for the backup. Because it
indicates the subsequent WAL file of the starting or ending one for
the backup, when its location is exactly at a WAL file boundary (What
is worse, sometimes it indicates a nonexistent WAL file).
------------

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#17Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Fujii Masao (#16)
Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)

But it was rejected because its change might break the existing app.

It might break existing applications if it returns "FE" instead of "FF",
but never-used filename surprises users. (IMO, the existing apps probably
crash if "FF" returned, i.e, 1/256 of the time.)

Should it return the *next* reasonable log filename instead of "FF"?
For example, 000000020000002000000000 for the above case.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#18Fujii Masao
masao.fujii@gmail.com
In reply to: Takahiro Itagaki (#17)
Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

But it was rejected because its change might break the existing app.

It might break existing applications if it returns "FE" instead of "FF",
but never-used filename surprises users. (IMO, the existing apps probably
crash if "FF" returned, i.e, 1/256 of the time.)

Should it return the *next* reasonable log filename instead of "FF"?
For example, 000000020000002000000000 for the above case.

I wonder if that change also breaks the existing app. But since
I've never seen the app that doesn't use that filename at face
value, I agree to change the existing (odd for me) behavior of
pg_stop_backup().

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#19Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#16)
1 attachment(s)
Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

On Thu, Feb 4, 2010 at 4:28 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Sorry for resurrecting an old argument.
http://archives.postgresql.org/message-id/200812051441.mB5EfG1M007309@wwwmaster.postgresql.org

I got the complaint about this behavior of the current pg_stop_backup()
in this morning. I thought that this is the bug, and created the patch.
But it was rejected because its change might break the existing app.
Though I'm not sure if there is really such an app. Anyway I think that
something like the following statements should be added into the document.
Thought?

------------
Note that the WAL file name in the backup history file cannot be used
to determine which WAL files are required for the backup. Because it
indicates the subsequent WAL file of the starting or ending one for
the backup, when its location is exactly at a WAL file boundary (What
is worse, sometimes it indicates a nonexistent WAL file).
------------

Here is the patch that adds the above-mentioned note. I think this
should be back-patched up to 8.0. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

note_backup_history_file_0215.patchtext/x-patch; charset=US-ASCII; name=note_backup_history_file_0215.patchDownload
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 859,864 **** SELECT pg_stop_backup();
--- 859,869 ----
      If you used the label to identify the associated dump file,
      then the archived history file is enough to tell you which dump file to
      restore.
+     Note that the WAL file name in the backup history file cannot be used
+     to determine which WAL files are required for the backup. Because it
+     indicates the subsequent WAL file of the starting or ending one for
+     the backup, when its location is exactly at a WAL file boundary (What
+     is worse, sometimes it indicates a nonexistent WAL file).
     </para>
  
     <para>
#20Fujii Masao
masao.fujii@gmail.com
In reply to: Takahiro Itagaki (#17)
1 attachment(s)
Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)

But it was rejected because its change might break the existing app.

It might break existing applications if it returns "FE" instead of "FF",
but never-used filename surprises users. (IMO, the existing apps probably
crash if "FF" returned, i.e, 1/256 of the time.)

Should it return the *next* reasonable log filename instead of "FF"?
For example, 000000020000002000000000 for the above case.

Here is the patch that avoids a nonexistent file name, according to
Itagaki-san's suggestion. If we are crossing a logid boundary, the
next reasonable file name is used instead of a nonexistent one.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

stop_file_name_0216.patchtext/x-patch; charset=US-ASCII; name=stop_file_name_0216.patchDownload
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 8057,8063 **** pg_stop_backup(PG_FUNCTION_ARGS)
  	 */
  	RequestXLogSwitch();
  
! 	XLByteToSeg(stoppoint, _logId, _logSeg);
  	XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
  
  	/* Use the log timezone here, not the session timezone */
--- 8057,8078 ----
  	 */
  	RequestXLogSwitch();
  
! 	if (stoppoint.xrecoff >= XLogSegSize)
! 	{
! 		XLogRecPtr	recptr = stoppoint;
! 
! 		/*
! 		 * Since xlog segment file name is calculated by using XLByteToSeg,
! 		 * it might indicate a nonexistent file (i.e., which ends in "FF")
! 		 * when we are crossing a logid boundary. In this case, we use the
! 		 * next reasonable file name instead of nonexistent one.
! 		 */
! 		recptr.xlogid += 1;
! 		recptr.xrecoff = XLOG_BLCKSZ;
! 		XLByteToSeg(recptr, _logId, _logSeg);
! 	}
! 	else
! 		XLByteToSeg(stoppoint, _logId, _logSeg);
  	XLogFileName(stopxlogfilename, ThisTimeLineID, _logId, _logSeg);
  
  	/* Use the log timezone here, not the session timezone */
#21Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Fujii Masao (#20)
Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

I'd like to apply the patch to HEAD and previous releases because
the issue seems to be a bug in the core. Any comments or objections?

Some users actually use STOP WAL LOCATION in their backup script,
and they've countered the bug with 1/256 probability in recent days.

Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Feb 5, 2010 at 9:08 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Dec 5, 2008 at 11:41 PM, Randy Isbell <jisbell@cisco.com> wrote:

An inconsistency exists between the segment name reported by
pg_stop_backup() and the actual WAL file name.

START WAL LOCATION: 10/FE1E2BAC (file 0000000200000010000000FE)
STOP WAL LOCATION: 10/FF000000 (file 0000000200000010000000FF)

But it was rejected because its change might break the existing app.

It might break existing applications if it returns "FE" instead of "FF",
but never-used filename surprises users. (IMO, the existing apps probably
crash if "FF" returned, i.e, 1/256 of the time.)

Should it return the *next* reasonable log filename instead of "FF"?
For example, 000000020000002000000000 for the above case.

Here is the patch that avoids a nonexistent file name, according to
Itagaki-san's suggestion. If we are crossing a logid boundary, the
next reasonable file name is used instead of a nonexistent one.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Takahiro Itagaki (#21)
Re: Re: [BUGS] BUG #4566: pg_stop_backup() reports incorrect STOP WAL LOCATION

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> writes:

I'd like to apply the patch to HEAD and previous releases because
the issue seems to be a bug in the core. Any comments or objections?

The proposed patch seems quite ugly to me; not only the messy coding,
but the fact that it might return either the segment containing the
XLOG_BACKUP_END record or the next one.

I think an appropriate fix might just be s/XLByteToSeg/XLByteToPrevSeg/,
so that the result is always the segment containing the XLOG_BACKUP_END
record even when the record ends exactly at a segment boundary.

regards, tom lane