avoid unnecessary failure to open restored WAL files

Started by Fujii Masaoover 13 years ago3 messages
#1Fujii Masao
masao.fujii@gmail.com
1 attachment(s)

Hi,

In HEAD and 9.2, the following scenario happens in archive recovery.

1. The archived WAL file is restored onto the temporary file name
"RECOVERYXLOG".
2. The restored WAL file is renamed to the correct file name like
000000010000000000000002.
3. The startup process tries to open the temporary file even though
it's already been renamed
and doesn't exist. This always fails.
4. The startup process retries to open the correct file as a WAL file
in pg_xlog directory instead
of the archived file. This succeeds.

The above failure of file open is unnecessary, so I think we can avoid
that. Attached patch
changes the startup process so that it opens the correct restored WAL
file after restoring the
archived WAL file.

Regards,

--
Fujii Masao

Attachments:

file_open_failure_v1.patchapplication/octet-stream; name=file_open_failure_v1.patchDownload
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 2804,2809 **** XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
--- 2804,2810 ----
  					(errcode_for_file_access(),
  					 errmsg("could not rename file \"%s\" to \"%s\": %m",
  							path, xlogfpath)));
+ 		strncpy(xlogfpath, path, MAXPGPATH);
  
  		/*
  		 * If the existing segment was replaced, since walsenders might have
#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#1)
Re: avoid unnecessary failure to open restored WAL files

On 2 August 2012 17:18, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

In HEAD and 9.2, the following scenario happens in archive recovery.

1. The archived WAL file is restored onto the temporary file name
"RECOVERYXLOG".
2. The restored WAL file is renamed to the correct file name like
000000010000000000000002.
3. The startup process tries to open the temporary file even though
it's already been renamed
and doesn't exist. This always fails.
4. The startup process retries to open the correct file as a WAL file
in pg_xlog directory instead
of the archived file. This succeeds.

The above failure of file open is unnecessary, so I think we can avoid
that. Attached patch
changes the startup process so that it opens the correct restored WAL
file after restoring the
archived WAL file.

Looks to me that the strncpy is backwards and will still fail. Please
double check.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#2)
1 attachment(s)
Re: avoid unnecessary failure to open restored WAL files

On Wed, Aug 8, 2012 at 3:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 2 August 2012 17:18, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

In HEAD and 9.2, the following scenario happens in archive recovery.

1. The archived WAL file is restored onto the temporary file name
"RECOVERYXLOG".
2. The restored WAL file is renamed to the correct file name like
000000010000000000000002.
3. The startup process tries to open the temporary file even though
it's already been renamed
and doesn't exist. This always fails.
4. The startup process retries to open the correct file as a WAL file
in pg_xlog directory instead
of the archived file. This succeeds.

The above failure of file open is unnecessary, so I think we can avoid
that. Attached patch
changes the startup process so that it opens the correct restored WAL
file after restoring the
archived WAL file.

Looks to me that the strncpy is backwards and will still fail. Please
double check.

Oh, you're right. I wrongly placed two arguments "source" and "destination"
of strncpy in the reverse order... Attached is the updated version of the patch.

Regards,

--
Fujii Masao

Attachments:

file_open_failure_v2.patchapplication/octet-stream; name=file_open_failure_v2.patchDownload
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 2804,2809 **** XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
--- 2804,2810 ----
  					(errcode_for_file_access(),
  					 errmsg("could not rename file \"%s\" to \"%s\": %m",
  							path, xlogfpath)));
+ 		strncpy(path, xlogfpath, MAXPGPATH);
  
  		/*
  		 * If the existing segment was replaced, since walsenders might have