Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

Started by Michael Paquierover 10 years ago12 messages
#1Michael Paquier
michael.paquier@gmail.com
1 attachment(s)

Hi all,

Since commit de768844, XLogFileCopy of xlog.c returns to caller a
pstrdup'd string that can be used afterwards for other things.
XLogFileCopy is used in only one place, and it happens that the result
string is never freed at all, leaking memory.
Attached is a patch to fix the problem.
Regards,
--
Michael

Attachments:

20150528_xlog_memory_leak.patchtext/x-patch; charset=US-ASCII; name=20150528_xlog_memory_leak.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 087b6be..2bb3acc 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5242,7 +5242,11 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 		 */
 		tmpfname = XLogFileCopy(NULL, xlogfname, endOfLog % XLOG_SEG_SIZE);
 		if (!InstallXLogFileSegment(&endLogSegNo, tmpfname, false, 0, false))
+		{
+			pfree(tmpfname);
 			elog(ERROR, "InstallXLogFileSegment should not have failed");
+		}
+		pfree(tmpfname);
 	}
 	else
 	{
#2Michael Paquier
michael.paquier@gmail.com
In reply to: Michael Paquier (#1)
1 attachment(s)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On Thu, May 28, 2015 at 9:09 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Since commit de768844, XLogFileCopy of xlog.c returns to caller a
pstrdup'd string that can be used afterwards for other things.
XLogFileCopy is used in only one place, and it happens that the result
string is never freed at all, leaking memory.
Attached is a patch to fix the problem.

Having a second look at that, XLogFileCopy() is called only in one
place, and dstfname is never used, hence I think that it would make
sense to return unconditionally a temporary file name to the caller.
Also the current error message in case of failure is very C-like:
elog(ERROR, "InstallXLogFileSegment should not have failed");
I thought that we to use function names in the error messages.
Wouldn't something like that be more adapted?
"could not copy partial log file" or ""could not copy partial log file
into temporary segment file"
Attached is a new patch.

Regards,
--
Michael

Attachments:

20150601_xlogcopy_fixes.patchtext/x-patch; charset=US-ASCII; name=20150601_xlogcopy_fixes.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 666fa37..9a5b350 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3041,18 +3041,15 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 /*
  * Copy a WAL segment file in pg_xlog directory.
  *
- * dstfname		destination filename
  * srcfname		source filename
  * upto			how much of the source file to copy? (the rest is filled with
  *				zeros)
  *
- * If dstfname is not given, the file is created with a temporary filename,
- * which is returned.  Both filenames are relative to the pg_xlog directory.
- *
- * NB: Any existing file with the same name will be overwritten!
+ * The file is created with a temporary filename, which is returned.  Both
+ * filenames are relative to the pg_xlog directory.
  */
 static char *
-XLogFileCopy(char *dstfname, char *srcfname, int upto)
+XLogFileCopy(char *srcfname, int upto)
 {
 	char		srcpath[MAXPGPATH];
 	char		tmppath[MAXPGPATH];
@@ -3150,25 +3147,8 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto)
 
 	CloseTransientFile(srcfd);
 
-	/*
-	 * Now move the segment into place with its final name.  (Or just return
-	 * the path to the file we created, if the caller wants to handle the rest
-	 * on its own.)
-	 */
-	if (dstfname)
-	{
-		char		dstpath[MAXPGPATH];
-
-		snprintf(dstpath, MAXPGPATH, XLOGDIR "/%s", dstfname);
-		if (rename(tmppath, dstpath) < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not rename file \"%s\" to \"%s\": %m",
-							tmppath, dstpath)));
-		return NULL;
-	}
-	else
-		return pstrdup(tmppath);
+	/* return name to caller, to let him hamdle the rest */
+	return pstrdup(tmppath);
 }
 
 /*
@@ -5238,9 +5218,13 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 		 * considerations. But we should be just as tense as XLogFileInit to
 		 * avoid emplacing a bogus file.
 		 */
-		tmpfname = XLogFileCopy(NULL, xlogfname, endOfLog % XLOG_SEG_SIZE);
+		tmpfname = XLogFileCopy(xlogfname, endOfLog % XLOG_SEG_SIZE);
 		if (!InstallXLogFileSegment(&endLogSegNo, tmpfname, false, 0, false))
-			elog(ERROR, "InstallXLogFileSegment should not have failed");
+		{
+			pfree(tmpfname);
+			elog(ERROR, "could not copy partial log file");
+		}
+		pfree(tmpfname);
 	}
 	else
 	{
#3Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#2)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On Mon, Jun 1, 2015 at 4:24 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, May 28, 2015 at 9:09 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Since commit de768844, XLogFileCopy of xlog.c returns to caller a
pstrdup'd string that can be used afterwards for other things.
XLogFileCopy is used in only one place, and it happens that the result
string is never freed at all, leaking memory.

That seems to be almost harmless because the startup process will exit
just after calling XLogFileCopy(). No?

Having a second look at that, XLogFileCopy() is called only in one
place, and dstfname is never used, hence I think that it would make
sense to return unconditionally a temporary file name to the caller.

+1

Also the current error message in case of failure is very C-like:
elog(ERROR, "InstallXLogFileSegment should not have failed");
I thought that we to use function names in the error messages.
Wouldn't something like that be more adapted?
"could not copy partial log file" or ""could not copy partial log file
into temporary segment file"

Or we can extend InstallXLogFileSegment so that we can give the log level
to it. When InstallXLogFileSegment is called after XLogFileCopy, we can
give ERROR as the log level and cause an exception to occur if link() or
rename() fails in InstallXLogFileSegment.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#3)
1 attachment(s)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On Thu, Jun 4, 2015 at 10:40 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Mon, Jun 1, 2015 at 4:24 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, May 28, 2015 at 9:09 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Since commit de768844, XLogFileCopy of xlog.c returns to caller a
pstrdup'd string that can be used afterwards for other things.
XLogFileCopy is used in only one place, and it happens that the result
string is never freed at all, leaking memory.

That seems to be almost harmless because the startup process will exit
just after calling XLogFileCopy(). No?

Yes that's harmless. My point here is correctness, prevention does not hurt
particularly if this code path is used more in the future.

Also the current error message in case of failure is very C-like:

elog(ERROR, "InstallXLogFileSegment should not have failed");
I thought that we to use function names in the error messages.
Wouldn't something like that be more adapted?
"could not copy partial log file" or ""could not copy partial log file
into temporary segment file"

Or we can extend InstallXLogFileSegment so that we can give the log level
to it. When InstallXLogFileSegment is called after XLogFileCopy, we can
give ERROR as the log level and cause an exception to occur if link() or
rename() fails in InstallXLogFileSegment.

That's neat. Done this way in the attached.
--
Michael

Attachments:

20150605_xlogcopy_fixes_v2.patchtext/x-patch; charset=US-ASCII; name=20150605_xlogcopy_fixes_v2.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 666fa37..5ee68c1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -807,7 +807,7 @@ static bool XLogCheckpointNeeded(XLogSegNo new_segno);
 static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible);
 static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 					   bool find_free, XLogSegNo max_segno,
-					   bool use_lock);
+					   bool use_lock, int elevel);
 static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 			 int source, bool notexistOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
@@ -3012,7 +3012,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	max_segno = logsegno + CheckPointSegments;
 	if (!InstallXLogFileSegment(&installed_segno, tmppath,
 								*use_existent, max_segno,
-								use_lock))
+								use_lock, LOG))
 	{
 		/*
 		 * No need for any more future segments, or InstallXLogFileSegment()
@@ -3041,18 +3041,15 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 /*
  * Copy a WAL segment file in pg_xlog directory.
  *
- * dstfname		destination filename
  * srcfname		source filename
  * upto			how much of the source file to copy? (the rest is filled with
  *				zeros)
  *
- * If dstfname is not given, the file is created with a temporary filename,
- * which is returned.  Both filenames are relative to the pg_xlog directory.
- *
- * NB: Any existing file with the same name will be overwritten!
+ * The file is created with a temporary filename, which is returned.  Both
+ * filenames are relative to the pg_xlog directory.
  */
 static char *
-XLogFileCopy(char *dstfname, char *srcfname, int upto)
+XLogFileCopy(char *srcfname, int upto)
 {
 	char		srcpath[MAXPGPATH];
 	char		tmppath[MAXPGPATH];
@@ -3150,25 +3147,8 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto)
 
 	CloseTransientFile(srcfd);
 
-	/*
-	 * Now move the segment into place with its final name.  (Or just return
-	 * the path to the file we created, if the caller wants to handle the rest
-	 * on its own.)
-	 */
-	if (dstfname)
-	{
-		char		dstpath[MAXPGPATH];
-
-		snprintf(dstpath, MAXPGPATH, XLOGDIR "/%s", dstfname);
-		if (rename(tmppath, dstpath) < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not rename file \"%s\" to \"%s\": %m",
-							tmppath, dstpath)));
-		return NULL;
-	}
-	else
-		return pstrdup(tmppath);
+	/* return name to caller, to let him hamdle the rest */
+	return pstrdup(tmppath);
 }
 
 /*
@@ -3195,6 +3175,8 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto)
  * place.  This should be TRUE except during bootstrap log creation.  The
  * caller must *not* hold the lock at call.
  *
+ * elevel: log level used by this routine.
+ *
  * Returns TRUE if the file was installed successfully.  FALSE indicates that
  * max_segno limit was exceeded, or an error occurred while renaming the
  * file into place.
@@ -3202,7 +3184,7 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto)
 static bool
 InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 					   bool find_free, XLogSegNo max_segno,
-					   bool use_lock)
+					   bool use_lock, int elevel)
 {
 	char		path[MAXPGPATH];
 	struct stat stat_buf;
@@ -3247,7 +3229,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	{
 		if (use_lock)
 			LWLockRelease(ControlFileLock);
-		ereport(LOG,
+		ereport(elevel,
 				(errcode_for_file_access(),
 				 errmsg("could not link file \"%s\" to \"%s\" (initialization of log file): %m",
 						tmppath, path)));
@@ -3259,7 +3241,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	{
 		if (use_lock)
 			LWLockRelease(ControlFileLock);
-		ereport(LOG,
+		ereport(elevel,
 				(errcode_for_file_access(),
 				 errmsg("could not rename file \"%s\" to \"%s\" (initialization of log file): %m",
 						tmppath, path)));
@@ -3748,7 +3730,7 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 	if (endlogSegNo <= recycleSegNo &&
 		lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
 		InstallXLogFileSegment(&endlogSegNo, path,
-							   true, recycleSegNo, true))
+							   true, recycleSegNo, true, LOG))
 	{
 		ereport(DEBUG2,
 				(errmsg("recycled transaction log file \"%s\"",
@@ -5238,9 +5220,10 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 		 * considerations. But we should be just as tense as XLogFileInit to
 		 * avoid emplacing a bogus file.
 		 */
-		tmpfname = XLogFileCopy(NULL, xlogfname, endOfLog % XLOG_SEG_SIZE);
-		if (!InstallXLogFileSegment(&endLogSegNo, tmpfname, false, 0, false))
-			elog(ERROR, "InstallXLogFileSegment should not have failed");
+		tmpfname = XLogFileCopy(xlogfname, endOfLog % XLOG_SEG_SIZE);
+		(void) InstallXLogFileSegment(&endLogSegNo, tmpfname, false,
+									  0, false, ERROR);
+		pfree(tmpfname);
 	}
 	else
 	{
#5Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#4)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On Fri, Jun 5, 2015 at 12:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Jun 4, 2015 at 10:40 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Mon, Jun 1, 2015 at 4:24 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, May 28, 2015 at 9:09 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Since commit de768844, XLogFileCopy of xlog.c returns to caller a
pstrdup'd string that can be used afterwards for other things.
XLogFileCopy is used in only one place, and it happens that the result
string is never freed at all, leaking memory.

That seems to be almost harmless because the startup process will exit
just after calling XLogFileCopy(). No?

Yes that's harmless. My point here is correctness, prevention does not hurt
particularly if this code path is used more in the future.

Why don't we call InstallXLogFileSegment() at the end of XLogFileCopy()?
If we do that, the risk of memory leak you're worried will disappear at all.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#5)
1 attachment(s)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On Fri, Jun 5, 2015 at 10:45 PM, Fujii Masao wrote:

Why don't we call InstallXLogFileSegment() at the end of XLogFileCopy()?
If we do that, the risk of memory leak you're worried will disappear at all.

Yes, that looks fine, XLogFileCopy() would copy to a temporary file,
then install it definitely. Updated patch attached.
--
Michael

Attachments:

20150608_xlogcopy_fixes_v3.patchtext/x-patch; charset=US-ASCII; name=20150608_xlogcopy_fixes_v3.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 666fa37..c03f70e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -807,7 +807,7 @@ static bool XLogCheckpointNeeded(XLogSegNo new_segno);
 static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible);
 static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 					   bool find_free, XLogSegNo max_segno,
-					   bool use_lock);
+					   bool use_lock, int elevel);
 static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 			 int source, bool notexistOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
@@ -3012,7 +3012,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	max_segno = logsegno + CheckPointSegments;
 	if (!InstallXLogFileSegment(&installed_segno, tmppath,
 								*use_existent, max_segno,
-								use_lock))
+								use_lock, LOG))
 	{
 		/*
 		 * No need for any more future segments, or InstallXLogFileSegment()
@@ -3041,18 +3041,16 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 /*
  * Copy a WAL segment file in pg_xlog directory.
  *
- * dstfname		destination filename
  * srcfname		source filename
  * upto			how much of the source file to copy? (the rest is filled with
  *				zeros)
+ * segno		identify segment to install.
  *
- * If dstfname is not given, the file is created with a temporary filename,
- * which is returned.  Both filenames are relative to the pg_xlog directory.
- *
- * NB: Any existing file with the same name will be overwritten!
+ * The file is first copied with a temporary filename, and then installed as
+ * a newly-created segment.
  */
-static char *
-XLogFileCopy(char *dstfname, char *srcfname, int upto)
+static void
+XLogFileCopy(char *srcfname, int upto, XLogSegNo segno)
 {
 	char		srcpath[MAXPGPATH];
 	char		tmppath[MAXPGPATH];
@@ -3150,25 +3148,9 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto)
 
 	CloseTransientFile(srcfd);
 
-	/*
-	 * Now move the segment into place with its final name.  (Or just return
-	 * the path to the file we created, if the caller wants to handle the rest
-	 * on its own.)
-	 */
-	if (dstfname)
-	{
-		char		dstpath[MAXPGPATH];
-
-		snprintf(dstpath, MAXPGPATH, XLOGDIR "/%s", dstfname);
-		if (rename(tmppath, dstpath) < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not rename file \"%s\" to \"%s\": %m",
-							tmppath, dstpath)));
-		return NULL;
-	}
-	else
-		return pstrdup(tmppath);
+	/* install the new file */
+	(void) InstallXLogFileSegment(&segno, tmppath, false,
+								  0, false, ERROR);
 }
 
 /*
@@ -3195,6 +3177,8 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto)
  * place.  This should be TRUE except during bootstrap log creation.  The
  * caller must *not* hold the lock at call.
  *
+ * elevel: log level used by this routine.
+ *
  * Returns TRUE if the file was installed successfully.  FALSE indicates that
  * max_segno limit was exceeded, or an error occurred while renaming the
  * file into place.
@@ -3202,7 +3186,7 @@ XLogFileCopy(char *dstfname, char *srcfname, int upto)
 static bool
 InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 					   bool find_free, XLogSegNo max_segno,
-					   bool use_lock)
+					   bool use_lock, int elevel)
 {
 	char		path[MAXPGPATH];
 	struct stat stat_buf;
@@ -3247,7 +3231,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	{
 		if (use_lock)
 			LWLockRelease(ControlFileLock);
-		ereport(LOG,
+		ereport(elevel,
 				(errcode_for_file_access(),
 				 errmsg("could not link file \"%s\" to \"%s\" (initialization of log file): %m",
 						tmppath, path)));
@@ -3259,7 +3243,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	{
 		if (use_lock)
 			LWLockRelease(ControlFileLock);
-		ereport(LOG,
+		ereport(elevel,
 				(errcode_for_file_access(),
 				 errmsg("could not rename file \"%s\" to \"%s\" (initialization of log file): %m",
 						tmppath, path)));
@@ -3748,7 +3732,7 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 	if (endlogSegNo <= recycleSegNo &&
 		lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
 		InstallXLogFileSegment(&endlogSegNo, path,
-							   true, recycleSegNo, true))
+							   true, recycleSegNo, true, LOG))
 	{
 		ereport(DEBUG2,
 				(errmsg("recycled transaction log file \"%s\"",
@@ -5227,8 +5211,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 	 */
 	if (endLogSegNo == startLogSegNo)
 	{
-		char	   *tmpfname;
-
 		XLogFileName(xlogfname, endTLI, endLogSegNo);
 
 		/*
@@ -5238,9 +5220,7 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 		 * considerations. But we should be just as tense as XLogFileInit to
 		 * avoid emplacing a bogus file.
 		 */
-		tmpfname = XLogFileCopy(NULL, xlogfname, endOfLog % XLOG_SEG_SIZE);
-		if (!InstallXLogFileSegment(&endLogSegNo, tmpfname, false, 0, false))
-			elog(ERROR, "InstallXLogFileSegment should not have failed");
+		XLogFileCopy(xlogfname, endOfLog % XLOG_SEG_SIZE, endLogSegNo);
 	}
 	else
 	{
#7Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#6)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On Mon, Jun 8, 2015 at 11:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Jun 5, 2015 at 10:45 PM, Fujii Masao wrote:

Why don't we call InstallXLogFileSegment() at the end of XLogFileCopy()?
If we do that, the risk of memory leak you're worried will disappear at all.

Yes, that looks fine, XLogFileCopy() would copy to a temporary file,
then install it definitely. Updated patch attached.

Thanks for updating the patch! Looks good to me. Applied.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Fujii Masao (#7)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On 06/08/2015 09:04 PM, Fujii Masao wrote:

On Mon, Jun 8, 2015 at 11:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Jun 5, 2015 at 10:45 PM, Fujii Masao wrote:

Why don't we call InstallXLogFileSegment() at the end of XLogFileCopy()?
If we do that, the risk of memory leak you're worried will disappear at all.

Yes, that looks fine, XLogFileCopy() would copy to a temporary file,
then install it definitely. Updated patch attached.

Thanks for updating the patch! Looks good to me. Applied.

XLogFileCopy() used to call InstallXLogFileSegment(), until I refactored
that in commit de7688442f5aaa03da60416a6aa3474738718803. That commit
added another caller of XLogFileCopy(), which didn't want to install the
segment. However, I later partially reverted that patch in commit
7cbee7c0a1db668c60c020a3fd1e3234daa562a9, and those changes to
XLogFileCopy() were not really needed anymore. I decided not to revert
those changes at that point because that refactoring seemed sensible
anyway. See my email at
/messages/by-id/555DD101.7080209@iki.fi about that.

I'm still not sure if I should've just reverted that refactoring, to
make XLogFileCopy() look the same in master and back-branches, which
makes back-patching easier, or keep the refactoring, because it makes
the code slightly nicer. But the current situation is the worst of both
worlds: the interface of XLogFileCopy() is no better than it used to be,
but it's different enough to cause merge conflicts. At this point, it's
probably best to revert the code to look the same as in 9.4.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Michael Paquier
michael.paquier@gmail.com
In reply to: Heikki Linnakangas (#8)
1 attachment(s)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On Tue, Jun 9, 2015 at 6:25 AM, Heikki Linnakangas wrote:

I'm still not sure if I should've just reverted that refactoring, to make
XLogFileCopy() look the same in master and back-branches, which makes
back-patching easier, or keep the refactoring, because it makes the code
slightly nicer. But the current situation is the worst of both worlds: the
interface of XLogFileCopy() is no better than it used to be, but it's
different enough to cause merge conflicts. At this point, it's probably best
to revert the code to look the same as in 9.4.

That's a valid concern. What about the attached then? I think that it
is still good to keep upto to copy only data up to the switch point at
recovery exit. InstallXLogFileSegment() changes a bit as well because
of its modifications of arguments.
--
Michael

Attachments:

20150609_xlogcopy_fixes_v4.patchtext/x-diff; charset=US-ASCII; name=20150609_xlogcopy_fixes_v4.patchDownload
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 150d56a..e8d3524 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -807,7 +807,7 @@ static bool XLogCheckpointNeeded(XLogSegNo new_segno);
 static void XLogWrite(XLogwrtRqst WriteRqst, bool flexible);
 static bool InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 					   bool find_free, XLogSegNo max_segno,
-					   bool use_lock, int elevel);
+					   bool use_lock);
 static int XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli,
 			 int source, bool notexistOk);
 static int	XLogFileReadAnyTLI(XLogSegNo segno, int emode, int source);
@@ -3012,7 +3012,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 	max_segno = logsegno + CheckPointSegments;
 	if (!InstallXLogFileSegment(&installed_segno, tmppath,
 								*use_existent, max_segno,
-								use_lock, LOG))
+								use_lock))
 	{
 		/*
 		 * No need for any more future segments, or InstallXLogFileSegment()
@@ -3039,20 +3039,25 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 }
 
 /*
- * Copy a WAL segment file in pg_xlog directory.
+ * Create a new XLOG file segment by copying a pre-existing one.
  *
- * srcfname		source filename
- * upto			how much of the source file to copy? (the rest is filled with
- *				zeros)
- * segno		identify segment to install.
+ * destsegno: identify segment to be created.
  *
- * The file is first copied with a temporary filename, and then installed as
- * a newly-created segment.
+ * srcTLI, srclog, srcseg: identify segment to be copied (could be from
+ *		a different timeline)
+ *
+ * upto: how much of the source file to copy (the rest is filled with
+ *		zeros)
+ *
+ * Currently this is only used during recovery, and so there are no locking
+ * considerations.  But we should be just as tense as XLogFileInit to avoid
+ * emplacing a bogus file.
  */
 static void
-XLogFileCopy(char *srcfname, int upto, XLogSegNo segno)
+XLogFileCopy(XLogSegNo destsegno, TimeLineID srcTLI, XLogSegNo srcsegno,
+			 int upto)
 {
-	char		srcpath[MAXPGPATH];
+	char		path[MAXPGPATH];
 	char		tmppath[MAXPGPATH];
 	char		buffer[XLOG_BLCKSZ];
 	int			srcfd;
@@ -3062,12 +3067,12 @@ XLogFileCopy(char *srcfname, int upto, XLogSegNo segno)
 	/*
 	 * Open the source file
 	 */
-	snprintf(srcpath, MAXPGPATH, XLOGDIR "/%s", srcfname);
-	srcfd = OpenTransientFile(srcpath, O_RDONLY | PG_BINARY, 0);
+	XLogFilePath(path, srcTLI, srcsegno);
+	srcfd = OpenTransientFile(path, O_RDONLY | PG_BINARY, 0);
 	if (srcfd < 0)
 		ereport(ERROR,
 				(errcode_for_file_access(),
-				 errmsg("could not open file \"%s\": %m", srcpath)));
+				 errmsg("could not open file \"%s\": %m", path)));
 
 	/*
 	 * Copy into a temp file name.
@@ -3111,11 +3116,11 @@ XLogFileCopy(char *srcfname, int upto, XLogSegNo segno)
 					ereport(ERROR,
 							(errcode_for_file_access(),
 							 errmsg("could not read file \"%s\": %m",
-									srcpath)));
+									path)));
 				else
 					ereport(ERROR,
 							(errmsg("not enough data in file \"%s\"",
-									srcpath)));
+									path)));
 			}
 		}
 		errno = 0;
@@ -3148,9 +3153,11 @@ XLogFileCopy(char *srcfname, int upto, XLogSegNo segno)
 
 	CloseTransientFile(srcfd);
 
-	/* install the new file */
-	(void) InstallXLogFileSegment(&segno, tmppath, false,
-								  0, false, ERROR);
+	/*
+	 * Now move the segment into place with its final name.
+	 */
+	if (!InstallXLogFileSegment(&destsegno, tmppath, false, 0, false))
+		elog(ERROR, "InstallXLogFileSegment should not have failed");
 }
 
 /*
@@ -3177,8 +3184,6 @@ XLogFileCopy(char *srcfname, int upto, XLogSegNo segno)
  * place.  This should be TRUE except during bootstrap log creation.  The
  * caller must *not* hold the lock at call.
  *
- * elevel: log level used by this routine.
- *
  * Returns TRUE if the file was installed successfully.  FALSE indicates that
  * max_segno limit was exceeded, or an error occurred while renaming the
  * file into place.
@@ -3186,7 +3191,7 @@ XLogFileCopy(char *srcfname, int upto, XLogSegNo segno)
 static bool
 InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 					   bool find_free, XLogSegNo max_segno,
-					   bool use_lock, int elevel)
+					   bool use_lock)
 {
 	char		path[MAXPGPATH];
 	struct stat stat_buf;
@@ -3231,7 +3236,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	{
 		if (use_lock)
 			LWLockRelease(ControlFileLock);
-		ereport(elevel,
+		ereport(LOG,
 				(errcode_for_file_access(),
 				 errmsg("could not link file \"%s\" to \"%s\" (initialization of log file): %m",
 						tmppath, path)));
@@ -3243,7 +3248,7 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	{
 		if (use_lock)
 			LWLockRelease(ControlFileLock);
-		ereport(elevel,
+		ereport(LOG,
 				(errcode_for_file_access(),
 				 errmsg("could not rename file \"%s\" to \"%s\" (initialization of log file): %m",
 						tmppath, path)));
@@ -3732,7 +3737,7 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 	if (endlogSegNo <= recycleSegNo &&
 		lstat(path, &statbuf) == 0 && S_ISREG(statbuf.st_mode) &&
 		InstallXLogFileSegment(&endlogSegNo, path,
-							   true, recycleSegNo, true, LOG))
+							   true, recycleSegNo, true))
 	{
 		ereport(DEBUG2,
 				(errmsg("recycled transaction log file \"%s\"",
@@ -5211,8 +5216,6 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 	 */
 	if (endLogSegNo == startLogSegNo)
 	{
-		XLogFileName(xlogfname, endTLI, endLogSegNo);
-
 		/*
 		 * Make a copy of the file on the new timeline.
 		 *
@@ -5220,7 +5223,8 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 		 * considerations. But we should be just as tense as XLogFileInit to
 		 * avoid emplacing a bogus file.
 		 */
-		XLogFileCopy(xlogfname, endOfLog % XLOG_SEG_SIZE, endLogSegNo);
+		XLogFileCopy(endLogSegNo, endTLI, endLogSegNo,
+					 endOfLog % XLOG_SEG_SIZE);
 	}
 	else
 	{
#10Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#8)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On Tue, Jun 9, 2015 at 6:25 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 06/08/2015 09:04 PM, Fujii Masao wrote:

On Mon, Jun 8, 2015 at 11:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Jun 5, 2015 at 10:45 PM, Fujii Masao wrote:

Why don't we call InstallXLogFileSegment() at the end of XLogFileCopy()?
If we do that, the risk of memory leak you're worried will disappear at
all.

Yes, that looks fine, XLogFileCopy() would copy to a temporary file,
then install it definitely. Updated patch attached.

Thanks for updating the patch! Looks good to me. Applied.

XLogFileCopy() used to call InstallXLogFileSegment(), until I refactored
that in commit de7688442f5aaa03da60416a6aa3474738718803. That commit added
another caller of XLogFileCopy(), which didn't want to install the segment.
However, I later partially reverted that patch in commit
7cbee7c0a1db668c60c020a3fd1e3234daa562a9, and those changes to
XLogFileCopy() were not really needed anymore. I decided not to revert those
changes at that point because that refactoring seemed sensible anyway. See
my email at /messages/by-id/555DD101.7080209@iki.fi
about that.

I'm still not sure if I should've just reverted that refactoring, to make
XLogFileCopy() look the same in master and back-branches, which makes
back-patching easier, or keep the refactoring, because it makes the code
slightly nicer. But the current situation is the worst of both worlds: the
interface of XLogFileCopy() is no better than it used to be, but it's
different enough to cause merge conflicts. At this point, it's probably best
to revert the code to look the same as in 9.4.

I'm not sure how much it's really nicer to keep the refactoring,
but I agree that it's better to make the code look same to make
the back-patching easier.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Fujii Masao
masao.fujii@gmail.com
In reply to: Michael Paquier (#9)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On Tue, Jun 9, 2015 at 10:09 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Jun 9, 2015 at 6:25 AM, Heikki Linnakangas wrote:

I'm still not sure if I should've just reverted that refactoring, to make
XLogFileCopy() look the same in master and back-branches, which makes
back-patching easier, or keep the refactoring, because it makes the code
slightly nicer. But the current situation is the worst of both worlds: the
interface of XLogFileCopy() is no better than it used to be, but it's
different enough to cause merge conflicts. At this point, it's probably best
to revert the code to look the same as in 9.4.

That's a valid concern. What about the attached then? I think that it
is still good to keep upto to copy only data up to the switch point at
recovery exit. InstallXLogFileSegment() changes a bit as well because
of its modifications of arguments.

Applied. Thanks!

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Michael Paquier
michael.paquier@gmail.com
In reply to: Fujii Masao (#11)
Re: Memory leak with XLogFileCopy since de768844 (WAL file with .partial)

On Wed, Jul 1, 2015 at 10:58 AM, Fujii Masao wrote:

On Tue, Jun 9, 2015 at 10:09 AM, Michael Paquier wrote:

That's a valid concern. What about the attached then? I think that it
is still good to keep upto to copy only data up to the switch point at
recovery exit. InstallXLogFileSegment() changes a bit as well because
of its modifications of arguments.

Applied. Thanks!

Thanks for showing up.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers