pgsql: On Windows, when a file is deleted and another process still has

Started by Nonameover 16 years ago11 messages
#1Noname
heikki@postgresql.org

Log Message:
-----------
On Windows, when a file is deleted and another process still has an open
file handle on it, the file goes into "pending deletion" state where it
still shows up in directory listing, but isn't accessible otherwise. That
confuses RemoveOldXLogFiles(), making it think that the file hasn't been
archived yet, while it actually was, and it was deleted along with the .done
file.

Fix that by renaming the file with ".deleted" extension before deleting it.
Also check the return value of rename() and unlink(), so that if the removal
fails for any reason (e.g another process is holding the file locked), we
don't delete the .done file until the WAL file is really gone.

Backpatch to 8.2, which is the oldest version supported on Windows.

Modified Files:
--------------
pgsql/src/backend/access/transam:
xlog.c (r1.351 -> r1.352)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/access/transam/xlog.c?r1=1.351&r2=1.352)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: pgsql: On Windows, when a file is deleted and another process still has

heikki@postgresql.org (Heikki Linnakangas) writes:

Fix that by renaming the file with ".deleted" extension before deleting it.
Also check the return value of rename() and unlink(), so that if the removal
fails for any reason (e.g another process is holding the file locked), we
don't delete the .done file until the WAL file is really gone.

This patch seems to me to be a seriously bad idea. It means that some
random process holding a file open will be able to cause checkpoints to
fail indefinitely. Is it really necessary or advisable to have the
error cases be elog(ERROR)? Couldn't we just elog(LOG) and leave the
file alone?

regards, tom lane

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#2)
Re: pgsql: On Windows, when a file is deleted and another process still has

Tom Lane wrote:

heikki@postgresql.org (Heikki Linnakangas) writes:

Fix that by renaming the file with ".deleted" extension before deleting it.
Also check the return value of rename() and unlink(), so that if the removal
fails for any reason (e.g another process is holding the file locked), we
don't delete the .done file until the WAL file is really gone.

This patch seems to me to be a seriously bad idea. It means that some
random process holding a file open will be able to cause checkpoints to
fail indefinitely. Is it really necessary or advisable to have the
error cases be elog(ERROR)? Couldn't we just elog(LOG) and leave the
file alone?

I considered that, but didn't bother since InstallXLogFileSegment will
throw an error anyway if the file is locked, and that's a more common
codepath.

On closer look, it looks like the code in InstallXLogFileSegment is
trying to check the return code from rename, and not throw an error if
it fails because the file is locked, but it doesn't seem to be working
properly. When I open a WAL file with a separate program, and then cycle
through enough segments to have the file recycled, I get this:

ERROR: could not rename file "pg_xlog/0000000100000001000000A0" to
"pg_xlog/0000000100000001000000A9" (initialization of log file 1,
segment 169): No such file or directory

I noted that behavior in my previous mail on the thread on pgsql-bugs,
but it didn't strike me as serious then.

FWIW, the checkpoint is mostly complete at that point, all that's left
is non-critical cleanup. I admit it's still annoying.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#3)
Re: pgsql: On Windows, when a file is deleted and another process still has

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

FWIW, the checkpoint is mostly complete at that point, all that's left
is non-critical cleanup. I admit it's still annoying.

"Mostly complete" isn't good enough, and it's much more than just an
annoyance --- failure to complete checkpoints will lead to disk full,
and other nasty things IIRC. We really need to deal with this.

I wouldn't be too surprised if that rat's nest in InstallXLogFileSegment
isn't right :-(. But we have to treat "file can't be renamed" as a
nonfatal condition on Windows.

regards, tom lane

#5Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#4)
Re: pgsql: On Windows, when a file is deleted and another process still has

Tom Lane wrote:

I wouldn't be too surprised if that rat's nest in InstallXLogFileSegment
isn't right :-(. But we have to treat "file can't be renamed" as a
nonfatal condition on Windows.

I added some debugging code, and I'm getting an ERROR_SHARING_VIOLATION
error when another program keeps the file open, while
InstallXLogFileSegment is checking for ERROR_ACCESS_DENIED. It seems
that check is flat-out wrong, as is the one in pgrename(). A bit of
googling suggests that Windows 9x might've returned ERROR_ACCESS_DENIED
in that case, so that's probably where that originated. pgwin32_open()
correctly checks for ERROR_SHARING_VIOLATION, but also for
ERROR_LOCK_VIOLATION.

I also note that since pgrename() doesn't set errno, the error message
printed in InstallXLogFileSegment if it fails is bogus. pgrename()
should set errno, using _dosmaperr().

A completely different approach would be to treat any failure on all
platforms as non-fatal. We shouldn't really cut the checkpoint short if
recycling a WAL file fails, whatever the reason. That seems like a more
robust approach than trying to guess which error codes are OK to ignore.

When called from XLogFileInit(), it's not expected that the target file
doesn't exist after InstallXLogFileSegment(), but it would just fail to
open it and throw an error anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#5)
Re: pgsql: On Windows, when a file is deleted and another process still has

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

A completely different approach would be to treat any failure on all
platforms as non-fatal. We shouldn't really cut the checkpoint short if
recycling a WAL file fails, whatever the reason. That seems like a more
robust approach than trying to guess which error codes are OK to ignore.

I could live with that, as long as it gets logged.

regards, tom lane

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#6)
1 attachment(s)
Re: [COMMITTERS] pgsql: On Windows, when a file is deleted and another process still has

(moving to pgsql-hackers)

Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

A completely different approach would be to treat any failure on all
platforms as non-fatal. We shouldn't really cut the checkpoint short if
recycling a WAL file fails, whatever the reason. That seems like a more
robust approach than trying to guess which error codes are OK to ignore.

I could live with that, as long as it gets logged.

Here's a patch implementing that, and changing pgrename() to check for
ERROR_SHARING_VIOLATION and ERROR_LOCK_VIOLATION like pgwin32_open()
does, instead of ERROR_ACCESS_DENIED.

I wonder if we should reduce the timeout in pgrename(). It's 30 s at the
moment, but apparently it hasn't been working correctly, failing
immediately instead if the file is locked. And no-one has complained
about that. But if we sleep in InstallXLogFileSegment(), we're holding
ControlFileLock, which can force other backends to wait, and that might
cause more harm than just failing outright. Something like 5 seconds
might be more appropriate, giving anti-virus and similar software some
time to give up the lock, but not too much to cause long delays. 5
seconds should be enough for anti-virus or backup software to process a
file under normal circumstances.

OTOH, pgwin32_open() uses 30 s, with the same potential for lockups, and
no-one has complained about that either. The bottom line is that if
another program keeps a file locked for any extended period of time,
you're going to have trouble one way or another.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

dont-error-in-InstallXLogFileSegment-1.patchtext/x-diff; name=dont-error-in-InstallXLogFileSegment-1.patchDownload
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***************
*** 2262,2273 **** XLogFileInit(uint32 log, uint32 seg,
  								*use_existent, &max_advance,
  								use_lock))
  	{
! 		/* No need for any more future segments... */
  		unlink(tmppath);
  	}
  
- 	elog(DEBUG2, "done creating and filling new WAL file");
- 
  	/* Set flag to tell caller there was no existent file */
  	*use_existent = false;
  
--- 2262,2275 ----
  								*use_existent, &max_advance,
  								use_lock))
  	{
! 		/*
! 		 * No need for any more future segments, or InstallXLogFileSegment()
! 		 * failed to rename the file into place. If the rename failed, opening
! 		 * the file below will fail.
! 		 */
  		unlink(tmppath);
  	}
  
  	/* Set flag to tell caller there was no existent file */
  	*use_existent = false;
  
***************
*** 2280,2285 **** XLogFileInit(uint32 log, uint32 seg,
--- 2282,2289 ----
  		   errmsg("could not open file \"%s\" (log file %u, segment %u): %m",
  				  path, log, seg)));
  
+ 	elog(DEBUG2, "done creating and filling new WAL file");
+ 
  	return fd;
  }
  
***************
*** 2409,2418 **** XLogFileCopy(uint32 log, uint32 seg,
   * place.  This should be TRUE except during bootstrap log creation.  The
   * caller must *not* hold the lock at call.
   *
!  * Returns TRUE if file installed, FALSE if not installed because of
!  * exceeding max_advance limit.  On Windows, we also return FALSE if we
!  * can't rename the file into place because someone's got it open.
!  * (Any other kind of failure causes ereport().)
   */
  static bool
  InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath,
--- 2413,2421 ----
   * place.  This should be TRUE except during bootstrap log creation.  The
   * caller must *not* hold the lock at call.
   *
!  * Returns TRUE if the file was installed successfully. FALSE indicates that
!  * max_advance limit was exceeded, or an error occurred while renaming the
!  * file into place.
   */
  static bool
  InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath,
***************
*** 2460,2490 **** InstallXLogFileSegment(uint32 *log, uint32 *seg, char *tmppath,
  	 */
  #if HAVE_WORKING_LINK
  	if (link(tmppath, path) < 0)
! 		ereport(ERROR,
  				(errcode_for_file_access(),
  				 errmsg("could not link file \"%s\" to \"%s\" (initialization of log file %u, segment %u): %m",
  						tmppath, path, *log, *seg)));
  	unlink(tmppath);
  #else
  	if (rename(tmppath, path) < 0)
  	{
! #ifdef WIN32
! #if !defined(__CYGWIN__)
! 		if (GetLastError() == ERROR_ACCESS_DENIED)
! #else
! 		if (errno == EACCES)
! #endif
! 		{
! 			if (use_lock)
! 				LWLockRelease(ControlFileLock);
! 			return false;
! 		}
! #endif   /* WIN32 */
! 
! 		ereport(ERROR,
  				(errcode_for_file_access(),
  				 errmsg("could not rename file \"%s\" to \"%s\" (initialization of log file %u, segment %u): %m",
  						tmppath, path, *log, *seg)));
  	}
  #endif
  
--- 2463,2488 ----
  	 */
  #if HAVE_WORKING_LINK
  	if (link(tmppath, path) < 0)
! 	{
! 		if (use_lock)
! 			LWLockRelease(ControlFileLock);
! 		ereport(LOG,
  				(errcode_for_file_access(),
  				 errmsg("could not link file \"%s\" to \"%s\" (initialization of log file %u, segment %u): %m",
  						tmppath, path, *log, *seg)));
+ 		return false;
+ 	}
  	unlink(tmppath);
  #else
  	if (rename(tmppath, path) < 0)
  	{
! 		if (use_lock)
! 			LWLockRelease(ControlFileLock);
! 		ereport(LOG,
  				(errcode_for_file_access(),
  				 errmsg("could not rename file \"%s\" to \"%s\" (initialization of log file %u, segment %u): %m",
  						tmppath, path, *log, *seg)));
+ 		return false;
  	}
  #endif
  
***************
*** 3128,3146 **** RemoveOldXlogFiles(uint32 log, uint32 seg, XLogRecPtr endptr)
  					 */
  					snprintf(newpath, MAXPGPATH, "%s.deleted", path);
  					if (rename(path, newpath) != 0)
! 						ereport(ERROR,
  								(errcode_for_file_access(),
! 								 errmsg("could not rename old transaction log file \"%s\"",
  										path)));
  					rc = unlink(newpath);
  #else
  					rc = unlink(path);
  #endif
  					if (rc != 0)
! 						ereport(ERROR,
  								(errcode_for_file_access(),
  								 errmsg("could not remove old transaction log file \"%s\": %m",
  										path)));
  					CheckpointStats.ckpt_segs_removed++;
  				}
  
--- 3126,3150 ----
  					 */
  					snprintf(newpath, MAXPGPATH, "%s.deleted", path);
  					if (rename(path, newpath) != 0)
! 					{
! 						ereport(LOG,
  								(errcode_for_file_access(),
! 								 errmsg("could not rename old transaction log file \"%s\": %m",
  										path)));
+ 						continue;
+ 					}
  					rc = unlink(newpath);
  #else
  					rc = unlink(path);
  #endif
  					if (rc != 0)
! 					{
! 						ereport(LOG,
  								(errcode_for_file_access(),
  								 errmsg("could not remove old transaction log file \"%s\": %m",
  										path)));
+ 						continue;
+ 					}
  					CheckpointStats.ckpt_segs_removed++;
  				}
  
*** a/src/port/dirmod.c
--- b/src/port/dirmod.c
***************
*** 129,139 **** pgrename(const char *from, const char *to)
  #endif
  	{
  #if defined(WIN32) && !defined(__CYGWIN__)
! 		if (GetLastError() != ERROR_ACCESS_DENIED)
  #else
  		if (errno != EACCES)
  #endif
- 			/* set errno? */
  			return -1;
  		if (++loops > 300)		/* time out after 30 sec */
  			return -1;
--- 129,143 ----
  #endif
  	{
  #if defined(WIN32) && !defined(__CYGWIN__)
! 		DWORD		err = GetLastError();
! 
! 		_dosmaperr(err);
! 
! 		if (err != ERROR_SHARING_VIOLATION &&
! 			err != ERROR_LOCK_VIOLATION)
  #else
  		if (errno != EACCES)
  #endif
  			return -1;
  		if (++loops > 300)		/* time out after 30 sec */
  			return -1;
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#7)
Re: Re: [COMMITTERS] pgsql: On Windows, when a file is deleted and another process still has

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Here's a patch implementing that, and changing pgrename() to check for
ERROR_SHARING_VIOLATION and ERROR_LOCK_VIOLATION like pgwin32_open()
does, instead of ERROR_ACCESS_DENIED.

This looks sane in a quick once-over, though I haven't tested it.
One tiny stylistic suggestion:

if (err != ERROR_SHARING_VIOLATION &&
err != ERROR_LOCK_VIOLATION)
#else
if (errno != EACCES)
#endif
return -1;
if (++loops > 300) /* time out after 30 sec */
return -1;

This is overly cute and will probably confuse both pgindent and ordinary
editors. It's worth one extra line to keep each part of the #if
syntactically independent, ie

if (err != ERROR_SHARING_VIOLATION &&
err != ERROR_LOCK_VIOLATION)
return -1;
#else
if (errno != EACCES)
return -1;
#endif
if (++loops > 300) /* time out after 30 sec */
return -1;

I wonder if we should reduce the timeout in pgrename(). It's 30 s at the
moment, but apparently it hasn't been working correctly, failing
immediately instead if the file is locked.

I have a vague recollection that there was a specific reason for having
such a long timeout --- you might want to check the archives to see the
discussion before that code got committed. However, if nothing turns
up, I wouldn't object to reducing it to 5 or 10 sec.

regards, tom lane

#9Magnus Hagander
magnus@hagander.net
In reply to: Heikki Linnakangas (#7)
Re: Re: [COMMITTERS] pgsql: On Windows, when a file is deleted and another process still has

On Fri, Sep 11, 2009 at 10:44, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

(moving to pgsql-hackers)

Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

A completely different approach would be to treat any failure on all
platforms as non-fatal. We shouldn't really cut the checkpoint short if
recycling a WAL file fails, whatever the reason. That seems like a more
robust approach than trying to guess which error codes are OK to ignore.

I could live with that, as long as it gets logged.

Here's a patch implementing that, and changing pgrename() to check for
ERROR_SHARING_VIOLATION and ERROR_LOCK_VIOLATION like pgwin32_open()
does, instead of ERROR_ACCESS_DENIED.

I have definitely seen AV programs return access deniderather than
sharing violation more than once for temporary errors. How about we
keep the access denied one as well? If we actually don't have
permissions in pg_xlog, we most likely never even got here...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#9)
Re: Re: [COMMITTERS] pgsql: On Windows, when a file is deleted and another process still has

Magnus Hagander <magnus@hagander.net> writes:

On Fri, Sep 11, 2009 at 10:44, Heikki Linnakangas

Here's a patch implementing that, and changing pgrename() to check for
ERROR_SHARING_VIOLATION and ERROR_LOCK_VIOLATION like pgwin32_open()
does, instead of ERROR_ACCESS_DENIED.

I have definitely seen AV programs return access deniderather than
sharing violation more than once for temporary errors. How about we
keep the access denied one as well?

+1 ... presumably the original coding was tested in *some* environment.

regards, tom lane

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#10)
Re: Re: [COMMITTERS] pgsql: On Windows, when a file is deleted and another process still has

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Fri, Sep 11, 2009 at 10:44, Heikki Linnakangas

Here's a patch implementing that, and changing pgrename() to check for
ERROR_SHARING_VIOLATION and ERROR_LOCK_VIOLATION like pgwin32_open()
does, instead of ERROR_ACCESS_DENIED.

I have definitely seen AV programs return access deniderather than
sharing violation more than once for temporary errors. How about we
keep the access denied one as well?

+1 ... presumably the original coding was tested in *some* environment.

Ok, I've committed that. Per quick discussion with Magnus, I also
lowered the timeout to 10s.

Luke, although your immediate problem was solved by the previous patch
already, this touched the same pieces of code, so you might want to
fetch the latest sources and retest if you want to be sure. (I did test
it myself..)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com