[PATCH] Atomic pgrename on Windows
Hi!
It's assumed in PostgreSQL codebase that pgrename atomically replaces
target file with source file even if target file is open and being read by
another process. And this assumption is true on Linux, but it's false on
Windows. MoveFileEx() triggers an error when target file is open (and
accordingly locked). Some our customers has been faced such errors while
operating heavily loaded PostgreSQL instance on Windows.
LOG could not rename temporary statistics file "pg_stat_tmp/global.tmp" to
"pg_stat_tmp/global.stat": Permission denied
I've managed to reproduce this situation. Reliable reproducing of this
issue required patch to PostgreSQL core. I've written
slow-read-statfiles.patch for artificial slowdown
of pgstat_read_statsfiles() – sleep 100 ms after each statfile entry. If
you run make-100-dbs.sql on patched version, and then few times execute
select pg_stat_get_tuples_inserted('t1'::regclass);
in psql, then you would likely get the error above on Windows.
Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
appears to be possible to atomically replace file on Windows –
ReplaceFile() does that. ReplaceFiles() requires target file to exist,
this is why we still need to call MoveFileEx() when it doesn't exist.
This patch is based on work of Victor Spirin who was asked by Postgres Pro
to research this problem.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
slow-read-statfiles.patchapplication/octet-stream; name=slow-read-statfiles.patchDownload
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
new file mode 100644
index 5c256ff..400057a
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*************** pgstat_read_statsfiles(Oid onlydb, bool
*** 4998,5003 ****
--- 4998,5004 ----
*/
for (;;)
{
+ pg_usleep(100000);
switch (fgetc(fpin))
{
/*
atomic-pgrename-windows-1.patchapplication/octet-stream; name=atomic-pgrename-windows-1.patchDownload
diff --git a/src/port/dirmod.c b/src/port/dirmod.c
new file mode 100644
index eac59bd..4dba35a
*** a/src/port/dirmod.c
--- b/src/port/dirmod.c
*************** pgrename(const char *from, const char *t
*** 57,69 ****
* and blocking other backends.
*/
#if defined(WIN32) && !defined(__CYGWIN__)
! while (!MoveFileEx(from, to, MOVEFILE_REPLACE_EXISTING))
#else
while (rename(from, to) < 0)
- #endif
{
#if defined(WIN32) && !defined(__CYGWIN__)
! DWORD err = GetLastError();
_dosmaperr(err);
--- 57,95 ----
* and blocking other backends.
*/
#if defined(WIN32) && !defined(__CYGWIN__)
! DWORD err;
!
! /*
! * On Windows we use ReplaceFile() to rename while concurrent processes
! * have file open. However, ReplaceFile() is to be used only when target
! * file is already exists. Thus, we check for file existence and then
! * choose between MoveFileEx() and ReplaceFile() functions.
! */
! while (true)
! {
! DWORD dwAttrib;
! bool filePresent;
!
! dwAttrib = GetFileAttributes(to);
! filePresent = (dwAttrib != INVALID_FILE_ATTRIBUTES) &&
! !(dwAttrib & FILE_ATTRIBUTE_DIRECTORY);
!
! if (filePresent)
! {
! if (ReplaceFile(to, from, NULL, REPLACEFILE_IGNORE_MERGE_ERRORS, 0, 0))
! break;
! }
! else
! {
! if (MoveFileEx(from, to, MOVEFILE_REPLACE_EXISTING))
! break;
! }
#else
while (rename(from, to) < 0)
{
+ #endif
#if defined(WIN32) && !defined(__CYGWIN__)
! err = GetLastError();
_dosmaperr(err);
On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
appears to be possible to atomically replace file on Windows – ReplaceFile()
does that. ReplaceFiles() requires target file to exist, this is why we
still need to call MoveFileEx() when it doesn't exist.
Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.
This patch is based on work of Victor Spirin who was asked by Postgres Pro
to research this problem.
Victor has no community account so his name cannot be registered as a
co-author of the patch. I have added your name though.
--
Michael
On 2017-11-28 09:47:45 +0900, Michael Paquier wrote:
On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
appears to be possible to atomically replace file on Windows – ReplaceFile()
does that. ReplaceFiles() requires target file to exist, this is why we
still need to call MoveFileEx() when it doesn't exist.Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.
That seems like a *seriously* bad idea. What if we crash inbetween the
unlink and the rename?
I'm confused about the need for this. Shouldn't normally
opening all files FILE_SHARE_DELETE take care of this? See
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx
"Note Delete access allows both delete and rename operations."
Is there an external process active that doesn't set that flag? Are we
missing setting it somewhere?
Greetings,
Andres Freund
On 27 November 2017 at 14:28, Alexander Korotkov <a.korotkov@postgrespro.ru>
wrote:
Hi!
It's assumed in PostgreSQL codebase that pgrename atomically replaces
target file with source file even if target file is open and being read by
another process. And this assumption is true on Linux, but it's false on
Windows. MoveFileEx() triggers an error when target file is open (and
accordingly locked). Some our customers has been faced such errors while
operating heavily loaded PostgreSQL instance on Windows.LOG could not rename temporary statistics file "pg_stat_tmp/global.tmp" to
"pg_stat_tmp/global.stat": Permission denied
That would explain a number of intermittent reports Iv'e seen floating
around.
Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
appears to be possible to atomically replace file on Windows –
ReplaceFile() does that. ReplaceFiles() requires target file to exist,
this is why we still need to call MoveFileEx() when it doesn't exist.
Look at the error codes for ReplaceFile:
https://msdn.microsoft.com/en-us/library/aa365512(VS.85).aspx
The docs don't say it's atomic and the error codes suggest it may not be.
But there's a Microsoft Research paper claiming it's atomic -
http://research.microsoft.com/pubs/64525/tr-2006-45.pdf .
It appears that MoveFileTransacted would be what we really want, when we're
on NTFS (it won't work on FAT32 or network shares). See
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365241(v=vs.85).aspx
. But the docs have a preface warning that it's not recommended and may not
be available in future Windows versions, so that's not an option.
This Go language bug (https://github.com/golang/go/issues/8914) and this
cPython discussion (http://bugs.python.org/issue8828)) have discussion.
I found this comment particularly illuminating
https://bugs.python.org/msg146307 as it quotes what Java does. It uses
MoveFileEx.
See also:
*
https://stackoverflow.com/questions/167414/is-an-atomic-file-rename-with-overwrite-possible-on-windows
*
https://blogs.msdn.microsoft.com/adioltean/2005/12/28/how-to-do-atomic-writes-in-a-file/
* https://msdn.microsoft.com/en-us/library/aa365512(VS.85).aspx
*
https://msdn.microsoft.com/en-us/library/windows/desktop/hh802690(v=vs.85).aspx
(I sincerely hope that that blog post about atomic writes, which is 12
years old, is obsoleted by some new functionality. Because seriously, WTF.)
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Nov 28, 2017 at 3:59 AM, Andres Freund <andres@anarazel.de> wrote:
On 2017-11-28 09:47:45 +0900, Michael Paquier wrote:
On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
appears to be possible to atomically replace file on Windows –ReplaceFile()
does that. ReplaceFiles() requires target file to exist, this is why
we
still need to call MoveFileEx() when it doesn't exist.
Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.That seems like a *seriously* bad idea. What if we crash inbetween the
unlink and the rename?I'm confused about the need for this. Shouldn't normally
opening all files FILE_SHARE_DELETE take care of this? See
https://msdn.microsoft.com/en-us/library/windows/desktop/
aa363858(v=vs.85).aspx
"Note Delete access allows both delete and rename operations."Is there an external process active that doesn't set that flag?
I'm quite sure there was no such processed during my experimentation. No
antivirus or other disturbers. Moreover, error reproduces only with
artificial delay in pgstat_read_statsfiles(). So, it's clearly related to
lock placed by this function.
Are we missing setting it somewhere?
That's curious, but at least pgstat_read_statsfiles() seems to open file
with that flag.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Nov 28, 2017 at 5:52 PM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
On Tue, Nov 28, 2017 at 3:59 AM, Andres Freund <andres@anarazel.de> wrote:
On 2017-11-28 09:47:45 +0900, Michael Paquier wrote:
On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:Attached patch atomic-pgrename-windows-1.patch fixes this problem.
It
appears to be possible to atomically replace file on Windows –
ReplaceFile()
does that. ReplaceFiles() requires target file to exist, this is why
we
still need to call MoveFileEx() when it doesn't exist.
Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.That seems like a *seriously* bad idea. What if we crash inbetween the
unlink and the rename?I'm confused about the need for this. Shouldn't normally
opening all files FILE_SHARE_DELETE take care of this? See
https://msdn.microsoft.com/en-us/library/windows/desktop/aa3
63858(v=vs.85).aspx
"Note Delete access allows both delete and rename operations."Is there an external process active that doesn't set that flag?
I'm quite sure there was no such processed during my experimentation. No
antivirus or other disturbers. Moreover, error reproduces only with
artificial delay in pgstat_read_statsfiles(). So, it's clearly related to
lock placed by this function.Are we missing setting it somewhere?
That's curious, but at least pgstat_read_statsfiles() seems to open file
with that flag.
I wrote same console program to verify that windows API behaves so, and
it's not something odd inside PostgreSQL. Source code is attached.
So, with FILE_SHARE_DELETE flag you really can delete opened file
concurrently.
Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
Session #2
rename_test.exe Delete 1.txt
DeleteFile success
Closed
And you can rename it concurrently.
Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
Session #2
rename_test.exe MoveFileEx 1.txt 2.txt
MoveFileEx successClosed
Closed
But you can't replace it concurrently with another file. So as msdn
states, you can either delete or rename opened file concurrently. But you
can't replace it...
Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
Session #2
rename_test.exe MoveFileEx 2.txt 1.txt
MoveFileEx error: 5
Closed
But ReplaceFile works OK. Temporary file lives until session #1 close the
file.
Session #1
rename_test.exe Open 1.txt
Opened
Press any key to continue . . .
Session #2
rename_test.exe ReplaceFile 2.txt 1.txt
ReplaceFile success
Closed
I didn't try MoveFileTransacted, because deprecated function doesn't seem
like an option.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
On Tue, Nov 28, 2017 at 5:02 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 27 November 2017 at 14:28, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:It's assumed in PostgreSQL codebase that pgrename atomically replaces
target file with source file even if target file is open and being read by
another process. And this assumption is true on Linux, but it's false on
Windows. MoveFileEx() triggers an error when target file is open (and
accordingly locked). Some our customers has been faced such errors while
operating heavily loaded PostgreSQL instance on Windows.LOG could not rename temporary statistics file "pg_stat_tmp/global.tmp"
to "pg_stat_tmp/global.stat": Permission deniedThat would explain a number of intermittent reports Iv'e seen floating
around.
Yeah.
Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
appears to be possible to atomically replace file on Windows –
ReplaceFile() does that. ReplaceFiles() requires target file to exist,
this is why we still need to call MoveFileEx() when it doesn't exist.Look at the error codes for ReplaceFile:
https://msdn.microsoft.com/en-us/library/aa365512(VS.85).aspx
The docs don't say it's atomic and the error codes suggest it may not be.
But there's a Microsoft Research paper claiming it's atomic -
http://research.microsoft.com/pubs/64525/tr-2006-45.pdf .It appears that MoveFileTransacted would be what we really want, when
we're on NTFS (it won't work on FAT32 or network shares). See
https://msdn.microsoft.com/en-us/library/windows/
desktop/aa365241(v=vs.85).aspx . But the docs have a preface warning that
it's not recommended and may not be available in future Windows versions,
so that's not an option.This Go language bug (https://github.com/golang/go/issues/8914) and this
cPython discussion (http://bugs.python.org/issue8828)) have discussion.I found this comment particularly illuminating https://bugs.python.org/
msg146307 as it quotes what Java does. It uses MoveFileEx.
That's look bad. MoveFileTransacted() looks like best option, but it's not
an option since it's deprecated.
For now, ReplaceFile() function looks like best choice for renaming
statfiles. But it doesn't look good for critical datafiles whose are also
renamed using pgrename, because system can crash between rename of
destination file and rename of source file. Thus, MoveFileEx() seems still
best solution for critical datafiles where safety is more important than
concurrency. After reading provided links, I'm very suspicious about its
safety too. But it's seems like best available solution and it have
already passed some test of time...
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 29 November 2017 at 00:16, Alexander Korotkov <a.korotkov@postgrespro.ru>
wrote:
For now, ReplaceFile() function looks like best choice for renaming
statfiles. But it doesn't look good for critical datafiles whose are also
renamed using pgrename, because system can crash between rename of
destination file and rename of source file. Thus, MoveFileEx() seems still
best solution for critical datafiles where safety is more important than
concurrency. After reading provided links, I'm very suspicious about its
safety too. But it's seems like best available solution and it have
already passed some test of time...
Yeah.
Or alternately, we might need to extend WAL to include any temp file names,
etc, needed to allow us to recover from an interrupted operation during
redo. Frankly that may be the safest option; on Windows presume that there
is no atomic rename we can trust for critical files, and do it in multiple
steps.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2017-11-29 10:13:32 +0800, Craig Ringer wrote:
On 29 November 2017 at 00:16, Alexander Korotkov <a.korotkov@postgrespro.ru>
wrote:For now, ReplaceFile() function looks like best choice for renaming
statfiles. But it doesn't look good for critical datafiles whose are also
renamed using pgrename, because system can crash between rename of
destination file and rename of source file. Thus, MoveFileEx() seems still
best solution for critical datafiles where safety is more important than
concurrency. After reading provided links, I'm very suspicious about its
safety too. But it's seems like best available solution and it have
already passed some test of time...
Or we can just add some retry logic like we have for opening files...
Or alternately, we might need to extend WAL to include any temp file names,
etc, needed to allow us to recover from an interrupted operation during
redo. Frankly that may be the safest option; on Windows presume that there
is no atomic rename we can trust for critical files, and do it in multiple
steps.
I'd rather remove windows support than go there.
Greetings,
Andres Freund
Greetings Alexander, all,
* Alexander Korotkov (a.korotkov@postgrespro.ru) wrote:
Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
appears to be possible to atomically replace file on Windows –
ReplaceFile() does that. ReplaceFiles() requires target file to exist,
this is why we still need to call MoveFileEx() when it doesn't exist.
There's been some discussion on this patch and it seems to, at least,
improve the situation on Windows even if it might not completely address
the issues. Does anyone else want to take a look and comment,
particularly those familiar with the Windows side of things? Otherwise,
I'm afraid this patch may end up just sitting in Needs review state fo
the entire CF and not getting moved forward, even though it seems like a
good improvement for our Windows users.
Thanks!
Stephen
On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
appears to be possible to atomically replace file on Windows –ReplaceFile()
does that. ReplaceFiles() requires target file to exist, this is why we
still need to call MoveFileEx() when it doesn't exist.Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.
Unlinking it first seems dangerous, as pointed out by Andres.
What about first trying ReplaceFile() and then if it fails with "target
doesn't exist", then call MoveFileEx().
Or the other way around -- try MoveFileEx() first since that seems to work
most of the time today (if it mostly broke we'd be in trouble already), and
if it fails with a sharing violation, try ReplaceFile()? And perhaps end up
doing it something similar to what we do with shared memory which is just
to loop over it and try each a couple of time, before giving up and
failing?
--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Hi Alexander,
On 1/20/18 10:13 AM, Magnus Hagander wrote:
On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier
<michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote:On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>> wrote:Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
appears to be possible to atomically replace file on Windows – ReplaceFile()
does that. ReplaceFiles() requires target file to exist, this is why we
still need to call MoveFileEx() when it doesn't exist.Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.Unlinking it first seems dangerous, as pointed out by Andres.
What about first trying ReplaceFile() and then if it fails with "target
doesn't exist", then call MoveFileEx().Or the other way around -- try MoveFileEx() first since that seems to
work most of the time today (if it mostly broke we'd be in trouble
already), and if it fails with a sharing violation, try ReplaceFile()?
And perhaps end up doing it something similar to what we do with shared
memory which is just to loop over it and try each a couple of time,
before giving up and failing?
This patch was mistakenly left as Needs Review during the last
commitfest but it's pretty clear that a new patch is required.
This certainly sounds like a non-trivial change. Perhaps it should be
submitted for PG12?
Thanks,
--
-David
david@pgmasters.net
Hi, David!
On Tue, Mar 6, 2018 at 5:04 PM, David Steele <david@pgmasters.net> wrote:
On 1/20/18 10:13 AM, Magnus Hagander wrote:
On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier
<michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote:On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru <mailto:a.korotkov@postgrespro.ru>>wrote:
Attached patch atomic-pgrename-windows-1.patch fixes this
problem. It
appears to be possible to atomically replace file on Windows –
ReplaceFile()
does that. ReplaceFiles() requires target file to exist, this is
why we
still need to call MoveFileEx() when it doesn't exist.
Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.Unlinking it first seems dangerous, as pointed out by Andres.
What about first trying ReplaceFile() and then if it fails with "target
doesn't exist", then call MoveFileEx().Or the other way around -- try MoveFileEx() first since that seems to
work most of the time today (if it mostly broke we'd be in trouble
already), and if it fails with a sharing violation, try ReplaceFile()?
And perhaps end up doing it something similar to what we do with shared
memory which is just to loop over it and try each a couple of time,
before giving up and failing?This patch was mistakenly left as Needs Review during the last
commitfest but it's pretty clear that a new patch is required.
OK! No objections against marking this patch RWF.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 3/6/18 9:06 AM, Alexander Korotkov wrote:
On Tue, Mar 6, 2018 at 5:04 PM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:On 1/20/18 10:13 AM, Magnus Hagander wrote:
Unlinking it first seems dangerous, as pointed out by Andres.
What about first trying ReplaceFile() and then if it fails with "target
doesn't exist", then call MoveFileEx().Or the other way around -- try MoveFileEx() first since that seems to
work most of the time today (if it mostly broke we'd be in trouble
already), and if it fails with a sharing violation, try ReplaceFile()?
And perhaps end up doing it something similar to what we do with shared
memory which is just to loop over it and try each a couple of time,
before giving up and failing?This patch was mistakenly left as Needs Review during the last
commitfest but it's pretty clear that a new patch is required.OK! No objections against marking this patch RWF.
Hmmm, I just noticed this categorized as a bug. I thought it was a
refactor.
Even so, it looks like the approach needs a rethink so better to wait
for that.
Marked Returned with Feedback.
Thanks,
--
-David
david@pgmasters.net
On Tue, Mar 6, 2018 at 5:11 PM, David Steele <david@pgmasters.net> wrote:
On 3/6/18 9:06 AM, Alexander Korotkov wrote:
On Tue, Mar 6, 2018 at 5:04 PM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:On 1/20/18 10:13 AM, Magnus Hagander wrote:
Unlinking it first seems dangerous, as pointed out by Andres.
What about first trying ReplaceFile() and then if it fails with
"target
doesn't exist", then call MoveFileEx().
Or the other way around -- try MoveFileEx() first since that seems
to
work most of the time today (if it mostly broke we'd be in trouble
already), and if it fails with a sharing violation, tryReplaceFile()?
And perhaps end up doing it something similar to what we do with
shared
memory which is just to loop over it and try each a couple of
time,
before giving up and failing?
This patch was mistakenly left as Needs Review during the last
commitfest but it's pretty clear that a new patch is required.OK! No objections against marking this patch RWF.
Hmmm, I just noticed this categorized as a bug. I thought it was a
refactor.
Yes, that's naturally a bug. Not very critical though.
Even so, it looks like the approach needs a rethink so better to wait
for that.
In this thread we've found at least two possible approaches to fix this
bug. But both of them need to be implemented and tested.
Marked Returned with Feedback.
OK!
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Hi!
I'd like to resume the discussion on this subject. Sorry for so long delay.
On Sat, Jan 20, 2018 at 6:13 PM Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Nov 28, 2017 at 2:47 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Mon, Nov 27, 2017 at 3:28 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:Attached patch atomic-pgrename-windows-1.patch fixes this problem. It
appears to be possible to atomically replace file on Windows – ReplaceFile()
does that. ReplaceFiles() requires target file to exist, this is why we
still need to call MoveFileEx() when it doesn't exist.Do you think that it could be safer to unlink the target file first
with pgunlink()? This way you make sure that the target file is
removed and not locked. This change makes me worrying about the
introduction of more race conditions.Unlinking it first seems dangerous, as pointed out by Andres.
What about first trying ReplaceFile() and then if it fails with "target doesn't exist", then call MoveFileEx().
Or the other way around -- try MoveFileEx() first since that seems to work most of the time today (if it mostly broke we'd be in trouble already), and if it fails with a sharing violation, try ReplaceFile()? And perhaps end up doing it something similar to what we do with shared memory which is just to loop over it and try each a couple of time, before giving up and failing?
Unfortunately, it seems that none of such strategies would fit all the cases.
Basically, we have two option for renaming a file.
* MoveFileEx() – safest possible option, less likely loose files, but
takes a lock on target.
* ReplaceFile() – "lockless" option, but may loose files on OS crash.
Also we have two different cases of files renaming:
1) Renaming durable files. When target already exists, on OS crash we
expect finding either old or new file in target filename. We don't
expect to find nothing in target filename.
2) Renaming temporary files. In this case we don't much care on
loosing files on OS crash. But locking appears to be an issue in some
cases.
So, in 1st case it doesn't seem like an option to try ReplaceFile()
either in first or second place. In both ways it causes a risk to
loose a target file, which seems unacceptable.
In the 2nd case, trying MoveFileEx() then ReplaceFile() or vise versa
might work. But error codes of these functions doesn't tell
explicitly whether target file exists. So, I prefer checking it
explicitly with GetFileAttributes().
Attached patch implements idea described in [1]. It implements
separate pgrename_temp() function, which is better for concurrent
operation but less safe.
Links
1. /messages/by-id/CAPpHfds9trA6ipezK3BsuuOSQwEmESiqj8pkOxACFJpoLpcoNw@mail.gmail.com
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
pgrename_temp-1.patchapplication/octet-stream; name=pgrename_temp-1.patchDownload
commit 80b3a109b5af25190343d3307eeeb6584b77b8b3
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date: Tue Aug 6 20:49:03 2019 +0300
Introduce pgrename_temp()
Reported-by:
Bug:
Discussion:
Author:
Reviewed-by:
Tested-by:
Backpatch-through:
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 2bb14cdd026..b4404f23805 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4952,7 +4952,7 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
tmpfile)));
unlink(tmpfile);
}
- else if (rename(tmpfile, statfile) < 0)
+ else if (rename_temp(tmpfile, statfile) < 0)
{
ereport(LOG,
(errcode_for_file_access(),
@@ -5087,7 +5087,7 @@ pgstat_write_db_statsfile(PgStat_StatDBEntry *dbentry, bool permanent)
tmpfile)));
unlink(tmpfile);
}
- else if (rename(tmpfile, statfile) < 0)
+ else if (rename_temp(tmpfile, statfile) < 0)
{
ereport(LOG,
(errcode_for_file_access(),
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index bafd31d22bb..734cefebf02 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -1518,7 +1518,7 @@ update_metainfo_datafile(void)
}
fclose(fh);
- if (rename(LOG_METAINFO_DATAFILE_TMP, LOG_METAINFO_DATAFILE) != 0)
+ if (rename_temp(LOG_METAINFO_DATAFILE_TMP, LOG_METAINFO_DATAFILE) != 0)
ereport(LOG,
(errcode_for_file_access(),
errmsg("could not rename file \"%s\" to \"%s\": %m",
diff --git a/src/include/port.h b/src/include/port.h
index b5c03d912b0..fb6446ade5e 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -234,14 +234,28 @@ extern int pclose_check(FILE *stream);
extern int pgrename(const char *from, const char *to);
extern int pgunlink(const char *path);
+#ifdef WIN32
+extern int pgrename_temp(const char *from, const char *to);
+#endif
+
/* Include this first so later includes don't see these defines */
#ifdef _MSC_VER
#include <io.h>
#endif
+#ifdef WIN32
+#define rename_temp(from, to) pgrename_temp(from, to)
+#else
+#define rename_temp(from, to) pgrename(from, to)
+#endif
+
#define rename(from, to) pgrename(from, to)
#define unlink(path) pgunlink(path)
-#endif /* defined(WIN32) || defined(__CYGWIN__) */
+#else /* defined(WIN32) || defined(__CYGWIN__) */
+
+#define rename_temp(from, to) rename(from, to)
+
+#endif
/*
* Win32 also doesn't have symlinks, but we can emulate them with
diff --git a/src/port/dirmod.c b/src/port/dirmod.c
index d7932401ef0..fa7286b9720 100644
--- a/src/port/dirmod.c
+++ b/src/port/dirmod.c
@@ -92,6 +92,87 @@ pgrename(const char *from, const char *to)
}
+#ifdef WIN32
+
+/*
+ * pgrename_temp
+ *
+ * Renames file concurrently to operations on target. Not safe version, if OS
+ * crashes during replacing a file, then afterwards target filename might not
+ * exists. Suitable for temporary files.
+ */
+int
+pgrename_temp(const char *from, const char *to)
+{
+ int loops = 0;
+
+ /*
+ * We need to loop because even though PostgreSQL uses flags that allow
+ * rename while the file is open, other applications might have the file
+ * open without those flags. However, we won't wait indefinitely for
+ * someone else to close the file, as the caller might be holding locks
+ * and blocking other backends.
+ */
+ DWORD err;
+
+ /*
+ * On Windows we use ReplaceFile() to rename while concurrent processes
+ * have file open. However, ReplaceFile() is to be used only when target
+ * file is already exists. Thus, we check for file existence and then
+ * choose between MoveFileEx() and ReplaceFile() functions.
+ */
+ while (true)
+ {
+ DWORD dwAttrib;
+ bool filePresent;
+
+ dwAttrib = GetFileAttributes(to);
+ filePresent = (dwAttrib != INVALID_FILE_ATTRIBUTES) &&
+ !(dwAttrib & FILE_ATTRIBUTE_DIRECTORY);
+
+ if (filePresent)
+ {
+ if (ReplaceFile(to, from, NULL, REPLACEFILE_IGNORE_MERGE_ERRORS, 0, 0))
+ break;
+ }
+ else
+ {
+ if (MoveFileEx(from, to, MOVEFILE_REPLACE_EXISTING))
+ break;
+ }
+ err = GetLastError();
+
+ _dosmaperr(err);
+
+ /*
+ * Modern NT-based Windows versions return ERROR_SHARING_VIOLATION if
+ * another process has the file open without FILE_SHARE_DELETE.
+ * ERROR_LOCK_VIOLATION has also been seen with some anti-virus
+ * software. This used to check for just ERROR_ACCESS_DENIED, so
+ * presumably you can get that too with some OS versions. We don't
+ * expect real permission errors where we currently use rename().
+ *
+ * Due to cuncurrent operation ReplaceFile() might return either
+ * ERROR_UNABLE_TO_MOVE_REPLACEMENT or
+ * ERROR_UNABLE_TO_REMOVE_REPLACED. In both cases it worth retrying.
+ */
+ if (err != ERROR_ACCESS_DENIED &&
+ err != ERROR_SHARING_VIOLATION &&
+ err != ERROR_LOCK_VIOLATION &&
+ err != ERROR_UNABLE_TO_MOVE_REPLACEMENT &&
+ err != ERROR_UNABLE_TO_REMOVE_REPLACED)
+ return -1;
+
+ if (++loops > 100) /* time out after 10 sec */
+ return -1;
+ pg_usleep(100000); /* us */
+ }
+ return 0;
+}
+
+#endif
+
+
/*
* pgunlink
*/
On Tue, Aug 06, 2019 at 09:03:08PM +0300, Alexander Korotkov wrote:
...
Unfortunately, it seems that none of such strategies would fit all the
cases.Basically, we have two option for renaming a file. * MoveFileEx() –
safest possible option, less likely loose files, but takes a lock on
target. * ReplaceFile() – "lockless" option, but may loose files on
OS crash.Also we have two different cases of files renaming: 1) Renaming
durable files. When target already exists, on OS crash we expect
finding either old or new file in target filename. We don't expect to
find nothing in target filename. 2) Renaming temporary files. In
this case we don't much care on loosing files on OS crash. But
locking appears to be an issue in some cases.So, in 1st case it doesn't seem like an option to try ReplaceFile()
either in first or second place. In both ways it causes a risk to
loose a target file, which seems unacceptable.In the 2nd case, trying MoveFileEx() then ReplaceFile() or vise versa
might work. But error codes of these functions doesn't tell explicitly
whether target file exists. So, I prefer checking it explicitly with
GetFileAttributes().Attached patch implements idea described in [1]. It implements
separate pgrename_temp() function, which is better for concurrent
operation but less safe.
I don't have access to a Windows machine and my developer experience
with that platform is pretty much nil, but I think this patch makes
sense. It's not an ideal solution, but it's not clear such solution
exists, and an improvement is better than nothing.
I have two minor comments about rename_temp:
1) The name might seem to be hinting it's about files opened using
OpenTemporaryFile, but in practice it's about files that are not
critical. But maybe it's true.
2) I think the rename_temp comment should mention it can only be used in
cases when the renames happen in a single process (non-concurrently). If
we could call rename_temp() concurrently from two different processes,
it won't work as expected. Of course, we only call rename_temp from stat
collector and syslogger, where it obviously works.
Anyway, I'm really nitpicking here ...
Are there any objections to get the current patch committed as is, so
that it does not get pushed yet again to the next commitfest.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi!
On Tue, Jan 7, 2020 at 7:16 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
I don't have access to a Windows machine and my developer experience
with that platform is pretty much nil, but I think this patch makes
sense. It's not an ideal solution, but it's not clear such solution
exists, and an improvement is better than nothing.
Thank you for your attention to this patch!
I have two minor comments about rename_temp:
1) The name might seem to be hinting it's about files opened using
OpenTemporaryFile, but in practice it's about files that are not
critical. But maybe it's true.
We may invent another name. What about rename_fragile()?
2) I think the rename_temp comment should mention it can only be used in
cases when the renames happen in a single process (non-concurrently). If
we could call rename_temp() concurrently from two different processes,
it won't work as expected. Of course, we only call rename_temp from stat
collector and syslogger, where it obviously works.
Good point, this should be reflected in comments.
Anyway, I'm really nitpicking here ...
Are there any objections to get the current patch committed as is, so
that it does not get pushed yet again to the next commitfest.
It would be good to commit. Let's get agreement on function name first.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
On Tue, Jan 7, 2020 at 7:16 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:I don't have access to a Windows machine and my developer experience
with that platform is pretty much nil, but I think this patch makes
sense. It's not an ideal solution, but it's not clear such solution
exists, and an improvement is better than nothing.
Thank you for your attention to this patch!
FWIW, I don't like this patch much at all. I too know nothing about
Windows, but I do *not* like inventing a distinction between "rename"
and "rename_temp" and expecting all call sites to have to decide
which one to use. That's allowing a single platform's implementation
bugs to dictate our APIs globally; plus it's not clear that every
call site can know which is more appropriate.
regards, tom lane
On Tue, Jan 7, 2020 at 9:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
On Tue, Jan 7, 2020 at 7:16 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:I don't have access to a Windows machine and my developer experience
with that platform is pretty much nil, but I think this patch makes
sense. It's not an ideal solution, but it's not clear such solution
exists, and an improvement is better than nothing.Thank you for your attention to this patch!
FWIW, I don't like this patch much at all. I too know nothing about
Windows, but I do *not* like inventing a distinction between "rename"
and "rename_temp" and expecting all call sites to have to decide
which one to use. That's allowing a single platform's implementation
bugs to dictate our APIs globally; plus it's not clear that every
call site can know which is more appropriate.
I'm not sure issue we faced is really about single platform. TBH, the
assumptions we place to rename function is very strict. We assume
rename works atomically on system crash. And we indirectly assume it
can work concurrently as least with file readers. The both are
probably true for Linux with most popular filesystems. But we do
support pretty many platforms. I think the issue we didn't
investigate rename properties well on all of them. But if we do, it
might happen some assumptions are wrong on multiple platforms.
Windows is just used busy enough to spot the problem.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
I'm not sure issue we faced is really about single platform. TBH, the
assumptions we place to rename function is very strict. We assume
rename works atomically on system crash. And we indirectly assume it
can work concurrently as least with file readers. The both are
probably true for Linux with most popular filesystems. But we do
support pretty many platforms. I think the issue we didn't
investigate rename properties well on all of them. But if we do, it
might happen some assumptions are wrong on multiple platforms.
Windows is just used busy enough to spot the problem.
Well, atomic rename is required by POSIX. True, we have found bugs
related to that in one or two Unix-ish platforms. But nobody
is going to deny that those are OS bugs that the OS ought to fix,
rather than accepted behaviors that applications have to find
some way to work around. I'm not pleased with the idea that
Windows' deficiencies in this area should result in kluges all over
our code. I think we should stick to the autoconf recommendation:
Autoconf follows a philosophy that was formed over the years by
those who have struggled for portability: isolate the portability
issues in specific files, and then program as if you were in a
Posix environment.
regards, tom lane
On Tue, Jan 7, 2020 at 11:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
I'm not sure issue we faced is really about single platform. TBH, the
assumptions we place to rename function is very strict. We assume
rename works atomically on system crash. And we indirectly assume it
can work concurrently as least with file readers. The both are
probably true for Linux with most popular filesystems. But we do
support pretty many platforms. I think the issue we didn't
investigate rename properties well on all of them. But if we do, it
might happen some assumptions are wrong on multiple platforms.
Windows is just used busy enough to spot the problem.Well, atomic rename is required by POSIX. True, we have found bugs
related to that in one or two Unix-ish platforms. But nobody
is going to deny that those are OS bugs that the OS ought to fix,
rather than accepted behaviors that applications have to find
some way to work around. I'm not pleased with the idea that
Windows' deficiencies in this area should result in kluges all over
our code. I think we should stick to the autoconf recommendation:Autoconf follows a philosophy that was formed over the years by
those who have struggled for portability: isolate the portability
issues in specific files, and then program as if you were in a
Posix environment.
I get this point: we want to program postgres like it's working in
perfect POSIX environment.
And POSIX standard really requires rename() to work concurrently with
file accesses.
"If the link named by the new argument exists and the file's link
count becomes 0 when it is removed and no process has the file open,
the space occupied by the file shall be freed and the file shall no
longer be accessible. If one or more processes have the file open when
the last link is removed, the link shall be removed before rename()
returns, but the removal of the file contents shall be postponed until
all references to the file are closed."
But issue is that on Windows POSIX rename() is kind of impossible to
implement. And I suspect other platforms may have issues too.
Regarding "pg_stat_tmp/global.stat", which is a problem in particular
case, we may evade file renaming altogether. Instead, we may
implement shared-memory counter for filename. So, instead of
renaming, new reads will just come to new file.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On 1/11/20 5:13 PM, Alexander Korotkov wrote:
On Tue, Jan 7, 2020 at 11:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"If the link named by the new argument exists and the file's link
count becomes 0 when it is removed and no process has the file open,
the space occupied by the file shall be freed and the file shall no
longer be accessible. If one or more processes have the file open when
the last link is removed, the link shall be removed before rename()
returns, but the removal of the file contents shall be postponed until
all references to the file are closed."But issue is that on Windows POSIX rename() is kind of impossible to
implement. And I suspect other platforms may have issues too.Regarding "pg_stat_tmp/global.stat", which is a problem in particular
case, we may evade file renaming altogether. Instead, we may
implement shared-memory counter for filename. So, instead of
renaming, new reads will just come to new file.
I tend to agree with Tom on the question of portability. But it seems
upthread we have determined that this can't be sensibly isolated into a
Windows-specific rename() function.
Does anyone have any further ideas? If not I feel like this patch is
going to need to be RWF again.
Regards,
--
-David
david@pgmasters.net
David Steele <david@pgmasters.net> writes:
On 1/11/20 5:13 PM, Alexander Korotkov wrote:
Regarding "pg_stat_tmp/global.stat", which is a problem in particular
case, we may evade file renaming altogether. Instead, we may
implement shared-memory counter for filename. So, instead of
renaming, new reads will just come to new file.
I tend to agree with Tom on the question of portability. But it seems
upthread we have determined that this can't be sensibly isolated into a
Windows-specific rename() function.
Does anyone have any further ideas? If not I feel like this patch is
going to need to be RWF again.
So far as pg_stat_tmp is concerned, I think there is reasonable hope
that that problem is just going to go away in the near future.
I've not been paying attention to the shared-memory stats collector
thread so I'm not sure if that's anywhere near committable, but
I think that's clearly something we'll want once it's ready.
So if that's the main argument why we need this, it's a pretty
thin argument ...
regards, tom lane
Hi,
On March 30, 2020 9:17:00 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
So far as pg_stat_tmp is concerned, I think there is reasonable hope
that that problem is just going to go away in the near future.
I've not been paying attention to the shared-memory stats collector
thread so I'm not sure if that's anywhere near committable, but
I think that's clearly something we'll want once it's ready.
I'd give it a ~30-40% chance for 13 at this point. The patch improved a lot after the review cycles in the last ~10 days, but still needs a good bit more polish.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.