BUG #14243: pg_basebackup failes by a STATUS_DELETE_PENDING file
The following bug has been logged on the website:
Bug reference: 14243
Logged by: TAKATSUKA Haruka
Email address: harukat@sraoss.co.jp
PostgreSQL version: 9.6beta2
Operating system: Windows
Description:
pg_basebackup sometimes failed on Windows by "Permission denied".
This occured in PostgreSQL 9.1.x and I reproduced it in 9.6beta2.
C:\>pg_basebackup.exe -h localhost -U postgres
-D C:\dat\96datpg_basebackup: could not get backup header:
ERROR: could not stat file or directory "./base/16393/16444":
Permission denied
C:\Program Files\PostgreSQL\9.6\data\base\16393>dir /q 16444
C:\Program Files\PostgreSQL\9.6\data\base\16393 directory
2016/07/12 15:56 0 ... 16444
Dir command says the file 16444's owner is '...'.
It means file 16444 is in STATUS_DELETE_PENDING.
The other processes cannot open it, so it can
cause pg_basebackup to fail.
I think pg_basebackup should ignore files which is in
STATUS_DELETE_PENDING. But I cannot yet find an easy
method to realize it.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Jul 12, 2016 at 2:02 PM, <harukat@sraoss.co.jp> wrote:
The following bug has been logged on the website:
Bug reference: 14243
Logged by: TAKATSUKA Haruka
Email address: harukat@sraoss.co.jp
PostgreSQL version: 9.6beta2
Operating system: Windows
Description:pg_basebackup sometimes failed on Windows by "Permission denied".
This occured in PostgreSQL 9.1.x and I reproduced it in 9.6beta2.C:\>pg_basebackup.exe -h localhost -U postgres
-D C:\dat\96datpg_basebackup: could not get backup header:
ERROR: could not stat file or directory "./base/16393/16444":
Permission deniedC:\Program Files\PostgreSQL\9.6\data\base\16393>dir /q 16444
C:\Program Files\PostgreSQL\9.6\data\base\16393 directory
2016/07/12 15:56 0 ... 16444
Dir command says the file 16444's owner is '...'.
It means file 16444 is in STATUS_DELETE_PENDING.
The other processes cannot open it, so it can
cause pg_basebackup to fail.I think pg_basebackup should ignore files which is in
STATUS_DELETE_PENDING. But I cannot yet find an easy
method to realize it.
pg_basebackup tries to backup all the files in data directory and
tablespaces. Refer Notes section of pg_basebackup documentation. I
think if it decides to skip a file with some bad status, it might also
lead to a corrupt backup. I wonder how the file has such a status?
From the name, it looks to be a valid database file.
[1]: https://www.postgresql.org/docs/devel/static/app-pgbasebackup.html
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Amit Kapila wrote:
On Tue, Jul 12, 2016 at 2:02 PM, <harukat@sraoss.co.jp> wrote:
C:\Program Files\PostgreSQL\9.6\data\base\16393>dir /q 16444
C:\Program Files\PostgreSQL\9.6\data\base\16393 directory
2016/07/12 15:56 0 ... 16444
Dir command says the file 16444's owner is '...'.
It means file 16444 is in STATUS_DELETE_PENDING.
pg_basebackup tries to backup all the files in data directory and
tablespaces. Refer Notes section of pg_basebackup documentation. I
think if it decides to skip a file with some bad status, it might also
lead to a corrupt backup. I wonder how the file has such a status?
From the name, it looks to be a valid database file.
This file is probably being deleted by a checkpoint after some user
transaction marked it for removal (either because the relation was
dropped, or because it got a new relfilenode). I would say that the
file is not needed for the backup after all and pg_basebackup should
just ignore it.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On Tue, Jul 12, 2016 at 2:02 PM, <harukat@sraoss.co.jp> wrote:
It means file 16444 is in STATUS_DELETE_PENDING.
This file is probably being deleted by a checkpoint after some user
transaction marked it for removal (either because the relation was
dropped, or because it got a new relfilenode). I would say that the
file is not needed for the backup after all and pg_basebackup should
just ignore it.
Yeah. The question is how do we distinguish that from cases that
are less benign.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Jul 13, 2016 at 8:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On Tue, Jul 12, 2016 at 2:02 PM, <harukat@sraoss.co.jp> wrote:
It means file 16444 is in STATUS_DELETE_PENDING.
This file is probably being deleted by a checkpoint after some user
transaction marked it for removal (either because the relation was
dropped, or because it got a new relfilenode). I would say that the
file is not needed for the backup after all and pg_basebackup should
just ignore it.Yeah. The question is how do we distinguish that from cases that
are less benign.
Indeed, recovery will be able to handle that case correctly, and the
file should be ignored in the base backup. It seems that
STATUS_DELETE_PENDING has always mapped to ERROR_ACCESS_DENIED in
NTSTATUS [1]http://stackoverflow.com/questions/6680491/why-does-windows-return-error-access-denied-when-i-try-to-open-a-delete-pended-f -- Michael, which is definitely not the brightest idea ever because
this does not allow us to check if the access to a path is really
denied because of permissions.
Checking directly for STATUS_DELETE_PENDING would be the way to go,
likely with tweaks in basebackup.c. but like Takatsuka-san, I cannot
find anything around that would allow us to check for that.
[1]: http://stackoverflow.com/questions/6680491/why-does-windows-return-error-access-denied-when-i-try-to-open-a-delete-pended-f -- Michael
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Jul 13, 2016 at 10:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Checking directly for STATUS_DELETE_PENDING would be the way to go,
likely with tweaks in basebackup.c. but like Takatsuka-san, I cannot
find anything around that would allow us to check for that.
Actually we may want to tweak stat() to not complain for a file with
such a state.
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
I study this a little more, but I cannot find the way to distinguish
STATUS_DELETE_PENDING state from the other state that causes permission
denied error.
Cmd.exe's "dir /q" may distinguish these, but I cannot know which API
is used.
-----------------
Haruka Takatsuka
harukat@sraoss.co.jp
On Wed, 13 Jul 2016 10:10:37 +0900
Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Jul 13, 2016 at 10:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Checking directly for STATUS_DELETE_PENDING would be the way to go,
likely with tweaks in basebackup.c. but like Takatsuka-san, I cannot
find anything around that would allow us to check for that.Actually we may want to tweak stat() to not complain for a file with
such a state.
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
[Sorry for hanging to wrong message but I cannot get any email from
the list since this morning and I have to create a email from the
archive (posting is ok using the same email account. See my From:).]
Since no one came up with any useful solution for this soon, I think
I need to add this to the TODO list unless there's an objection.
Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
I study this a little more, but I cannot find the way to distinguish
STATUS_DELETE_PENDING state from the other state that causes permission
denied error.Cmd.exe's "dir /q" may distinguish these, but I cannot know which API
is used.-----------------
Haruka Takatsuka
harukat(at)sraoss(dot)co(dot)jpOn Wed, Jul 13, 2016 at 10:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Checking directly for STATUS_DELETE_PENDING would be the way to go,
likely with tweaks in basebackup.c. but like Takatsuka-san, I cannot
find anything around that would allow us to check for that.Actually we may want to tweak stat() to not complain for a file with
such a state.
--
Michael--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Aug 3, 2016 at 2:33 PM, Tatsuo Ishii <ishii@postgresql.org> wrote:
[Sorry for hanging to wrong message but I cannot get any email from
the list since this morning and I have to create a email from the
archive (posting is ok using the same email account. See my From:).]Since no one came up with any useful solution for this soon, I think
I need to add this to the TODO list unless there's an objection.
The last time I looked for for a way to make the difference between a
permission error and a pending deletion status, I bumped into the
following tool PendMove:
https://technet.microsoft.com/en-us/sysinternals/bb897556.aspx
Well, it is not useful in itself, but, as my colleague Nikhil
Deshpande has pointed out, it becomes more interesting if combined
with ProcessExplorer to see what system calls it does:
https://technet.microsoft.com/en-us/sysinternals/processexplorer.aspx
This may allow us to find a way what are the system APIs called to
make the difference and then use them.
I don't make a priority of this item, but, Takatsuka-san, that's an
idea to dig for I think.
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Jul 13, 2016 at 3:10 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Wed, Jul 13, 2016 at 10:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Checking directly for STATUS_DELETE_PENDING would be the way to go,
likely with tweaks in basebackup.c. but like Takatsuka-san, I cannot
find anything around that would allow us to check for that.windows-return-error-access-denied-when-i-try-to-open-a-delete-pended-f
Actually we may want to tweak stat() to not complain for a file with
such a state.
That will still leave is with a race condition won't it? It'll just be much
smaller? The file could go into STATUS_DELETE_PENDING between the call to
stat() and the call to sendFile()/allocateFile().
Maybe that's an acceptable difference? If not, we also need to teach
AllocateFile() about the error.
If we're happy with just stat, it looks like we will have to add it to
pgwin32_safestat().
I don' t have a working window sbox for testing ATM (yeah yeah, it's on my
magic todo), but can someone confirm if calling stat() on such a deleted
flie sets the errorcode so it can be checked with GetLastError() *as well*
as setting errno? IIRC those functions do, but it needs to be checked. If
so, the attached might work? If not, then we need an extra call to
GetFileAttributesEx(), or rearrange those calls in a way to trap errors
there.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Attachments:
win32_stat_deleted.patchtext/x-patch; charset=US-ASCII; name=win32_stat_deleted.patchDownload+19-1
On 2016-07-12 19:05:16 -0400, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On Tue, Jul 12, 2016 at 2:02 PM, <harukat@sraoss.co.jp> wrote:
It means file 16444 is in STATUS_DELETE_PENDING.
This file is probably being deleted by a checkpoint after some user
transaction marked it for removal (either because the relation was
dropped, or because it got a new relfilenode). I would say that the
file is not needed for the backup after all and pg_basebackup should
just ignore it.Yeah. The question is how do we distinguish that from cases that
are less benign.
One approach would be to rename the file into something indicating it's
being deleted, before actually deleting it.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Aug 16, 2016 at 9:54 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-07-12 19:05:16 -0400, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On Tue, Jul 12, 2016 at 2:02 PM, <harukat@sraoss.co.jp> wrote:
It means file 16444 is in STATUS_DELETE_PENDING.
This file is probably being deleted by a checkpoint after some user
transaction marked it for removal (either because the relation was
dropped, or because it got a new relfilenode). I would say that the
file is not needed for the backup after all and pg_basebackup should
just ignore it.Yeah. The question is how do we distinguish that from cases that
are less benign.One approach would be to rename the file into something indicating it's
being deleted, before actually deleting it.
That would be an option (IIRC if you open with FILE_SHARE_DELETE, it can
also be renamed). But if we can track the delete when we try to open it and
just treat it as "file does not exist", that seems cleaner.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 2016-08-17 12:19:31 +0200, Magnus Hagander wrote:
On Tue, Aug 16, 2016 at 9:54 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-07-12 19:05:16 -0400, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On Tue, Jul 12, 2016 at 2:02 PM, <harukat@sraoss.co.jp> wrote:
It means file 16444 is in STATUS_DELETE_PENDING.
This file is probably being deleted by a checkpoint after some user
transaction marked it for removal (either because the relation was
dropped, or because it got a new relfilenode). I would say that the
file is not needed for the backup after all and pg_basebackup should
just ignore it.Yeah. The question is how do we distinguish that from cases that
are less benign.One approach would be to rename the file into something indicating it's
being deleted, before actually deleting it.That would be an option (IIRC if you open with FILE_SHARE_DELETE, it can
also be renamed). But if we can track the delete when we try to open it and
just treat it as "file does not exist", that seems cleaner.
I'm not sure what exactly you're suggesting. But isn't there the issue
that such an approach will not interoperate with external tools?
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Aug 17, 2016 at 5:41 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-17 12:19:31 +0200, Magnus Hagander wrote:
On Tue, Aug 16, 2016 at 9:54 PM, Andres Freund <andres@anarazel.de>
wrote:
On 2016-07-12 19:05:16 -0400, Tom Lane wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On Tue, Jul 12, 2016 at 2:02 PM, <harukat@sraoss.co.jp> wrote:
It means file 16444 is in STATUS_DELETE_PENDING.
This file is probably being deleted by a checkpoint after some user
transaction marked it for removal (either because the relation was
dropped, or because it got a new relfilenode). I would say thatthe
file is not needed for the backup after all and pg_basebackup
should
just ignore it.
Yeah. The question is how do we distinguish that from cases that
are less benign.One approach would be to rename the file into something indicating it's
being deleted, before actually deleting it.That would be an option (IIRC if you open with FILE_SHARE_DELETE, it can
also be renamed). But if we can track the delete when we try to open itand
just treat it as "file does not exist", that seems cleaner.
I'm not sure what exactly you're suggesting. But isn't there the issue
that such an approach will not interoperate with external tools?
I'll have to admit I wasn't thinking of external tools. The external tools
would have to learn about pending-delete files themselves. So yeah, if we
care about those then renaming them to something predictable and then
instruct the third party tools to exclude such files would be a more
complete fix.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Tue, Aug 16, 2016 at 4:56 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, Jul 13, 2016 at 3:10 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:On Wed, Jul 13, 2016 at 10:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Checking directly for STATUS_DELETE_PENDING would be the way to go,
likely with tweaks in basebackup.c. but like Takatsuka-san, I cannot
find anything around that would allow us to check for that.Actually we may want to tweak stat() to not complain for a file with
such a state.That will still leave is with a race condition won't it? It'll just be much
smaller? The file could go into STATUS_DELETE_PENDING between the call to
stat() and the call to sendFile()/allocateFile().
That would only reduce the window. If the file is marked as ready for
deletion after checking with it for stat() and scanned afterwards
we're out, we'd still get a similar error afterwards.
Maybe that's an acceptable difference? If not, we also need to teach
AllocateFile() about the error.If we're happy with just stat, it looks like we will have to add it to
pgwin32_safestat().
lstat() needs a similar treatment, see sendTablespace that calls it.
port.h needs also to have its comments updated.
I don' t have a working window sbox for testing ATM (yeah yeah, it's on my
magic todo), but can someone confirm if calling stat() on such a deleted
flie sets the errorcode so it can be checked with GetLastError() *as well*
as setting errno? IIRC those functions do, but it needs to be checked. If
so, the attached might work? If not, then we need an extra call to
GetFileAttributesEx(), or rearrange those calls in a way to trap errors
there.
I think that your patch is definitely an improvement, and that we may
have better backpatch it: any extension relying on those calls is
going to fail similarly, so this gives an escape path. To be honest, I
have not thought of GetLastError() and check that with
STATUS_DELETE_PENDING, but that's definitely the right call to ignore
a file that has this status instead of failing with EACCESS.
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Aug 18, 2016 at 2:25 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Tue, Aug 16, 2016 at 4:56 PM, Magnus Hagander <magnus@hagander.net>
wrote:On Wed, Jul 13, 2016 at 3:10 AM, Michael Paquier <
michael.paquier@gmail.com>
wrote:
On Wed, Jul 13, 2016 at 10:06 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Checking directly for STATUS_DELETE_PENDING would be the way to go,
likely with tweaks in basebackup.c. but like Takatsuka-san, I cannot
find anything around that would allow us to check for that.windows-return-error-access-denied-when-i-try-to-open-a-delete-pended-f
Actually we may want to tweak stat() to not complain for a file with
such a state.That will still leave is with a race condition won't it? It'll just be
much
smaller? The file could go into STATUS_DELETE_PENDING between the call to
stat() and the call to sendFile()/allocateFile().That would only reduce the window. If the file is marked as ready for
deletion after checking with it for stat() and scanned afterwards
we're out, we'd still get a similar error afterwards.
That's pretty much exactly what I said, isn't it? :)
Maybe that's an acceptable difference? If not, we also need to teach
AllocateFile() about the error.If we're happy with just stat, it looks like we will have to add it to
pgwin32_safestat().lstat() needs a similar treatment, see sendTablespace that calls it.
port.h needs also to have its comments updated.
port/win32.h has:
/*
* Supplement to <sys/stat.h>.
*/
#define lstat(path, sb) stat((path), (sb))
So I don't think we should need that, no?
I don' t have a working window sbox for testing ATM (yeah yeah, it's on
my
magic todo), but can someone confirm if calling stat() on such a deleted
flie sets the errorcode so it can be checked with GetLastError() *aswell*
as setting errno? IIRC those functions do, but it needs to be checked. If
so, the attached might work? If not, then we need an extra call to
GetFileAttributesEx(), or rearrange those calls in a way to trap errors
there.I think that your patch is definitely an improvement, and that we may
have better backpatch it: any extension relying on those calls is
going to fail similarly, so this gives an escape path. To be honest, I
have not thought of GetLastError() and check that with
STATUS_DELETE_PENDING, but that's definitely the right call to ignore
a file that has this status instead of failing with EACCESS.
So, thinking more about that.
AllocateFile() calls fopen(). That one is only documented to set errno, and
might not always set the value used by GetLastError().
Perhaps we should make AllocateFile() on win32 specifically check for
EACCESS, and in case we get that error then we do a GetFileAttributesEx()
on the same file and see if we get STATUS_DELETE_PENDING or not. If we get
STATUS_DELETE_PENDING we rewrite EACCESS to ENOENT and return, if we get
anything else we just let it through.
The race then is if the file "expires", is deleted *and* we create a new
file in between. In that case, we would still return EACCESS and not open
the new file. But that seems like a very narrow case.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Thu, Aug 18, 2016 at 7:07 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Aug 18, 2016 at 2:25 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:On Tue, Aug 16, 2016 at 4:56 PM, Magnus Hagander <magnus@hagander.net>
wrote:On Wed, Jul 13, 2016 at 3:10 AM, Michael Paquier
<michael.paquier@gmail.com>
wrote:That's pretty much exactly what I said, isn't it? :)
:p
lstat() needs a similar treatment, see sendTablespace that calls it.
port.h needs also to have its comments updated.port/win32.h has:
/*
* Supplement to <sys/stat.h>.
*/
#define lstat(path, sb) stat((path), (sb))So I don't think we should need that, no?
Right, this mapping may be caused by the fact that WIN32 uses junction
points. Is that right?
I think that your patch is definitely an improvement, and that we may
have better backpatch it: any extension relying on those calls is
going to fail similarly, so this gives an escape path. To be honest, I
have not thought of GetLastError() and check that with
STATUS_DELETE_PENDING, but that's definitely the right call to ignore
a file that has this status instead of failing with EACCESS.So, thinking more about that.
AllocateFile() calls fopen(). That one is only documented to set errno, and
might not always set the value used by GetLastError().Perhaps we should make AllocateFile() on win32 specifically check for
EACCESS, and in case we get that error then we do a GetFileAttributesEx() on
the same file and see if we get STATUS_DELETE_PENDING or not. If we get
STATUS_DELETE_PENDING we rewrite EACCESS to ENOENT and return, if we get
anything else we just let it through.
That's a good idea.
The race then is if the file "expires", is deleted *and* we create a new
file in between. In that case, we would still return EACCESS and not open
the new file. But that seems like a very narrow case.
You mean a file with the same name that gets re-created? Nah, it's not
worth worrying about that. The original complaint here was about a
relfilenode that a checkpoint has removed after it got likely
truncated by a backend and those are unique.
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Aug 19, 2016 at 4:15 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Thu, Aug 18, 2016 at 7:07 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Aug 18, 2016 at 2:25 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:On Tue, Aug 16, 2016 at 4:56 PM, Magnus Hagander <magnus@hagander.net>
wrote:On Wed, Jul 13, 2016 at 3:10 AM, Michael Paquier
<michael.paquier@gmail.com>
wrote:That's pretty much exactly what I said, isn't it? :)
:p
lstat() needs a similar treatment, see sendTablespace that calls it.
port.h needs also to have its comments updated.port/win32.h has:
/*
* Supplement to <sys/stat.h>.
*/
#define lstat(path, sb) stat((path), (sb))So I don't think we should need that, no?
Right, this mapping may be caused by the fact that WIN32 uses junction
points. Is that right?I think that your patch is definitely an improvement, and that we may
have better backpatch it: any extension relying on those calls is
going to fail similarly, so this gives an escape path. To be honest, I
have not thought of GetLastError() and check that with
STATUS_DELETE_PENDING, but that's definitely the right call to ignore
a file that has this status instead of failing with EACCESS.So, thinking more about that.
AllocateFile() calls fopen(). That one is only documented to set errno, and
might not always set the value used by GetLastError().Perhaps we should make AllocateFile() on win32 specifically check for
EACCESS, and in case we get that error then we do a GetFileAttributesEx() on
the same file and see if we get STATUS_DELETE_PENDING or not. If we get
STATUS_DELETE_PENDING we rewrite EACCESS to ENOENT and return, if we get
anything else we just let it through.That's a good idea.
So, working more on it I have not been able to reproduce the failure
reported even after running pg_basebackup in loop with a pgbench
running on a single client with this script to create a bunch of
relfilenodes:
insert into hoge values (1);
truncate hoge;
max_file_size and checkpoint_timeout were also set to minimum to
increase checkpoint frequency.
Anyway, I have spent some time eye-balling the patch proposed by
Magnus and I think that it is over-complicated. First I have noticed
here something:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681382(v=vs.85).aspx
There is an error code ERROR_DELETE_PENDING that we can use to perform
the checks we want, and I am noticing that this is not getting listed
in win32error.c, so we could just map that with ENOENT and we are
basically done: AllocateFile() calls pgwin32_fopen() and stat() calls
win32_safestats, both of them finishing with _dosmaperr() to be sure
that errno is correctly set. This results in the simple patch
attached.
Takatsuka-san, could you try the patch attached and see if the failure
goes away?
--
Michael
Attachments:
win32_pending_delete.patchtext/x-patch; charset=US-ASCII; name=win32_pending_delete.patchDownload+3-0
On Mon, Aug 22, 2016 at 3:13 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Fri, Aug 19, 2016 at 4:15 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Thu, Aug 18, 2016 at 7:07 PM, Magnus Hagander <magnus@hagander.net>
wrote:
On Thu, Aug 18, 2016 at 2:25 AM, Michael Paquier <
michael.paquier@gmail.com>
wrote:
On Tue, Aug 16, 2016 at 4:56 PM, Magnus Hagander <magnus@hagander.net>
wrote:On Wed, Jul 13, 2016 at 3:10 AM, Michael Paquier
<michael.paquier@gmail.com>
wrote:That's pretty much exactly what I said, isn't it? :)
:p
lstat() needs a similar treatment, see sendTablespace that calls it.
port.h needs also to have its comments updated.port/win32.h has:
/*
* Supplement to <sys/stat.h>.
*/
#define lstat(path, sb) stat((path), (sb))So I don't think we should need that, no?
Right, this mapping may be caused by the fact that WIN32 uses junction
points. Is that right?I think that your patch is definitely an improvement, and that we may
have better backpatch it: any extension relying on those calls is
going to fail similarly, so this gives an escape path. To be honest, I
have not thought of GetLastError() and check that with
STATUS_DELETE_PENDING, but that's definitely the right call to ignore
a file that has this status instead of failing with EACCESS.So, thinking more about that.
AllocateFile() calls fopen(). That one is only documented to set errno,
and
might not always set the value used by GetLastError().
Perhaps we should make AllocateFile() on win32 specifically check for
EACCESS, and in case we get that error then we do aGetFileAttributesEx() on
the same file and see if we get STATUS_DELETE_PENDING or not. If we get
STATUS_DELETE_PENDING we rewrite EACCESS to ENOENT and return, if we get
anything else we just let it through.That's a good idea.
So, working more on it I have not been able to reproduce the failure
reported even after running pg_basebackup in loop with a pgbench
running on a single client with this script to create a bunch of
relfilenodes:
insert into hoge values (1);
truncate hoge;
max_file_size and checkpoint_timeout were also set to minimum to
increase checkpoint frequency.Anyway, I have spent some time eye-balling the patch proposed by
Magnus and I think that it is over-complicated. First I have noticed
here something:
https://msdn.microsoft.com/en-us/library/windows/desktop/
ms681382(v=vs.85).aspx
There is an error code ERROR_DELETE_PENDING that we can use to perform
the checks we want, and I am noticing that this is not getting listed
in win32error.c, so we could just map that with ENOENT and we are
basically done: AllocateFile() calls pgwin32_fopen() and stat() calls
win32_safestats, both of them finishing with _dosmaperr() to be sure
that errno is correctly set. This results in the simple patch
attached.
Not having looked in detail, but in pgwin32_safestat(), if the stat() call
fails, we return immediately without calling _dosmaperr(), don't we? So
we're still going to error out there with whatever the default mapping is,
and that's access denied.
It's only if the the stat() call succeeds but the getting of extended
attributes fail that we actually call _dosmaperr().
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Mon, Aug 22, 2016 at 10:17 PM, Magnus Hagander <magnus@hagander.net> wrote:
Not having looked in detail, but in pgwin32_safestat(), if the stat() call
fails, we return immediately without calling _dosmaperr(), don't we? So
we're still going to error out there with whatever the default mapping is,
and that's access denied.It's only if the the stat() call succeeds but the getting of extended
attributes fail that we actually call _dosmaperr().
Meh, you're right. How stupid I am here. So we could just reuse the
first block of your patch when checking for (r < 0), but drop the
second part that complicates GetFileAttributesEx if win32error.c gets
completed for _dosmaperr as my last patch does, right?
By the way, in your patch you really need to
s/STATUS_DELETE_PENDING/ERROR_DELETE_PENDING or compilation just
fails.
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs