More time spending with "delete pending"

Started by Alexander Lakhinabout 5 years ago27 messages
#1Alexander Lakhin
exclusion@gmail.com
2 attachment(s)

Hello hackers,

After fixing bug #16161 (pg_ctl inability to open just deleted
postmaster.pid) there are still some errors related to the same
Windows-only issue.
Namely, pg_upgradeCheck failures seen on
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=fairywren&br=REL_13_STABLE
Here pg_ls_dir_files couldn't handle the "delete pending" state too.

I've made a simple test to reproduce this behavior with pg_ls_waldir,
that fails with:
error running SQL: 'psql:<stdin>:1: ERROR:  could not stat file
"pg_wal/dummy": Permission denied'

As noted in [1]/messages/by-id/396837.1605298573@sss.pgh.pa.us, a sensible solution would be putting the same "retry on
ERROR_ACCESS_DENIED" action in a wrapper for stat().
And bed90759f brought in master the _pgstat64() function, where such
error handling should be placed.
So please look at the patch (based on the previous one made to fix
bug#16161), that makes the attached test pass.

And I have a few questions.
For now genfile.c needs to include "port.h" explicitly to override
definitions in "sys/stat.h", that override "stat" redefinition for
Windows included implicitly above via "postgres.h". Without it the
pg_ls_dir_files() function just uses system' stat().
So I wonder, is there a way to make all stat-calling code use the
improved stat()?
And if so, maybe the stat() call in pgwin32_open should be replaced with
microsoft_native_stat().

And regarding released versions, are there any reasons to not backport
bed90759f (with the proposed fix or alike)?

[1]: /messages/by-id/396837.1605298573@sss.pgh.pa.us

Best regards,
Alexander

Attachments:

001_delete_pending.plapplication/x-perl; name=001_delete_pending.plDownload
fix_pg_ls_dir.patchtext/x-patch; charset=UTF-8; name=fix_pg_ls_dir.patchDownload
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index d34182a7b0..922df49125 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -28,6 +28,7 @@
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "port.h"
 #include "postmaster/syslogger.h"
 #include "storage/fd.h"
 #include "utils/acl.h"
diff --git a/src/port/win32stat.c b/src/port/win32stat.c
index 4351aa4d08..c2b55c7fa6 100644
--- a/src/port/win32stat.c
+++ b/src/port/win32stat.c
@@ -170,6 +170,7 @@ _pgstat64(const char *name, struct stat *buf)
 	SECURITY_ATTRIBUTES sa;
 	HANDLE		hFile;
 	int			ret;
+	int			loops = 0;
 #if _WIN32_WINNT < 0x0600
 	IO_STATUS_BLOCK ioStatus;
 	FILE_STANDARD_INFORMATION standardInfo;
@@ -182,31 +183,60 @@ _pgstat64(const char *name, struct stat *buf)
 		errno = EINVAL;
 		return -1;
 	}
-
 	/* fast not-exists check */
 	if (GetFileAttributes(name) == INVALID_FILE_ATTRIBUTES)
 	{
-		_dosmaperr(GetLastError());
-		return -1;
+		DWORD		err = GetLastError();
+
+		if (err != ERROR_ACCESS_DENIED) {
+			_dosmaperr(err);
+			return -1;
+		}
 	}
 
 	/* get a file handle as lightweight as we can */
 	sa.nLength = sizeof(SECURITY_ATTRIBUTES);
 	sa.bInheritHandle = TRUE;
 	sa.lpSecurityDescriptor = NULL;
-	hFile = CreateFile(name,
-					   GENERIC_READ,
-					   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
-					   &sa,
-					   OPEN_EXISTING,
-					   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
-						FILE_FLAG_OVERLAPPED),
-					   NULL);
-	if (hFile == INVALID_HANDLE_VALUE)
+	while ((hFile = CreateFile(name,
+							   GENERIC_READ,
+							   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
+							   &sa,
+							   OPEN_EXISTING,
+							   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
+							   FILE_FLAG_OVERLAPPED),
+							   NULL)) == INVALID_HANDLE_VALUE)
 	{
 		DWORD		err = GetLastError();
 
-		CloseHandle(hFile);
+		/*
+		 * ERROR_ACCESS_DENIED is returned if the file is deleted but not yet
+		 * gone (Windows NT status code is STATUS_DELETE_PENDING).  In that
+		 * case we want to wait a bit and try again, giving up after 1 second
+		 * (since this condition should never persist very long).  However,
+		 * there are other commonly-hit cases that return ERROR_ACCESS_DENIED,
+		 * so care is needed.  In particular that happens if we try to open a
+		 * directory, or of course if there's an actual file-permissions
+		 * problem.  To distinguish these cases, try a stat().  In the
+		 * delete-pending case, it will either also get STATUS_DELETE_PENDING,
+		 * or it will see the file as gone and fail with ENOENT.  In other
+		 * cases it will usually succeed.  The only somewhat-likely case where
+		 * this coding will uselessly wait is if there's a permissions problem
+		 * with a containing directory, which we hope will never happen in any
+		 * performance-critical code paths.
+		 */
+		if (err == ERROR_ACCESS_DENIED)
+		{
+			struct microsoft_native_stat st;
+
+			if (microsoft_native_stat(name, &st) != 0)
+			{
+				pg_usleep(100000);
+				loops++;
+				continue;
+			}
+		}
+
 		_dosmaperr(err);
 		return -1;
 	}
#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Alexander Lakhin (#1)
Re: More time spending with "delete pending"

On Sat, Nov 14, 2020 at 01:00:00PM +0300, Alexander Lakhin wrote:

As noted in [1], a sensible solution would be putting the same "retry on
ERROR_ACCESS_DENIED" action in a wrapper for stat().
And bed90759f brought in master the _pgstat64() function, where such
error handling should be placed.
So please look at the patch (based on the previous one made to fix
bug#16161), that makes the attached test pass.

Your patch introduces a "loops", but doesn't use it to escape the loop.

Show quoted text
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index d34182a7b0..922df49125 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -28,6 +28,7 @@
#include "funcapi.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "port.h"
#include "postmaster/syslogger.h"
#include "storage/fd.h"
#include "utils/acl.h"
diff --git a/src/port/win32stat.c b/src/port/win32stat.c
index 4351aa4d08..c2b55c7fa6 100644
--- a/src/port/win32stat.c
+++ b/src/port/win32stat.c
@@ -170,6 +170,7 @@ _pgstat64(const char *name, struct stat *buf)
SECURITY_ATTRIBUTES sa;
HANDLE		hFile;
int			ret;
+	int			loops = 0;
#if _WIN32_WINNT < 0x0600
IO_STATUS_BLOCK ioStatus;
FILE_STANDARD_INFORMATION standardInfo;
@@ -182,31 +183,60 @@ _pgstat64(const char *name, struct stat *buf)
errno = EINVAL;
return -1;
}
-
/* fast not-exists check */
if (GetFileAttributes(name) == INVALID_FILE_ATTRIBUTES)
{
-		_dosmaperr(GetLastError());
-		return -1;
+		DWORD		err = GetLastError();
+
+		if (err != ERROR_ACCESS_DENIED) {
+			_dosmaperr(err);
+			return -1;
+		}
}
/* get a file handle as lightweight as we can */
sa.nLength = sizeof(SECURITY_ATTRIBUTES);
sa.bInheritHandle = TRUE;
sa.lpSecurityDescriptor = NULL;
-	hFile = CreateFile(name,
-					   GENERIC_READ,
-					   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
-					   &sa,
-					   OPEN_EXISTING,
-					   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
-						FILE_FLAG_OVERLAPPED),
-					   NULL);
-	if (hFile == INVALID_HANDLE_VALUE)
+	while ((hFile = CreateFile(name,
+							   GENERIC_READ,
+							   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
+							   &sa,
+							   OPEN_EXISTING,
+							   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
+							   FILE_FLAG_OVERLAPPED),
+							   NULL)) == INVALID_HANDLE_VALUE)
{
DWORD		err = GetLastError();
-		CloseHandle(hFile);
+		/*
+		 * ERROR_ACCESS_DENIED is returned if the file is deleted but not yet
+		 * gone (Windows NT status code is STATUS_DELETE_PENDING).  In that
+		 * case we want to wait a bit and try again, giving up after 1 second
+		 * (since this condition should never persist very long).  However,
+		 * there are other commonly-hit cases that return ERROR_ACCESS_DENIED,
+		 * so care is needed.  In particular that happens if we try to open a
+		 * directory, or of course if there's an actual file-permissions
+		 * problem.  To distinguish these cases, try a stat().  In the
+		 * delete-pending case, it will either also get STATUS_DELETE_PENDING,
+		 * or it will see the file as gone and fail with ENOENT.  In other
+		 * cases it will usually succeed.  The only somewhat-likely case where
+		 * this coding will uselessly wait is if there's a permissions problem
+		 * with a containing directory, which we hope will never happen in any
+		 * performance-critical code paths.
+		 */
+		if (err == ERROR_ACCESS_DENIED)
+		{
+			struct microsoft_native_stat st;
+
+			if (microsoft_native_stat(name, &st) != 0)
+			{
+				pg_usleep(100000);
+				loops++;
+				continue;
+			}
+		}
+
_dosmaperr(err);
return -1;
}
#3Alexander Lakhin
exclusion@gmail.com
In reply to: Justin Pryzby (#2)
1 attachment(s)
Re: More time spending with "delete pending"

15.11.2020 04:11, Justin Pryzby wrote:

On Sat, Nov 14, 2020 at 01:00:00PM +0300, Alexander Lakhin wrote:

As noted in [1], a sensible solution would be putting the same "retry on
ERROR_ACCESS_DENIED" action in a wrapper for stat().
And bed90759f brought in master the _pgstat64() function, where such
error handling should be placed.
So please look at the patch (based on the previous one made to fix
bug#16161), that makes the attached test pass.

Your patch introduces a "loops", but doesn't use it to escape the loop.

Indeed, this is my mistake. Please look at the corrected patch (now that
code corresponds to the pgwin32_open() as intended).

And it rises another question, what if pg_ls_dir_files() is called for a
directory where hundreds or thousands of files are really inaccessible
due to restrictions?
I mean that using CreateFile() in the modified stat() implementation can
be rather expensive for an arbitrary file (and worse yet, for many files).
On the positive side, for now pg_ls_dir_files() is called only from
pg_ls_logdir, pg_ls_waldir, pg_ls_tmpdir, pg_ls_archive_statusdir, where
having a bunch of files that are inaccessible for the postgres user is
not expected anyway.

But probably getting directory contents with correct file sizes (>4GB)
in pg_ls_dir_files() can be implemented without calling
CreateFile()/stat() at all (as ProcMon shows, the "dir" command doesn't
call CreateFile() (or any other system function) for each file in the
target directory).

Best regards,
Alexander

Attachments:

fix_pg_ls_dir.patchtext/x-patch; charset=UTF-8; name=fix_pg_ls_dir.patchDownload
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index d34182a7b0..922df49125 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -28,6 +28,7 @@
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "port.h"
 #include "postmaster/syslogger.h"
 #include "storage/fd.h"
 #include "utils/acl.h"
diff --git a/src/port/win32stat.c b/src/port/win32stat.c
index 4351aa4d08..2f9e111627 100644
--- a/src/port/win32stat.c
+++ b/src/port/win32stat.c
@@ -170,6 +170,7 @@ _pgstat64(const char *name, struct stat *buf)
 	SECURITY_ATTRIBUTES sa;
 	HANDLE		hFile;
 	int			ret;
+	int			loops = 0;
 #if _WIN32_WINNT < 0x0600
 	IO_STATUS_BLOCK ioStatus;
 	FILE_STANDARD_INFORMATION standardInfo;
@@ -182,31 +183,63 @@ _pgstat64(const char *name, struct stat *buf)
 		errno = EINVAL;
 		return -1;
 	}
-
 	/* fast not-exists check */
 	if (GetFileAttributes(name) == INVALID_FILE_ATTRIBUTES)
 	{
-		_dosmaperr(GetLastError());
-		return -1;
+		DWORD		err = GetLastError();
+
+		if (err != ERROR_ACCESS_DENIED) {
+			_dosmaperr(err);
+			return -1;
+		}
 	}
 
 	/* get a file handle as lightweight as we can */
 	sa.nLength = sizeof(SECURITY_ATTRIBUTES);
 	sa.bInheritHandle = TRUE;
 	sa.lpSecurityDescriptor = NULL;
-	hFile = CreateFile(name,
-					   GENERIC_READ,
-					   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
-					   &sa,
-					   OPEN_EXISTING,
-					   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
-						FILE_FLAG_OVERLAPPED),
-					   NULL);
-	if (hFile == INVALID_HANDLE_VALUE)
+	while ((hFile = CreateFile(name,
+							   GENERIC_READ,
+							   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
+							   &sa,
+							   OPEN_EXISTING,
+							   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
+							   FILE_FLAG_OVERLAPPED),
+							   NULL)) == INVALID_HANDLE_VALUE)
 	{
 		DWORD		err = GetLastError();
 
-		CloseHandle(hFile);
+		/*
+		 * ERROR_ACCESS_DENIED is returned if the file is deleted but not yet
+		 * gone (Windows NT status code is STATUS_DELETE_PENDING).  In that
+		 * case we want to wait a bit and try again, giving up after 1 second
+		 * (since this condition should never persist very long).  However,
+		 * there are other commonly-hit cases that return ERROR_ACCESS_DENIED,
+		 * so care is needed.  In particular that happens if we try to open a
+		 * directory, or of course if there's an actual file-permissions
+		 * problem.  To distinguish these cases, try a stat().  In the
+		 * delete-pending case, it will either also get STATUS_DELETE_PENDING,
+		 * or it will see the file as gone and fail with ENOENT.  In other
+		 * cases it will usually succeed.  The only somewhat-likely case where
+		 * this coding will uselessly wait is if there's a permissions problem
+		 * with a containing directory, which we hope will never happen in any
+		 * performance-critical code paths.
+		 */
+		if (err == ERROR_ACCESS_DENIED)
+		{
+			if (loops < 10)
+			{
+				struct microsoft_native_stat st;
+
+				if (microsoft_native_stat(name, &st) != 0)
+				{
+					pg_usleep(100000);
+					loops++;
+					continue;
+				}
+			}
+		}
+
 		_dosmaperr(err);
 		return -1;
 	}
#4Alexander Lakhin
exclusion@gmail.com
In reply to: Alexander Lakhin (#3)
Re: More time spending with "delete pending"

15.11.2020 08:00, Alexander Lakhin wrote:

And it rises another question, what if pg_ls_dir_files() is called for a
directory where hundreds or thousands of files are really inaccessible
due to restrictions?
I mean that using CreateFile() in the modified stat() implementation can
be rather expensive for an arbitrary file (and worse yet, for many files).
On the positive side, for now pg_ls_dir_files() is called only from
pg_ls_logdir, pg_ls_waldir, pg_ls_tmpdir, pg_ls_archive_statusdir, where
having a bunch of files that are inaccessible for the postgres user is
not expected anyway.

I've missed the fact that pg_ls_dir_files() just fails on first error,
so the existing coding would not cause performance issues. But such
behavior is somewhat dubious, because having an inaccessible file in a
directory of interest would make any of those calls fail

But probably getting directory contents with correct file sizes (>4GB)
in pg_ls_dir_files() can be implemented without calling
CreateFile()/stat() at all (as ProcMon shows, the "dir" command doesn't
call CreateFile() (or any other system function) for each file in the
target directory).

As I've found out, readdir() replacement for Windows in fact gets all
the needed information (correct file size, attributes...) in
WIN32_FIND_DATA, but it just leaves it inside and returns only fields of
the dirent structure. So pg_ls_dir_files() (that calls
ReadDir()/readdir()) needs to get this information again, that leads to
opening a file on Windows.
I think it can be done more effectively by adding a new function
ReadDirExtendedInfo(), that will additionally accept a pointer to
"struct dirent_extra" with fields {valid (indicating that the structure
is filled), attributes, file size, mtime}. Maybe the advanced function
could also invoke stat() inside (not on Windows).

As to patch I proposed before, I think it's still needed to fully
support the following usage pattern:
stat();
if (no_error) {
    do_something();
} else if (file_not_found) {
    do_something_else();
} else {
    error();
}

Best regards,
Alexander

#5Juan José Santamaría Flecha
juanjo.santamaria@gmail.com
In reply to: Alexander Lakhin (#4)
Re: More time spending with "delete pending"

On Sun, Nov 15, 2020 at 4:00 PM Alexander Lakhin <exclusion@gmail.com>
wrote:

As I've found out, readdir() replacement for Windows in fact gets all
the needed information (correct file size, attributes...) in
WIN32_FIND_DATA, but it just leaves it inside and returns only fields of
the dirent structure. So pg_ls_dir_files() (that calls
ReadDir()/readdir()) needs to get this information again, that leads to
opening a file on Windows.
I think it can be done more effectively by adding a new function
ReadDirExtendedInfo(), that will additionally accept a pointer to
"struct dirent_extra" with fields {valid (indicating that the structure
is filled), attributes, file size, mtime}. Maybe the advanced function
could also invoke stat() inside (not on Windows).

As to patch I proposed before, I think it's still needed to fully
support the following usage pattern:
stat();
if (no_error) {
do_something();
} else if (file_not_found) {
do_something_else();
} else {
error();
}

We are currently missing a WIN32 lstat() port. I was thinking about
proposing a patch to implement it using GetFileAttributesEx(). That might
work as fall back to the CreateFile() if the file attribute is not a
FILE_ATTRIBUTE_REPARSE_POINT.

Anyhow, I do not think any retry logic should be in the stat() function,
but in the caller.

Regards,

Juan José Santamaría Flecha

Show quoted text
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Lakhin (#3)
Re: More time spending with "delete pending"

Alexander Lakhin <exclusion@gmail.com> writes:

15.11.2020 04:11, Justin Pryzby wrote:

Your patch introduces a "loops", but doesn't use it to escape the loop.

Indeed, this is my mistake. Please look at the corrected patch (now that
code corresponds to the pgwin32_open() as intended).

So what you're saying (and what the buildfarm seems to confirm, since
we are still seeing "Permission denied" failures from time to time,
eg [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2020-11-18%2018%3A02%3A04) is that all that code that bed90759f added to deal with
delete-pending is just junk, because we never get that far if the
file is delete-pending. So shouldn't we remove all of it? (That's
lines 214-276 in win32stat.c, plus some earlier declarations that
would now be unneeded.)

I'd be kind of inclined to change pgwin32_open() to use
microsoft_native_stat, as well, because these two functions
really ought to be handling this exactly alike.

regards, tom lane

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2020-11-18%2018%3A02%3A04

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: More time spending with "delete pending"

BTW ... scraping the buildfarm logs for "could not open ... Permission
denied" failures suggests that pgwin32_open() isn't the pinnacle of
perfection either. In the last three months I found these instances:

dory | REL_11_STABLE | 2020-08-21 22:15:14 | Check | pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/REL_11_STABLE/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r
fairywren | REL_11_STABLE | 2020-09-01 01:04:55 | Check | pg_ctl: could not open PID file "C:/tools/msys64/home/pgrunner/bf/root/REL_11_STABLE/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r
dory | HEAD | 2020-09-16 02:40:17 | Check | pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r
walleye | HEAD | 2020-09-29 18:35:46 | pg_upgradeCheck | 2020-09-29 15:20:04.298 EDT [896:31] pg_regress/gist WARNING: could not open statistics file "pg_stat_tmp/global.stat": Permission denied
frogmouth | REL_10_STABLE | 2020-10-01 00:30:15 | StopDb-C:5 | waiting for server to shut down....pg_ctl: could not open PID file "data-C/postmaster.pid": Permission denied\r
dory | REL_11_STABLE | 2020-10-05 02:15:06 | pg_upgradeCheck | waiting for server to shut down..............pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/REL_11_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/data.old/postmaster.pid": Permission denied\r
bowerbird | REL_10_STABLE | 2020-10-21 23:19:22 | scriptsCheck | waiting for server to shut down....pg_ctl: could not open PID file "H:/prog/bf/root/REL_10_STABLE/pgsql.build/src/bin/scripts/tmp_check/data_main_cH68/pgdata/postmaster.pid": Permission denied\r
jacana | REL_10_STABLE | 2020-10-22 12:42:51 | pg_upgradeCheck | Oct 22 09:08:20 waiting for server to shut down....pg_ctl: could not open PID file "c:/mingw/msys/1.0/home/pgrunner/bf/root/REL_10_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/data.old/postmaster.pid": Permission denied\r
dory | HEAD | 2020-10-22 23:30:16 | Check | pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r
dory | REL_11_STABLE | 2020-10-27 18:15:08 | StopDb-C:2 | waiting for server to shut down....pg_ctl: could not open PID file "data-C/postmaster.pid": Permission denied\r
fairywren | REL_11_STABLE | 2020-10-28 04:01:17 | recoveryCheck | waiting for server to shut down....pg_ctl: could not open PID file "C:/tools/msys64/home/pgrunner/bf/root/REL_11_STABLE/pgsql.build/src/test/recovery/tmp_check/t_004_timeline_switch_standby_2_data/pgdata/postmaster.pid": Permission denied\r
hamerkop | REL_11_STABLE | 2020-11-01 11:31:54 | pg_upgradeCheck | waiting for server to shut down....pg_ctl: could not open PID file "c:/build-farm-local/buildroot/REL_11_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/data/postmaster.pid": Permission denied\r
dory | HEAD | 2020-11-04 05:50:17 | Check | pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r
walleye | HEAD | 2020-11-09 09:45:41 | Check | 2020-11-09 05:06:19.672 EST [4408:31] pg_regress/gist WARNING: could not open statistics file "pg_stat_tmp/global.stat": Permission denied
dory | HEAD | 2020-11-10 23:35:18 | Check | pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r

Now, the ones on the 10 and 11 branches are all from pg_ctl, which
does not use pgwin32_open() in those branches, only native open().
So those aren't fair to count against it. But we have nearly as
many similar failures in HEAD, which surely is going through
pgwin32_open(). So either we don't really have adequate protection
against delete-pending files, or there is some other effect we haven't
explained yet.

A simple theory about that is that the 1-second wait is not always
enough on heavily loaded buildfarm machines. Is it sensible to
increase the timeout?

regards, tom lane

#8Alexander Lakhin
exclusion@gmail.com
In reply to: Tom Lane (#7)
Re: More time spending with "delete pending"

Hello Tom,
18.11.2020 23:39, Tom Lane wrote:

BTW ... scraping the buildfarm logs for "could not open ... Permission
denied" failures suggests that pgwin32_open() isn't the pinnacle of
perfection either. In the last three months I found these instances:

dory | REL_11_STABLE | 2020-08-21 22:15:14 | Check | pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/REL_11_STABLE/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r
fairywren | REL_11_STABLE | 2020-09-01 01:04:55 | Check | pg_ctl: could not open PID file "C:/tools/msys64/home/pgrunner/bf/root/REL_11_STABLE/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r
dory | HEAD | 2020-09-16 02:40:17 | Check | pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r
walleye | HEAD | 2020-09-29 18:35:46 | pg_upgradeCheck | 2020-09-29 15:20:04.298 EDT [896:31] pg_regress/gist WARNING: could not open statistics file "pg_stat_tmp/global.stat": Permission denied
frogmouth | REL_10_STABLE | 2020-10-01 00:30:15 | StopDb-C:5 | waiting for server to shut down....pg_ctl: could not open PID file "data-C/postmaster.pid": Permission denied\r
dory | REL_11_STABLE | 2020-10-05 02:15:06 | pg_upgradeCheck | waiting for server to shut down..............pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/REL_11_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/data.old/postmaster.pid": Permission denied\r
bowerbird | REL_10_STABLE | 2020-10-21 23:19:22 | scriptsCheck | waiting for server to shut down....pg_ctl: could not open PID file "H:/prog/bf/root/REL_10_STABLE/pgsql.build/src/bin/scripts/tmp_check/data_main_cH68/pgdata/postmaster.pid": Permission denied\r
jacana | REL_10_STABLE | 2020-10-22 12:42:51 | pg_upgradeCheck | Oct 22 09:08:20 waiting for server to shut down....pg_ctl: could not open PID file "c:/mingw/msys/1.0/home/pgrunner/bf/root/REL_10_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/data.old/postmaster.pid": Permission denied\r
dory | HEAD | 2020-10-22 23:30:16 | Check | pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r
dory | REL_11_STABLE | 2020-10-27 18:15:08 | StopDb-C:2 | waiting for server to shut down....pg_ctl: could not open PID file "data-C/postmaster.pid": Permission denied\r
fairywren | REL_11_STABLE | 2020-10-28 04:01:17 | recoveryCheck | waiting for server to shut down....pg_ctl: could not open PID file "C:/tools/msys64/home/pgrunner/bf/root/REL_11_STABLE/pgsql.build/src/test/recovery/tmp_check/t_004_timeline_switch_standby_2_data/pgdata/postmaster.pid": Permission denied\r
hamerkop | REL_11_STABLE | 2020-11-01 11:31:54 | pg_upgradeCheck | waiting for server to shut down....pg_ctl: could not open PID file "c:/build-farm-local/buildroot/REL_11_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/data/postmaster.pid": Permission denied\r
dory | HEAD | 2020-11-04 05:50:17 | Check | pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r
walleye | HEAD | 2020-11-09 09:45:41 | Check | 2020-11-09 05:06:19.672 EST [4408:31] pg_regress/gist WARNING: could not open statistics file "pg_stat_tmp/global.stat": Permission denied
dory | HEAD | 2020-11-10 23:35:18 | Check | pg_ctl: could not open PID file "C:/pgbuildfarm/pgbuildroot/HEAD/pgsql.build/src/test/regress/./tmp_check/data/postmaster.pid": Permission denied\r

Now, the ones on the 10 and 11 branches are all from pg_ctl, which
does not use pgwin32_open() in those branches, only native open().
So those aren't fair to count against it. But we have nearly as
many similar failures in HEAD, which surely is going through
pgwin32_open(). So either we don't really have adequate protection
against delete-pending files, or there is some other effect we haven't
explained yet.

Can you confirm that there are no such failures on REL_12_STABLE and
REL_13_STABLE for last three (or maybe six) months? Maybe something
changed in HEAD then?

A simple theory about that is that the 1-second wait is not always
enough on heavily loaded buildfarm machines. Is it sensible to
increase the timeout?

I'm going to explore the issue this weekend, and probably comparing
duration of the tests before failures can say something about load of
the machines...
It's very doubtful that that state could last for so 1 second...

Best regards,
Alexander

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Lakhin (#8)
1 attachment(s)
Re: More time spending with "delete pending"

Alexander Lakhin <exclusion@gmail.com> writes:

18.11.2020 23:39, Tom Lane wrote:

Now, the ones on the 10 and 11 branches are all from pg_ctl, which
does not use pgwin32_open() in those branches, only native open().
So those aren't fair to count against it. But we have nearly as
many similar failures in HEAD, which surely is going through
pgwin32_open(). So either we don't really have adequate protection
against delete-pending files, or there is some other effect we haven't
explained yet.

Can you confirm that there are no such failures on REL_12_STABLE and
REL_13_STABLE for last three (or maybe six) months? Maybe something
changed in HEAD then?

Hmm, that is an interesting question isn't it. Here's a search going
back a full year. There are a few in v12 --- interestingly, all on
the statistics file, none from pg_ctl --- and none in v13. Of course,
v13 has only existed as a separate branch since 2020-06-07.

There's also a buildfarm test-coverage artifact involved. The bulk
of the HEAD reports are from dory and walleye, neither of which are
building v13 as yet, because of unclear problems [1]/messages/by-id/CA+BEBhvHhM-Bn628pf-LsjqRh3Ang7qCSBG0Ga+7KwhGqrNUPw@mail.gmail.com. I think those
two animals build much more frequently than our other Windows animals,
too, so the fact that they have more may be just because of that and
not because they're somehow more susceptible. Because of that, I'm not
sure that we have enough evidence to say that v13 is better than HEAD.
If there is some new bug, it's post-v12, but maybe not post-v13. But
v12 is evidently not perfect either.

regards, tom lane

[1]: /messages/by-id/CA+BEBhvHhM-Bn628pf-LsjqRh3Ang7qCSBG0Ga+7KwhGqrNUPw@mail.gmail.com

Attachments:

permissions-failures.txttext/plain; charset=us-ascii; name=permissions-failures.txtDownload
#10Alexander Lakhin
exclusion@gmail.com
In reply to: Tom Lane (#9)
Re: More time spending with "delete pending"

19.11.2020 01:28, Tom Lane wrote:

Alexander Lakhin <exclusion@gmail.com> writes:

18.11.2020 23:39, Tom Lane wrote:

Now, the ones on the 10 and 11 branches are all from pg_ctl, which
does not use pgwin32_open() in those branches, only native open().
So those aren't fair to count against it. But we have nearly as
many similar failures in HEAD, which surely is going through
pgwin32_open(). So either we don't really have adequate protection
against delete-pending files, or there is some other effect we haven't
explained yet.

Can you confirm that there are no such failures on REL_12_STABLE and
REL_13_STABLE for last three (or maybe six) months? Maybe something
changed in HEAD then?

Hmm, that is an interesting question isn't it. Here's a search going
back a full year. There are a few in v12 --- interestingly, all on
the statistics file, none from pg_ctl --- and none in v13. Of course,
v13 has only existed as a separate branch since 2020-06-07.

There's also a buildfarm test-coverage artifact involved. The bulk
of the HEAD reports are from dory and walleye, neither of which are
building v13 as yet, because of unclear problems [1]. I think those
two animals build much more frequently than our other Windows animals,
too, so the fact that they have more may be just because of that and
not because they're somehow more susceptible. Because of that, I'm not
sure that we have enough evidence to say that v13 is better than HEAD.
If there is some new bug, it's post-v12, but maybe not post-v13. But
v12 is evidently not perfect either.

Thanks for the complete list! As I can see from it, only lorikeet fails
on REL_12_STABLE. And it has a simple explanation.
lorikeet uses cygwin, not win32, but improved open() and fopen()
functions are not defined on cygwin:
#if defined(WIN32) && !defined(__CYGWIN__)

/*
�* open() and fopen() replacements to allow deletion of open files and
�* passing of other special options.
�*/
#define��� ��� O_DIRECT��� 0x80000000
extern int��� pgwin32_open(const char *, int,...);
extern FILE *pgwin32_fopen(const char *, const char *);
#define��� ��� open(a,b,c) pgwin32_open(a,b,c)
#define��� ��� fopen(a,b) pgwin32_fopen(a,b)
...

And aside from the "Permission denied" failures (not necessarily related
to the "delete pending" state), we can see the "Device or resource busy"
failures on the same global.stat file (e. g., [1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=lorikeet&amp;dt=2020-09-07%2020%3A31%3A16&amp;stg=install-check-C, [2]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&amp;dt=2019-10-27%2009%3A36%3A07, [3]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=lorikeet&amp;dt=2020-09-02%2008%3A28%3A53&amp;stg=install-check-C). All these
failures occur when VACUUM (or REINDEX [3]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=lorikeet&amp;dt=2020-09-02%2008%3A28%3A53&amp;stg=install-check-C) tries to access statistics
simultaneously with the statistics collector.

I've tried to make open()/fopen() replacements for cygwin, but haven't
succeeded yet (I need more time to find out how to get original
GetLastError() [4]https://github.com/openunix/cygwin/blob/master/winsup/cygwin/errno.cc, as knowing errno is not sufficient to implement
pgwin32_open()' logic). (And I'm inclined to file a separate bug related
to this cygwin behavior when I'll get more info.)

The failures on HEAD (on dory and walleye) need another investigation
(maybe they are caused by modified stat()...).

[1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=lorikeet&amp;dt=2020-09-07%2020%3A31%3A16&amp;stg=install-check-C
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=lorikeet&amp;dt=2020-09-07%2020%3A31%3A16&amp;stg=install-check-C
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&amp;dt=2019-10-27%2009%3A36%3A07
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&amp;dt=2019-10-27%2009%3A36%3A07
[3]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=lorikeet&amp;dt=2020-09-02%2008%3A28%3A53&amp;stg=install-check-C
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=lorikeet&amp;dt=2020-09-02%2008%3A28%3A53&amp;stg=install-check-C
[4]: https://github.com/openunix/cygwin/blob/master/winsup/cygwin/errno.cc

Best regards,
Alexander

#11Alexander Lakhin
exclusion@gmail.com
In reply to: Alexander Lakhin (#10)
1 attachment(s)
Re: More time spending with "delete pending"

Hello hackers,

22.11.2020 18:59, Alexander Lakhin wrote:

19.11.2020 01:28, Tom Lane wrote:

Hmm, that is an interesting question isn't it. Here's a search going
back a full year. There are a few in v12 --- interestingly, all on
the statistics file, none from pg_ctl --- and none in v13. Of course,
v13 has only existed as a separate branch since 2020-06-07.

There's also a buildfarm test-coverage artifact involved. The bulk
of the HEAD reports are from dory and walleye, neither of which are
building v13 as yet, because of unclear problems [1]. I think those
two animals build much more frequently than our other Windows animals,
too, so the fact that they have more may be just because of that and
not because they're somehow more susceptible. Because of that, I'm not
sure that we have enough evidence to say that v13 is better than HEAD.
If there is some new bug, it's post-v12, but maybe not post-v13. But
v12 is evidently not perfect either.

The failures on HEAD (on dory and walleye) need another investigation
(maybe they are caused by modified stat()...).

To revive this topic and make sure that the issue still persists, I've
searched through the buildfarm logs (for HEAD and REL_13_STABLE) and
collected the following failures of the misc_functions test with the
"Permission denied" error:

bowerbird
HEAD (13 runs last week, 500 total, 18 failed)
-

REL_13_STABLE (4 runs last week, 237 total, 8 failed)
-

drongo
HEAD (25 runs last week, 500 total, 24 failed)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&amp;dt=2020-12-27%2019%3A01%3A50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&amp;dt=2021-01-18%2001%3A01%3A55

REL_13_STABLE (5 runs last week, 255 total, 2 failed)
-

hamerkop
HEAD - (8 runs last week, 501 total, 20 failed)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&amp;dt=2021-02-22%2010%3A20%3A00

fairywren
HEAD - (27 runs last week, 500 total, 28 failed)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2021-02-22%2012%3A03%3A35
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2021-01-28%2000%3A04%3A50
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2021-01-19%2023%3A29%3A38
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2021-01-17%2018%3A08%3A11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2020-11-30%2009%3A57%3A15
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2020-11-19%2006%3A02%3A22
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2020-11-18%2018%3A02%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2020-09-16%2019%3A48%3A21

REL_13_STABLE (5 runs last week, 302 total, 5 failed)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2021-02-26%2003%3A03%3A42
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2020-10-20%2007%3A50%3A44
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2020-09-10%2006%3A02%3A12
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2020-09-04%2019%3A58%3A39

woodlouse
HEAD (33 runs last week, 500 total, 5 failed)
-

REL_13_STABLE (4 runs last week, 325 total, 1 failed)
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=woodlouse&amp;dt=2021-01-20%2011%3A42%3A30

dory
HEAD (52 runs last week, 500 total, 25 failed)
-

whelk
HEAD (31 runs last week, 500 total, 11 failed)
-

REL_13_STABLE (7 runs last week, 224 total, 1 failed)
-

(I've not considered lorikeet, jacana, and walleye as they are using
alternative compilers.)
So on some machines there wasn't a single failure (probably it depends
on hardware), but where it fails, such failures account for a
significant share of all failures.
I believe that the patch attached to [1] should fix this issue. The
patch still applies to master and makes the demotest (attached to [2])
pass. Also I've prepared a trivial patch that makes pgwin32_open() use
the original stat() function (as in the proposed change for _pgstat64()).

/messages/by-id/979f4ee6-9960-e37f-f3ee-f458e87777b0@gmail.com
/messages/by-id/c3427edf-d7c0-ff57-90f6-b5de3bb62709@gmail.com

Best regards,
Alexander

Attachments:

use-microsoft_native_stat-in-pgwin32_open.patchtext/x-patch; charset=UTF-8; name=use-microsoft_native_stat-in-pgwin32_open.patchDownload
diff --git a/src/port/open.c b/src/port/open.c
index 14c6debba9..4bd11ca9d9 100644
--- a/src/port/open.c
+++ b/src/port/open.c
@@ -157,9 +157,9 @@ pgwin32_open(const char *fileName, int fileFlags,...)
 		{
 			if (loops < 10)
 			{
-				struct stat st;
+				struct microsoft_native_stat st;
 
-				if (stat(fileName, &st) != 0)
+				if (microsoft_native_stat(fileName, &st) != 0)
 				{
 					pg_usleep(100000);
 					loops++;
#12Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#11)
Re: More time spending with "delete pending"

On Sun, Mar 14, 2021 at 06:00:00PM +0300, Alexander Lakhin wrote:

I believe that the patch attached to [1] should fix this issue. The
patch still applies to master and makes the demotest (attached to [2])
pass. Also I've prepared a trivial patch that makes pgwin32_open() use
the original stat() function (as in the proposed change for _pgstat64()).

Hmm. Knowing that _pgfstat64() has some special handling related to
files pending for deletion, do we really need that on HEAD?

-				struct stat st;
+				struct microsoft_native_stat st;
-				if (stat(fileName, &st) != 0)
+				if (microsoft_native_stat(fileName, &st) != 0)
{
pg_usleep(100000);
loops++;

This change looks like a good idea for the WIN32 emulation of open(),
taken independently.
--
Michael

#13Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#12)
Re: More time spending with "delete pending"

Hello Michael,

06.07.2021 11:33, Michael Paquier wrote:

On Sun, Mar 14, 2021 at 06:00:00PM +0300, Alexander Lakhin wrote:

I believe that the patch attached to [1] should fix this issue. The
patch still applies to master and makes the demotest (attached to [2])
pass. Also I've prepared a trivial patch that makes pgwin32_open() use
the original stat() function (as in the proposed change for _pgstat64()).

Hmm. Knowing that _pgfstat64() has some special handling related to
files pending for deletion, do we really need that on HEAD?

Yes, this fix is needed for HEAD (b9734c13) as the simple test attached
to [1]/messages/by-id/c3427edf-d7c0-ff57-90f6-b5de3bb62709@gmail.com shows:
(You can put that script in src\test\modules\test_misc\t and perform
`vcregress taptest src\test\modules\test_misc`.)
t/001_delete_pending.pl ......... # Looks like your test exited with 2
before it could output anything.

and
src\test\modules\test_misc\tmp_check\log\regress_log_001_delete_pending
contains:
# Postmaster PID for node "node" is 1616
error running SQL: 'psql:<stdin>:1: ERROR:� could not stat file
"pg_wal/dummy": Permission denied'
while running 'psql -XAtq -d port=64889 host=127.0.0.1 dbname='postgres'
-f - -v ON_ERROR_STOP=1' with sql 'select count(*) > 0 as ok from
pg_ls_waldir();' at C:/src/postgresql/src/test/perl/PostgresNode.pm line
1771.

As Tom Lane noted above, the code added with bed90759f is dubious
(_NtQueryInformationFile() can not be used to handle the "delete
pending" state as CreateFile() returns INVALID_HANDLE_VALUE in this case.)
Probably that change should be reverted. Should I do it along with the
proposed fix?

[1]: /messages/by-id/c3427edf-d7c0-ff57-90f6-b5de3bb62709@gmail.com
/messages/by-id/c3427edf-d7c0-ff57-90f6-b5de3bb62709@gmail.com

Best regards,
Alexander

#14Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#13)
Re: More time spending with "delete pending"

On Thu, Jul 08, 2021 at 07:00:01AM +0300, Alexander Lakhin wrote:

As Tom Lane noted above, the code added with bed90759f is dubious
(_NtQueryInformationFile() can not be used to handle the "delete
pending" state as CreateFile() returns INVALID_HANDLE_VALUE in this case.)
Probably that change should be reverted. Should I do it along with the
proposed fix?

Ah, I see. I have managed to miss your point. If
_NtQueryInformationFile() cannot be used, then we'd actually miss the
contents for standardInfo and the pending deletion. If you could send
everything you have, that would be helpful! I'd like to test that
stuff by myself, with all the contents discussed at disposal for a
proper evaluation.
--
Michael

#15Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#14)
1 attachment(s)
Re: More time spending with "delete pending"

08.07.2021 10:47, Michael Paquier wrote:

On Thu, Jul 08, 2021 at 07:00:01AM +0300, Alexander Lakhin wrote:

As Tom Lane noted above, the code added with bed90759f is dubious
(_NtQueryInformationFile() can not be used to handle the "delete
pending" state as CreateFile() returns INVALID_HANDLE_VALUE in this case.)
Probably that change should be reverted. Should I do it along with the
proposed fix?

Ah, I see. I have managed to miss your point. If
_NtQueryInformationFile() cannot be used, then we'd actually miss the
contents for standardInfo and the pending deletion. If you could send
everything you have, that would be helpful! I'd like to test that
stuff by myself, with all the contents discussed at disposal for a
proper evaluation.
--

Beside the aforementioned test I can only propose the extended patch,
that incorporates the undo of the changes brought by bed90759f.
With this patch that test is passed.

Best regards,
Alexander

Attachments:

fix_pg_ls_dir_v2.patchtext/x-patch; charset=UTF-8; name=fix_pg_ls_dir_v2.patchDownload
diff --git a/src/backend/utils/adt/genfile.c b/src/backend/utils/adt/genfile.c
index c436d9318b..a8c22a81d0 100644
--- a/src/backend/utils/adt/genfile.c
+++ b/src/backend/utils/adt/genfile.c
@@ -28,6 +28,7 @@
 #include "funcapi.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "port.h"
 #include "postmaster/syslogger.h"
 #include "storage/fd.h"
 #include "utils/acl.h"
diff --git a/src/port/win32stat.c b/src/port/win32stat.c
index 2ad8ee1359..6028ad809e 100644
--- a/src/port/win32stat.c
+++ b/src/port/win32stat.c
@@ -18,53 +18,6 @@
 #include "c.h"
 #include <windows.h>
 
-/*
- * In order to support MinGW and MSVC2013 we use NtQueryInformationFile as an
- * alternative for GetFileInformationByHandleEx. It is loaded from the ntdll
- * library.
- */
-#if _WIN32_WINNT < 0x0600
-#include <winternl.h>
-
-#if !defined(__MINGW32__) && !defined(__MINGW64__)
-/* MinGW includes this in <winternl.h>, but it is missing in MSVC */
-typedef struct _FILE_STANDARD_INFORMATION
-{
-	LARGE_INTEGER AllocationSize;
-	LARGE_INTEGER EndOfFile;
-	ULONG		NumberOfLinks;
-	BOOLEAN		DeletePending;
-	BOOLEAN		Directory;
-} FILE_STANDARD_INFORMATION;
-#define FileStandardInformation 5
-#endif							/* !defined(__MINGW32__) &&
-								 * !defined(__MINGW64__) */
-
-typedef NTSTATUS (NTAPI * PFN_NTQUERYINFORMATIONFILE)
-			(IN HANDLE FileHandle,
-			 OUT PIO_STATUS_BLOCK IoStatusBlock,
-			 OUT PVOID FileInformation,
-			 IN ULONG Length,
-			 IN FILE_INFORMATION_CLASS FileInformationClass);
-
-static PFN_NTQUERYINFORMATIONFILE _NtQueryInformationFile = NULL;
-
-static HMODULE ntdll = NULL;
-
-/*
- * Load DLL file just once regardless of how many functions we load/call in it.
- */
-static void
-LoadNtdll(void)
-{
-	if (ntdll != NULL)
-		return;
-	ntdll = LoadLibraryEx("ntdll.dll", NULL, 0);
-}
-
-#endif							/* _WIN32_WINNT < 0x0600 */
-
-
 /*
  * Convert a FILETIME struct into a 64 bit time_t.
  */
@@ -170,110 +123,73 @@ _pgstat64(const char *name, struct stat *buf)
 	SECURITY_ATTRIBUTES sa;
 	HANDLE		hFile;
 	int			ret;
-#if _WIN32_WINNT < 0x0600
-	IO_STATUS_BLOCK ioStatus;
-	FILE_STANDARD_INFORMATION standardInfo;
-#else
-	FILE_STANDARD_INFO standardInfo;
-#endif
+	int			loops = 0;
 
 	if (name == NULL || buf == NULL)
 	{
 		errno = EINVAL;
 		return -1;
 	}
-
 	/* fast not-exists check */
 	if (GetFileAttributes(name) == INVALID_FILE_ATTRIBUTES)
 	{
-		_dosmaperr(GetLastError());
-		return -1;
+		DWORD		err = GetLastError();
+
+		if (err != ERROR_ACCESS_DENIED) {
+			_dosmaperr(err);
+			return -1;
+		}
 	}
 
 	/* get a file handle as lightweight as we can */
 	sa.nLength = sizeof(SECURITY_ATTRIBUTES);
 	sa.bInheritHandle = TRUE;
 	sa.lpSecurityDescriptor = NULL;
-	hFile = CreateFile(name,
-					   GENERIC_READ,
-					   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
-					   &sa,
-					   OPEN_EXISTING,
-					   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
-						FILE_FLAG_OVERLAPPED),
-					   NULL);
-	if (hFile == INVALID_HANDLE_VALUE)
+	while ((hFile = CreateFile(name,
+							   GENERIC_READ,
+							   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
+							   &sa,
+							   OPEN_EXISTING,
+							   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
+							   FILE_FLAG_OVERLAPPED),
+							   NULL)) == INVALID_HANDLE_VALUE)
 	{
 		DWORD		err = GetLastError();
 
-		CloseHandle(hFile);
-		_dosmaperr(err);
-		return -1;
-	}
-
-	memset(&standardInfo, 0, sizeof(standardInfo));
-
-#if _WIN32_WINNT < 0x0600
-	if (_NtQueryInformationFile == NULL)
-	{
-		/* First time through: load ntdll.dll and find NtQueryInformationFile */
-		LoadNtdll();
-		if (ntdll == NULL)
-		{
-			DWORD		err = GetLastError();
-
-			CloseHandle(hFile);
-			_dosmaperr(err);
-			return -1;
-		}
-
-		_NtQueryInformationFile = (PFN_NTQUERYINFORMATIONFILE) (pg_funcptr_t)
-			GetProcAddress(ntdll, "NtQueryInformationFile");
-		if (_NtQueryInformationFile == NULL)
+		/*
+		 * ERROR_ACCESS_DENIED is returned if the file is deleted but not yet
+		 * gone (Windows NT status code is STATUS_DELETE_PENDING).  In that
+		 * case we want to wait a bit and try again, giving up after 1 second
+		 * (since this condition should never persist very long).  However,
+		 * there are other commonly-hit cases that return ERROR_ACCESS_DENIED,
+		 * so care is needed.  In particular that happens if we try to open a
+		 * directory, or of course if there's an actual file-permissions
+		 * problem.  To distinguish these cases, try a stat().  In the
+		 * delete-pending case, it will either also get STATUS_DELETE_PENDING,
+		 * or it will see the file as gone and fail with ENOENT.  In other
+		 * cases it will usually succeed.  The only somewhat-likely case where
+		 * this coding will uselessly wait is if there's a permissions problem
+		 * with a containing directory, which we hope will never happen in any
+		 * performance-critical code paths.
+		 */
+		if (err == ERROR_ACCESS_DENIED)
 		{
-			DWORD		err = GetLastError();
-
-			CloseHandle(hFile);
-			_dosmaperr(err);
-			return -1;
+			if (loops < 10)
+			{
+				struct microsoft_native_stat st;
+
+				if (microsoft_native_stat(name, &st) != 0)
+				{
+					pg_usleep(100000);
+					loops++;
+					continue;
+				}
+			}
 		}
-	}
-
-	if (!NT_SUCCESS(_NtQueryInformationFile(hFile, &ioStatus, &standardInfo,
-											sizeof(standardInfo),
-											FileStandardInformation)))
-	{
-		DWORD		err = GetLastError();
-
-		CloseHandle(hFile);
-		_dosmaperr(err);
-		return -1;
-	}
-#else
-	if (!GetFileInformationByHandleEx(hFile, FileStandardInfo, &standardInfo,
-									  sizeof(standardInfo)))
-	{
-		DWORD		err = GetLastError();
 
-		CloseHandle(hFile);
 		_dosmaperr(err);
 		return -1;
 	}
-#endif							/* _WIN32_WINNT < 0x0600 */
-
-	if (standardInfo.DeletePending)
-	{
-		/*
-		 * File has been deleted, but is not gone from the filesystem yet.
-		 * This can happen when some process with FILE_SHARE_DELETE has it
-		 * open, and it will be fully removed once that handle is closed.
-		 * Meanwhile, we can't open it, so indicate that the file just doesn't
-		 * exist.
-		 */
-		CloseHandle(hFile);
-		errno = ENOENT;
-		return -1;
-	}
 
 	/* At last we can invoke fileinfo_to_stat */
 	ret = fileinfo_to_stat(hFile, buf);
#16Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#15)
1 attachment(s)
Re: More time spending with "delete pending"

On Thu, Jul 08, 2021 at 11:00:00PM +0300, Alexander Lakhin wrote:

Beside the aforementioned test I can only propose the extended patch,
that incorporates the undo of the changes brought by bed90759f.
With this patch that test is passed.

Checked and confirmed. It is a nice test with IPC::Run you have here.
Making things in win32stat.c more consistent with open.c surely is
appealing. One thing that I'd like to introduce in this patch, and
also mentioned upthread, is to change the stat() call in open.c to use
microsoft_native_stat().

I have let pgbench run for a couple of hours with some concurrent
activity using genfile.c, without noticing problems. My environment
is not representative of everything we can find out there on Windows,
but it brings some confidence.
--
Michael

Attachments:

win32-stat-deletion.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/port/open.c b/src/port/open.c
index 14c6debba9..4bd11ca9d9 100644
--- a/src/port/open.c
+++ b/src/port/open.c
@@ -157,9 +157,9 @@ pgwin32_open(const char *fileName, int fileFlags,...)
 		{
 			if (loops < 10)
 			{
-				struct stat st;
+				struct microsoft_native_stat st;
 
-				if (stat(fileName, &st) != 0)
+				if (microsoft_native_stat(fileName, &st) != 0)
 				{
 					pg_usleep(100000);
 					loops++;
diff --git a/src/port/win32stat.c b/src/port/win32stat.c
index 2ad8ee1359..6028ad809e 100644
--- a/src/port/win32stat.c
+++ b/src/port/win32stat.c
@@ -18,53 +18,6 @@
 #include "c.h"
 #include <windows.h>
 
-/*
- * In order to support MinGW and MSVC2013 we use NtQueryInformationFile as an
- * alternative for GetFileInformationByHandleEx. It is loaded from the ntdll
- * library.
- */
-#if _WIN32_WINNT < 0x0600
-#include <winternl.h>
-
-#if !defined(__MINGW32__) && !defined(__MINGW64__)
-/* MinGW includes this in <winternl.h>, but it is missing in MSVC */
-typedef struct _FILE_STANDARD_INFORMATION
-{
-	LARGE_INTEGER AllocationSize;
-	LARGE_INTEGER EndOfFile;
-	ULONG		NumberOfLinks;
-	BOOLEAN		DeletePending;
-	BOOLEAN		Directory;
-} FILE_STANDARD_INFORMATION;
-#define FileStandardInformation 5
-#endif							/* !defined(__MINGW32__) &&
-								 * !defined(__MINGW64__) */
-
-typedef NTSTATUS (NTAPI * PFN_NTQUERYINFORMATIONFILE)
-			(IN HANDLE FileHandle,
-			 OUT PIO_STATUS_BLOCK IoStatusBlock,
-			 OUT PVOID FileInformation,
-			 IN ULONG Length,
-			 IN FILE_INFORMATION_CLASS FileInformationClass);
-
-static PFN_NTQUERYINFORMATIONFILE _NtQueryInformationFile = NULL;
-
-static HMODULE ntdll = NULL;
-
-/*
- * Load DLL file just once regardless of how many functions we load/call in it.
- */
-static void
-LoadNtdll(void)
-{
-	if (ntdll != NULL)
-		return;
-	ntdll = LoadLibraryEx("ntdll.dll", NULL, 0);
-}
-
-#endif							/* _WIN32_WINNT < 0x0600 */
-
-
 /*
  * Convert a FILETIME struct into a 64 bit time_t.
  */
@@ -170,108 +123,71 @@ _pgstat64(const char *name, struct stat *buf)
 	SECURITY_ATTRIBUTES sa;
 	HANDLE		hFile;
 	int			ret;
-#if _WIN32_WINNT < 0x0600
-	IO_STATUS_BLOCK ioStatus;
-	FILE_STANDARD_INFORMATION standardInfo;
-#else
-	FILE_STANDARD_INFO standardInfo;
-#endif
+	int			loops = 0;
 
 	if (name == NULL || buf == NULL)
 	{
 		errno = EINVAL;
 		return -1;
 	}
-
 	/* fast not-exists check */
 	if (GetFileAttributes(name) == INVALID_FILE_ATTRIBUTES)
 	{
-		_dosmaperr(GetLastError());
-		return -1;
+		DWORD		err = GetLastError();
+
+		if (err != ERROR_ACCESS_DENIED) {
+			_dosmaperr(err);
+			return -1;
+		}
 	}
 
 	/* get a file handle as lightweight as we can */
 	sa.nLength = sizeof(SECURITY_ATTRIBUTES);
 	sa.bInheritHandle = TRUE;
 	sa.lpSecurityDescriptor = NULL;
-	hFile = CreateFile(name,
-					   GENERIC_READ,
-					   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
-					   &sa,
-					   OPEN_EXISTING,
-					   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
-						FILE_FLAG_OVERLAPPED),
-					   NULL);
-	if (hFile == INVALID_HANDLE_VALUE)
+	while ((hFile = CreateFile(name,
+							   GENERIC_READ,
+							   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
+							   &sa,
+							   OPEN_EXISTING,
+							   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
+							   FILE_FLAG_OVERLAPPED),
+							   NULL)) == INVALID_HANDLE_VALUE)
 	{
 		DWORD		err = GetLastError();
 
-		CloseHandle(hFile);
-		_dosmaperr(err);
-		return -1;
-	}
-
-	memset(&standardInfo, 0, sizeof(standardInfo));
-
-#if _WIN32_WINNT < 0x0600
-	if (_NtQueryInformationFile == NULL)
-	{
-		/* First time through: load ntdll.dll and find NtQueryInformationFile */
-		LoadNtdll();
-		if (ntdll == NULL)
-		{
-			DWORD		err = GetLastError();
-
-			CloseHandle(hFile);
-			_dosmaperr(err);
-			return -1;
-		}
-
-		_NtQueryInformationFile = (PFN_NTQUERYINFORMATIONFILE) (pg_funcptr_t)
-			GetProcAddress(ntdll, "NtQueryInformationFile");
-		if (_NtQueryInformationFile == NULL)
-		{
-			DWORD		err = GetLastError();
-
-			CloseHandle(hFile);
-			_dosmaperr(err);
-			return -1;
-		}
-	}
-
-	if (!NT_SUCCESS(_NtQueryInformationFile(hFile, &ioStatus, &standardInfo,
-											sizeof(standardInfo),
-											FileStandardInformation)))
-	{
-		DWORD		err = GetLastError();
-
-		CloseHandle(hFile);
-		_dosmaperr(err);
-		return -1;
-	}
-#else
-	if (!GetFileInformationByHandleEx(hFile, FileStandardInfo, &standardInfo,
-									  sizeof(standardInfo)))
-	{
-		DWORD		err = GetLastError();
-
-		CloseHandle(hFile);
-		_dosmaperr(err);
-		return -1;
-	}
-#endif							/* _WIN32_WINNT < 0x0600 */
-
-	if (standardInfo.DeletePending)
-	{
 		/*
-		 * File has been deleted, but is not gone from the filesystem yet.
-		 * This can happen when some process with FILE_SHARE_DELETE has it
-		 * open, and it will be fully removed once that handle is closed.
-		 * Meanwhile, we can't open it, so indicate that the file just doesn't
-		 * exist.
+		 * ERROR_ACCESS_DENIED is returned if the file is deleted but not yet
+		 * gone (Windows NT status code is STATUS_DELETE_PENDING).  In that
+		 * case we want to wait a bit and try again, giving up after 1 second
+		 * (since this condition should never persist very long).  However,
+		 * there are other commonly-hit cases that return ERROR_ACCESS_DENIED,
+		 * so care is needed.  In particular that happens if we try to open a
+		 * directory, or of course if there's an actual file-permissions
+		 * problem.  To distinguish these cases, try a stat().  In the
+		 * delete-pending case, it will either also get STATUS_DELETE_PENDING,
+		 * or it will see the file as gone and fail with ENOENT.  In other
+		 * cases it will usually succeed.  The only somewhat-likely case where
+		 * this coding will uselessly wait is if there's a permissions problem
+		 * with a containing directory, which we hope will never happen in any
+		 * performance-critical code paths.
 		 */
-		CloseHandle(hFile);
-		errno = ENOENT;
+		if (err == ERROR_ACCESS_DENIED)
+		{
+			if (loops < 10)
+			{
+				struct microsoft_native_stat st;
+
+				if (microsoft_native_stat(name, &st) != 0)
+				{
+					pg_usleep(100000);
+					loops++;
+					continue;
+				}
+			}
+		}
+
+		_dosmaperr(err);
 		return -1;
 	}
 
#17Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#16)
Re: More time spending with "delete pending"

Hello Michael,
09.07.2021 08:52, Michael Paquier wrote:

On Thu, Jul 08, 2021 at 11:00:00PM +0300, Alexander Lakhin wrote:

Beside the aforementioned test I can only propose the extended patch,
that incorporates the undo of the changes brought by bed90759f.
With this patch that test is passed.

Checked and confirmed. It is a nice test with IPC::Run you have here.
Making things in win32stat.c more consistent with open.c surely is
appealing. One thing that I'd like to introduce in this patch, and
also mentioned upthread, is to change the stat() call in open.c to use
microsoft_native_stat().

I have let pgbench run for a couple of hours with some concurrent
activity using genfile.c, without noticing problems. My environment
is not representative of everything we can find out there on Windows,
but it brings some confidence.

Thank you! I agree with your improvement. I can't remember why did I
inject 'include "port.h"' in genfile.c.
Today I've rechecked all the chain of includes and I see that stat() is
redefined as _pgstat64() in win32_port.h, that includes <sys/stat.h>.
genfile.c includes "postgres.h" (that includes win32_port.h indirectly)
and then includes <sys/stat.h> again, but the later include should be
ignored due "#pragma once" in stat.h.
So I have no objection to the removal of that include.

Best regards,
Alexander

#18Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#17)
Re: More time spending with "delete pending"

On Fri, Jul 09, 2021 at 09:00:00PM +0300, Alexander Lakhin wrote:

Thank you! I agree with your improvement. I can't remember why did I
inject 'include "port.h"' in genfile.c.
Today I've rechecked all the chain of includes and I see that stat() is
redefined as _pgstat64() in win32_port.h, that includes <sys/stat.h>.
genfile.c includes "postgres.h" (that includes win32_port.h indirectly)
and then includes <sys/stat.h> again, but the later include should be
ignored due "#pragma once" in stat.h.
So I have no objection to the removal of that include.

Thanks, that matches my impression. There was one comment at the top
of _pgstat64() that was still incorrect. I have spent more time
checking and tested this stuff today, and that looks fine. One
question pending is if we should increase the timeout used by WIN32's
stat() and open(), but the buildfarm should be able to tell us more
here.
--
Michael

#19Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#18)
Re: More time spending with "delete pending"

On Mon, Jul 12, 2021 at 01:07:17PM +0900, Michael Paquier wrote:

Thanks, that matches my impression. There was one comment at the top
of _pgstat64() that was still incorrect. I have spent more time
checking and tested this stuff today, and that looks fine. One
question pending is if we should increase the timeout used by WIN32's
stat() and open(), but the buildfarm should be able to tell us more
here.

And fairywren, that uses MinGW, is unhappy:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2021-07-12%2004%3A47%3A13
Looking at it now.
--
Michael

#20Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#19)
Re: More time spending with "delete pending"

On Mon, Jul 12, 2021 at 02:09:41PM +0900, Michael Paquier wrote:

And fairywren, that uses MinGW, is unhappy:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2021-07-12%2004%3A47%3A13
Looking at it now.

I am not sure what to do here to cool down MinGW, so the patch has
been reverted for now:
- Plain stat() is not enough to do a proper detection of the files
pending for deletion on MSVC, so reverting only the part with
microsoft_native_stat() is not going to help.
- We are going to need an evaluation of the stat() implementation in
MinGW and check if is it possible to rely on it more. If yes, we
could get away with more tweaks based on __MINGW32__/__MINGW64__.
--
Michael

#21Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#20)
Re: More time spending with "delete pending"

Hello Michael,
12.07.2021 08:52, Michael Paquier wrote:

On Mon, Jul 12, 2021 at 02:09:41PM +0900, Michael Paquier wrote:

And fairywren, that uses MinGW, is unhappy:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2021-07-12%2004%3A47%3A13
Looking at it now.

I am not sure what to do here to cool down MinGW, so the patch has
been reverted for now:
- Plain stat() is not enough to do a proper detection of the files
pending for deletion on MSVC, so reverting only the part with
microsoft_native_stat() is not going to help.
- We are going to need an evaluation of the stat() implementation in
MinGW and check if is it possible to rely on it more. If yes, we
could get away with more tweaks based on __MINGW32__/__MINGW64__.

I'll take care of it and will prepare mingw-compatible patch tomorrow.

Best regards,
Alexander

#22Michael Paquier
michael@paquier.xyz
In reply to: Alexander Lakhin (#21)
Re: More time spending with "delete pending"

On Mon, Jul 12, 2021 at 09:06:01AM +0300, Alexander Lakhin wrote:

I'll take care of it and will prepare mingw-compatible patch tomorrow.

Thanks. Do you have an environment with 32b or 64b MinGW?
--
Michael

#23Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#22)
Re: More time spending with "delete pending"

12.07.2021 09:23, Michael Paquier wrote:

On Mon, Jul 12, 2021 at 09:06:01AM +0300, Alexander Lakhin wrote:

I'll take care of it and will prepare mingw-compatible patch tomorrow.

Thanks. Do you have an environment with 32b or 64b MinGW?

Yes, I have VMs with 32- and 64-bit mingw environments.

Best regards,
Alexander

#24Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#20)
2 attachment(s)
Re: More time spending with "delete pending"

Hello Michael,
12.07.2021 08:52, Michael Paquier wrote:

On Mon, Jul 12, 2021 at 02:09:41PM +0900, Michael Paquier wrote:

And fairywren, that uses MinGW, is unhappy:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2021-07-12%2004%3A47%3A13
Looking at it now.

I am not sure what to do here to cool down MinGW, so the patch has
been reverted for now:
- Plain stat() is not enough to do a proper detection of the files
pending for deletion on MSVC, so reverting only the part with
microsoft_native_stat() is not going to help.
- We are going to need an evaluation of the stat() implementation in
MinGW and check if is it possible to rely on it more. If yes, we
could get away with more tweaks based on __MINGW32__/__MINGW64__.

I've managed to adapt open/win32stat for MinGW by preventing stat
overriding at the implementation level. Thus, the code that calls stat()
will use the overridden struct/function(s), but inside of
open/win32_stat we could access original stat and reference to
__pgstat64 where we want to use our.
Tested on MSVC, MinGW64 and MinGW32 � the fixed code compiles everywhere.

But when using perl from msys (not ActivePerl) while testing I've got
the same test failure due to another error condition:

In this case CreateFile() fails with ERROR_SHARING_VIOLATION (not
ERROR_ACCESS_DENIED) and it leads to the same "Permission denied" error.
This condition is handled in the pgwin32_open() but not in _pgstat64() yet.

I think I should develop another test for MSVC environment that will
demonstrate a such failure too.
But as it's not related to the DELETE_PENDING state, I wonder whether
this should be fixed in a separate patch.

Best regards,
Alexander

Attachments:

ghffjapimnefdgac.pngimage/png; name=ghffjapimnefdgac.pngDownload
win32-stat-deletion-v2.patchtext/x-patch; charset=UTF-8; name=win32-stat-deletion-v2.patchDownload
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 05c5a53442..d72950880a 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -53,11 +53,13 @@
 #undef near
 
 /* needed before sys/stat hacking below: */
+#if !defined(NO_STAT_OVERRIDE)
 #define fstat microsoft_native_fstat
 #define stat microsoft_native_stat
 #include <sys/stat.h>
 #undef fstat
 #undef stat
+#endif
 
 /* Must be here to avoid conflicting with prototype in windows.h */
 #define mkdir(a,b)	mkdir(a)
@@ -253,7 +255,7 @@ typedef int pid_t;
  * The struct stat is 32 bit in MSVC, so we redefine it as a copy of
  * struct __stat64.  This also fixes the struct size for MINGW builds.
  */
-struct stat						/* This should match struct __stat64 */
+struct _pgstat64					/* This should match struct __stat64 */
 {
 	_dev_t		st_dev;
 	_ino_t		st_ino;
@@ -268,12 +270,14 @@ struct stat						/* This should match struct __stat64 */
 	__time64_t	st_ctime;
 };
 
-extern int	_pgfstat64(int fileno, struct stat *buf);
-extern int	_pgstat64(const char *name, struct stat *buf);
+extern int	_pgfstat64(int fileno, struct _pgstat64 *buf);
+extern int	_pgstat64(const char *name, struct _pgstat64 *buf);
 
+#if !defined(NO_STAT_OVERRIDE)
 #define fstat(fileno, sb)	_pgfstat64(fileno, sb)
-#define stat(path, sb)		_pgstat64(path, sb)
+#define stat				_pgstat64
 #define lstat(path, sb)		_pgstat64(path, sb)
+#endif
 
 /* These macros are not provided by older MinGW, nor by MSVC */
 #ifndef S_IRUSR
diff --git a/src/port/open.c b/src/port/open.c
index 14c6debba9..09144351a4 100644
--- a/src/port/open.c
+++ b/src/port/open.c
@@ -13,6 +13,7 @@
 
 #ifdef WIN32
 
+#define NO_STAT_OVERRIDE
 #ifndef FRONTEND
 #include "postgres.h"
 #else
diff --git a/src/port/win32stat.c b/src/port/win32stat.c
index 2ad8ee1359..1c9c01cc98 100644
--- a/src/port/win32stat.c
+++ b/src/port/win32stat.c
@@ -15,54 +15,10 @@
 
 #ifdef WIN32
 
+#define NO_STAT_OVERRIDE
 #include "c.h"
 #include <windows.h>
-
-/*
- * In order to support MinGW and MSVC2013 we use NtQueryInformationFile as an
- * alternative for GetFileInformationByHandleEx. It is loaded from the ntdll
- * library.
- */
-#if _WIN32_WINNT < 0x0600
-#include <winternl.h>
-
-#if !defined(__MINGW32__) && !defined(__MINGW64__)
-/* MinGW includes this in <winternl.h>, but it is missing in MSVC */
-typedef struct _FILE_STANDARD_INFORMATION
-{
-	LARGE_INTEGER AllocationSize;
-	LARGE_INTEGER EndOfFile;
-	ULONG		NumberOfLinks;
-	BOOLEAN		DeletePending;
-	BOOLEAN		Directory;
-} FILE_STANDARD_INFORMATION;
-#define FileStandardInformation 5
-#endif							/* !defined(__MINGW32__) &&
-								 * !defined(__MINGW64__) */
-
-typedef NTSTATUS (NTAPI * PFN_NTQUERYINFORMATIONFILE)
-			(IN HANDLE FileHandle,
-			 OUT PIO_STATUS_BLOCK IoStatusBlock,
-			 OUT PVOID FileInformation,
-			 IN ULONG Length,
-			 IN FILE_INFORMATION_CLASS FileInformationClass);
-
-static PFN_NTQUERYINFORMATIONFILE _NtQueryInformationFile = NULL;
-
-static HMODULE ntdll = NULL;
-
-/*
- * Load DLL file just once regardless of how many functions we load/call in it.
- */
-static void
-LoadNtdll(void)
-{
-	if (ntdll != NULL)
-		return;
-	ntdll = LoadLibraryEx("ntdll.dll", NULL, 0);
-}
-
-#endif							/* _WIN32_WINNT < 0x0600 */
+#include <sys/stat.h>
 
 
 /*
@@ -112,7 +68,7 @@ fileattr_to_unixmode(int attr)
  * Convert WIN32 file information (from a HANDLE) to a struct stat.
  */
 static int
-fileinfo_to_stat(HANDLE hFile, struct stat *buf)
+fileinfo_to_stat(HANDLE hFile, struct _pgstat64 *buf)
 {
 	BY_HANDLE_FILE_INFORMATION fiData;
 
@@ -159,119 +115,83 @@ fileinfo_to_stat(HANDLE hFile, struct stat *buf)
  * This currently also implements lstat(), though perhaps that should change.
  */
 int
-_pgstat64(const char *name, struct stat *buf)
+_pgstat64(const char *name, struct _pgstat64 *buf)
 {
 	/*
 	 * We must use a handle so lstat() returns the information of the target
-	 * file.  To have a reliable test for ERROR_DELETE_PENDING, we use
-	 * NtQueryInformationFile from Windows 2000 or
-	 * GetFileInformationByHandleEx from Server 2008 / Vista.
+	 * file.  To have a reliable test for ERROR_DELETE_PENDING, this uses a
+	 * method similar to open() with a loop using stat() and some waits when
+	 * facing ERROR_ACCESS_DENIED.
 	 */
 	SECURITY_ATTRIBUTES sa;
 	HANDLE		hFile;
 	int			ret;
-#if _WIN32_WINNT < 0x0600
-	IO_STATUS_BLOCK ioStatus;
-	FILE_STANDARD_INFORMATION standardInfo;
-#else
-	FILE_STANDARD_INFO standardInfo;
-#endif
+	int			loops = 0;
 
 	if (name == NULL || buf == NULL)
 	{
 		errno = EINVAL;
 		return -1;
 	}
-
 	/* fast not-exists check */
 	if (GetFileAttributes(name) == INVALID_FILE_ATTRIBUTES)
-	{
-		_dosmaperr(GetLastError());
-		return -1;
-	}
-
-	/* get a file handle as lightweight as we can */
-	sa.nLength = sizeof(SECURITY_ATTRIBUTES);
-	sa.bInheritHandle = TRUE;
-	sa.lpSecurityDescriptor = NULL;
-	hFile = CreateFile(name,
-					   GENERIC_READ,
-					   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
-					   &sa,
-					   OPEN_EXISTING,
-					   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
-						FILE_FLAG_OVERLAPPED),
-					   NULL);
-	if (hFile == INVALID_HANDLE_VALUE)
 	{
 		DWORD		err = GetLastError();
 
-		CloseHandle(hFile);
-		_dosmaperr(err);
-		return -1;
-	}
-
-	memset(&standardInfo, 0, sizeof(standardInfo));
-
-#if _WIN32_WINNT < 0x0600
-	if (_NtQueryInformationFile == NULL)
-	{
-		/* First time through: load ntdll.dll and find NtQueryInformationFile */
-		LoadNtdll();
-		if (ntdll == NULL)
+		if (err != ERROR_ACCESS_DENIED)
 		{
-			DWORD		err = GetLastError();
-
-			CloseHandle(hFile);
-			_dosmaperr(err);
-			return -1;
-		}
-
-		_NtQueryInformationFile = (PFN_NTQUERYINFORMATIONFILE) (pg_funcptr_t)
-			GetProcAddress(ntdll, "NtQueryInformationFile");
-		if (_NtQueryInformationFile == NULL)
-		{
-			DWORD		err = GetLastError();
-
-			CloseHandle(hFile);
 			_dosmaperr(err);
 			return -1;
 		}
 	}
 
-	if (!NT_SUCCESS(_NtQueryInformationFile(hFile, &ioStatus, &standardInfo,
-											sizeof(standardInfo),
-											FileStandardInformation)))
-	{
-		DWORD		err = GetLastError();
-
-		CloseHandle(hFile);
-		_dosmaperr(err);
-		return -1;
-	}
-#else
-	if (!GetFileInformationByHandleEx(hFile, FileStandardInfo, &standardInfo,
-									  sizeof(standardInfo)))
+	/* get a file handle as lightweight as we can */
+	sa.nLength = sizeof(SECURITY_ATTRIBUTES);
+	sa.bInheritHandle = TRUE;
+	sa.lpSecurityDescriptor = NULL;
+	while ((hFile = CreateFile(name,
+							   GENERIC_READ,
+							   (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
+							   &sa,
+							   OPEN_EXISTING,
+							   (FILE_FLAG_NO_BUFFERING | FILE_FLAG_BACKUP_SEMANTICS |
+								FILE_FLAG_OVERLAPPED),
+							   NULL)) == INVALID_HANDLE_VALUE)
 	{
 		DWORD		err = GetLastError();
 
-		CloseHandle(hFile);
-		_dosmaperr(err);
-		return -1;
-	}
-#endif							/* _WIN32_WINNT < 0x0600 */
-
-	if (standardInfo.DeletePending)
-	{
 		/*
-		 * File has been deleted, but is not gone from the filesystem yet.
-		 * This can happen when some process with FILE_SHARE_DELETE has it
-		 * open, and it will be fully removed once that handle is closed.
-		 * Meanwhile, we can't open it, so indicate that the file just doesn't
-		 * exist.
+		 * ERROR_ACCESS_DENIED is returned if the file is deleted but not yet
+		 * gone (Windows NT status code is STATUS_DELETE_PENDING).  In that
+		 * case we want to wait a bit and try again, giving up after 1 second
+		 * (since this condition should never persist very long).  However,
+		 * there are other commonly-hit cases that return ERROR_ACCESS_DENIED,
+		 * so care is needed.  In particular that happens if we try to open a
+		 * directory, or of course if there's an actual file-permissions
+		 * problem.  To distinguish these cases, try a stat().  In the
+		 * delete-pending case, it will either also get STATUS_DELETE_PENDING,
+		 * or it will see the file as gone and fail with ENOENT.  In other
+		 * cases it will usually succeed.  The only somewhat-likely case where
+		 * this coding will uselessly wait is if there's a permissions problem
+		 * with a containing directory, which we hope will never happen in any
+		 * performance-critical code paths.
 		 */
-		CloseHandle(hFile);
-		errno = ENOENT;
+		if (err == ERROR_ACCESS_DENIED)
+		{
+			if (loops < 10)
+			{
+				struct stat st;
+
+				if (stat(name, &st) != 0)
+				{
+					pg_usleep(100000);
+					loops++;
+					continue;
+				}
+			}
+		}
+
+		_dosmaperr(err);
 		return -1;
 	}
 
@@ -286,7 +206,7 @@ _pgstat64(const char *name, struct stat *buf)
  * Windows implementation of fstat().
  */
 int
-_pgfstat64(int fileno, struct stat *buf)
+_pgfstat64(int fileno, struct _pgstat64 *buf)
 {
 	HANDLE		hFile = (HANDLE) _get_osfhandle(fileno);
 
#25Daniel Gustafsson
daniel@yesql.se
In reply to: Alexander Lakhin (#24)
Re: More time spending with "delete pending"

On 13 Jul 2021, at 20:00, Alexander Lakhin <exclusion@gmail.com> wrote:

Hello Michael,
12.07.2021 08:52, Michael Paquier wrote:

On Mon, Jul 12, 2021 at 02:09:41PM +0900, Michael Paquier wrote:

And fairywren, that uses MinGW, is unhappy:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&amp;dt=2021-07-12%2004%3A47%3A13

Looking at it now.

I am not sure what to do here to cool down MinGW, so the patch has
been reverted for now:
- Plain stat() is not enough to do a proper detection of the files
pending for deletion on MSVC, so reverting only the part with
microsoft_native_stat() is not going to help.
- We are going to need an evaluation of the stat() implementation in
MinGW and check if is it possible to rely on it more. If yes, we
could get away with more tweaks based on __MINGW32__/__MINGW64__.

I've managed to adapt open/win32stat for MinGW by preventing stat overriding at the implementation level. Thus, the code that calls stat() will use the overridden struct/function(s), but inside of open/win32_stat we could access original stat and reference to __pgstat64 where we want to use our.
Tested on MSVC, MinGW64 and MinGW32 — the fixed code compiles everywhere.

But when using perl from msys (not ActivePerl) while testing I've got the same test failure due to another error condition:

<ghffjapimnefdgac.png>

In this case CreateFile() fails with ERROR_SHARING_VIOLATION (not ERROR_ACCESS_DENIED) and it leads to the same "Permission denied" error. This condition is handled in the pgwin32_open() but not in _pgstat64() yet.

I think I should develop another test for MSVC environment that will demonstrate a such failure too.
But as it's not related to the DELETE_PENDING state, I wonder whether this should be fixed in a separate patch.

Michael, have you had a chance to look at the updated patch here? I'm moving
this patch entry from Waiting on Author to Needs Review.

--
Daniel Gustafsson https://vmware.com/

#26Michael Paquier
michael@paquier.xyz
In reply to: Daniel Gustafsson (#25)
Re: More time spending with "delete pending"

On Wed, Dec 01, 2021 at 11:51:58AM +0100, Daniel Gustafsson wrote:

Michael, have you had a chance to look at the updated patch here? I'm moving
this patch entry from Waiting on Author to Needs Review.

No, I haven't. Before being touched more, this requires first a bit
more of testing infrastructure based on MinGW, and I don't have that
in place yet :/
--
Michael

#27Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#26)
Re: More time spending with "delete pending"

Hello Daniel and Michael,
02.12.2021 08:03, Michael Paquier wrote:

On Wed, Dec 01, 2021 at 11:51:58AM +0100, Daniel Gustafsson wrote:

Michael, have you had a chance to look at the updated patch here? I'm moving
this patch entry from Waiting on Author to Needs Review.

No, I haven't. Before being touched more, this requires first a bit
more of testing infrastructure based on MinGW, and I don't have that
in place yet :/

I think that my patch should be withdrawn in favor of
https://commitfest.postgresql.org/34/3326/ ("Check for
STATUS_DELETE_PENDING on Windows" by Thomas Munro), that use deeper
understanding of the problematic status, not indirect indication of it.

Barring objections, I'll withdraw my commitfest entry.

Best regards,
Alexander