BUG #15858: could not stat file - over 4GB
The following bug has been logged on the website:
Bug reference: 15858
Logged by: William Allen
Email address: williamedwinallen@live.com
PostgreSQL version: 11.3
Operating system: Windows Server 2012 R2
Description:
Issue using copy from command for files over 4GB.
ERROR: could not stat file "E:\file.txt": Unknown error
SQL state: XX000
On Tue, Jun 18, 2019 at 10:02:53AM +0000, PG Bug reporting form wrote:
Issue using copy from command for files over 4GB.
ERROR: could not stat file "E:\file.txt": Unknown error
SQL state: XX000
Windows is known for having limitations in its former implementations
of stat(), and the various _stat structures they use make actually
that much harder from a compatibility point of view:
/messages/by-id/1803D792815FC24D871C00D17AE95905CF5099@g01jpexmbkw24
Nobody has actually dug enough into this set of issues to get a patch
out of the ground, which basically requires more tweaks that one may
think at first sight (look at pgwin32_safestat() in src/port/dirmod.c
for example).
--
Michael
On Wed, Jun 19, 2019 at 3:26 AM Michael Paquier <michael@paquier.xyz> wrote:
Windows is known for having limitations in its former implementations
of stat(), and the various _stat structures they use make actually
that much harder from a compatibility point of view:
/messages/by-id/1803D792815FC24D871C00D17AE95905CF5099@g01jpexmbkw24
Going through this discussion it is not clear to me if there was a
consensus about the shape of an acceptable patch. Would something like
the attached be suitable?
Regards,
Juan José Santamaría Flecha
Attachments:
0001-support-for-large-files-on-Win32.patchapplication/octet-stream; name=0001-support-for-large-files-on-Win32.patchDownload+44-5
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
On Wed, Jun 19, 2019 at 3:26 AM Michael Paquier <michael@paquier.xyz> wrote:
Windows is known for having limitations in its former implementations
of stat(), and the various _stat structures they use make actually
that much harder from a compatibility point of view:
/messages/by-id/1803D792815FC24D871C00D17AE95905CF5099@g01jpexmbkw24
Going through this discussion it is not clear to me if there was a
consensus about the shape of an acceptable patch. Would something like
the attached be suitable?
I think there's general agreement that the correct fix involves somehow
mapping stat() to _stat64() and mapping "struct stat" to "struct __stat64"
to go along with that. Beyond that, things get murky.
1. Can we assume that _stat64() and struct __stat64 exist on every Windows
version and build toolchain that we care about? Windows itself is
probably OK --- googling found a (non-authoritative) statement that these
were introduced in Windows 2K. But it's less clear whether they'll work
on builds with Cygwin, or Mingw, or Mingw-64, or how far back that support
goes. I found one statement that Mingw declares them only "#if
__MSVCRT_VERSION__ >= 0x0601".
2. Mapping stat() to _stat64() seems easy enough: we already declare
stat(a,b) as a macro on Windows, so just change it to something else.
3. What about the struct name? I proposed just "define stat __stat64",
but Robert thought that was too cute, and he's got a point --- in
particular, it's not clear to me how nicely it'd play to have both
function and object macros for the same name "stat". I see you are
proposing fixing this angle by suppressing the system definition of
struct stat and then defining it ourselves with the same contents as
struct __stat64. That might work. Ordinarily I'd be worried about
bit-rot in a struct that has to track a system definition, but Microsoft
are so religiously anal about never breaking ABI that it might be safe
to assume we don't have to worry about that.
I don't like the specific way you're proposing suppressing the system
definition of struct stat, though. "#define _CRT_NO_TIME_T" seems
like it's going to be a disaster, both because it likely has other
side-effects and because it probably doesn't do what you intend at all
on non-MSVC toolchains. We have precedents for dealing with similar
issues in, eg, plperl; and what those precedents would suggest is
doing something like
#define stat microsoft_native_stat
#include <sys/stat.h>
#undef stat
after which we could do
struct stat {
... same contents as __stat64
};
#define stat(a,b) _stat64(a,b)
Another issue here is that pgwin32_safestat() probably needs revisited
as to its scope and purpose. Its use of GetFileAttributesEx() can
presumably be dropped. I don't actually believe the header comment
claiming that stat() is not guaranteed to update the st_size field;
there's no indication of that in the Microsoft documentation. What
seems more likely is that that's a garbled version of the truth,
that you won't get a correct value of _st_size for files over 4GB.
But the test for ERROR_DELETE_PENDING might be worth keeping. So
that would lead us to
struct stat {
... same contents as __stat64
};
extern int pgwin32_safestat(const char *path, struct stat *buf);
#define stat(a,b) pgwin32_safestat(a,b)
and something like
int
pgwin32_safestat(const char *path, struct stat *buf)
{
int r;
/*
* Don't call stat(), that would just recurse back to here.
* We really want _stat64().
*/
r = _stat64(path, buf);
if (r < 0)
{
if (GetLastError() == ERROR_DELETE_PENDING)
{
/*
* File has been deleted, but is not gone from the filesystem yet.
* This can happen when some process with FILE_SHARE_DELETE has it
* open and it will be fully removed once that handle is closed.
* Meanwhile, we can't open it, so indicate that the file just
* doesn't exist.
*/
errno = ENOENT;
}
}
return r;
}
Not sure if we'd need an explicit cast to override passing struct
stat * to _stat64(). If so, a StaticAssert that sizeof(struct stat)
matches sizeof(struct __stat64) seems like a good idea.
I'd also be very strongly inclined to move pgwin32_safestat into its
own file in src/port and get rid of UNSAFE_STAT_OK. There wouldn't
be a good reason to opt out of using it once we got to this point.
regards, tom lane
I wrote:
Another issue here is that pgwin32_safestat() probably needs revisited
as to its scope and purpose. Its use of GetFileAttributesEx() can
presumably be dropped. I don't actually believe the header comment
claiming that stat() is not guaranteed to update the st_size field;
there's no indication of that in the Microsoft documentation. What
seems more likely is that that's a garbled version of the truth,
that you won't get a correct value of _st_size for files over 4GB.
So after further digging around, it seems that this is wrong. The
existence of pgwin32_safestat() can be traced back to these threads:
/messages/by-id/528853D3C5ED2C4AA8990B504BA7FB850106DF10@sol.transas.com
/messages/by-id/528853D3C5ED2C4AA8990B504BA7FB850106DF2F@sol.transas.com
in which it's stated that
It seems I've found the cause and the workaround of the problem.
MSVC's stat() is implemented by using FindNextFile().
MSDN contains the following suspicious paragraph аbout FindNextFile():
"In rare cases, file attribute information on NTFS file systems
may not be current at the time you call this function. To obtain
the current NTFS file system file attributes, call
GetFileInformationByHandle."
Since we generally cannot open an examined file, we need another way.
I'm wondering though why we adopted the existing coding in the face of
that observation. Couldn't the rest of struct stat be equally out of
date?
In short it seems like maybe we should be doing something similar to the
patch that Sergey actually submitted in that discussion:
/messages/by-id/528853D3C5ED2C4AA8990B504BA7FB850658BA5C@sol.transas.com
which reimplements stat() from scratch on top of GetFileAttributesEx(),
and thus doesn't require any assumptions at all about what's available
from the toolchain's <sys/stat.h>.
regards, tom lane
On Wed, Jun 19, 2019 at 8:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
In short it seems like maybe we should be doing something similar to the
patch that Sergey actually submitted in that discussion:/messages/by-id/528853D3C5ED2C4AA8990B504BA7FB850658BA5C@sol.transas.com
I will not have much time for this list in the next couple of weeks,
so I will send this patch in its current WIP state rather than
stalling without a reply.
Most of its functionality comes from Sergey's patch with some cosmetic
changes, and the addition of the 64 bits struct stat and fstat().
Regards,
Juan José Santamaría Flecha
Attachments:
0001-WIP-support-for-large-files-on-Win32-v2.patchapplication/octet-stream; name=0001-WIP-support-for-large-files-on-Win32-v2.patchDownload+214-61
On Tue, Jun 25, 2019 at 12:00:45PM +0200, Juan José Santamaría Flecha wrote:
I will not have much time for this list in the next couple of weeks,
so I will send this patch in its current WIP state rather than
stalling without a reply.Most of its functionality comes from Sergey's patch with some cosmetic
changes, and the addition of the 64 bits struct stat and fstat().
The former patch was rather impressive. Or scary. Or both. At which
extent have you tested it? I think that we really need to make sure
of a couple of things which satisfy our needs:
1) Are we able to fix the issues with stat() calls on files larger
than 2GB and report a correct size?
2) Are we able to detect properly that files pending for deletion are
discarded with ENOENT?
3) Are frontends able to use the new layer?
It seems to me that you don't need the configure changes.
Instead of stat_pg_fixed which is confusing because it only involves
Windows, I would rename the new file to stat.c or win32_stat.c. The
location in src/port/ is adapted. I would also move out of
win32_port.h the various inline declarations and keep only raw
declarations. That could be much cleaner.
The code desperately needs more comments to help understand its
logic. Don't we have in the tree an equivalent of cvt_ft2ut? What
does cvt_attr2uxmode do? It would be nice to avoid conversion
wrappers as much as possible, and find out system-related equivalents
if any, and actually if necessary.
+static unsigned short
+cvt_attr2uxmode(int attr, const _TCHAR * name)
This looks rather bug-prone...
I think that this stuff has not been tested and would break at
compilation. If src/tools/msvc/Mkvcbuild.pm is not changed, then the
new file won't get included in the compiled set.
--
Michael
On Wed, Jun 26, 2019 at 4:23 AM Michael Paquier <michael@paquier.xyz> wrote:
The former patch was rather impressive. Or scary. Or both. At which
extent have you tested it? I think that we really need to make sure
of a couple of things which satisfy our needs:
I wanted to make a quick test on the previous patch. So let me state
what have I tested and what I have not: it builds and pass tests in
Windows and Cygwin, but I have not setup a MinGW environment.
1) Are we able to fix the issues with stat() calls on files larger
than 2GB and report a correct size?
I have successfuly tested a COPY with large files.
2) Are we able to detect properly that files pending for deletion are
discarded with ENOENT?
Cannot reproduce reliably, but is using the same logic as pgwin32_safestat().
3) Are frontends able to use the new layer?
After removing UNSAFE_STAT_OK, is this still an issue?
It seems to me that you don't need the configure changes.
The changes in configuration are meant for gcc compilations in Windows
(Cygwin and Mingw).
Instead of stat_pg_fixed which is confusing because it only involves
Windows, I would rename the new file to stat.c or win32_stat.c. The
location in src/port/ is adapted. I would also move out of
win32_port.h the various inline declarations and keep only raw
declarations. That could be much cleaner.
Ok.
The code desperately needs more comments to help understand its
logic. Don't we have in the tree an equivalent of cvt_ft2ut? What
does cvt_attr2uxmode do? It would be nice to avoid conversion
wrappers as much as possible, and find out system-related equivalents
if any, and actually if necessary.
I have only found something similar in ./src/port/gettimeofday.c, but
not sure if this patch should touch that code.
+static unsigned short +cvt_attr2uxmode(int attr, const _TCHAR * name) This looks rather bug-prone...
I wanted to keep as much of the original code as possible, but if this
is found as a viable solution, what shape should it have?
I think that this stuff has not been tested and would break at
compilation. If src/tools/msvc/Mkvcbuild.pm is not changed, then the
new file won't get included in the compiled set.
The previous patch was broken, taken from the wrong local branch
(sorry about that). The attached is still a WIP but it has to do the
things above-mentioned.
Regards,
Juan José Santamaría Flecha
Attachments:
0001-WIP-support-for-large-files-on-Win32-v3.patchapplication/octet-stream; name=0001-WIP-support-for-large-files-on-Win32-v3.patchDownload+193-62
On Fri, Jun 28, 2019 at 11:34:38PM +0200, Juan José Santamaría Flecha wrote:
I wanted to make a quick test on the previous patch. So let me state
what have I tested and what I have not: it builds and pass tests in
Windows and Cygwin, but I have not setup a MinGW environment.
Thanks. Could you attach this patch to the next commit fest? We had
many complaints with the current limitations with large files (pg_dump
syncs its result files, so that breaks on Windows actually if the dump
is larger than 2GB..), and we are going to need to do something. I
find that stuff rather hard to backpatch, but let's see.
--
Michael
On Sat, Jun 29, 2019 at 4:30 AM Michael Paquier <michael@paquier.xyz> wrote:
Thanks. Could you attach this patch to the next commit fest? We had
many complaints with the current limitations with large files (pg_dump
syncs its result files, so that breaks on Windows actually if the dump
is larger than 2GB..), and we are going to need to do something. I
find that stuff rather hard to backpatch, but let's see.
Done. [1]https://commitfest.postgresql.org/23/2189/
Regards,
Juan José Santamaría Flecha
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
On Wed, Jun 26, 2019 at 4:23 AM Michael Paquier <michael@paquier.xyz> wrote:
It seems to me that you don't need the configure changes.
The changes in configuration are meant for gcc compilations in Windows
(Cygwin and Mingw).
Directly editing the configure script is Not Done ... or at least,
such changes wouldn't survive the next correctly-done configure
update. You have to edit configure.in (or one of the sub-files in
config/) and then regenerate configure using autoconf.
It seems likely that we *don't* need or want this for Cygwin;
that should be providing a reasonable stat() emulation already.
So probably you just want to add "AC_LIBOBJ(win32_stat)" to
the stanza beginning
# Win32 (really MinGW) support
if test "$PORTNAME" = "win32"; then
AC_CHECK_FUNCS(_configthreadlocale)
AC_REPLACE_FUNCS(gettimeofday)
AC_LIBOBJ(dirmod)
I'd also recommend that stat() fill all the fields in struct stat,
even if you don't have anything better to put there than zeroes.
Otherwise you're just opening things up for random misbehavior.
I'm not in a position to comment on the details of the conversion from
GetFileAttributesEx results to struct stat, but in general this
seems like a reasonable way to proceed.
regards, tom lane
Thanks for looking into this.
On Fri, Aug 23, 2019 at 11:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Directly editing the configure script is Not Done ... or at least,
such changes wouldn't survive the next correctly-done configure
update. You have to edit configure.in (or one of the sub-files in
config/) and then regenerate configure using autoconf.It seems likely that we *don't* need or want this for Cygwin;
that should be providing a reasonable stat() emulation already.
So probably you just want to add "AC_LIBOBJ(win32_stat)" to
the stanza beginningI'd also recommend that stat() fill all the fields in struct stat,
even if you don't have anything better to put there than zeroes.
Otherwise you're just opening things up for random misbehavior.
Fixed.
I'm not in a position to comment on the details of the conversion from
GetFileAttributesEx results to struct stat, but in general this
seems like a reasonable way to proceed.
Actually, due to the behaviour of GetFileAttributesEx with symbolic
links I think that using GetFileInformationByHandle instead can give a
more resilient solution. Also, by using a handle we get a good test
for ERROR_DELETE_PENDING. This is the approach for the attached patch.
Regards,
Juan José Santamaría Flecha
Attachments:
0001-WIP-support-for-large-files-on-Win32-v4.patchapplication/octet-stream; name=0001-WIP-support-for-large-files-on-Win32-v4.patchDownload+294-66
Hi - is this likely to be applied to an upcoming release? / How does a novice apply a patch..?
Thanks
-----Original Message-----
From: Juan José Santamaría Flecha <juanjo.santamaria@gmail.com>
Sent: 04 September 2019 22:48
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Michael Paquier <michael@paquier.xyz>; williamedwinallen@live.com; pgsql-bugs@lists.postgresql.org; Magnus Hagander <magnus@hagander.net>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: BUG #15858: could not stat file - over 4GB
Thanks for looking into this.
On Fri, Aug 23, 2019 at 11:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Directly editing the configure script is Not Done ... or at least,
such changes wouldn't survive the next correctly-done configure
update. You have to edit configure.in (or one of the sub-files in
config/) and then regenerate configure using autoconf.It seems likely that we *don't* need or want this for Cygwin; that
should be providing a reasonable stat() emulation already.
So probably you just want to add "AC_LIBOBJ(win32_stat)" to the stanza
beginningI'd also recommend that stat() fill all the fields in struct stat,
even if you don't have anything better to put there than zeroes.
Otherwise you're just opening things up for random misbehavior.
Fixed.
I'm not in a position to comment on the details of the conversion from
GetFileAttributesEx results to struct stat, but in general this seems
like a reasonable way to proceed.
Actually, due to the behaviour of GetFileAttributesEx with symbolic links I think that using GetFileInformationByHandle instead can give a more resilient solution. Also, by using a handle we get a good test for ERROR_DELETE_PENDING. This is the approach for the attached patch.
Regards,
Juan José Santamaría Flecha
On Mon, Oct 28, 2019 at 3:29 PM william allen <williamedwinallen@live.com>
wrote:
Hi - is this likely to be applied to an upcoming release? / How does a
novice apply a patch..?
At this moment is missing review, so it is probably far from being
commitable. Any attention is appreciated and might help pushing it forward.
As a personal note, I have to check that is still applies before the
upcoming commitfest.
As for applying this patch you would need a Windows development
environment. I would recommend Visual Studio as a starting point [1]https://www.postgresql.org/docs/current/install-windows.html. You
also have a very visual guide in the wiki [2]https://wiki.postgresql.org/wiki/Working_With_VisualStudio.
[1]: https://www.postgresql.org/docs/current/install-windows.html
[2]: https://wiki.postgresql.org/wiki/Working_With_VisualStudio
Regards,
Juan José Santamaría Flecha
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested
I ran into this problem when using psql.exe and copy command.
I have checked out 11.6-release tarball and applied the patch.
The patch does not apply cleanly, but can be easily modified to apply. See Note 1.
After applying the patch I built using "build psql" and ran the new psql.exe binary.
In order to test I have done the following:
Against a PostgreSQL 11 server run two commands:
"COPY public.table FROM 'C:/file'" and "\copy public.table FROM 'C:/file'"
The first one runs in the context of the server, and does not work. It aborts with an error saying "cannot stat file", as expected.
The seconds on runs in the context of the new binary and does work. It copies data as expected.
Note 1:
src/tools/msvc/Mkvcbuild.pm should be
- sprompt.c strerror.c tar.c thread.c getopt.c getopt_long.c dirent.c
- win32env.c win32error.c win32security.c win32setlocale.c);
+ sprompt.c tar.c thread.c getopt.c getopt_long.c dirent.c
+ win32env.c win32error.c win32security.c win32setlocale.c win32_stat.c);
The new status of this patch is: Waiting on Author
On Wed, Feb 5, 2020 at 12:47 PM Emil Iggland <emil@iggland.com> wrote:
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested
The latest version of this patch could benefit from an update. Please find
attached a new version.
Most changes are cosmetic, but they have been more extensive than a simple
rebase so I am changing the status back to 'needs review'.
To summarize those changes:
- Rename 'win32_stat.c' file to 'win32stat.c', as a better match of
project files.
- Improve indentation and comments.
- Remove cruft about old Windows versions.
Regards,
Juan José Santamaría Flecha
Attachments:
0001-Support-for-large-files-on-Win32-v5.patchapplication/octet-stream; name=0001-Support-for-large-files-on-Win32-v5.patchDownload+291-66
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
The latest version of this patch could benefit from an update. Please find
attached a new version.
The cfbot thinks this doesn't compile on Windows [1]https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.81541. Looks like perhaps
a missing-#include problem?
regards, tom lane
[1]: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.81541
On Sat, Feb 29, 2020 at 12:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The cfbot thinks this doesn't compile on Windows [1]. Looks like perhaps
a missing-#include problem?
The define logic for _WIN32_WINNT includes testing of _MSC_VER, and is not
a proper choice for MSVC 2013 as the cfbot is showing.
Please find attached a new version addressing this issue.
Regards,
Juan José Santamaría Flecha
Show quoted text
Attachments:
0001-Support-for-large-files-on-Win32-v6.patchapplication/octet-stream; name=0001-Support-for-large-files-on-Win32-v6.patchDownload+291-66
On Sat, Feb 29, 2020 at 9:40 AM Juan José Santamaría Flecha <
juanjo.santamaria@gmail.com> wrote:
On Sat, Feb 29, 2020 at 12:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The cfbot thinks this doesn't compile on Windows [1]. Looks like perhaps
a missing-#include problem?The define logic for _WIN32_WINNT includes testing of _MSC_VER, and is not
a proper choice for MSVC 2013 as the cfbot is showing.
The cfbot is not happy yet. I will backtrack a bit on the cruft cleanup.
Please find attached a new version addressing this issue.
Regards,
Juan José Santamaría Flecha
Show quoted text
Attachments:
0001-Support-for-large-files-on-Win32-v7.patchapplication/octet-stream; name=0001-Support-for-large-files-on-Win32-v7.patchDownload+304-66
I assigned myself as a reviewer for this patch, as I hit this bug today and had to perform a workaround. I have never reviewed a patch before but will try to update within the next 5 days. I intend on performing "Implements Feature" reviewing.