BUG #17288: PSQL bug with COPY command (Windows)
The following bug has been logged on the website:
Bug reference: 17288
Logged by: Dmitry Koval
Email address: d.koval@postgrespro.ru
PostgreSQL version: 14.1
Operating system: Windows 10 (21H1)
Description:
Hi,
there is a problem on the REL_14_STABLE branch (on the REL_14_1
branch too). When executing a query under Windows 10 (21H1)
\copy (SELECT 1) TO stdout
PSQL utility prints error
"could not stat file" (null) ": Invalid argument" and crashes.
There is no error under Ubuntu 20.04 LTS.
Issue source: commit bed90759fcbcd72d4d06969eebab81e47326f9a
("Fix our Windows stat() emulation to handle file sizes > 4GB.",
discussion: /messages/by-id/15858-9572469fd3b73263@postgresql.org).
In this commit function stat () was replaced to function
GetFileInformationByHandle()
(see src/port/win32stat.c, function fileinfo_to_stat()):
if (!GetFileInformationByHandle(hFile, &fiData))
{
_dosmaperr(GetLastError());
return -1;
}
Function GetFileInformationByHandle() works for files but can not work
with streams like "stdout".
For "stdout" the GetFileInformationByHandle () function returns an error
(GetLastError () == ERROR_INVALID_FUNCTION), which causes PSQL crash.
Examples of errors processing for function GetFileInformationByHandle()
in other applications:
1) https://github.com/python/cpython/blob/main/Modules/posixmodule.c
2)
https://doxygen.reactos.org/da/d6a/subsystems_2mvdm_2ntvdm_2hardware_2disk_8c_source.html
Quick fix in src/port/win32stat.c:
in case function GetFileInformationByHandle() returns a specific error
code, call the _stat64() function for this descriptor.
It's not elegant, but it works.
(I'll attach file with patch in next email).
With best regards,
Dmitry Koval.
On Wed, Nov 17, 2021 at 01:29:02PM +0300, Dmitry Koval wrote:
Attachments: patch + screenshot with error.
Gulp. I can reproduce this problem.
if (!GetFileInformationByHandle(hFile, &fiData)) { - _dosmaperr(GetLastError()); + DWORD error = GetLastError(); + + switch (error) + { + case ERROR_INVALID_PARAMETER: + case ERROR_INVALID_FUNCTION: + case ERROR_NOT_SUPPORTED: + + /* + * Object is other than real file (stdout, for example). So + * need to call _fstat64 in this case. "struct stat" should + * match "struct __stat64", see "struct stat" definition. + */ + if (fileno >= 0) + return _fstat64(fileno, (struct __stat64 *) buf); + else if (name) + return _stat64(name, (struct __stat64 *) buf); + } + _dosmaperr(error); return -1;
Hmm. _fstat64() and _stat64() have proved to be tricky to work with
and rather unworkable across all the build systems we support
(remember the 2GB file size problem, for example), which is why we
have the existing business of win32stat.c to begin with. It seems to
me, also, that we could run into issues if we blindly map those error
codes to call a stat() function as fallback. I am not sure that doing
a blind cast to __stat64 is going to work all the time, either.
I think that we had better never call GetFileInformationByHandle() if
we use a fileno that maps to stdin, stdout or stderr. The 10000$
question is what to use though. One option is to let our emulation
code fill in the gap for those three cases with at least st_mode
filled with S_IFIFO, then return 0 as error code :/
Just to be sure, this is the code path in psql's copy.c where we check
that a specified copystream is not a directory, right?
--
Michael
Thanks for reporting.
On Thu, Nov 18, 2021 at 8:40 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Nov 17, 2021 at 01:29:02PM +0300, Dmitry Koval wrote:
Attachments: patch + screenshot with error.
I think that we had better never call GetFileInformationByHandle() if
we use a fileno that maps to stdin, stdout or stderr. The 10000$
question is what to use though. One option is to let our emulation
code fill in the gap for those three cases with at least st_mode
filled with S_IFIFO, then return 0 as error code :/When the file descriptor is coming from a stream I don't think we have to
use GetFileInformationByHandle() or stat() to get any information, but we
can directly return the stat information. Please find attached a patch for
so.
Regards,
Juan José Santamaría Flecha
Attachments:
v2-0001-Fixed-Windows-stat-emulation-working-with-streams.patchapplication/octet-stream; name=v2-0001-Fixed-Windows-stat-emulation-working-with-streams.patchDownload+15-1
_fstat64() and _stat64() have proved to be tricky to work with and
rather unworkable across all the build systems we support
I agree, it is better not use them.
I think that we had better never call GetFileInformationByHandle() if
we use a fileno that maps to stdin, stdout or stderr.
Probably we should call GetFileInformationByHandle() for case standard
streams stdin/stdout/stderr are redirected to files.
See file src\backend\utils\error\elog.c, for example. It contains
lines:
if (!freopen(OutputFileName, "a", stderr))
if (!freopen(OutputFileName, "a", stdout))
And this command with "stderr" works in PSQL without crash (in
contrast to "stdout"):
\copy (SELECT 1) TO stderr
(it put resullt into file with name "stderr").
We can emulate stats for stdin/stdout/stderr after call
GetFileInformationByHandle().
Just to be sure, this is the code path in psql's copy.c where we check
that a specified copystream is not a directory, right?
Yes, fstat() called from file src/bin/psql/copy.c:
/* make sure the specified file is not a directory */
if ((result = fstat(fileno(copystream), &st)) < 0)
I attached new patch version.
With best regards,
Dmitry Koval.
Attachments:
v3_0001-Fixed-Windows-stat-emulation-working-with-streams.patchtext/plain; charset=UTF-8; name=v3_0001-Fixed-Windows-stat-emulation-working-with-streams.patchDownload+31-5
Three corrections for patch
v2-0001-Fixed-Windows-stat-emulation-working-with-streams.patch:
1) I see in debugger that field "st_mode" for stdout has value
_S_IFCHR (0x2000, 8192);
2) "st_rdev" feild should have the same value as "st_dev";
3) we can not use emulated stat information for stderr after call
"freopen(OutputFileName, "a", stderr);".
With best regards,
Dmitry Koval.
On Thu, Nov 18, 2021 at 2:40 PM Dmitry Koval <d.koval@postgrespro.ru> wrote:
I think that we had better never call GetFileInformationByHandle() if
we use a fileno that maps to stdin, stdout or stderr.Probably we should call GetFileInformationByHandle() for case standard
streams stdin/stdout/stderr are redirected to files.
See file src\backend\utils\error\elog.c, for example. It contains
lines:if (!freopen(OutputFileName, "a", stderr))
if (!freopen(OutputFileName, "a", stdout))And this command with "stderr" works in PSQL without crash (in
contrast to "stdout"):\copy (SELECT 1) TO stderr
(it put resullt into file with name "stderr").
We can emulate stats for stdin/stdout/stderr after call
GetFileInformationByHandle().You are right about freopen(), but stderr works just because it's being
parsed as a file, not a stream, by parse_slash_copy().
I attached new patch version.
I would keep the memset(buf, 0, sizeof(*buf)) for the members we are not
setting.
Regards,
Juan José Santamaría Flecha
You are right about freopen(), but stderr works just because it's being
parsed as a file, not a stream, by parse_slash_copy().
You are right. I wrote about stderr without inspecting code (
I would keep the memset(buf, 0, sizeof(*buf)) for the members we
are not setting.
Of course. Original code (src/port/win32stat.c) contains line
memset(buf, 0, sizeof(*buf))
before call GetFileInformationByHandle().
With best regards,
Dmitry Koval.
On Thu, Nov 18, 2021 at 09:48:55PM +0300, Dmitry Koval wrote:
You are right about freopen(), but stderr works just because it's being
parsed as a file, not a stream, by parse_slash_copy().You are right. I wrote about stderr without inspecting code (
I would keep the memset(buf, 0, sizeof(*buf)) for the members we
are not setting.Of course. Original code (src/port/win32stat.c) contains line
memset(buf, 0, sizeof(*buf))
before call GetFileInformationByHandle().
I still think that we should not call GetFileInformationByHandle()
when it comes to an argument that we know will fail, so I would agree
with Juan's approach in v2 to just patch _pgfstat64() rather than
messing with the code paths in charge of checking handles pending for
deletion. Each fileno is going to be 0, 1 or 2 in those cases, still
it would be better to rely on the result of fileno() as v3 is doing?
It looks like you are right about _S_IFCHR for st_mode, as of this
part of the docs:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fstat-fstat32-fstat64-fstati64-fstat32i64-fstat64i32?view=msvc-170
--
Michael
I still think that we should not call GetFileInformationByHandle()
when it comes to an argument that we know will fail, ...
Our program calls function freopen() (this function is used in file
src\backend\utils\error\elog.c). So we don't know exactly that
GetFileInformationByHandle() will fails for stdin/stdout/stderr.
Ior illustration, I attached small example (test.c) for this case:
1) GetFileInformationByHandle() for "stdout" works without error;
2) file "1.txt" after running test contains:
Sample string
attributes = 0x2020, file size = 15
Each fileno is going to be 0, 1 or 2 in those cases, still
it would be better to rely on the result of fileno() as v3 is
doing?
I think fileno() is safer. We can call fclose(stdout) and
_fileno(stdout) returns -1 after that. But I don't know:
can OS reuse fileno=1 or not in this case?
With best regards,
Dmitry.
Attachments:
test.ctext/plain; charset=UTF-8; name=test.cDownload
On Fri, Nov 19, 2021 at 9:47 AM Dmitry Koval <d.koval@postgrespro.ru> wrote:
I still think that we should not call GetFileInformationByHandle()
when it comes to an argument that we know will fail, ...Our program calls function freopen() (this function is used in file
src\backend\utils\error\elog.c). So we don't know exactly that
GetFileInformationByHandle() will fails for stdin/stdout/stderr.Ior illustration, I attached small example (test.c) for this case:
1) GetFileInformationByHandle() for "stdout" works without error;
2) file "1.txt" after running test contains:Sample string
attributes = 0x2020, file size = 15We can check for redirection without calling GetFileInformationByHandle(),
please consider the attached patch.
Regards,
Juan José Santamaría Flecha
Attachments:
v4_0001-Fixed-Windows-stat-emulation-working-with-streams.patchapplication/octet-stream; name=v4_0001-Fixed-Windows-stat-emulation-working-with-streams.patchDownload+13-0
We can check for redirection without calling
GetFileInformationByHandle(), please consider the attached patch.
There are doubts:
1) we always use system function (GetFinalPathNameByHandleA or
GetFileInformationByHandle); sometimes we use these two calls
GetFinalPathNameByHandleA + GetFileInformationByHandle
together (in freopen() case);
2) function GetFinalPathNameByHandleA is slower than
GetFileInformationByHandle (2-4 times).
Might be exist a cheaper way than GetFinalPathNameByHandleA?
I don't understand why this way better than using one call
GetFileInformationByHandle...
With best regards,
Dmitry Koval.
On Fri, Nov 19, 2021 at 11:45 AM Dmitry Koval <d.koval@postgrespro.ru>
wrote:
We can check for redirection without calling
GetFileInformationByHandle(), please consider the attached patch.There are doubts:
1) we always use system function (GetFinalPathNameByHandleA or
GetFileInformationByHandle); sometimes we use these two calls
GetFinalPathNameByHandleA + GetFileInformationByHandle
together (in freopen() case);2) function GetFinalPathNameByHandleA is slower than
GetFileInformationByHandle (2-4 times).
Might be exist a cheaper way than GetFinalPathNameByHandleA?I don't understand why this way better than using one call
GetFileInformationByHandle...Personally, I think it is more readable, although a comment should explain
what we are checking.
GetFinalPathNameByHandleA() is slower when there is redirection, when there
is no redirection it's similar or faster than GetFileInformationByHandle(),
and that should be the most common case.
Regards,
Juan José Santamaría Flecha
GetFinalPathNameByHandleA() is slower when there is redirection, when
there is no redirection it's similar or faster than
GetFileInformationByHandle(), and that should be the most common case.
I tested 2 cases:
1) "stdout" is not redirected.
In this case GetFinalPathNameByHandleA() returns error and
GetFileInformationByHandle() returns error.
Simple test in attachment.
---
Result:
GetFileInformationByHandle: 7, GetFinalPathNameByHandleA: 16
2) "stdout" redirected (need to uncomment 2 lines in test).
Both function works without error.
---
Result (see file "1.txt"):
GetFileInformationByHandle: 34, GetFinalPathNameByHandleA: 120
The difference is not very big, but it is.
With best regards,
Dmitry.
Attachments:
test2.ctext/plain; charset=UTF-8; name=test2.cDownload
On Fri, Nov 19, 2021 at 12:44:14PM +0100, Juan José Santamaría Flecha wrote:
GetFinalPathNameByHandleA() is slower when there is redirection, when there
is no redirection it's similar or faster than GetFileInformationByHandle(),
and that should be the most common case.
Yeah, the slight performance impact caused by the redirection does not
worry me much, based on the code path of Postgres where this would be
called. And that's unlikely going to become a bottleneck anyway as
GetFinalPathNameByHandleA() would just be called if we know that we
are with stdin, stdout or stderr thanks to the first part of the "if"
clause.
I like the simplicity behind v4 as it does not interfere with
_pgstat64() while dealing with the redirection case, so it should
address the concerns from Dmitry anyway (right?). We should really
add a comment explaining why this is handled this way though for the
case of streams, why the case of the redirection matters, and why we
count on a failure of GetFinalPathNameByHandleA().
--
Michael
... We should really
add a comment explaining why this is handled this way though for the
case of streams, why the case of the redirection matters, and why we
count on a failure of GetFinalPathNameByHandleA().
Added a comment for v4 (as draft).
But I don't know English well enough and probably comment should be
corrected.
With best regards,
Dmitry.
Attachments:
v5_0001-Fixed-Windows-stat-emulation-working-with-streams.patchtext/plain; charset=UTF-8; name=v5_0001-Fixed-Windows-stat-emulation-working-with-streams.patchDownload+29-0
On Sat, Nov 20, 2021 at 12:51 PM Dmitry Koval <d.koval@postgrespro.ru>
wrote:
Added a comment for v4 (as draft).
But I don't know English well enough and probably comment should be
corrected.It reads a little too much like pseudocode for my liking. Please consider
the attached version.
Regards,
Juan José Santamaría Flecha
Attachments:
v6_0001-Fixed-Windows-stat-emulation-working-with-streams.patchapplication/octet-stream; name=v6_0001-Fixed-Windows-stat-emulation-working-with-streams.patchDownload+19-0
It reads a little too much like pseudocode for my liking. Please
consider the attached version.
This is a good comment, I like it.
With best regards,
Dmitry Koval.
On Mon, Nov 22, 2021 at 02:07:09PM +0300, Dmitry Koval wrote:
It reads a little too much like pseudocode for my liking. Please
consider the attached version.This is a good comment, I like it.
/*
+ * Check if the fileno is a data stream. If so, unless it has been
+ * redirected to a file, getting information by handle will fail. Return
+ * the appropriate stat information instead.
+ */
I would perhaps add an extra note that we emulate the result in the
best way we can. Anyway, what you have here looks basically enough.
Another thing that has been itching me on this thread is that we have
not been able to detect this problem even if we have tests in
copyselect.sql that cover those paths. Is that because running
vcregress.pl implies a redirection of stdout so we would never see,
except from a terminal?
--
Michael
Is that because running vcregress.pl implies a redirection of stdout so
we would never see, except from a terminal?
Unfortunately, you are.
All my COPY-tests work well with files.
I see a problem in the terminal only ...
With best regards,
Dmitry Koval.