pgwin32_open returning EINVAL
Hi,
We've seen several cases of people (running 8.1 or 8.2) that see
messages like this:
2007-11-26 11:41:59 ERROR: could not open relation 1663/352369/353685: Invalid argument
The platform is Win32.
The problem is that pgwin32_open reduces any error code from
GetLastError that's not ERROR_PATH_NOT_FOUND, ERROR_FILE_NOT_FOUND,
ERROR_FILE_EXISTS or ERROR_ACCESS_DENIED into EINVAL, without further
notice. This is problematic because the true cause of the problem is
being hidden and it's hard to track down what's going on.
Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?
--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"Las navajas y los monos deben estar siempre distantes" (Germ�n Poo)
Attachments:
pgwin32_open.patchtext/x-diff; charset=us-asciiDownload+6-4
On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:
Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?
The Windows API provides a way to get the error message associated
with the code. It seems it would be more user-friendly if the message
was displayed (just as for unix) rather than the code.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy
Martijn van Oosterhout wrote:
On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:
Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?The Windows API provides a way to get the error message associated
with the code. It seems it would be more user-friendly if the message
was displayed (just as for unix) rather than the code.
Yeah, I looked into that (it's FormatMessage) but it's sufficiently hard
to use that I refrained. All the more when I can't actually compile
that code.
--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Nunca se desea ardientemente lo que solo se desea por raz�n" (F. Alexandre)
On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:
Martijn van Oosterhout wrote:
On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:
Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?
We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().
So I think it's a good idea to do it, yes.
The Windows API provides a way to get the error message associated
with the code. It seems it would be more user-friendly if the message
was displayed (just as for unix) rather than the code.Yeah, I looked into that (it's FormatMessage) but it's sufficiently hard
to use that I refrained. All the more when I can't actually compile
that code.
It's fairly trivial to use, really. See for example pg_SSPI_error in
auth.c. But if it only happens rarely, we shouldn't need it. Let's try it
with just an int for now, and we can perhaps detect the specific cases when
it happens often, and deal with those separately.
I can put together one that uses messages if you want me to (since I can
compile it...) though. I assume you consider your current patch safe enough
not to need help buiklding that part before it hits the buildfarm :-P
//Magnus
On Wed, Nov 28, 2007 at 05:20:46PM +0100, Magnus Hagander wrote:
On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:
Martijn van Oosterhout wrote:
On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:
Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().So I think it's a good idea to do it, yes.
Thinking about this some more, I think this is a better patch - we already
have a function that takes care of this, including both error and debug
logging. Anybody disagree? If not, I'll go ahead and apply it...
//Magnus
Attachments:
win32err.patchtext/plain; charset=us-asciiDownload+17-17
Magnus Hagander wrote:
On Wed, Nov 28, 2007 at 05:20:46PM +0100, Magnus Hagander wrote:
On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:
Martijn van Oosterhout wrote:
On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:
Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().So I think it's a good idea to do it, yes.
Thinking about this some more, I think this is a better patch - we already
have a function that takes care of this, including both error and debug
logging. Anybody disagree? If not, I'll go ahead and apply it...
Hmm, but this still mixes some error codes. To absolutely get the
Windows error code you would have to be running with DEBUG5, which I
don't think is acceptable for a production system during any interesting
length of time ... Can we have that DEBUG5 message changed to LOG
instead?
Or maybe have _dosmaperr receive the elevel with which to report the
message as a parameter, and have current callers use DEBUG5, and
pgwin32_open use LOG. That way we can backpatch it to 8.2 without
changing current behavior.
--
Alvaro Herrera Valdivia, Chile ICBM: S 39� 49' 18.1", W 73� 13' 56.4"
"Prefiero omelette con amigos que caviar con tontos"
(Alain Nonnet)
On Thu, Nov 29, 2007 at 09:09:47AM -0300, Alvaro Herrera wrote:
Magnus Hagander wrote:
On Wed, Nov 28, 2007 at 05:20:46PM +0100, Magnus Hagander wrote:
On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:
Martijn van Oosterhout wrote:
On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:
Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().So I think it's a good idea to do it, yes.
Thinking about this some more, I think this is a better patch - we already
have a function that takes care of this, including both error and debug
logging. Anybody disagree? If not, I'll go ahead and apply it...Hmm, but this still mixes some error codes. To absolutely get the
Windows error code you would have to be running with DEBUG5, which I
don't think is acceptable for a production system during any interesting
length of time ... Can we have that DEBUG5 message changed to LOG
instead?
Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error. For example, a file that doesn't exist
doesn't necessarily mean it's an error... I don't want to have to go
through all code-paths that end up calling that function to see if that may
be so...
We can safely drop it to some lower level DEBUG though, maybe DEBUG1?
But it is true that we have mixed error codes. But we only do that when we
know they're actually there. If we have an unknown code, we don't just
return it as EINVAL without logging (as open did before) - and that log
goes out as LOG.
(the open code already mixed two win32 codes into ENOENT already, btw)
Or maybe have _dosmaperr receive the elevel with which to report the
message as a parameter, and have current callers use DEBUG5, and
pgwin32_open use LOG. That way we can backpatch it to 8.2 without
changing current behavior.
I don't think we can, or at least should, touch the signature for
_dosmaperr(). It's a system replacment, I think we should keep the same
signature for it.
//Magnus
Magnus Hagander wrote:
On Thu, Nov 29, 2007 at 09:09:47AM -0300, Alvaro Herrera wrote:
Magnus Hagander wrote:
On Wed, Nov 28, 2007 at 05:20:46PM +0100, Magnus Hagander wrote:
On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:
Martijn van Oosterhout wrote:
On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:
Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().So I think it's a good idea to do it, yes.
Thinking about this some more, I think this is a better patch - we already
have a function that takes care of this, including both error and debug
logging. Anybody disagree? If not, I'll go ahead and apply it...Hmm, but this still mixes some error codes. To absolutely get the
Windows error code you would have to be running with DEBUG5, which I
don't think is acceptable for a production system during any interesting
length of time ... Can we have that DEBUG5 message changed to LOG
instead?Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error. For example, a file that doesn't exist
doesn't necessarily mean it's an error... I don't want to have to go
through all code-paths that end up calling that function to see if that may
be so...
I just checked. I see there are only five callers. In three cases (two
in file/fd.c and one in port/dirent.c), there is at a single error code
which is "possibly expected". It is taken care of without calling
_dosmaperr at all. In syslogger.c there are two possibly expected error
codes, dealt with in the same way. And the last caller is
port/getrusage.c, which has no possibly-expected error code.
So I don't think this is a concern -- whenever _dosmaperr is called, a
"true" error message is already going to be logged.
(The only case where a message would be logged inappropriately would be
in file/fd.c if _dosmaperr returned EINTR, but AFAICS we don't map any
code to that).
But it is true that we have mixed error codes. But we only do that when we
know they're actually there. If we have an unknown code, we don't just
return it as EINVAL without logging (as open did before) - and that log
goes out as LOG.
Yeah, I understand. But for example there are several different cases
that are mapped to EACCES. In the cases we're currently following,
having the exact error code could prove crucial to determining the cause
of the error.
(the open code already mixed two win32 codes into ENOENT already, btw)
Yeah, I saw that, but it's not really a problem because the errors are
"file does not exist" and "dir does not exist", and it can be very
easily confirmed whether the file and dir actually exist or not. But if
you have ERROR_LOCK_VIOLATION reported identically to
ERROR_SHARING_VIOLATION, or whatever, there's no way you can tell which
one actually occured.
Or maybe have _dosmaperr receive the elevel with which to report the
message as a parameter, and have current callers use DEBUG5, and
pgwin32_open use LOG. That way we can backpatch it to 8.2 without
changing current behavior.I don't think we can, or at least should, touch the signature for
_dosmaperr(). It's a system replacment, I think we should keep the same
signature for it.
Good point. If this were truly a problem we could make another routine
with the same code called differently, but I don't think it's really
important.
--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)
On Thu, Nov 29, 2007 at 09:43:30AM -0300, Alvaro Herrera wrote:
Magnus Hagander wrote:
On Thu, Nov 29, 2007 at 09:09:47AM -0300, Alvaro Herrera wrote:
Magnus Hagander wrote:
On Wed, Nov 28, 2007 at 05:20:46PM +0100, Magnus Hagander wrote:
On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:
Martijn van Oosterhout wrote:
On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:
Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().So I think it's a good idea to do it, yes.
Thinking about this some more, I think this is a better patch - we already
have a function that takes care of this, including both error and debug
logging. Anybody disagree? If not, I'll go ahead and apply it...Hmm, but this still mixes some error codes. To absolutely get the
Windows error code you would have to be running with DEBUG5, which I
don't think is acceptable for a production system during any interesting
length of time ... Can we have that DEBUG5 message changed to LOG
instead?Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error. For example, a file that doesn't exist
doesn't necessarily mean it's an error... I don't want to have to go
through all code-paths that end up calling that function to see if that may
be so...I just checked. I see there are only five callers. In three cases (two
in file/fd.c and one in port/dirent.c), there is at a single error code
which is "possibly expected". It is taken care of without calling
_dosmaperr at all. In syslogger.c there are two possibly expected error
codes, dealt with in the same way. And the last caller is
port/getrusage.c, which has no possibly-expected error code.So I don't think this is a concern -- whenever _dosmaperr is called, a
"true" error message is already going to be logged.
What about all points that call readdir() which maps to that acll in
port/dirent.c?
If we can disregard that problem, then I think it's good to increase the
level of logging for that one to either NOTICE or LOG.
(The only case where a message would be logged inappropriately would be
in file/fd.c if _dosmaperr returned EINTR, but AFAICS we don't map any
code to that).
No, we don't - the concept of EINTR doesn't really exist on win32, since
there are no signals..
But it is true that we have mixed error codes. But we only do that when we
know they're actually there. If we have an unknown code, we don't just
return it as EINVAL without logging (as open did before) - and that log
goes out as LOG.Yeah, I understand. But for example there are several different cases
that are mapped to EACCES. In the cases we're currently following,
having the exact error code could prove crucial to determining the cause
of the error.
Yeah, agreed.
//Magnus
Magnus Hagander wrote:
On Thu, Nov 29, 2007 at 09:43:30AM -0300, Alvaro Herrera wrote:
Magnus Hagander wrote:
Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error. For example, a file that doesn't exist
doesn't necessarily mean it's an error... I don't want to have to go
through all code-paths that end up calling that function to see if that may
be so...I just checked. I see there are only five callers. In three cases (two
in file/fd.c and one in port/dirent.c), there is at a single error code
which is "possibly expected". It is taken care of without calling
_dosmaperr at all. In syslogger.c there are two possibly expected error
codes, dealt with in the same way. And the last caller is
port/getrusage.c, which has no possibly-expected error code.So I don't think this is a concern -- whenever _dosmaperr is called, a
"true" error message is already going to be logged.What about all points that call readdir() which maps to that acll in
port/dirent.c?
Sorry, I don't follow. I think the expected case is that FindNextFile
fails with ERROR_NO_MORE_FILES when there are no more files, on which
case we don't call _dosmaperr.
...
Oh, I see what you mean: for the unexpected cases that readdir() does
call _dosmaperr, readdir returns NULL but what does the caller do?
The good news is that most callers of readdir are in frontend programs:
pg_standby, initdb, pg_resetxlog. There are two callers in the backend:
file/fd.c again, which already calls ereport(ERROR) if anything weird
happen, and pgfnames() which also logs a WARNING.
Callers in frontend programs are not a problem, because the current
_dosmaperr already calls fprintf(stderr) with the code mapping message
in all cases.
Hmm, I just noticed a bug in those fprintf calls -- they are missing
the terminating newline. Please change that too if you're going to
patch this part of the code. In order to avoid translation problems, I
think it should be like this:
fprintf(stderr, _("mapped win32 error code %lu to %d" "\n"), e, errno);
Thanks!
--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
One man's impedance mismatch is another man's layer of abstraction.
(Lincoln Yeoh)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Magnus Hagander wrote:
Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error.
I'm very concerned about that too, and think that DEBUG5 is just fine.
Hmm, I just noticed a bug in those fprintf calls -- they are missing
the terminating newline. Please change that too if you're going to
patch this part of the code. In order to avoid translation problems, I
think it should be like this:
fprintf(stderr, _("mapped win32 error code %lu to %d" "\n"), e, errno);
That's both unreadable and useless, because gettext will smash it to one
string anyway.
regards, tom lane
Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Magnus Hagander wrote:
Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error.I'm very concerned about that too, and think that DEBUG5 is just fine.
Well, the analysis was already done which says this shouldn't be a problem.
Hmm, I just noticed a bug in those fprintf calls -- they are missing
the terminating newline. Please change that too if you're going to
patch this part of the code. In order to avoid translation problems, I
think it should be like this:fprintf(stderr, _("mapped win32 error code %lu to %d" "\n"), e, errno);
That's both unreadable and useless, because gettext will smash it to one
string anyway.
Huh, right. I meant to have the second string outside the _() call, but
I don't think that can be made to work easily.
--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Puedes elegir el color de tu auto -- siempre y cuando sea negro."
(Henry Ford)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Tom Lane wrote:
I'm very concerned about that too, and think that DEBUG5 is just fine.
Well, the analysis was already done which says this shouldn't be a problem.
That analysis is full of holes --- FileRead and FileWrite for starters.
regards, tom lane
Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Tom Lane wrote:
I'm very concerned about that too, and think that DEBUG5 is just fine.
Well, the analysis was already done which says this shouldn't be a problem.
That analysis is full of holes --- FileRead and FileWrite for starters.
I already did. The case where they retry do not call _dosmaperr.
--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"No es bueno caminar con un hombre muerto"
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Tom Lane wrote:
That analysis is full of holes --- FileRead and FileWrite for starters.
I already did. The case where they retry do not call _dosmaperr.
What's retry got to do with it? What's displeasing me is the idea of
LOG messages showing up during perfectly normal operation.
regards, tom lane
Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Tom Lane wrote:
That analysis is full of holes --- FileRead and FileWrite for starters.
I already did. The case where they retry do not call _dosmaperr.
What's retry got to do with it? What's displeasing me is the idea of
LOG messages showing up during perfectly normal operation.
Oh, I see your point. It's that those routines will sometimes be
called, they return an error, and this is *ignored* by the caller. In
the case of FileRead, the only such caller is ExecHashJoinGetSavedTuple.
In the case of FileWrite, the thing is quite a bit more difficult to
follow, but it goes through BufFileWrite and BufFileFlush.
So yeah, it seems there is valid code trying to call FileRead and
FileWrite, have it error out, and silently ignore the error. I'm not
going to argue this late in the cycle that all that code be changed, so
I think a reasonable compromise is to turn the ereport() in _dosmaperr
to DEBUG1 instead. That way it won't clutter any log by default, and in
the cases where we're actually interested in tracking the problematic
situation, we don't need to get huge amounts of log traffic coming from
other parts of the system.
All the cases where BufFileFlush is called and it ignores an error are
bugs. I think it's quite safe to modify BufFileFlush to ereport(ERROR)
if it cannot do what it was asked.
And all the callers of BufFileWrite immediately ereport() if it cannot
write the specified amount of bytes.
So there is exactly one case where these routines would be unnecessarily
noisy, and that is ExecHashJoinGetSavedTuple.
--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"No necesitamos banderas
No reconocemos fronteras" (Jorge Gonz�lez)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
I think a reasonable compromise is to turn the ereport() in _dosmaperr
to DEBUG1 instead. That way it won't clutter any log by default, and in
the cases where we're actually interested in tracking the problematic
situation, we don't need to get huge amounts of log traffic coming from
other parts of the system.
I'm still not convinced what you think the problematic situation is.
It's already the case (and reasonable, I think) that _dosmaperr issues a
LOG message if it sees a GetLastError code it doesn't recognize; that
addresses the problem that this thread started with. Why do we need to
make the success case chattier?
regards, tom lane
Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
I think a reasonable compromise is to turn the ereport() in _dosmaperr
to DEBUG1 instead. That way it won't clutter any log by default, and in
the cases where we're actually interested in tracking the problematic
situation, we don't need to get huge amounts of log traffic coming from
other parts of the system.I'm still not convinced what you think the problematic situation is.
It's already the case (and reasonable, I think) that _dosmaperr issues a
LOG message if it sees a GetLastError code it doesn't recognize; that
addresses the problem that this thread started with. Why do we need to
make the success case chattier?
I believe Alvaros point is that several different GetLastError codes map
to the same errno code, making it impossible to see the difference
between those errors.
//Magnus
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
I'm still not convinced what you think the problematic situation is.
I believe Alvaros point is that several different GetLastError codes map
to the same errno code, making it impossible to see the difference
between those errors.
(1) If we need to know the difference, then we shouldn't be mapping them
to the same thing.
(2) Do we have any live cases where we must know this? Much less enough
cases to justify a global increase in the log chattiness? It's not like
we cannot get the information if we really need it.
regards, tom lane
Tom Lane wrote:
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
I'm still not convinced what you think the problematic situation is.
I believe Alvaros point is that several different GetLastError codes map
to the same errno code, making it impossible to see the difference
between those errors.(1) If we need to know the difference, then we shouldn't be mapping them
to the same thing.
I totally agree with that, but the problem is the API is not very
flexible in this regard. We can only use error codes that strerror() is
going to recognize.
A radical idea would be to store the Windows error code from
GetLastError in a new field in ErrorData, and display it in the next
ereport(). Perhaps use a special error callback to add an errcontext in
the possibly problematic cases.
(2) Do we have any live cases where we must know this? Much less enough
cases to justify a global increase in the log chattiness? It's not like
we cannot get the information if we really need it.
Yes. This thread shows the problem:
http://archives.postgresql.org/pgsql-es-ayuda/2007-11/msg00354.php
Basically he is getting this error:
2007-11-16 14:54:16 ERROR: could not open relation 1663/16403/16487: Invalid argument
and no further indication of what's going on. I did some searching a
couple of days ago and none of the cases I could find got a resolution.
http://archives.free.net.ph/message/20060930.223306.6ac7d657.en.html
http://www.mydatabasesupport.com/forums/postgresql/314656-general-postgre-crash-currval-problem-help.html
--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem." (Tom Lane)