pgwin32_open returning EINVAL

Started by Alvaro Herreraabout 18 years ago67 messages
#1Alvaro Herrera
alvherre@alvh.no-ip.org
1 attachment(s)

Hi,

We've seen several cases of people (running 8.1 or 8.2) that see
messages like this:

2007-11-26 11:41:59 ERROR: could not open relation 1663/352369/353685: Invalid argument

The platform is Win32.

The problem is that pgwin32_open reduces any error code from
GetLastError that's not ERROR_PATH_NOT_FOUND, ERROR_FILE_NOT_FOUND,
ERROR_FILE_EXISTS or ERROR_ACCESS_DENIED into EINVAL, without further
notice. This is problematic because the true cause of the problem is
being hidden and it's hard to track down what's going on.

Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"Las navajas y los monos deben estar siempre distantes" (Germ�n Poo)

Attachments:

pgwin32_open.patchtext/x-diff; charset=us-asciiDownload
Index: src/port/open.c
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/port/open.c,v
retrieving revision 1.21
diff -c -p -r1.21 open.c
*** src/port/open.c	15 Nov 2007 21:14:46 -0000	1.21
--- src/port/open.c	28 Nov 2007 14:56:04 -0000
*************** pgwin32_open(const char *fileName, int f
*** 88,94 ****
  						((fileFlags & O_DSYNC) ? FILE_FLAG_WRITE_THROUGH : 0),
  						NULL)) == INVALID_HANDLE_VALUE)
  	{
! 		switch (GetLastError())
  		{
  				/* EMFILE, ENFILE should not occur from CreateFile. */
  			case ERROR_PATH_NOT_FOUND:
--- 88,96 ----
  						((fileFlags & O_DSYNC) ? FILE_FLAG_WRITE_THROUGH : 0),
  						NULL)) == INVALID_HANDLE_VALUE)
  	{
! 		DWORD	err = GetLastError();
! 
! 		switch (err)
  		{
  				/* EMFILE, ENFILE should not occur from CreateFile. */
  			case ERROR_PATH_NOT_FOUND:
*************** pgwin32_open(const char *fileName, int f
*** 102,107 ****
--- 104,111 ----
  				errno = EACCES;
  				break;
  			default:
+ 				ereport(LOG,
+ 						(errmsg("win32 error code: %d", err)));
  				errno = EINVAL;
  		}
  		return -1;
#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Alvaro Herrera (#1)
Re: pgwin32_open returning EINVAL

On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:

Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?

The Windows API provides a way to get the error message associated
with the code. It seems it would be more user-friendly if the message
was displayed (just as for unix) rather than the code.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy

#3Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Martijn van Oosterhout (#2)
Re: pgwin32_open returning EINVAL

Martijn van Oosterhout wrote:

On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:

Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?

The Windows API provides a way to get the error message associated
with the code. It seems it would be more user-friendly if the message
was displayed (just as for unix) rather than the code.

Yeah, I looked into that (it's FormatMessage) but it's sufficiently hard
to use that I refrained. All the more when I can't actually compile
that code.

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Nunca se desea ardientemente lo que solo se desea por raz�n" (F. Alexandre)

#4Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#3)
Re: pgwin32_open returning EINVAL

On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:

Martijn van Oosterhout wrote:

On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:

Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?

We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().

So I think it's a good idea to do it, yes.

The Windows API provides a way to get the error message associated
with the code. It seems it would be more user-friendly if the message
was displayed (just as for unix) rather than the code.

Yeah, I looked into that (it's FormatMessage) but it's sufficiently hard
to use that I refrained. All the more when I can't actually compile
that code.

It's fairly trivial to use, really. See for example pg_SSPI_error in
auth.c. But if it only happens rarely, we shouldn't need it. Let's try it
with just an int for now, and we can perhaps detect the specific cases when
it happens often, and deal with those separately.

I can put together one that uses messages if you want me to (since I can
compile it...) though. I assume you consider your current patch safe enough
not to need help buiklding that part before it hits the buildfarm :-P

//Magnus

#5Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#4)
1 attachment(s)
Re: pgwin32_open returning EINVAL

On Wed, Nov 28, 2007 at 05:20:46PM +0100, Magnus Hagander wrote:

On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:

Martijn van Oosterhout wrote:

On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:

Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?

We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().

So I think it's a good idea to do it, yes.

Thinking about this some more, I think this is a better patch - we already
have a function that takes care of this, including both error and debug
logging. Anybody disagree? If not, I'll go ahead and apply it...

//Magnus

Attachments:

win32err.patchtext/plain; charset=us-asciiDownload
Index: src/port/open.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/port/open.c,v
retrieving revision 1.21
diff -c -r1.21 open.c
*** src/port/open.c	15 Nov 2007 21:14:46 -0000	1.21
--- src/port/open.c	29 Nov 2007 11:44:46 -0000
***************
*** 88,109 ****
  						((fileFlags & O_DSYNC) ? FILE_FLAG_WRITE_THROUGH : 0),
  						NULL)) == INVALID_HANDLE_VALUE)
  	{
! 		switch (GetLastError())
! 		{
! 				/* EMFILE, ENFILE should not occur from CreateFile. */
! 			case ERROR_PATH_NOT_FOUND:
! 			case ERROR_FILE_NOT_FOUND:
! 				errno = ENOENT;
! 				break;
! 			case ERROR_FILE_EXISTS:
! 				errno = EEXIST;
! 				break;
! 			case ERROR_ACCESS_DENIED:
! 				errno = EACCES;
! 				break;
! 			default:
! 				errno = EINVAL;
! 		}
  		return -1;
  	}

--- 88,94 ----
  						((fileFlags & O_DSYNC) ? FILE_FLAG_WRITE_THROUGH : 0),
  						NULL)) == INVALID_HANDLE_VALUE)
  	{
! 		_dosmaperr(GetLastError());
  		return -1;
  	}

#6Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Magnus Hagander (#5)
Re: pgwin32_open returning EINVAL

Magnus Hagander wrote:

On Wed, Nov 28, 2007 at 05:20:46PM +0100, Magnus Hagander wrote:

On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:

Martijn van Oosterhout wrote:

On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:

Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?

We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().

So I think it's a good idea to do it, yes.

Thinking about this some more, I think this is a better patch - we already
have a function that takes care of this, including both error and debug
logging. Anybody disagree? If not, I'll go ahead and apply it...

Hmm, but this still mixes some error codes. To absolutely get the
Windows error code you would have to be running with DEBUG5, which I
don't think is acceptable for a production system during any interesting
length of time ... Can we have that DEBUG5 message changed to LOG
instead?

Or maybe have _dosmaperr receive the elevel with which to report the
message as a parameter, and have current callers use DEBUG5, and
pgwin32_open use LOG. That way we can backpatch it to 8.2 without
changing current behavior.

--
Alvaro Herrera Valdivia, Chile ICBM: S 39� 49' 18.1", W 73� 13' 56.4"
"Prefiero omelette con amigos que caviar con tontos"
(Alain Nonnet)

#7Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#6)
Re: pgwin32_open returning EINVAL

On Thu, Nov 29, 2007 at 09:09:47AM -0300, Alvaro Herrera wrote:

Magnus Hagander wrote:

On Wed, Nov 28, 2007 at 05:20:46PM +0100, Magnus Hagander wrote:

On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:

Martijn van Oosterhout wrote:

On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:

Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?

We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().

So I think it's a good idea to do it, yes.

Thinking about this some more, I think this is a better patch - we already
have a function that takes care of this, including both error and debug
logging. Anybody disagree? If not, I'll go ahead and apply it...

Hmm, but this still mixes some error codes. To absolutely get the
Windows error code you would have to be running with DEBUG5, which I
don't think is acceptable for a production system during any interesting
length of time ... Can we have that DEBUG5 message changed to LOG
instead?

Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error. For example, a file that doesn't exist
doesn't necessarily mean it's an error... I don't want to have to go
through all code-paths that end up calling that function to see if that may
be so...

We can safely drop it to some lower level DEBUG though, maybe DEBUG1?

But it is true that we have mixed error codes. But we only do that when we
know they're actually there. If we have an unknown code, we don't just
return it as EINVAL without logging (as open did before) - and that log
goes out as LOG.

(the open code already mixed two win32 codes into ENOENT already, btw)

Or maybe have _dosmaperr receive the elevel with which to report the
message as a parameter, and have current callers use DEBUG5, and
pgwin32_open use LOG. That way we can backpatch it to 8.2 without
changing current behavior.

I don't think we can, or at least should, touch the signature for
_dosmaperr(). It's a system replacment, I think we should keep the same
signature for it.

//Magnus

#8Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Magnus Hagander (#7)
Re: pgwin32_open returning EINVAL

Magnus Hagander wrote:

On Thu, Nov 29, 2007 at 09:09:47AM -0300, Alvaro Herrera wrote:

Magnus Hagander wrote:

On Wed, Nov 28, 2007 at 05:20:46PM +0100, Magnus Hagander wrote:

On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:

Martijn van Oosterhout wrote:

On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:

Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?

We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().

So I think it's a good idea to do it, yes.

Thinking about this some more, I think this is a better patch - we already
have a function that takes care of this, including both error and debug
logging. Anybody disagree? If not, I'll go ahead and apply it...

Hmm, but this still mixes some error codes. To absolutely get the
Windows error code you would have to be running with DEBUG5, which I
don't think is acceptable for a production system during any interesting
length of time ... Can we have that DEBUG5 message changed to LOG
instead?

Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error. For example, a file that doesn't exist
doesn't necessarily mean it's an error... I don't want to have to go
through all code-paths that end up calling that function to see if that may
be so...

I just checked. I see there are only five callers. In three cases (two
in file/fd.c and one in port/dirent.c), there is at a single error code
which is "possibly expected". It is taken care of without calling
_dosmaperr at all. In syslogger.c there are two possibly expected error
codes, dealt with in the same way. And the last caller is
port/getrusage.c, which has no possibly-expected error code.

So I don't think this is a concern -- whenever _dosmaperr is called, a
"true" error message is already going to be logged.

(The only case where a message would be logged inappropriately would be
in file/fd.c if _dosmaperr returned EINTR, but AFAICS we don't map any
code to that).

But it is true that we have mixed error codes. But we only do that when we
know they're actually there. If we have an unknown code, we don't just
return it as EINVAL without logging (as open did before) - and that log
goes out as LOG.

Yeah, I understand. But for example there are several different cases
that are mapped to EACCES. In the cases we're currently following,
having the exact error code could prove crucial to determining the cause
of the error.

(the open code already mixed two win32 codes into ENOENT already, btw)

Yeah, I saw that, but it's not really a problem because the errors are
"file does not exist" and "dir does not exist", and it can be very
easily confirmed whether the file and dir actually exist or not. But if
you have ERROR_LOCK_VIOLATION reported identically to
ERROR_SHARING_VIOLATION, or whatever, there's no way you can tell which
one actually occured.

Or maybe have _dosmaperr receive the elevel with which to report the
message as a parameter, and have current callers use DEBUG5, and
pgwin32_open use LOG. That way we can backpatch it to 8.2 without
changing current behavior.

I don't think we can, or at least should, touch the signature for
_dosmaperr(). It's a system replacment, I think we should keep the same
signature for it.

Good point. If this were truly a problem we could make another routine
with the same code called differently, but I don't think it's really
important.

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

#9Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#8)
Re: pgwin32_open returning EINVAL

On Thu, Nov 29, 2007 at 09:43:30AM -0300, Alvaro Herrera wrote:

Magnus Hagander wrote:

On Thu, Nov 29, 2007 at 09:09:47AM -0300, Alvaro Herrera wrote:

Magnus Hagander wrote:

On Wed, Nov 28, 2007 at 05:20:46PM +0100, Magnus Hagander wrote:

On Wed, Nov 28, 2007 at 12:24:26PM -0300, Alvaro Herrera wrote:

Martijn van Oosterhout wrote:

On Wed, Nov 28, 2007 at 11:57:35AM -0300, Alvaro Herrera wrote:

Can we do something like this to report the Win32 error code so that the
user has a higher chance of figuring out what's going on?

We already do something very similar to what you're donig in
backend/port/wni32/socket.c :: TranslateSocketError().

So I think it's a good idea to do it, yes.

Thinking about this some more, I think this is a better patch - we already
have a function that takes care of this, including both error and debug
logging. Anybody disagree? If not, I'll go ahead and apply it...

Hmm, but this still mixes some error codes. To absolutely get the
Windows error code you would have to be running with DEBUG5, which I
don't think is acceptable for a production system during any interesting
length of time ... Can we have that DEBUG5 message changed to LOG
instead?

Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error. For example, a file that doesn't exist
doesn't necessarily mean it's an error... I don't want to have to go
through all code-paths that end up calling that function to see if that may
be so...

I just checked. I see there are only five callers. In three cases (two
in file/fd.c and one in port/dirent.c), there is at a single error code
which is "possibly expected". It is taken care of without calling
_dosmaperr at all. In syslogger.c there are two possibly expected error
codes, dealt with in the same way. And the last caller is
port/getrusage.c, which has no possibly-expected error code.

So I don't think this is a concern -- whenever _dosmaperr is called, a
"true" error message is already going to be logged.

What about all points that call readdir() which maps to that acll in
port/dirent.c?

If we can disregard that problem, then I think it's good to increase the
level of logging for that one to either NOTICE or LOG.

(The only case where a message would be logged inappropriately would be
in file/fd.c if _dosmaperr returned EINTR, but AFAICS we don't map any
code to that).

No, we don't - the concept of EINTR doesn't really exist on win32, since
there are no signals..

But it is true that we have mixed error codes. But we only do that when we
know they're actually there. If we have an unknown code, we don't just
return it as EINVAL without logging (as open did before) - and that log
goes out as LOG.

Yeah, I understand. But for example there are several different cases
that are mapped to EACCES. In the cases we're currently following,
having the exact error code could prove crucial to determining the cause
of the error.

Yeah, agreed.

//Magnus

#10Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Magnus Hagander (#9)
Re: pgwin32_open returning EINVAL

Magnus Hagander wrote:

On Thu, Nov 29, 2007 at 09:43:30AM -0300, Alvaro Herrera wrote:

Magnus Hagander wrote:

Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error. For example, a file that doesn't exist
doesn't necessarily mean it's an error... I don't want to have to go
through all code-paths that end up calling that function to see if that may
be so...

I just checked. I see there are only five callers. In three cases (two
in file/fd.c and one in port/dirent.c), there is at a single error code
which is "possibly expected". It is taken care of without calling
_dosmaperr at all. In syslogger.c there are two possibly expected error
codes, dealt with in the same way. And the last caller is
port/getrusage.c, which has no possibly-expected error code.

So I don't think this is a concern -- whenever _dosmaperr is called, a
"true" error message is already going to be logged.

What about all points that call readdir() which maps to that acll in
port/dirent.c?

Sorry, I don't follow. I think the expected case is that FindNextFile
fails with ERROR_NO_MORE_FILES when there are no more files, on which
case we don't call _dosmaperr.

...

Oh, I see what you mean: for the unexpected cases that readdir() does
call _dosmaperr, readdir returns NULL but what does the caller do?

The good news is that most callers of readdir are in frontend programs:
pg_standby, initdb, pg_resetxlog. There are two callers in the backend:
file/fd.c again, which already calls ereport(ERROR) if anything weird
happen, and pgfnames() which also logs a WARNING.

Callers in frontend programs are not a problem, because the current
_dosmaperr already calls fprintf(stderr) with the code mapping message
in all cases.

Hmm, I just noticed a bug in those fprintf calls -- they are missing
the terminating newline. Please change that too if you're going to
patch this part of the code. In order to avoid translation problems, I
think it should be like this:

fprintf(stderr, _("mapped win32 error code %lu to %d" "\n"), e, errno);

Thanks!

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
One man's impedance mismatch is another man's layer of abstraction.
(Lincoln Yeoh)

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#10)
Re: pgwin32_open returning EINVAL

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Magnus Hagander wrote:

Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error.

I'm very concerned about that too, and think that DEBUG5 is just fine.

Hmm, I just noticed a bug in those fprintf calls -- they are missing
the terminating newline. Please change that too if you're going to
patch this part of the code. In order to avoid translation problems, I
think it should be like this:

fprintf(stderr, _("mapped win32 error code %lu to %d" "\n"), e, errno);

That's both unreadable and useless, because gettext will smash it to one
string anyway.

regards, tom lane

#12Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#11)
Re: pgwin32_open returning EINVAL

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Magnus Hagander wrote:

Maybe. I'm concerned we might end up logging a whole lot more, for cases
where it's not an actual error.

I'm very concerned about that too, and think that DEBUG5 is just fine.

Well, the analysis was already done which says this shouldn't be a problem.

Hmm, I just noticed a bug in those fprintf calls -- they are missing
the terminating newline. Please change that too if you're going to
patch this part of the code. In order to avoid translation problems, I
think it should be like this:

fprintf(stderr, _("mapped win32 error code %lu to %d" "\n"), e, errno);

That's both unreadable and useless, because gettext will smash it to one
string anyway.

Huh, right. I meant to have the second string outside the _() call, but
I don't think that can be made to work easily.

--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"Puedes elegir el color de tu auto -- siempre y cuando sea negro."
(Henry Ford)

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#12)
Re: pgwin32_open returning EINVAL

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

I'm very concerned about that too, and think that DEBUG5 is just fine.

Well, the analysis was already done which says this shouldn't be a problem.

That analysis is full of holes --- FileRead and FileWrite for starters.

regards, tom lane

#14Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#13)
Re: pgwin32_open returning EINVAL

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

I'm very concerned about that too, and think that DEBUG5 is just fine.

Well, the analysis was already done which says this shouldn't be a problem.

That analysis is full of holes --- FileRead and FileWrite for starters.

I already did. The case where they retry do not call _dosmaperr.

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"No es bueno caminar con un hombre muerto"

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#14)
Re: pgwin32_open returning EINVAL

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

That analysis is full of holes --- FileRead and FileWrite for starters.

I already did. The case where they retry do not call _dosmaperr.

What's retry got to do with it? What's displeasing me is the idea of
LOG messages showing up during perfectly normal operation.

regards, tom lane

#16Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#15)
Re: pgwin32_open returning EINVAL

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

That analysis is full of holes --- FileRead and FileWrite for starters.

I already did. The case where they retry do not call _dosmaperr.

What's retry got to do with it? What's displeasing me is the idea of
LOG messages showing up during perfectly normal operation.

Oh, I see your point. It's that those routines will sometimes be
called, they return an error, and this is *ignored* by the caller. In
the case of FileRead, the only such caller is ExecHashJoinGetSavedTuple.

In the case of FileWrite, the thing is quite a bit more difficult to
follow, but it goes through BufFileWrite and BufFileFlush.

So yeah, it seems there is valid code trying to call FileRead and
FileWrite, have it error out, and silently ignore the error. I'm not
going to argue this late in the cycle that all that code be changed, so
I think a reasonable compromise is to turn the ereport() in _dosmaperr
to DEBUG1 instead. That way it won't clutter any log by default, and in
the cases where we're actually interested in tracking the problematic
situation, we don't need to get huge amounts of log traffic coming from
other parts of the system.

All the cases where BufFileFlush is called and it ignores an error are
bugs. I think it's quite safe to modify BufFileFlush to ereport(ERROR)
if it cannot do what it was asked.

And all the callers of BufFileWrite immediately ereport() if it cannot
write the specified amount of bytes.

So there is exactly one case where these routines would be unnecessarily
noisy, and that is ExecHashJoinGetSavedTuple.

--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"No necesitamos banderas
No reconocemos fronteras" (Jorge Gonz�lez)

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#16)
Re: pgwin32_open returning EINVAL

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

I think a reasonable compromise is to turn the ereport() in _dosmaperr
to DEBUG1 instead. That way it won't clutter any log by default, and in
the cases where we're actually interested in tracking the problematic
situation, we don't need to get huge amounts of log traffic coming from
other parts of the system.

I'm still not convinced what you think the problematic situation is.
It's already the case (and reasonable, I think) that _dosmaperr issues a
LOG message if it sees a GetLastError code it doesn't recognize; that
addresses the problem that this thread started with. Why do we need to
make the success case chattier?

regards, tom lane

#18Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#17)
Re: pgwin32_open returning EINVAL

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

I think a reasonable compromise is to turn the ereport() in _dosmaperr
to DEBUG1 instead. That way it won't clutter any log by default, and in
the cases where we're actually interested in tracking the problematic
situation, we don't need to get huge amounts of log traffic coming from
other parts of the system.

I'm still not convinced what you think the problematic situation is.
It's already the case (and reasonable, I think) that _dosmaperr issues a
LOG message if it sees a GetLastError code it doesn't recognize; that
addresses the problem that this thread started with. Why do we need to
make the success case chattier?

I believe Alvaros point is that several different GetLastError codes map
to the same errno code, making it impossible to see the difference
between those errors.

//Magnus

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#18)
Re: pgwin32_open returning EINVAL

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

I'm still not convinced what you think the problematic situation is.

I believe Alvaros point is that several different GetLastError codes map
to the same errno code, making it impossible to see the difference
between those errors.

(1) If we need to know the difference, then we shouldn't be mapping them
to the same thing.

(2) Do we have any live cases where we must know this? Much less enough
cases to justify a global increase in the log chattiness? It's not like
we cannot get the information if we really need it.

regards, tom lane

#20Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#19)
Re: pgwin32_open returning EINVAL

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

I'm still not convinced what you think the problematic situation is.

I believe Alvaros point is that several different GetLastError codes map
to the same errno code, making it impossible to see the difference
between those errors.

(1) If we need to know the difference, then we shouldn't be mapping them
to the same thing.

I totally agree with that, but the problem is the API is not very
flexible in this regard. We can only use error codes that strerror() is
going to recognize.

A radical idea would be to store the Windows error code from
GetLastError in a new field in ErrorData, and display it in the next
ereport(). Perhaps use a special error callback to add an errcontext in
the possibly problematic cases.

(2) Do we have any live cases where we must know this? Much less enough
cases to justify a global increase in the log chattiness? It's not like
we cannot get the information if we really need it.

Yes. This thread shows the problem:

http://archives.postgresql.org/pgsql-es-ayuda/2007-11/msg00354.php

Basically he is getting this error:

2007-11-16 14:54:16 ERROR: could not open relation 1663/16403/16487: Invalid argument

and no further indication of what's going on. I did some searching a
couple of days ago and none of the cases I could find got a resolution.

http://archives.free.net.ph/message/20060930.223306.6ac7d657.en.html
http://www.mydatabasesupport.com/forums/postgresql/314656-general-postgre-crash-currval-problem-help.html

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem." (Tom Lane)

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#20)
Re: pgwin32_open returning EINVAL

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

(2) Do we have any live cases where we must know this?

Yes. This thread shows the problem:
http://archives.postgresql.org/pgsql-es-ayuda/2007-11/msg00354.php
Basically he is getting this error:
2007-11-16 14:54:16 ERROR: could not open relation 1663/16403/16487: Invalid argument

Well, since EINVAL is the default result from _dosmaperr, and none of
the cases it represents are "expected", why don't we just remove all of
the explicit mappings to EINVAL from doserrors[]? Then we will get the
LOG message you need, and we won't have to increase the chattiness level
for anything else.

regards, tom lane

#22Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#21)
Re: pgwin32_open returning EINVAL

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

(2) Do we have any live cases where we must know this?

Yes. This thread shows the problem:
http://archives.postgresql.org/pgsql-es-ayuda/2007-11/msg00354.php
Basically he is getting this error:
2007-11-16 14:54:16 ERROR: could not open relation 1663/16403/16487: Invalid argument

Well, since EINVAL is the default result from _dosmaperr, and none of
the cases it represents are "expected", why don't we just remove all of
the explicit mappings to EINVAL from doserrors[]? Then we will get the
LOG message you need, and we won't have to increase the chattiness level
for anything else.

Well, the problematic routine is not already using _dosmaperr currently.
It is doing it's own mapping and neglecting to report anything. In
fact, after all the problems that appeared after Magnus proposed to use
_dosmaperr, I'm inclined to go with my original suggestion: don't use
_dosmaperr at all and instead add an ereport(LOG) with the Windows error
code.

The routine I'm talking about (pgwin32_open) has this:

switch (err)
{
/* EMFILE, ENFILE should not occur from CreateFile. */
case ERROR_PATH_NOT_FOUND:
case ERROR_FILE_NOT_FOUND:
errno = ENOENT;
break;
case ERROR_FILE_EXISTS:
errno = EEXIST;
break;
case ERROR_ACCESS_DENIED:
errno = EACCES;
break;
default:
errno = EINVAL;
}

So _anything_ could be EINVAL. Including the several cases that
_dosmaperr treat as EACCES. So I'm afraid that for this experiment to
be successful, we would have to remove not only the EINVAL cases from
doserrors[], but also any other code that appears more than once on it.
Otherwise the output could be ambiguous.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"El Maquinismo fue proscrito so pena de cosquilleo hasta la muerte"
(Ijon Tichy en Viajes, Stanislaw Lem)

#23Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#22)
Re: pgwin32_open returning EINVAL

Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

(2) Do we have any live cases where we must know this?

Yes. This thread shows the problem:
http://archives.postgresql.org/pgsql-es-ayuda/2007-11/msg00354.php
Basically he is getting this error:
2007-11-16 14:54:16 ERROR: could not open relation 1663/16403/16487: Invalid argument

Well, since EINVAL is the default result from _dosmaperr, and none of
the cases it represents are "expected", why don't we just remove all of
the explicit mappings to EINVAL from doserrors[]? Then we will get the
LOG message you need, and we won't have to increase the chattiness level
for anything else.

Well, the problematic routine is not already using _dosmaperr currently.
It is doing it's own mapping and neglecting to report anything. In
fact, after all the problems that appeared after Magnus proposed to use
_dosmaperr, I'm inclined to go with my original suggestion: don't use
_dosmaperr at all and instead add an ereport(LOG) with the Windows error
code.

That'll just cause the same problem in a different location, no? We'd
still be logging too often?

If we want to avoid duplicate code, we could go around to your idea of
being able to specify the elog level, by creating a _dosmaperr_internal
that takes that as parameter, and then have _dosmaperr call it with
DEBUG5 as argument... But that's only if we think it's ok for
pgwin32_open() to log that much - I thought we didn't like that?

The routine I'm talking about (pgwin32_open) has this:

switch (err)
{
/* EMFILE, ENFILE should not occur from CreateFile. */
case ERROR_PATH_NOT_FOUND:
case ERROR_FILE_NOT_FOUND:
errno = ENOENT;
break;
case ERROR_FILE_EXISTS:
errno = EEXIST;
break;
case ERROR_ACCESS_DENIED:
errno = EACCES;
break;
default:
errno = EINVAL;
}

So _anything_ could be EINVAL. Including the several cases that
_dosmaperr treat as EACCES. So I'm afraid that for this experiment to
be successful, we would have to remove not only the EINVAL cases from
doserrors[], but also any other code that appears more than once on it.
Otherwise the output could be ambiguous.

Could be, but may not be :P

//Magnus

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#22)
Re: pgwin32_open returning EINVAL

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

Well, since EINVAL is the default result from _dosmaperr, and none of
the cases it represents are "expected", why don't we just remove all of
the explicit mappings to EINVAL from doserrors[]?

Well, the problematic routine is not already using _dosmaperr currently.
It is doing it's own mapping and neglecting to report anything.

Oh, well then why are we arguing? There is no reason at all to assume
that _dosmaperr wouldn't give us a sufficiently good fix on the error
if it were only being used. At the very least, I think we should put
in Magnus' patch and find out whether it gives sufficient information.
If it doesn't, then we can think about changing to a higher log level.

regards, tom lane

#25Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#24)
Re: pgwin32_open returning EINVAL

On Thu, Nov 29, 2007 at 05:48:23PM -0500, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

Well, since EINVAL is the default result from _dosmaperr, and none of
the cases it represents are "expected", why don't we just remove all of
the explicit mappings to EINVAL from doserrors[]?

Well, the problematic routine is not already using _dosmaperr currently.
It is doing it's own mapping and neglecting to report anything.

Oh, well then why are we arguing? There is no reason at all to assume
that _dosmaperr wouldn't give us a sufficiently good fix on the error
if it were only being used. At the very least, I think we should put
in Magnus' patch and find out whether it gives sufficient information.
If it doesn't, then we can think about changing to a higher log level.

Applied.

//Magnus

#26Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Alvaro Herrera (#22)
Re: pgwin32_open returning EINVAL

Alvaro Herrera wrote:

So _anything_ could be EINVAL. Including the several cases that
_dosmaperr treat as EACCES. So I'm afraid that for this experiment to
be successful, we would have to remove not only the EINVAL cases from
doserrors[], but also any other code that appears more than once on it.
Otherwise the output could be ambiguous.

OK, this got tested by the original reporter and the result is this:

2007-12-12 10:55:49 ERROR: could not open relation 1663/16403/1247: Permission denied

I think this is EACCES, right? So we now need to know which one it is:

ERROR_ACCESS_DENIED
ERROR_CURRENT_DIRECTORY
ERROR_LOCK_VIOLATION
ERROR_SHARING_VIOLATION
ERROR_NETWORK_ACCESS_DENIED
ERROR_CANNOT_MAKE
ERROR_FAIL_I24
ERROR_DRIVE_LOCKED
ERROR_SEEK_ON_DEVICE
ERROR_NOT_LOCKED
ERROR_LOCK_FAILED

(These are the Win32 codes mapped to EACCES by our _dosmaperr.)

Many of these are nonsensical -- we know this is not a device, nor
network access. Still there is more than one possibility, and I don't
know which ones should be really acceptable in this context or not.
(What's ERROR_FAIL_I24??) SHARING_VIOLATION seems the most likely
problem; an antivirus perhaps?

My suggestion continues to be: these codes oughta be logged separately
from the strerror() message. If there are cases which we know are
deemed to occur in regular operation, let's suppress only those. None
of them is one of these EACCES, AFAIR.

--
Alvaro Herrera http://www.flickr.com/photos/alvherre/
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)

#27Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#26)
Re: pgwin32_open returning EINVAL

On Thu, Dec 13, 2007 at 09:55:33AM -0300, Alvaro Herrera wrote:

Alvaro Herrera wrote:

So _anything_ could be EINVAL. Including the several cases that
_dosmaperr treat as EACCES. So I'm afraid that for this experiment to
be successful, we would have to remove not only the EINVAL cases from
doserrors[], but also any other code that appears more than once on it.
Otherwise the output could be ambiguous.

OK, this got tested by the original reporter and the result is this:

2007-12-12 10:55:49 ERROR: could not open relation 1663/16403/1247: Permission denied

I think this is EACCES, right? So we now need to know which one it is:

ERROR_ACCESS_DENIED
ERROR_CURRENT_DIRECTORY
ERROR_LOCK_VIOLATION
ERROR_SHARING_VIOLATION
ERROR_NETWORK_ACCESS_DENIED
ERROR_CANNOT_MAKE
ERROR_FAIL_I24
ERROR_DRIVE_LOCKED
ERROR_SEEK_ON_DEVICE
ERROR_NOT_LOCKED
ERROR_LOCK_FAILED

(These are the Win32 codes mapped to EACCES by our _dosmaperr.)

Many of these are nonsensical -- we know this is not a device, nor
network access. Still there is more than one possibility, and I don't
know which ones should be really acceptable in this context or not.
(What's ERROR_FAIL_I24??) SHARING_VIOLATION seems the most likely
problem; an antivirus perhaps?

If you have an antivirus running on the system, you really should get rid
of taht long before you start looking at the code...

But you still have a point :-)

I take it it's not reproducible enough to run at DEBUG5 until it happens?

My suggestion continues to be: these codes oughta be logged separately
from the strerror() message. If there are cases which we know are
deemed to occur in regular operation, let's suppress only those. None
of them is one of these EACCES, AFAIR.

Ok. Say we added a third field to the struct named "verbose" which
indicates which values shuold be logged? And then log those separately
("mapped error %i to %i" or similar). The clearest message would be if we
could log it at the point of the caller, but that's not a reasonable change
IMHO (would change every single caller). But this way we could at least get
something traceable. What level would you have that logged?

//Magnus

#28Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Magnus Hagander (#27)
Re: pgwin32_open returning EINVAL

Magnus Hagander wrote:

On Thu, Dec 13, 2007 at 09:55:33AM -0300, Alvaro Herrera wrote:

Many of these are nonsensical -- we know this is not a device, nor
network access. Still there is more than one possibility, and I don't
know which ones should be really acceptable in this context or not.
(What's ERROR_FAIL_I24??) SHARING_VIOLATION seems the most likely
problem; an antivirus perhaps?

If you have an antivirus running on the system, you really should get rid
of taht long before you start looking at the code...

FWIW I noticed by accident that the latest stable version of a
not-competing database system has fixed a related bug:

http://bugs.mysql.com/bug.php?id=9709
(yes, it only took them two and a half years to fix it).

Note that their behavior on finding SHARING_VIOLATION or LOCK_VIOLATION
is to retry forever until the error goes away, on the theory that the
antivirus/backup software will soon release the file.

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Someone said that it is at least an order of magnitude more work to do
production software than a prototype. I think he is wrong by at least
an order of magnitude." (Brian Kernighan)

#29Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#28)
Re: pgwin32_open returning EINVAL

Alvaro Herrera wrote:

Magnus Hagander wrote:

On Thu, Dec 13, 2007 at 09:55:33AM -0300, Alvaro Herrera wrote:

Many of these are nonsensical -- we know this is not a device, nor
network access. Still there is more than one possibility, and I don't
know which ones should be really acceptable in this context or not.
(What's ERROR_FAIL_I24??) SHARING_VIOLATION seems the most likely
problem; an antivirus perhaps?

If you have an antivirus running on the system, you really should get rid
of taht long before you start looking at the code...

FWIW I noticed by accident that the latest stable version of a
not-competing database system has fixed a related bug:

http://bugs.mysql.com/bug.php?id=9709
(yes, it only took them two and a half years to fix it).

Note that their behavior on finding SHARING_VIOLATION or LOCK_VIOLATION
is to retry forever until the error goes away, on the theory that the
antivirus/backup software will soon release the file.

Interesting. Maybe forever is going a bit too far, but retrying for <n>
seconds or so.

So assuming we'd want to do that, how do we do it. If it's just the
"open" operation that needs it, we can probably just stick the retry in
the port file for that. But do we need to be able to retry on say
read/write as well?

//Magnus

#30Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#29)
Re: pgwin32_open returning EINVAL

Magnus Hagander wrote:

Alvaro Herrera wrote:

Magnus Hagander wrote:

On Thu, Dec 13, 2007 at 09:55:33AM -0300, Alvaro Herrera wrote:

Many of these are nonsensical -- we know this is not a device, nor
network access. Still there is more than one possibility, and I don't
know which ones should be really acceptable in this context or not.
(What's ERROR_FAIL_I24??) SHARING_VIOLATION seems the most likely
problem; an antivirus perhaps?

If you have an antivirus running on the system, you really should get rid
of taht long before you start looking at the code...

FWIW I noticed by accident that the latest stable version of a
not-competing database system has fixed a related bug:

http://bugs.mysql.com/bug.php?id=9709
(yes, it only took them two and a half years to fix it).

Note that their behavior on finding SHARING_VIOLATION or LOCK_VIOLATION
is to retry forever until the error goes away, on the theory that the
antivirus/backup software will soon release the file.

Interesting. Maybe forever is going a bit too far, but retrying for <n>
seconds or so.

So assuming we'd want to do that, how do we do it. If it's just the
"open" operation that needs it, we can probably just stick the retry in
the port file for that. But do we need to be able to retry on say
read/write as well?

Let's start with open and see what happens - we're surely going to need
it there anyway. We already have a 30 second retry loop in unlink.

cheers

andrew

#31Gregory Stark
stark@enterprisedb.com
In reply to: Andrew Dunstan (#30)
Re: pgwin32_open returning EINVAL

"Andrew Dunstan" <andrew@dunslane.net> writes:

Magnus Hagander wrote:

Alvaro Herrera wrote:

Magnus Hagander wrote:

Note that their behavior on finding SHARING_VIOLATION or LOCK_VIOLATION
is to retry forever until the error goes away, on the theory that the
antivirus/backup software will soon release the file.

Interesting. Maybe forever is going a bit too far, but retrying for <n>
seconds or so.

I think looping forever is the right thing. Having a fixed timeout just means
Postgres will break sometimes instead of all the time. And it introduces
non-deterministic behaviour too.

You could print a warning after 30s but then I think you have to keep trying
forever. Just like database operations hang forever waiting for a lock.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#31)
Re: pgwin32_open returning EINVAL

Gregory Stark <stark@enterprisedb.com> writes:

"Andrew Dunstan" <andrew@dunslane.net> writes:

Interesting. Maybe forever is going a bit too far, but retrying for <n>
seconds or so.

I think looping forever is the right thing. Having a fixed timeout just means
Postgres will break sometimes instead of all the time. And it introduces
non-deterministic behaviour too.

Looping forever would be considered broken by a very large fraction of
the community.

IIRC we have a 30-second timeout in rename() for Windows, and that seems
to be working well enough, so I'd be inclined to copy the behavior for
this case.

regards, tom lane

#33Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#32)
Re: pgwin32_open returning EINVAL

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Gregory Stark <stark@enterprisedb.com> writes:

"Andrew Dunstan" <andrew@dunslane.net> writes:

Interesting. Maybe forever is going a bit too far, but retrying for <n>
seconds or so.

I think looping forever is the right thing. Having a fixed timeout just means
Postgres will break sometimes instead of all the time. And it introduces
non-deterministic behaviour too.

Looping forever would be considered broken by a very large fraction of
the community.

Really? I understood we're talking about having Postgres fail with an error if
any of its files are opened by another program such as backup software. So
with a 30s limit it means Postgres might or might not fail depending on how
long this other software has the file open. That doesn't seem like an
improvement.

IIRC we have a 30-second timeout in rename() for Windows, and that seems
to be working well enough, so I'd be inclined to copy the behavior for
this case.

I thought it was unlink, and the worst-case there is that we leak a file until
some later time. I'm wasn't exactly following that case though.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

#34Magnus Hagander
magnus@hagander.net
In reply to: Gregory Stark (#33)
Re: pgwin32_open returning EINVAL

On Sun, Dec 16, 2007 at 10:31:59AM +0000, Gregory Stark wrote:

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Gregory Stark <stark@enterprisedb.com> writes:

"Andrew Dunstan" <andrew@dunslane.net> writes:

Interesting. Maybe forever is going a bit too far, but retrying for <n>
seconds or so.

I think looping forever is the right thing. Having a fixed timeout just means
Postgres will break sometimes instead of all the time. And it introduces
non-deterministic behaviour too.

Looping forever would be considered broken by a very large fraction of
the community.

Really? I understood we're talking about having Postgres fail with an error if
any of its files are opened by another program such as backup software. So
with a 30s limit it means Postgres might or might not fail depending on how
long this other software has the file open. That doesn't seem like an
improvement.

If your software is locking a file for that long, that software is more
than just broken, it's horribly broken. Having a workaround against
something that might happen once or twice because of a bug in the other
software is one thing, but if it's actually *designed* to do that you
really need to get that software removed from your machine.

Having the server completely stop for 30 seconds waiting for something to
happen is bad enough, I think. Better to error out to let the user know
that there's a major configuration problem on the machine.

IIRC we have a 30-second timeout in rename() for Windows, and that seems
to be working well enough, so I'd be inclined to copy the behavior for
this case.

I thought it was unlink, and the worst-case there is that we leak a file until
some later time. I'm wasn't exactly following that case though.

We do it on both rename and unlink.

//Magnus

#35Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#32)
1 attachment(s)
Re: pgwin32_open returning EINVAL

On Sat, Dec 15, 2007 at 10:31:38PM -0500, Tom Lane wrote:

Gregory Stark <stark@enterprisedb.com> writes:

"Andrew Dunstan" <andrew@dunslane.net> writes:

Interesting. Maybe forever is going a bit too far, but retrying for <n>
seconds or so.

I think looping forever is the right thing. Having a fixed timeout just means
Postgres will break sometimes instead of all the time. And it introduces
non-deterministic behaviour too.

Looping forever would be considered broken by a very large fraction of
the community.

IIRC we have a 30-second timeout in rename() for Windows, and that seems
to be working well enough, so I'd be inclined to copy the behavior for
this case.

Here's a patch that I think implements this ;) Alvaro - do you have a build
env so you can test it? I can't reproduce the problem in my environment...

Also, it currently just silently loops. Would it be interesting to
ereport(WARNING) that it's looping on the open, to let the user know
there's a problem? (Naturally, only warning the first time it tries it on
each file, so we don't spam the log too hard)

//Magnus

Attachments:

win32open.patchtext/plain; charset=us-asciiDownload
Index: src/port/open.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/port/open.c,v
retrieving revision 1.22
diff -c -r1.22 open.c
*** src/port/open.c	30 Nov 2007 11:16:43 -0000	1.22
--- src/port/open.c	19 Dec 2007 12:42:05 -0000
***************
*** 58,65 ****
  pgwin32_open(const char *fileName, int fileFlags,...)
  {
  	int			fd;
! 	HANDLE		h;
  	SECURITY_ATTRIBUTES sa;

  	/* Check that we can handle the request */
  	assert((fileFlags & ((O_RDONLY | O_WRONLY | O_RDWR) | O_APPEND |
--- 58,66 ----
  pgwin32_open(const char *fileName, int fileFlags,...)
  {
  	int			fd;
! 	HANDLE		h = INVALID_HANDLE_VALUE;
  	SECURITY_ATTRIBUTES sa;
+ 	int			loops = 0;

  	/* Check that we can handle the request */
  	assert((fileFlags & ((O_RDONLY | O_WRONLY | O_RDWR) | O_APPEND |
***************
*** 71,77 ****
  	sa.bInheritHandle = TRUE;
  	sa.lpSecurityDescriptor = NULL;

! 	if ((h = CreateFile(fileName,
  	/* cannot use O_RDONLY, as it == 0 */
  					  (fileFlags & O_RDWR) ? (GENERIC_WRITE | GENERIC_READ) :
  					 ((fileFlags & O_WRONLY) ? GENERIC_WRITE : GENERIC_READ),
--- 72,78 ----
  	sa.bInheritHandle = TRUE;
  	sa.lpSecurityDescriptor = NULL;

! 	while ((h = CreateFile(fileName,
  	/* cannot use O_RDONLY, as it == 0 */
  					  (fileFlags & O_RDWR) ? (GENERIC_WRITE | GENERIC_READ) :
  					 ((fileFlags & O_WRONLY) ? GENERIC_WRITE : GENERIC_READ),
***************
*** 88,93 ****
--- 89,108 ----
  						((fileFlags & O_DSYNC) ? FILE_FLAG_WRITE_THROUGH : 0),
  						NULL)) == INVALID_HANDLE_VALUE)
  	{
+ 		/*
+ 		 * Sharing violation or locking error can indicate antivirus, backup
+ 		 * or similar software that's locking the file. Try again for 30 seconds
+ 		 * before giving up.
+ 		 */
+ 		if (GetLastError() == ERROR_SHARING_VIOLATION ||
+ 			GetLastError() == ERROR_LOCK_VIOLATION)
+ 		{
+ 			pg_usleep(100000);
+ 			loops++;
+ 			if (loops < 30)
+ 				continue;
+ 		}
+
  		_dosmaperr(GetLastError());
  		return -1;
  	}
#36Gregory Stark
stark@enterprisedb.com
In reply to: Magnus Hagander (#34)
Re: pgwin32_open returning EINVAL

"Magnus Hagander" <magnus@hagander.net> writes:

If your software is locking a file for that long, that software is more
than just broken, it's horribly broken. Having a workaround against
something that might happen once or twice because of a bug in the other
software is one thing, but if it's actually *designed* to do that you
really need to get that software removed from your machine.

I was under the impression though that this was just how open worked on
windows. Only one process can have a file open at the same time.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

#37Magnus Hagander
magnus@hagander.net
In reply to: Gregory Stark (#36)
Re: pgwin32_open returning EINVAL

On Wed, Dec 19, 2007 at 01:05:00PM +0000, Gregory Stark wrote:

"Magnus Hagander" <magnus@hagander.net> writes:

If your software is locking a file for that long, that software is more
than just broken, it's horribly broken. Having a workaround against
something that might happen once or twice because of a bug in the other
software is one thing, but if it's actually *designed* to do that you
really need to get that software removed from your machine.

I was under the impression though that this was just how open worked on
windows. Only one process can have a file open at the same time.

Then you're wrong. You can open files in shared mode no problem - it's how
we have multiple backensd opening the same file. You do have to remember to
specify the flag when you open the file - something backup software for
example has always done, unless it's really crappy design.

Most proper backup software will use VSS these days, which makes the whole
point moot.

And antivirus is supposed to be entirely transparent, so it's definitely a
bug when it happens. But we've seen a lot of buggy antivirus.

//Magnus

#38Alvaro Herrera
alvherre@commandprompt.com
In reply to: Magnus Hagander (#35)
Re: pgwin32_open returning EINVAL

Magnus Hagander wrote:

On Sat, Dec 15, 2007 at 10:31:38PM -0500, Tom Lane wrote:

Gregory Stark <stark@enterprisedb.com> writes:

"Andrew Dunstan" <andrew@dunslane.net> writes:

Interesting. Maybe forever is going a bit too far, but retrying for <n>
seconds or so.

I think looping forever is the right thing. Having a fixed timeout just means
Postgres will break sometimes instead of all the time. And it introduces
non-deterministic behaviour too.

Looping forever would be considered broken by a very large fraction of
the community.

IIRC we have a 30-second timeout in rename() for Windows, and that seems
to be working well enough, so I'd be inclined to copy the behavior for
this case.

Here's a patch that I think implements this ;) Alvaro - do you have a build
env so you can test it? I can't reproduce the problem in my environment...

Thanks -- forwarded to the appropriate parties. :-)

Also, it currently just silently loops. Would it be interesting to
ereport(WARNING) that it's looping on the open, to let the user know
there's a problem? (Naturally, only warning the first time it tries it on
each file, so we don't spam the log too hard)

Yeah, I think it would be useful to log one message if after (say) 5
seconds we still haven't been able to open the file.

Is the sleep time correct? If I'm reading it right, it sleeps 100 ms
each time, 30 times, that totals 3 seconds ... ?

(Are we OK with the idea of sleeping 1 second each time?)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#39Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#38)
Re: pgwin32_open returning EINVAL

On Wed, Dec 19, 2007 at 11:23:35AM -0300, Alvaro Herrera wrote:

Magnus Hagander wrote:

On Sat, Dec 15, 2007 at 10:31:38PM -0500, Tom Lane wrote:

Gregory Stark <stark@enterprisedb.com> writes:

"Andrew Dunstan" <andrew@dunslane.net> writes:

Interesting. Maybe forever is going a bit too far, but retrying for <n>
seconds or so.

I think looping forever is the right thing. Having a fixed timeout just means
Postgres will break sometimes instead of all the time. And it introduces
non-deterministic behaviour too.

Looping forever would be considered broken by a very large fraction of
the community.

IIRC we have a 30-second timeout in rename() for Windows, and that seems
to be working well enough, so I'd be inclined to copy the behavior for
this case.

Here's a patch that I think implements this ;) Alvaro - do you have a build
env so you can test it? I can't reproduce the problem in my environment...

Thanks -- forwarded to the appropriate parties. :-)

Thanks. Let us know the results :-)

Also, it currently just silently loops. Would it be interesting to
ereport(WARNING) that it's looping on the open, to let the user know
there's a problem? (Naturally, only warning the first time it tries it on
each file, so we don't spam the log too hard)

Yeah, I think it would be useful to log one message if after (say) 5
seconds we still haven't been able to open the file.

Either that, or on the first run.

Is the sleep time correct? If I'm reading it right, it sleeps 100 ms
each time, 30 times, that totals 3 seconds ... ?

Uh, I copied that from pgunlink() and pgrename(), but forgot a zero on the
loop. It's supposed to loop 300 times.

(Are we OK with the idea of sleeping 1 second each time?)

I think not. 0.1 seconds is better. We don't want to delay a full second if
it's just a transient thing.

//Magnus

#40Zeugswetter Andreas ADI SD
Andreas.Zeugswetter@s-itsolutions.at
In reply to: Magnus Hagander (#39)
Re: pgwin32_open returning EINVAL

Yeah, I think it would be useful to log one message if after (say) 5
seconds we still haven't been able to open the file.

Either that, or on the first run.

Imho 1-5s is better, so that would be after the 10-50th try.

loop. It's supposed to loop 300 times.

Yes.

(Are we OK with the idea of sleeping 1 second each time?)

I think not. 0.1 seconds is better. We don't want to delay a full

second if

it's just a transient thing.

Yes 0.1 s is imho good. Btw. m$ is talking about milliseconds
(http://support.microsoft.com/kb/316609)

Andreas

#41Alvaro Herrera
alvherre@commandprompt.com
In reply to: Zeugswetter Andreas ADI SD (#40)
Re: pgwin32_open returning EINVAL

Zeugswetter Andreas ADI SD wrote:

I think not. 0.1 seconds is better. We don't want to delay a full
second if it's just a transient thing.

Yes 0.1 s is imho good. Btw. m$ is talking about milliseconds
(http://support.microsoft.com/kb/316609)

Hm, the article only mentions ERROR_SHARING_VIOLATION but we're also
considering ERROR_LOCKING_VIOLATION. Should we retry only on the
SHARING violation?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#42Andrew Dunstan
andrew@dunslane.net
In reply to: Zeugswetter Andreas ADI SD (#40)
Re: pgwin32_open returning EINVAL

Zeugswetter Andreas ADI SD wrote:

Yes 0.1 s is imho good. Btw. m$ is talking about milliseconds
(http://support.microsoft.com/kb/316609)

We have seen cases in the past where these locks last quite a long time.
That 30s total timeout in rename and unlink was not chosen arbitrarily -
it's based on experience.

I think a retry interval of 0.1s here should be OK.

cheers

andrew

#43Magnus Hagander
magnus@hagander.net
In reply to: Zeugswetter Andreas ADI SD (#40)
Re: pgwin32_open returning EINVAL

On Wed, Dec 19, 2007 at 04:20:23PM +0100, Zeugswetter Andreas ADI SD wrote:

Yeah, I think it would be useful to log one message if after (say) 5
seconds we still haven't been able to open the file.

Either that, or on the first run.

Imho 1-5s is better, so that would be after the 10-50th try.

Ok, so I'll put in a warning after 50 tries.

loop. It's supposed to loop 300 times.

Yes.

(Are we OK with the idea of sleeping 1 second each time?)

I think not. 0.1 seconds is better. We don't want to delay a full

second if

it's just a transient thing.

Yes 0.1 s is imho good. Btw. m$ is talking about milliseconds
(http://support.microsoft.com/kb/316609)

Their sample code sleeps for 0.25 seconds though. (Sleep() takes
milliseconds). So we're definitely fine with 0.1sec I think.

//Magnus

#44Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#43)
1 attachment(s)
Re: pgwin32_open returning EINVAL

On Wed, Dec 19, 2007 at 04:44:12PM +0100, Magnus Hagander wrote:

On Wed, Dec 19, 2007 at 04:20:23PM +0100, Zeugswetter Andreas ADI SD wrote:

Yeah, I think it would be useful to log one message if after (say) 5
seconds we still haven't been able to open the file.

Either that, or on the first run.

Imho 1-5s is better, so that would be after the 10-50th try.

Ok, so I'll put in a warning after 50 tries.

Updated version attached. Comments on the wording of the messages are also
welcome ;-)

//Magnus

Attachments:

win32open.patchtext/plain; charset=us-asciiDownload
Index: src/port/open.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/port/open.c,v
retrieving revision 1.22
diff -c -r1.22 open.c
*** src/port/open.c	30 Nov 2007 11:16:43 -0000	1.22
--- src/port/open.c	19 Dec 2007 15:50:06 -0000
***************
*** 13,19 ****
  
  #ifdef WIN32
  
! #include "c.h"
  
  #include <windows.h>
  #include <fcntl.h>
--- 13,23 ----
  
  #ifdef WIN32
  
! #ifndef FRONTEND
! #include "postgres.h"
! #else
! #include "postgres_fe.h"
! #endif
  
  #include <windows.h>
  #include <fcntl.h>
***************
*** 58,65 ****
  pgwin32_open(const char *fileName, int fileFlags,...)
  {
  	int			fd;
! 	HANDLE		h;
  	SECURITY_ATTRIBUTES sa;
  
  	/* Check that we can handle the request */
  	assert((fileFlags & ((O_RDONLY | O_WRONLY | O_RDWR) | O_APPEND |
--- 62,70 ----
  pgwin32_open(const char *fileName, int fileFlags,...)
  {
  	int			fd;
! 	HANDLE		h = INVALID_HANDLE_VALUE;
  	SECURITY_ATTRIBUTES sa;
+ 	int			loops = 0;
  
  	/* Check that we can handle the request */
  	assert((fileFlags & ((O_RDONLY | O_WRONLY | O_RDWR) | O_APPEND |
***************
*** 71,77 ****
  	sa.bInheritHandle = TRUE;
  	sa.lpSecurityDescriptor = NULL;
  
! 	if ((h = CreateFile(fileName,
  	/* cannot use O_RDONLY, as it == 0 */
  					  (fileFlags & O_RDWR) ? (GENERIC_WRITE | GENERIC_READ) :
  					 ((fileFlags & O_WRONLY) ? GENERIC_WRITE : GENERIC_READ),
--- 76,82 ----
  	sa.bInheritHandle = TRUE;
  	sa.lpSecurityDescriptor = NULL;
  
! 	while ((h = CreateFile(fileName,
  	/* cannot use O_RDONLY, as it == 0 */
  					  (fileFlags & O_RDWR) ? (GENERIC_WRITE | GENERIC_READ) :
  					 ((fileFlags & O_WRONLY) ? GENERIC_WRITE : GENERIC_READ),
***************
*** 88,93 ****
--- 93,121 ----
  						((fileFlags & O_DSYNC) ? FILE_FLAG_WRITE_THROUGH : 0),
  						NULL)) == INVALID_HANDLE_VALUE)
  	{
+ 		/*
+ 		 * Sharing violation or locking error can indicate antivirus, backup
+ 		 * or similar software that's locking the file. Try again for 30 seconds
+ 		 * before giving up.
+ 		 */
+ 		if (GetLastError() == ERROR_SHARING_VIOLATION || 
+ 			GetLastError() == ERROR_LOCK_VIOLATION)
+ 		{
+ 			pg_usleep(100000);
+ 			loops++;
+
+ #ifndef FRONTEND
+ 			if (loops == 50)
+ 				ereport(WARNING,
+ 					(errmsg("could not open file \"%s\" due to sharing violation. Will retry for 30 seconds.",
+ 						fileName,
+ 					 errhint("You may have antivirus, backup or similar software interfering with the database."))));
+ #endif
+
+ 			if (loops < 300)
+ 				continue;
+ 		}
+
  		_dosmaperr(GetLastError());
  		return -1;
  	}
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#44)
Re: pgwin32_open returning EINVAL

Magnus Hagander <magnus@hagander.net> writes:

Updated version attached. Comments on the wording of the messages are also
welcome ;-)

Well, since you asked ;-) ... I don't particularly like this.

1. Doesn't distinguish SHARING_VIOLATION from LOCK_VIOLATION.
We might want to know that.

2. Do we really want this to be WARNING? LOG seems a better idea,
since it's not warning about anything the client app did wrong.

3. Doesn't follow the message style guide: shoehorning two sentences
into errmsg() is surely not compliant. Perhaps

errmsg: could not open file \"%s\": sharing violation
OR errmsg: could not open file \"%s\": lock violation

errdetail: Continuing to retry for up to 30 seconds.

errhint: OK as you have it.

regards, tom lane

#46Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#45)
Re: pgwin32_open returning EINVAL

On Wed, Dec 19, 2007 at 07:50:29PM -0500, Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Updated version attached. Comments on the wording of the messages are also
welcome ;-)

Well, since you asked ;-) ... I don't particularly like this.

Well, it's better to have you say that before than to have to redo it
later, of course. Especially when we're this close to translation deadline,
and it contains new strings to translate :-P

1. Doesn't distinguish SHARING_VIOLATION from LOCK_VIOLATION.
We might want to know that.

Yeah, I had that fixed locally already.

2. Do we really want this to be WARNING? LOG seems a better idea,
since it's not warning about anything the client app did wrong.

I put it as warning because I wanted to be sure the admin notices. If your
database is hanging 5+ seconds to open a file, you have a *big* problem,
and you need to fix it. Just putting it as LOG will probably make it much
more likely it's missed.

Is there a guideline somewhere that says what shuld be WARNING and what
should be LOG in general? If not, maybe there shuold be? ;-)

3. Doesn't follow the message style guide: shoehorning two sentences
into errmsg() is surely not compliant. Perhaps

errmsg: could not open file \"%s\": sharing violation
OR errmsg: could not open file \"%s\": lock violation

errdetail: Continuing to retry for up to 30 seconds.

errhint: OK as you have it.

Yeah, Alvaro pointed parts of that out on IM already. Updated before
application, pending discussion about log level above.

ereport(WARNING,
(errmsg("could not open file \"%s\": %s violation", fileName,
(GetLastError() ==
ERROR_SHARING_VIOLATION)?_("sharing"):_("lock")),
errdetail("Continuing to retry for 30 seconds."),
errhint("You may have antivirus, backup or similar software
interfering with the database.")));

Is _("") the proper way to get a translatable string in like that, or is
there a better way?

//Magnus

#47Martijn van Oosterhout
kleptog@svana.org
In reply to: Magnus Hagander (#46)
Re: pgwin32_open returning EINVAL

On Thu, Dec 20, 2007 at 09:32:38AM +0100, Magnus Hagander wrote:

(errmsg("could not open file \"%s\": %s violation", fileName,
(GetLastError() ==
ERROR_SHARING_VIOLATION)?_("sharing"):_("lock")),

Is _("") the proper way to get a translatable string in like that, or is
there a better way?

Hmm, from a translation point of view wouldn't it be better to have the
word "violation" inside each of the _(), since not all languages might
allow such substitions (gender problems).

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy

#48Trevor Talbot
quension@gmail.com
In reply to: Magnus Hagander (#46)
Re: pgwin32_open returning EINVAL

On 12/20/07, Magnus Hagander <magnus@hagander.net> wrote:

ereport(WARNING,
(errmsg("could not open file \"%s\": %s violation", fileName,
(GetLastError() ==
ERROR_SHARING_VIOLATION)?_("sharing"):_("lock")),
errdetail("Continuing to retry for 30 seconds."),
errhint("You may have antivirus, backup or similar software
interfering with the database.")));

Without looking myself, is it possible for errhint() or errdetail() to
do something that affects GetLastError()? It's like errno, checking it
very far away from the call site makes me nervous.

#49Magnus Hagander
magnus@hagander.net
In reply to: Trevor Talbot (#48)
Re: pgwin32_open returning EINVAL

On Thu, Dec 20, 2007 at 04:39:55AM -0800, Trevor Talbot wrote:

On 12/20/07, Magnus Hagander <magnus@hagander.net> wrote:

ereport(WARNING,
(errmsg("could not open file \"%s\": %s violation", fileName,
(GetLastError() ==
ERROR_SHARING_VIOLATION)?_("sharing"):_("lock")),
errdetail("Continuing to retry for 30 seconds."),
errhint("You may have antivirus, backup or similar software
interfering with the database.")));

Without looking myself, is it possible for errhint() or errdetail() to
do something that affects GetLastError()? It's like errno, checking it
very far away from the call site makes me nervous.

I guess, but it shouldn't matter. We're giong to loop right back up and do a
new CreateFile() after this, which will overwrite it again.

//Magnus

#50Trevor Talbot
quension@gmail.com
In reply to: Magnus Hagander (#49)
Re: pgwin32_open returning EINVAL

On 12/20/07, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Dec 20, 2007 at 04:39:55AM -0800, Trevor Talbot wrote:

On 12/20/07, Magnus Hagander <magnus@hagander.net> wrote:

ereport(WARNING,
(errmsg("could not open file \"%s\": %s violation", fileName,
(GetLastError() ==
ERROR_SHARING_VIOLATION)?_("sharing"):_("lock")),
errdetail("Continuing to retry for 30 seconds."),
errhint("You may have antivirus, backup or similar software
interfering with the database.")));

Without looking myself, is it possible for errhint() or errdetail() to
do something that affects GetLastError()? It's like errno, checking it
very far away from the call site makes me nervous.

I guess, but it shouldn't matter. We're giong to loop right back up and do a
new CreateFile() after this, which will overwrite it again.

I mean for the purposes of the report. I'm worried the message might
say it's a lock violation when it's really a sharing violation.

#51Alvaro Herrera
alvherre@commandprompt.com
In reply to: Trevor Talbot (#48)
Re: pgwin32_open returning EINVAL

Trevor Talbot escribi�:

Without looking myself, is it possible for errhint() or errdetail() to
do something that affects GetLastError()? It's like errno, checking it
very far away from the call site makes me nervous.

Yeah, they call gettext(), palloc/pfree/pstrdup and various StringInfo
routines.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#52Magnus Hagander
magnus@hagander.net
In reply to: Trevor Talbot (#50)
Re: pgwin32_open returning EINVAL

On Thu, Dec 20, 2007 at 04:47:33AM -0800, Trevor Talbot wrote:

On 12/20/07, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Dec 20, 2007 at 04:39:55AM -0800, Trevor Talbot wrote:

On 12/20/07, Magnus Hagander <magnus@hagander.net> wrote:

ereport(WARNING,
(errmsg("could not open file \"%s\": %s violation", fileName,
(GetLastError() ==
ERROR_SHARING_VIOLATION)?_("sharing"):_("lock")),
errdetail("Continuing to retry for 30 seconds."),
errhint("You may have antivirus, backup or similar software
interfering with the database.")));

Without looking myself, is it possible for errhint() or errdetail() to
do something that affects GetLastError()? It's like errno, checking it
very far away from the call site makes me nervous.

I guess, but it shouldn't matter. We're giong to loop right back up and do a
new CreateFile() after this, which will overwrite it again.

I mean for the purposes of the report. I'm worried the message might
say it's a lock violation when it's really a sharing violation.

Wouldn't we then have the same problem for every place that does a %i and
report GetLastError() or errno? And we have a *lot* of those... I would've
thought the framework thinks of that, but I haven't actually verified that.

If it's a problem, it's trivial to save the value off in a variable, of
course.

//Magnus

#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Trevor Talbot (#48)
Re: pgwin32_open returning EINVAL

"Trevor Talbot" <quension@gmail.com> writes:

Without looking myself, is it possible for errhint() or errdetail() to
do something that affects GetLastError()?

Hmm ... the macro framework is designed so that the arguments get
evaluated before anything very interesting happens, but it might be
better to use a variable anyway --- for onm thing we could get rid of
the redundant GetLastError calls in the test in front of this.

BTW, I concur with Martijn's comment to avoid constructing phrases from
spare parts; it creates a translation problem, and in this case you'd
only be saving a few bytes anyway.

regards, tom lane

#54Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#53)
Re: pgwin32_open returning EINVAL

On Thu, Dec 20, 2007 at 10:11:10AM -0500, Tom Lane wrote:

"Trevor Talbot" <quension@gmail.com> writes:

Without looking myself, is it possible for errhint() or errdetail() to
do something that affects GetLastError()?

Hmm ... the macro framework is designed so that the arguments get
evaluated before anything very interesting happens, but it might be
better to use a variable anyway --- for onm thing we could get rid of
the redundant GetLastError calls in the test in front of this.

I'd expect the compiler to optimize away those, but I'll make it a var
anyawy.

BTW, I concur with Martijn's comment to avoid constructing phrases from
spare parts; it creates a translation problem, and in this case you'd
only be saving a few bytes anyway.

Ok. Are you saying whole sentence, or are you saying it's fine to just
break out the part that's after the colon into "sharing violation" and
"lock violation"? I assume the second?

//Magnus

#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#46)
Re: pgwin32_open returning EINVAL

Magnus Hagander <magnus@hagander.net> writes:

On Wed, Dec 19, 2007 at 07:50:29PM -0500, Tom Lane wrote:

2. Do we really want this to be WARNING? LOG seems a better idea,
since it's not warning about anything the client app did wrong.

I put it as warning because I wanted to be sure the admin notices. If your
database is hanging 5+ seconds to open a file, you have a *big* problem,
and you need to fix it. Just putting it as LOG will probably make it much
more likely it's missed.

This reasoning is faulty. For logging purposes, LOG is *more* severe
(higher priority) than WARNING. I think it's fairly common to set
log_min_messages = ERROR, which would mean that warnings disappear.
On the client side, unless you're issuing queries by hand with psql,
it's entirely likely that all non-error messages go into the bit bucket.
You can't count on anyone ever noticing them in a production app.

Use LOG. That's what it's there for. (If you want a more formal
definition, I'd say it's for messages that a DBA would be interested in
but are not directly relevant to a client app.)

regards, tom lane

#56Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#55)
Re: pgwin32_open returning EINVAL

On Thu, Dec 20, 2007 at 10:26:46AM -0500, Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Wed, Dec 19, 2007 at 07:50:29PM -0500, Tom Lane wrote:

2. Do we really want this to be WARNING? LOG seems a better idea,
since it's not warning about anything the client app did wrong.

I put it as warning because I wanted to be sure the admin notices. If your
database is hanging 5+ seconds to open a file, you have a *big* problem,
and you need to fix it. Just putting it as LOG will probably make it much
more likely it's missed.

This reasoning is faulty. For logging purposes, LOG is *more* severe
(higher priority) than WARNING. I think it's fairly common to set
log_min_messages = ERROR, which would mean that warnings disappear.
On the client side, unless you're issuing queries by hand with psql,
it's entirely likely that all non-error messages go into the bit bucket.
You can't count on anyone ever noticing them in a production app.

Use LOG. That's what it's there for. (If you want a more formal
definition, I'd say it's for messages that a DBA would be interested in
but are not directly relevant to a client app.)

Ah, wasn't aware of that at all. Then LOG certainly makes a lot more sense,
yes. Thanks for clearifying.

//Magnus

#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#54)
Re: pgwin32_open returning EINVAL

Magnus Hagander <magnus@hagander.net> writes:

On Thu, Dec 20, 2007 at 10:11:10AM -0500, Tom Lane wrote:

BTW, I concur with Martijn's comment to avoid constructing phrases from
spare parts; it creates a translation problem, and in this case you'd
only be saving a few bytes anyway.

Ok. Are you saying whole sentence, or are you saying it's fine to just
break out the part that's after the colon into "sharing violation" and
"lock violation"? I assume the second?

Right, that's one reason for using the colon: to help divide the message
into phrases that can be translated separately without too much
grammatical mayhem. _("sharing violation") seems OK to me.

regards, tom lane

#58Trevor Talbot
quension@gmail.com
In reply to: Magnus Hagander (#54)
Re: pgwin32_open returning EINVAL

On 12/20/07, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Dec 20, 2007 at 04:39:55AM -0800, Trevor Talbot wrote:

On 12/20/07, Magnus Hagander <magnus@hagander.net> wrote:

ereport(WARNING,
(errmsg("could not open file \"%s\": %s violation", fileName,
(GetLastError() ==
ERROR_SHARING_VIOLATION)?_("sharing"):_("lock")),
errdetail("Continuing to retry for 30 seconds."),
errhint("You may have antivirus, backup or similar software
interfering with the database.")));

Without looking myself, is it possible for errhint() or errdetail() to
do something that affects GetLastError()? It's like errno, checking it
very far away from the call site makes me nervous.

Wouldn't we then have the same problem for every place that does a %i and
report GetLastError() or errno? And we have a *lot* of those... I would've
thought the framework thinks of that, but I haven't actually verified that.

A function's arguments are evaluated before the call, so that's safe
in general. What is implementation-specific is the order of evaluation
of different arguments, and I don't know if the parentheses above
override that.

On 12/20/07, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Dec 20, 2007 at 10:11:10AM -0500, Tom Lane wrote:

Hmm ... the macro framework is designed so that the arguments get
evaluated before anything very interesting happens, but it might be
better to use a variable anyway --- for onm thing we could get rid of
the redundant GetLastError calls in the test in front of this.

Sounds like it was already thought of then.

I'd expect the compiler to optimize away those, but I'll make it a var
anyawy.

It can't; it's an opaque callout to kernel32.dll, and there's nothing
that tells the optimizer when you can expect to get the same result.
That said, it's cheaper than it looks, since the error code is stored
at a fixed location in thread-specific VM space. I guess it wasn't
interesting enough to make an MSVC intrinsic instead of keeping it a
system detail.

#59Magnus Hagander
magnus@hagander.net
In reply to: Trevor Talbot (#58)
Re: pgwin32_open returning EINVAL

On Thu, Dec 20, 2007 at 07:57:23AM -0800, Trevor Talbot wrote:

I'd expect the compiler to optimize away those, but I'll make it a var
anyawy.

It can't; it's an opaque callout to kernel32.dll, and there's nothing
that tells the optimizer when you can expect to get the same result.
That said, it's cheaper than it looks, since the error code is stored
at a fixed location in thread-specific VM space. I guess it wasn't
interesting enough to make an MSVC intrinsic instead of keeping it a
system detail.

Interesting. I was certainly it was an intrinsic, but it seems not...
Variable, it is..

//Magnus

#60Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#56)
Re: pgwin32_open returning EINVAL

Magnus Hagander wrote:

On Thu, Dec 20, 2007 at 10:26:46AM -0500, Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Wed, Dec 19, 2007 at 07:50:29PM -0500, Tom Lane wrote:

2. Do we really want this to be WARNING? LOG seems a better idea,
since it's not warning about anything the client app did wrong.

I put it as warning because I wanted to be sure the admin notices. If your
database is hanging 5+ seconds to open a file, you have a *big* problem,
and you need to fix it. Just putting it as LOG will probably make it much
more likely it's missed.

This reasoning is faulty. For logging purposes, LOG is *more* severe
(higher priority) than WARNING. I think it's fairly common to set
log_min_messages = ERROR, which would mean that warnings disappear.
On the client side, unless you're issuing queries by hand with psql,
it's entirely likely that all non-error messages go into the bit bucket.
You can't count on anyone ever noticing them in a production app.

Use LOG. That's what it's there for. (If you want a more formal
definition, I'd say it's for messages that a DBA would be interested in
but are not directly relevant to a client app.)

Ah, wasn't aware of that at all. Then LOG certainly makes a lot more sense,
yes. Thanks for clearifying.

I've applied a patch for this to head, to get some proper buildfarming
on it. If it passes the tests that Alvaro's contact will be running
(since they had a reasonably repeatable case), I think we should
backpatch it. But not until then...

//Magnus

#61Jaime Casanova
systemguards@gmail.com
In reply to: Magnus Hagander (#60)
Re: pgwin32_open returning EINVAL

On Dec 20, 2007 3:29 PM, Magnus Hagander <magnus@hagander.net> wrote:

Magnus Hagander wrote:

On Thu, Dec 20, 2007 at 10:26:46AM -0500, Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Wed, Dec 19, 2007 at 07:50:29PM -0500, Tom Lane wrote:

2. Do we really want this to be WARNING? LOG seems a better idea,
since it's not warning about anything the client app did wrong.

I put it as warning because I wanted to be sure the admin notices. If your
database is hanging 5+ seconds to open a file, you have a *big* problem,
and you need to fix it. Just putting it as LOG will probably make it much
more likely it's missed.

This reasoning is faulty. For logging purposes, LOG is *more* severe
(higher priority) than WARNING. I think it's fairly common to set
log_min_messages = ERROR, which would mean that warnings disappear.
On the client side, unless you're issuing queries by hand with psql,
it's entirely likely that all non-error messages go into the bit bucket.
You can't count on anyone ever noticing them in a production app.

Use LOG. That's what it's there for. (If you want a more formal
definition, I'd say it's for messages that a DBA would be interested in
but are not directly relevant to a client app.)

Ah, wasn't aware of that at all. Then LOG certainly makes a lot more sense,
yes. Thanks for clearifying.

I've applied a patch for this to head, to get some proper buildfarming
on it. If it passes the tests that Alvaro's contact will be running
(since they had a reasonably repeatable case), I think we should
backpatch it. But not until then...

it doesn't compile on current head on mingw 5.1 and msys 1.0.10; of
course, it doesn't compile on 8.2 neither in order to Alvaro's contact
to test it

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-DFRONTEND -I. -I../../../src/include -I./src/include/port/win32
-DEXEC_BACKEND "-I../../../src/include/port/win32"
-I../../../src/port -c -o pthread-win32.o pthread-win32.c
dllwrap -o libpq.dll --dllname libpq.dll --def libpqdll.def
fe-auth.o fe-connect.o fe-exec.o fe-misc.o fe-print.o fe-lobj.o
fe-protocol2.o fe-protocol3.o pqexpbuffer.o pqsignal.o fe-secure.o
md5.o ip.o wchar.o encnames.o noblock.o pgstrcasecmp.o thread.o
crypt.o inet_aton.o strlcpy.o getaddrinfo.o open.o win32error.o
snprintf.o win32.o libpqrc.o pthread-win32.o -L../../../src/port
-lshfolder -lwsock32 -lws2_32 -lsecur32
open.o:open.c:(.text+0x144): undefined reference to `pg_usleep'
collect2: ld returned 1 exit status
d:\mingw\bin\dllwrap.exe: d:\mingw\bin\gcc exited with status 1
make[3]: *** [libpq.a] Error 1
make[3]: Leaving directory
`/home/General/PG_RELEASES/pgsql-8.3/src/interfaces/libpq'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/home/General/PG_RELEASES/pgsql-8.3/src/interfaces'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/home/General/PG_RELEASES/pgsql-8.3/src'
make: *** [all] Error 2

--
Atentamente,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
Richard Cook

#62Alvaro Herrera
alvherre@commandprompt.com
In reply to: Jaime Casanova (#61)
Re: pgwin32_open returning EINVAL

Jaime Casanova escribi�:

it doesn't compile on current head on mingw 5.1 and msys 1.0.10; of
course, it doesn't compile on 8.2 neither in order to Alvaro's contact
to test it

Yeah, the buildfarm is failing in the same way.

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dawn_bat&amp;dt=2007-12-20%2022:00:01

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#63Alvaro Herrera
alvherre@commandprompt.com
In reply to: Jaime Casanova (#61)
1 attachment(s)
Re: pgwin32_open returning EINVAL

Jaime Casanova escribi�:

it doesn't compile on current head on mingw 5.1 and msys 1.0.10; of
course, it doesn't compile on 8.2 neither in order to Alvaro's contact
to test it

I think this patch should fix it. I think win32.mak needs to be
similarly patched.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachments:

fix-libpq.patchtext/x-diff; charset=us-asciiDownload
Index: src/interfaces/libpq/Makefile
===================================================================
RCS file: /home/alvherre/Code/cvs/pgsql/src/interfaces/libpq/Makefile,v
retrieving revision 1.157
diff -c -p -r1.157 Makefile
*** src/interfaces/libpq/Makefile	30 Nov 2007 16:21:19 -0000	1.157
--- src/interfaces/libpq/Makefile	21 Dec 2007 02:30:00 -0000
*************** backend_src = $(top_srcdir)/src/backend
*** 80,86 ****
  # For port modules, this only happens if configure decides the module
  # is needed (see filter hack in OBJS, above).
  
! crypt.c getaddrinfo.c inet_aton.c noblock.c open.c pgstrcasecmp.c snprintf.c strerror.c strlcpy.c thread.c win32error.c: % : $(top_srcdir)/src/port/%
  	rm -f $@ && $(LN_S) $< .
  
  md5.c ip.c: % : $(backend_src)/libpq/%
--- 80,86 ----
  # For port modules, this only happens if configure decides the module
  # is needed (see filter hack in OBJS, above).
  
! crypt.c getaddrinfo.c inet_aton.c noblock.c open.c pgstrcasecmp.c snprintf.c strerror.c strlcpy.c thread.c win32error.c pgsleep.c: % : $(top_srcdir)/src/port/%
  	rm -f $@ && $(LN_S) $< .
  
  md5.c ip.c: % : $(backend_src)/libpq/%
*************** uninstall: uninstall-lib
*** 169,175 ****
  	rm -f '$(DESTDIR)$(includedir)/libpq-fe.h' '$(DESTDIR)$(includedir_internal)/libpq-int.h' '$(DESTDIR)$(includedir_internal)/pqexpbuffer.h' '$(DESTDIR)$(datadir)/pg_service.conf.sample'
  
  clean distclean: clean-lib
! 	rm -f $(OBJS) pg_config_paths.h crypt.c getaddrinfo.c inet_aton.c noblock.c open.c pgstrcasecmp.c snprintf.c strerror.c strlcpy.c thread.c md5.c ip.c encnames.c wchar.c win32error.c pthread.h exports.list
  # Might be left over from a Win32 client-only build
  	rm -f pg_config_paths.h
  
--- 169,175 ----
  	rm -f '$(DESTDIR)$(includedir)/libpq-fe.h' '$(DESTDIR)$(includedir_internal)/libpq-int.h' '$(DESTDIR)$(includedir_internal)/pqexpbuffer.h' '$(DESTDIR)$(datadir)/pg_service.conf.sample'
  
  clean distclean: clean-lib
! 	rm -f $(OBJS) pg_config_paths.h crypt.c getaddrinfo.c inet_aton.c noblock.c open.c pgstrcasecmp.c snprintf.c strerror.c strlcpy.c thread.c md5.c ip.c encnames.c wchar.c win32error.c pgsleep.c pthread.h exports.list
  # Might be left over from a Win32 client-only build
  	rm -f pg_config_paths.h
  
#64Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#63)
Re: pgwin32_open returning EINVAL

Alvaro Herrera wrote:

Jaime Casanova escribi�:

it doesn't compile on current head on mingw 5.1 and msys 1.0.10; of
course, it doesn't compile on 8.2 neither in order to Alvaro's contact
to test it

I think this patch should fix it. I think win32.mak needs to be
similarly patched.

Don't you also need to add pgsleep.o to $(OBJS) in the win32 stanza?

cheers

andrew

#65Jaime Casanova
systemguards@gmail.com
In reply to: Alvaro Herrera (#63)
Re: pgwin32_open returning EINVAL

On Dec 20, 2007 9:31 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:

Jaime Casanova escribió:

it doesn't compile on current head on mingw 5.1 and msys 1.0.10; of
course, it doesn't compile on 8.2 neither in order to Alvaro's contact
to test it

I think this patch should fix it. I think win32.mak needs to be
similarly patched.

ok. it's solved now. i added the lines in the patch and pgsleep.o in
(circa line 34 in src/interfaces/libpq)

OBJS= fe-auth.o fe-connect.o fe-exec.o fe-misc.o fe-print.o fe-lobj.o \
fe-protocol2.o fe-protocol3.o pqexpbuffer.o pqsignal.o fe-secure.o \
md5.o ip.o wchar.o encnames.o noblock.o pgstrcasecmp.o thread.o \
$(filter crypt.o getaddrinfo.o inet_aton.o open.o snprintf.o
strerror.o strlcpy.o win32error.o pgsleep.o, $(LIBOBJS))

--
regards,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
Richard Cook

#66Magnus Hagander
magnus@hagander.net
In reply to: Jaime Casanova (#65)
Re: pgwin32_open returning EINVAL

On Fri, Dec 21, 2007 at 01:30:07AM -0500, Jaime Casanova wrote:

On Dec 20, 2007 9:31 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:

Jaime Casanova escribi�:

it doesn't compile on current head on mingw 5.1 and msys 1.0.10; of
course, it doesn't compile on 8.2 neither in order to Alvaro's contact
to test it

I think this patch should fix it. I think win32.mak needs to be
similarly patched.

ok. it's solved now. i added the lines in the patch and pgsleep.o in
(circa line 34 in src/interfaces/libpq)

OBJS= fe-auth.o fe-connect.o fe-exec.o fe-misc.o fe-print.o fe-lobj.o \
fe-protocol2.o fe-protocol3.o pqexpbuffer.o pqsignal.o fe-secure.o \
md5.o ip.o wchar.o encnames.o noblock.o pgstrcasecmp.o thread.o \
$(filter crypt.o getaddrinfo.o inet_aton.o open.o snprintf.o
strerror.o strlcpy.o win32error.o pgsleep.o, $(LIBOBJS))

That's the wrong place - it should be added for win32 only.

Proper patch committed. Sorry 'bout that one. *grumble*

//Magnus

#67Alvaro Herrera
alvherre@commandprompt.com
In reply to: Andrew Dunstan (#64)
Re: pgwin32_open returning EINVAL

Andrew Dunstan escribi�:

Alvaro Herrera wrote:

I think this patch should fix it. I think win32.mak needs to be
similarly patched.

Don't you also need to add pgsleep.o to $(OBJS) in the win32 stanza?

Hmm. Wow, that's silly. I introduced a hack in a Replicator's Makefile
to avoid having to duplicate the file list. I'll look into submitting
it into libpq's for 8.4.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support