stats test on Windows is now failing repeatably?

Started by Tom Laneover 19 years ago9 messages
#1Tom Lane
tgl@sss.pgh.pa.us
I just looked over the buildfarm results and was struck by the
observation that the stats regression test, which lately had been
failing once-in-a-while on Windows and never anywhere else, has a
batting average of 0-for-10-or-so over the past 24 hours on the Windows
buildfarm machines.  I still have no idea what the real problem is there
--- but since it suddenly seems to have gotten very repeatable, I trust
someone with a Windows box and a debugger will get after it before the
source code drifts again.

[ urk ... must ... resist ... temptation ... failing ... AUTOVACUUM? ]

regards, tom lane

#2Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#1)
Re: stats test on Windows is now failing repeatably?

Tom Lane wrote:

I just looked over the buildfarm results and was struck by the
observation that the stats regression test, which lately had been
failing once-in-a-while on Windows and never anywhere else, has a
batting average of 0-for-10-or-so over the past 24 hours on the Windows
buildfarm machines.  I still have no idea what the real problem is there
--- but since it suddenly seems to have gotten very repeatable, I trust
someone with a Windows box and a debugger will get after it before the
source code drifts again.

maybe it's worth pointing out that leveret(fedora core5/x86_64/icc)
manages to trigger that too on occassion - so maybe it is not a "windows
only" bug:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-17%2008:30:01
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-10%2000:30:02

Stefan

#3ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Tom Lane (#1)
Re: stats test on Windows is now failing repeatably?

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I just looked over the buildfarm results and was struck by the
observation that the stats regression test, which lately had been
failing once-in-a-while on Windows and never anywhere else, has a
batting average of 0-for-10-or-so over the past 24 hours on the Windows
buildfarm machines.

I tested HEAD on Windows and saw some Windows-specific logs.

LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2
LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13

The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file
specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process
cannot access the file because it is being used by another process."

We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat.
Are there any incompatible behavior in the trick between POSIX and Windows?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: ITAGAKI Takahiro (#3)
Re: stats test on Windows is now failing repeatably?

ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:

I tested HEAD on Windows and saw some Windows-specific logs.

LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2
LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13

The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file
specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process
cannot access the file because it is being used by another process."

The first of those is probably normal operation --- we remove
pg_internal.init whenever it is out-of-date. The second is bad though.

We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat.
Are there any incompatible behavior in the trick between POSIX and Windows?

It looks to me like we have implemented Windows' FILE_SHARE_DELETE flag
for open() calls but not for fopen(). Isn't this a problem? We do use
fopen() for stuff like pgstat.stat.

regards, tom lane

#5Magnus Hagander
mha@sollentuna.net
In reply to: Tom Lane (#4)
Re: stats test on Windows is now failing repeatably?

The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find

the

file specified." and the code 32 means ERROR_SHARING_VIOLATION,

"The

process cannot access the file because it is being used by

another process."

The first of those is probably normal operation --- we remove
pg_internal.init whenever it is out-of-date. The second is bad
though.

We use the tmpfile-and-rename trick on both pg_internal.init and

pgstat.stat.

Are there any incompatible behavior in the trick between POSIX

and Windows?

It looks to me like we have implemented Windows' FILE_SHARE_DELETE
flag for open() calls but not for fopen(). Isn't this a problem?
We do use
fopen() for stuff like pgstat.stat.

That definitely sounds like a problem, there is no reason why the issue
shouldn't occur for fopen(). Do you want to work up a patch for that
based on open(), or do you want me to take a look at it?

//Magnus

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#5)
Re: stats test on Windows is now failing repeatably?

"Magnus Hagander" <mha@sollentuna.net> writes:

It looks to me like we have implemented Windows' FILE_SHARE_DELETE
flag for open() calls but not for fopen(). Isn't this a problem?
We do use fopen() for stuff like pgstat.stat.

That definitely sounds like a problem, there is no reason why the issue
shouldn't occur for fopen(). Do you want to work up a patch for that
based on open(), or do you want me to take a look at it?

It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build environment
I couldn't test the patch. Please take a look at it.

regards, tom lane

#7Magnus Hagander
mha@sollentuna.net
In reply to: Tom Lane (#6)
1 attachment(s)
Re: stats test on Windows is now failing repeatably?

It looks to me like we have implemented Windows'

FILE_SHARE_DELETE

flag for open() calls but not for fopen(). Isn't this a

problem?

We do use fopen() for stuff like pgstat.stat.

That definitely sounds like a problem, there is no reason why the
issue shouldn't occur for fopen(). Do you want to work up a patch

for

that based on open(), or do you want me to take a look at it?

It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build
environment I couldn't test the patch. Please take a look at it.

I think this is what we want. It passes regression tests on my machine.
I never managed to reproduce the original problem on this machine, so
don't know if it solves the problem, but I don't think it makes it worse
:-)

//Magnus

Attachments:

win32_fopen.diffapplication/octet-stream; name=win32_fopen.diffDownload
Index: src/include/port.h
===================================================================
RCS file: /projects/cvsroot/pgsql/src/include/port.h,v
retrieving revision 1.96
diff -c -r1.96 port.h
*** src/include/port.h	18 Aug 2006 15:47:08 -0000	1.96
--- src/include/port.h	30 Aug 2006 17:37:44 -0000
***************
*** 267,275 ****
--- 267,277 ----
  /* open() replacement to allow delete of held files and passing
   * of special options. */
  extern int	pgwin32_open(const char *, int,...);
+ extern FILE *pgwin32_fopen(const char *, const char *);
  
  #ifndef FRONTEND
  #define		open(a,b,c)	pgwin32_open(a,b,c)
+ #define     fopen(a,b) pgwin32_fopen(a,b)
  #endif
  
  #define popen(a,b) _popen(a,b)
Index: src/port/open.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/port/open.c,v
retrieving revision 1.13
diff -c -r1.13 open.c
*** src/port/open.c	25 Jun 2006 00:18:24 -0000	1.13
--- src/port/open.c	30 Aug 2006 17:37:50 -0000
***************
*** 20,25 ****
--- 20,26 ----
  #include <assert.h>
  
  int			pgwin32_open(const char *fileName, int fileFlags,...);
+ FILE       *pgwin32_fopen(const char *fileName, const char *mode);
  
  static int
  openFlagsToCreateFileFlags(int openFlags)
***************
*** 112,115 ****
--- 113,143 ----
  	return fd;
  }
  
+ FILE *
+ pgwin32_fopen(const char *fileName, const char *mode)
+ {
+ 	int openmode = 0;
+ 	int fd;
+ 	
+ 	if (strchr(mode, 'a'))
+ 		openmode |= O_WRONLY | O_APPEND;
+ 	if (strchr(mode, 'r'))
+ 		openmode |= O_RDONLY;
+ 	if (strstr(mode, "r+"))
+ 		openmode |= O_RDWR;
+ 	if (strchr(mode, 'w'))
+ 		openmode |= O_WRONLY | O_CREAT | O_TRUNC;
+ 	if (strstr(mode, "w+"))
+ 		openmode |= O_RDWR | O_CREAT | O_TRUNC;
+ 	if (strchr(mode, 'b'))
+ 		openmode |= O_BINARY;
+ 	if (strchr(mode, 't'))
+ 		openmode |= O_TEXT;
+ 	
+ 	fd = pgwin32_open(fileName, openmode);
+ 	if (fd == -1)
+ 		return NULL;
+ 	return _fdopen(fd, mode);
+ }
+ 
  #endif
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#7)
Re: stats test on Windows is now failing repeatably?

"Magnus Hagander" <mha@sollentuna.net> writes:

It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build
environment I couldn't test the patch. Please take a look at it.

I think this is what we want. It passes regression tests on my machine.
I never managed to reproduce the original problem on this machine, so
don't know if it solves the problem, but I don't think it makes it worse
:-)

Applied, we'll see what happens ...

regards, tom lane

#9ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Magnus Hagander (#7)
Re: stats test on Windows is now failing repeatably?

"Magnus Hagander" <mha@sollentuna.net> wrote:

FILE_SHARE_DELETE

I think this is what we want. It passes regression tests on my machine.
I never managed to reproduce the original problem on this machine, so
don't know if it solves the problem, but I don't think it makes it worse
:-)

It seems to work very well!
I ran the same workload on the HEAD, and I did not see any
pgstat.stat related logs now.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center