stats test on Windows is now failing repeatably?

Started by Tom Laneover 19 years ago9 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us
I just looked over the buildfarm results and was struck by the
observation that the stats regression test, which lately had been
failing once-in-a-while on Windows and never anywhere else, has a
batting average of 0-for-10-or-so over the past 24 hours on the Windows
buildfarm machines.  I still have no idea what the real problem is there
--- but since it suddenly seems to have gotten very repeatable, I trust
someone with a Windows box and a debugger will get after it before the
source code drifts again.

[ urk ... must ... resist ... temptation ... failing ... AUTOVACUUM? ]

regards, tom lane

#2Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#1)
Re: stats test on Windows is now failing repeatably?

Tom Lane wrote:

I just looked over the buildfarm results and was struck by the
observation that the stats regression test, which lately had been
failing once-in-a-while on Windows and never anywhere else, has a
batting average of 0-for-10-or-so over the past 24 hours on the Windows
buildfarm machines.  I still have no idea what the real problem is there
--- but since it suddenly seems to have gotten very repeatable, I trust
someone with a Windows box and a debugger will get after it before the
source code drifts again.

maybe it's worth pointing out that leveret(fedora core5/x86_64/icc)
manages to trigger that too on occassion - so maybe it is not a "windows
only" bug:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-17%2008:30:01
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-10%2000:30:02

Stefan

#3ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Tom Lane (#1)
Re: stats test on Windows is now failing repeatably?

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I just looked over the buildfarm results and was struck by the
observation that the stats regression test, which lately had been
failing once-in-a-while on Windows and never anywhere else, has a
batting average of 0-for-10-or-so over the past 24 hours on the Windows
buildfarm machines.

I tested HEAD on Windows and saw some Windows-specific logs.

LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2
LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13

The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file
specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process
cannot access the file because it is being used by another process."

We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat.
Are there any incompatible behavior in the trick between POSIX and Windows?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: ITAGAKI Takahiro (#3)
Re: stats test on Windows is now failing repeatably?

ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:

I tested HEAD on Windows and saw some Windows-specific logs.

LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2
LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13

The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file
specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process
cannot access the file because it is being used by another process."

The first of those is probably normal operation --- we remove
pg_internal.init whenever it is out-of-date. The second is bad though.

We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat.
Are there any incompatible behavior in the trick between POSIX and Windows?

It looks to me like we have implemented Windows' FILE_SHARE_DELETE flag
for open() calls but not for fopen(). Isn't this a problem? We do use
fopen() for stuff like pgstat.stat.

regards, tom lane

#5Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#4)
Re: stats test on Windows is now failing repeatably?

The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find

the

file specified." and the code 32 means ERROR_SHARING_VIOLATION,

"The

process cannot access the file because it is being used by

another process."

The first of those is probably normal operation --- we remove
pg_internal.init whenever it is out-of-date. The second is bad
though.

We use the tmpfile-and-rename trick on both pg_internal.init and

pgstat.stat.

Are there any incompatible behavior in the trick between POSIX

and Windows?

It looks to me like we have implemented Windows' FILE_SHARE_DELETE
flag for open() calls but not for fopen(). Isn't this a problem?
We do use
fopen() for stuff like pgstat.stat.

That definitely sounds like a problem, there is no reason why the issue
shouldn't occur for fopen(). Do you want to work up a patch for that
based on open(), or do you want me to take a look at it?

//Magnus

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#5)
Re: stats test on Windows is now failing repeatably?

"Magnus Hagander" <mha@sollentuna.net> writes:

It looks to me like we have implemented Windows' FILE_SHARE_DELETE
flag for open() calls but not for fopen(). Isn't this a problem?
We do use fopen() for stuff like pgstat.stat.

That definitely sounds like a problem, there is no reason why the issue
shouldn't occur for fopen(). Do you want to work up a patch for that
based on open(), or do you want me to take a look at it?

It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build environment
I couldn't test the patch. Please take a look at it.

regards, tom lane

#7Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#6)
Re: stats test on Windows is now failing repeatably?

It looks to me like we have implemented Windows'

FILE_SHARE_DELETE

flag for open() calls but not for fopen(). Isn't this a

problem?

We do use fopen() for stuff like pgstat.stat.

That definitely sounds like a problem, there is no reason why the
issue shouldn't occur for fopen(). Do you want to work up a patch

for

that based on open(), or do you want me to take a look at it?

It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build
environment I couldn't test the patch. Please take a look at it.

I think this is what we want. It passes regression tests on my machine.
I never managed to reproduce the original problem on this machine, so
don't know if it solves the problem, but I don't think it makes it worse
:-)

//Magnus

Attachments:

win32_fopen.diffapplication/octet-stream; name=win32_fopen.diffDownload+30-0
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#7)
Re: stats test on Windows is now failing repeatably?

"Magnus Hagander" <mha@sollentuna.net> writes:

It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build
environment I couldn't test the patch. Please take a look at it.

I think this is what we want. It passes regression tests on my machine.
I never managed to reproduce the original problem on this machine, so
don't know if it solves the problem, but I don't think it makes it worse
:-)

Applied, we'll see what happens ...

regards, tom lane

#9ITAGAKI Takahiro
itagaki.takahiro@oss.ntt.co.jp
In reply to: Magnus Hagander (#7)
Re: stats test on Windows is now failing repeatably?

"Magnus Hagander" <mha@sollentuna.net> wrote:

FILE_SHARE_DELETE

I think this is what we want. It passes regression tests on my machine.
I never managed to reproduce the original problem on this machine, so
don't know if it solves the problem, but I don't think it makes it worse
:-)

It seems to work very well!
I ran the same workload on the HEAD, and I did not see any
pgstat.stat related logs now.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center