stats test on Windows is now failing repeatably?
I just looked over the buildfarm results and was struck by the
observation that the stats regression test, which lately had been
failing once-in-a-while on Windows and never anywhere else, has a
batting average of 0-for-10-or-so over the past 24 hours on the Windows
buildfarm machines. I still have no idea what the real problem is there
--- but since it suddenly seems to have gotten very repeatable, I trust
someone with a Windows box and a debugger will get after it before the
source code drifts again.
[ urk ... must ... resist ... temptation ... failing ... AUTOVACUUM? ]
regards, tom lane
Tom Lane wrote:
I just looked over the buildfarm results and was struck by the observation that the stats regression test, which lately had been failing once-in-a-while on Windows and never anywhere else, has a batting average of 0-for-10-or-so over the past 24 hours on the Windows buildfarm machines. I still have no idea what the real problem is there --- but since it suddenly seems to have gotten very repeatable, I trust someone with a Windows box and a debugger will get after it before the source code drifts again.
maybe it's worth pointing out that leveret(fedora core5/x86_64/icc)
manages to trigger that too on occassion - so maybe it is not a "windows
only" bug:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-17%2008:30:01
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-10%2000:30:02
Stefan
Tom Lane <tgl@sss.pgh.pa.us> wrote:
I just looked over the buildfarm results and was struck by the
observation that the stats regression test, which lately had been
failing once-in-a-while on Windows and never anywhere else, has a
batting average of 0-for-10-or-so over the past 24 hours on the Windows
buildfarm machines.
I tested HEAD on Windows and saw some Windows-specific logs.
LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2
LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13
The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file
specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process
cannot access the file because it is being used by another process."
We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat.
Are there any incompatible behavior in the trick between POSIX and Windows?
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
I tested HEAD on Windows and saw some Windows-specific logs.
LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2
LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13
The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file
specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process
cannot access the file because it is being used by another process."
The first of those is probably normal operation --- we remove
pg_internal.init whenever it is out-of-date. The second is bad though.
We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat.
Are there any incompatible behavior in the trick between POSIX and Windows?
It looks to me like we have implemented Windows' FILE_SHARE_DELETE flag
for open() calls but not for fopen(). Isn't this a problem? We do use
fopen() for stuff like pgstat.stat.
regards, tom lane
The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find
the
file specified." and the code 32 means ERROR_SHARING_VIOLATION,
"The
process cannot access the file because it is being used by
another process."
The first of those is probably normal operation --- we remove
pg_internal.init whenever it is out-of-date. The second is bad
though.We use the tmpfile-and-rename trick on both pg_internal.init and
pgstat.stat.
Are there any incompatible behavior in the trick between POSIX
and Windows?
It looks to me like we have implemented Windows' FILE_SHARE_DELETE
flag for open() calls but not for fopen(). Isn't this a problem?
We do use
fopen() for stuff like pgstat.stat.
That definitely sounds like a problem, there is no reason why the issue
shouldn't occur for fopen(). Do you want to work up a patch for that
based on open(), or do you want me to take a look at it?
//Magnus
"Magnus Hagander" <mha@sollentuna.net> writes:
It looks to me like we have implemented Windows' FILE_SHARE_DELETE
flag for open() calls but not for fopen(). Isn't this a problem?
We do use fopen() for stuff like pgstat.stat.
That definitely sounds like a problem, there is no reason why the issue
shouldn't occur for fopen(). Do you want to work up a patch for that
based on open(), or do you want me to take a look at it?
It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build environment
I couldn't test the patch. Please take a look at it.
regards, tom lane
It looks to me like we have implemented Windows'
FILE_SHARE_DELETE
flag for open() calls but not for fopen(). Isn't this a
problem?
We do use fopen() for stuff like pgstat.stat.
That definitely sounds like a problem, there is no reason why the
issue shouldn't occur for fopen(). Do you want to work up a patchfor
that based on open(), or do you want me to take a look at it?
It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build
environment I couldn't test the patch. Please take a look at it.
I think this is what we want. It passes regression tests on my machine.
I never managed to reproduce the original problem on this machine, so
don't know if it solves the problem, but I don't think it makes it worse
:-)
//Magnus
Attachments:
win32_fopen.diffapplication/octet-stream; name=win32_fopen.diffDownload+30-0
"Magnus Hagander" <mha@sollentuna.net> writes:
It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build
environment I couldn't test the patch. Please take a look at it.
I think this is what we want. It passes regression tests on my machine.
I never managed to reproduce the original problem on this machine, so
don't know if it solves the problem, but I don't think it makes it worse
:-)
Applied, we'll see what happens ...
regards, tom lane
"Magnus Hagander" <mha@sollentuna.net> wrote:
FILE_SHARE_DELETE
I think this is what we want. It passes regression tests on my machine.
I never managed to reproduce the original problem on this machine, so
don't know if it solves the problem, but I don't think it makes it worse
:-)
It seems to work very well!
I ran the same workload on the HEAD, and I did not see any
pgstat.stat related logs now.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center