9.2 pg_upgrade regression tests on WIndows

Started by Andrew Dunstanover 13 years ago17 messages
#1Andrew Dunstan
andrew@dunslane.net
1 attachment(s)

The attached very small patch allows pg_upgrade's "make check" to
succeed on REL9_2_STABLE on my Mingw system.

However, I consider the issue I mentioned earlier regarding use of
forward slashes in the argument to rmdir to be a significant blocker, so
I'm going to go and fix that and then pull this all together.

cheers

andrew

Attachments:

pg_upgrade_test_92.patchtext/x-patch; name=pg_upgrade_test_92.patchDownload
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index 6f993df..57ca1df 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -91,10 +91,12 @@ exec_prog(bool throw_error, bool is_priv, const char *log_file,
 	else
 		retval = 0;
 
+#ifndef WIN32
 	if ((log = fopen_priv(log_file, "a+")) == NULL)
 		pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
 	fprintf(log, "\n\n");
 	fclose(log);
+#endif
 
 	return retval;
 }
diff --git a/contrib/pg_upgrade/test.sh b/contrib/pg_upgrade/test.sh
index 299b7a5..7d41953 100644
--- a/contrib/pg_upgrade/test.sh
+++ b/contrib/pg_upgrade/test.sh
@@ -15,6 +15,8 @@ set -e
 : ${PGPORT=50432}
 export PGPORT
 
+testhost=`uname -o`
+
 temp_root=$PWD/tmp_check
 
 if [ "$1" = '--install' ]; then
@@ -114,6 +116,10 @@ if [ -n "$pg_dumpall2_status" ]; then
 	exit 1
 fi
 
+if [ $testhost = Msys ] ; then
+	dos2unix "$temp_root"/dump1.sql "$temp_root"/dump2.sql
+fi
+
 if diff -q "$temp_root"/dump1.sql "$temp_root"/dump2.sql; then
 	echo PASSED
 	exit 0
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: 9.2 pg_upgrade regression tests on WIndows

Andrew Dunstan <andrew@dunslane.net> writes:

The attached very small patch allows pg_upgrade's "make check" to
succeed on REL9_2_STABLE on my Mingw system.

However, I consider the issue I mentioned earlier regarding use of
forward slashes in the argument to rmdir to be a significant blocker, so
I'm going to go and fix that and then pull this all together.

My intentions over the next hour or two are to commit the
unix-socket-directory fix and then sync 9.2 pg_upgrade with HEAD
(ie, back-patch everything that's in HEAD except the int64-XLogRecPtr
changes). Will that cause any problems for what you're doing?

regards, tom lane

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: 9.2 pg_upgrade regression tests on WIndows

On 09/03/2012 12:58 PM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

The attached very small patch allows pg_upgrade's "make check" to
succeed on REL9_2_STABLE on my Mingw system.
However, I consider the issue I mentioned earlier regarding use of
forward slashes in the argument to rmdir to be a significant blocker, so
I'm going to go and fix that and then pull this all together.

My intentions over the next hour or two are to commit the
unix-socket-directory fix and then sync 9.2 pg_upgrade with HEAD
(ie, back-patch everything that's in HEAD except the int64-XLogRecPtr
changes). Will that cause any problems for what you're doing?

No, go for it. I can sync up without difficulty. Does that include
backpatching the exec_prog changes to 9.2?

cheers

andrew

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: 9.2 pg_upgrade regression tests on WIndows

Andrew Dunstan <andrew@dunslane.net> writes:

On 09/03/2012 12:58 PM, Tom Lane wrote:

My intentions over the next hour or two are to commit the
unix-socket-directory fix and then sync 9.2 pg_upgrade with HEAD
(ie, back-patch everything that's in HEAD except the int64-XLogRecPtr
changes). Will that cause any problems for what you're doing?

No, go for it. I can sync up without difficulty. Does that include
backpatching the exec_prog changes to 9.2?

Yes, nobody objected, so I'll do that too.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
Re: 9.2 pg_upgrade regression tests on WIndows

I wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

No, go for it. I can sync up without difficulty. Does that include
backpatching the exec_prog changes to 9.2?

Yes, nobody objected, so I'll do that too.

And done. Please apply the other open fixes to both HEAD and 9.2.
Should be easy at this point.

regards, tom lane

#6Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#1)
Re: 9.2 pg_upgrade regression tests on WIndows

On Mon, Sep 3, 2012 at 12:44:09PM -0400, Andrew Dunstan wrote:

The attached very small patch allows pg_upgrade's "make check" to
succeed on REL9_2_STABLE on my Mingw system.

However, I consider the issue I mentioned earlier regarding use of
forward slashes in the argument to rmdir to be a significant
blocker, so I'm going to go and fix that and then pull this all
together.

cheers

andrew

diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index 6f993df..57ca1df 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -91,10 +91,12 @@ exec_prog(bool throw_error, bool is_priv, const char *log_file,
else
retval = 0;

+#ifndef WIN32
if ((log = fopen_priv(log_file, "a+")) == NULL)
pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
fprintf(log, "\n\n");
fclose(log);
+#endif

return retval;
}

I am confused by this fix. If pg_ctl was keeping that log file open,
wouldn't the log write fail when pg_dump or psql was run later? I am
trying to understand how a later commands would not also trigger an
error. Is it a timing thing? If that is it, I would like to know and
have that documented.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#6)
Re: 9.2 pg_upgrade regression tests on WIndows

On 09/04/2012 09:47 AM, Bruce Momjian wrote:

On Mon, Sep 3, 2012 at 12:44:09PM -0400, Andrew Dunstan wrote:

The attached very small patch allows pg_upgrade's "make check" to
succeed on REL9_2_STABLE on my Mingw system.

However, I consider the issue I mentioned earlier regarding use of
forward slashes in the argument to rmdir to be a significant
blocker, so I'm going to go and fix that and then pull this all
together.

cheers

andrew
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index 6f993df..57ca1df 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -91,10 +91,12 @@ exec_prog(bool throw_error, bool is_priv, const char *log_file,
else
retval = 0;

+#ifndef WIN32
if ((log = fopen_priv(log_file, "a+")) == NULL)
pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
fprintf(log, "\n\n");
fclose(log);
+#endif

return retval;
}

I am confused by this fix. If pg_ctl was keeping that log file open,
wouldn't the log write fail when pg_dump or psql was run later? I am
trying to understand how a later commands would not also trigger an
error. Is it a timing thing? If that is it, I would like to know and
have that documented.

Oh, hmm. I thought it was the postmaster holding the log, but now I see
that we are giving it a different log file. Maybe it is a timing thing.
I'll experiment and see if a sleep cures the problem.

...

Nope, still getting this after a sleep(5):

cannot write to log file pg_upgrade_server_start.log
Failure, exiting

...

[try some more] Nope, even in a loop lasting 60s I still got this.

So I'm a bit confused too. Seeing if I can narrow it down using ProcMon ...

cheers

andrew

#8Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#1)
1 attachment(s)
Re: 9.2 pg_upgrade regression tests on WIndows

On Mon, Sep 3, 2012 at 12:44:09PM -0400, Andrew Dunstan wrote:

The attached very small patch allows pg_upgrade's "make check" to
succeed on REL9_2_STABLE on my Mingw system.

However, I consider the issue I mentioned earlier regarding use of
forward slashes in the argument to rmdir to be a significant
blocker, so I'm going to go and fix that and then pull this all
together.

cheers

andrew

diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index 6f993df..57ca1df 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -91,10 +91,12 @@ exec_prog(bool throw_error, bool is_priv, const char *log_file,
else
retval = 0;

+#ifndef WIN32
if ((log = fopen_priv(log_file, "a+")) == NULL)
pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
fprintf(log, "\n\n");
fclose(log);
+#endif

return retval;
}

OK, I worked with Andrew on this issue, and have applied the attached
patch which explains what is happening in this case. Andrew's #ifndef
WIN32 was the correct fix. I consider this issue closed.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

log.difftext/x-diff; charset=us-asciiDownload
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
new file mode 100644
index ac46a9b..99f5006
*** a/contrib/pg_upgrade/exec.c
--- b/contrib/pg_upgrade/exec.c
*************** exec_prog(const char *log_file, const ch
*** 63,70 ****
  	if (written >= MAXCMDLEN)
  		pg_log(PG_FATAL, "command too long\n");
  
! 	if ((log = fopen_priv(log_file, "a+")) == NULL)
  		pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
  	pg_log(PG_VERBOSE, "%s\n", cmd);
  	fprintf(log, "command: %s\n", cmd);
  
--- 63,73 ----
  	if (written >= MAXCMDLEN)
  		pg_log(PG_FATAL, "command too long\n");
  
! 	if ((log = fopen_priv(log_file, "a")) == NULL)
  		pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
+ #ifdef WIN32
+ 	fprintf(log, "\n\n");
+ #endif
  	pg_log(PG_VERBOSE, "%s\n", cmd);
  	fprintf(log, "command: %s\n", cmd);
  
*************** exec_prog(const char *log_file, const ch
*** 97,106 ****
  
  #ifndef WIN32
  	/* 
! 	 * Can't do this on Windows, postmaster will still hold the log file
! 	 * open if the command was "pg_ctl start".
  	 */
! 	if ((log = fopen_priv(log_file, "a+")) == NULL)
  		pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
  	fprintf(log, "\n\n");
  	fclose(log);
--- 100,112 ----
  
  #ifndef WIN32
  	/* 
! 	 *	We can't do this on Windows because it will keep the "pg_ctl start"
! 	 *	output filename open until the server stops, so we do the \n\n above
! 	 *	on that platform.  We use a unique filename for "pg_ctl start" that is
! 	 *	never reused while the server is running, so it works fine.  We could
! 	 *	log these commands to a third file, but that just adds complexity.
  	 */
! 	if ((log = fopen_priv(log_file, "a")) == NULL)
  		pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
  	fprintf(log, "\n\n");
  	fclose(log);
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index 195b927..3058343
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
*************** extern char *output_files[];
*** 63,69 ****
  #define SERVER_STOP_LOG_FILE	SERVER_LOG_FILE
  #else
  #define SERVER_START_LOG_FILE	"pg_upgrade_server_start.log"
! /* pg_ctl stop doesn't keep the log file open, so reuse UTILITY_LOG_FILE */
  #define SERVER_STOP_LOG_FILE	UTILITY_LOG_FILE
  #endif
  
--- 63,73 ----
  #define SERVER_STOP_LOG_FILE	SERVER_LOG_FILE
  #else
  #define SERVER_START_LOG_FILE	"pg_upgrade_server_start.log"
! /*
!  *	"pg_ctl start" keeps SERVER_START_LOG_FILE and SERVER_LOG_FILE open
!  *	while the server is running, so we use UTILITY_LOG_FILE for "pg_ctl
!  *	stop".
!  */
  #define SERVER_STOP_LOG_FILE	UTILITY_LOG_FILE
  #endif
  
#9Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#8)
Re: 9.2 pg_upgrade regression tests on WIndows

On 09/05/2012 12:02 AM, Bruce Momjian wrote:

On Mon, Sep 3, 2012 at 12:44:09PM -0400, Andrew Dunstan wrote:

The attached very small patch allows pg_upgrade's "make check" to
succeed on REL9_2_STABLE on my Mingw system.

However, I consider the issue I mentioned earlier regarding use of
forward slashes in the argument to rmdir to be a significant
blocker, so I'm going to go and fix that and then pull this all
together.

cheers

andrew
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index 6f993df..57ca1df 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -91,10 +91,12 @@ exec_prog(bool throw_error, bool is_priv, const char *log_file,
else
retval = 0;

+#ifndef WIN32
if ((log = fopen_priv(log_file, "a+")) == NULL)
pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
fprintf(log, "\n\n");
fclose(log);
+#endif

return retval;
}

OK, I worked with Andrew on this issue, and have applied the attached
patch which explains what is happening in this case. Andrew's #ifndef
WIN32 was the correct fix. I consider this issue closed.

It looks like we still have problems in this area :-( see
<http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=pitta&amp;dt=2012-09-05%2023%3A05%3A16&gt;

Now it looks like somehow the fopen on the log file that isn't commented
out is failing. But the identical code worked on the same machine on
HEAD. SO this does rather look like a timing issue.

Investigating ...

cheers

andrew

#10Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#9)
Re: 9.2 pg_upgrade regression tests on WIndows

On Wed, Sep 5, 2012 at 09:07:05PM -0400, Andrew Dunstan wrote:

OK, I worked with Andrew on this issue, and have applied the attached
patch which explains what is happening in this case. Andrew's #ifndef
WIN32 was the correct fix. I consider this issue closed.

It looks like we still have problems in this area :-( see <http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=pitta&amp;dt=2012-09-05%2023%3A05%3A16&gt;

Now it looks like somehow the fopen on the log file that isn't
commented out is failing. But the identical code worked on the same
machine on HEAD. SO this does rather look like a timing issue.

Investigating ...

Yes, that is very odd. It is also right after the code we just changed
to use binary mode to split the pg_dumpall file, split_old_dump().

The code is doing pg_ctl -w stop, then starting a new postmaster with
pg_ctl -w start. Looking at the pg_ctl.c code (that you wrote), what
pg_ctl -w stop does is to wait for the postmaster.pid file to disappear,
then it returns complete. I suppose it is possible that the pid file is
getting removed, pg_ctl is returning done, but the pg_ctl binary is
still running, holding open those log files.

I guess the buildfarm is showing us the problems in pg_upgrade, as it
should. I think you might be right that we need to add a sleep(1) at
the end of stop_postmaster on Windows, and document it is to give the
postmaster time to release its log files.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#10)
Re: 9.2 pg_upgrade regression tests on WIndows

On 09/05/2012 09:42 PM, Bruce Momjian wrote:

On Wed, Sep 5, 2012 at 09:07:05PM -0400, Andrew Dunstan wrote:

OK, I worked with Andrew on this issue, and have applied the attached
patch which explains what is happening in this case. Andrew's #ifndef
WIN32 was the correct fix. I consider this issue closed.

It looks like we still have problems in this area :-( see <http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=pitta&amp;dt=2012-09-05%2023%3A05%3A16&gt;

Now it looks like somehow the fopen on the log file that isn't
commented out is failing. But the identical code worked on the same
machine on HEAD. SO this does rather look like a timing issue.

Investigating ...

Yes, that is very odd. It is also right after the code we just changed
to use binary mode to split the pg_dumpall file, split_old_dump().

The code is doing pg_ctl -w stop, then starting a new postmaster with
pg_ctl -w start. Looking at the pg_ctl.c code (that you wrote), what
pg_ctl -w stop does is to wait for the postmaster.pid file to disappear,
then it returns complete. I suppose it is possible that the pid file is
getting removed, pg_ctl is returning done, but the pg_ctl binary is
still running, holding open those log files.

I guess the buildfarm is showing us the problems in pg_upgrade, as it
should. I think you might be right that we need to add a sleep(1) at
the end of stop_postmaster on Windows, and document it is to give the
postmaster time to release its log files.

Icky. I wish there were some nice portable flock() mechanism we could use.

I just re-ran the test on the same machine, same code, same everything
as the reporte3d failure, and it passed, so it definitely looks like
it's a timing issue.

I'd be inclined to put a loop around that fopen() to try it once every
second for, say, 5 seconds.

cheers

andrew

#12Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#11)
Re: 9.2 pg_upgrade regression tests on WIndows

On Wed, Sep 5, 2012 at 10:04:07PM -0400, Andrew Dunstan wrote:

On 09/05/2012 09:42 PM, Bruce Momjian wrote:

On Wed, Sep 5, 2012 at 09:07:05PM -0400, Andrew Dunstan wrote:

OK, I worked with Andrew on this issue, and have applied the attached
patch which explains what is happening in this case. Andrew's #ifndef
WIN32 was the correct fix. I consider this issue closed.

It looks like we still have problems in this area :-( see <http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=pitta&amp;dt=2012-09-05%2023%3A05%3A16&gt;

Now it looks like somehow the fopen on the log file that isn't
commented out is failing. But the identical code worked on the same
machine on HEAD. SO this does rather look like a timing issue.

Investigating ...

Yes, that is very odd. It is also right after the code we just changed
to use binary mode to split the pg_dumpall file, split_old_dump().

The code is doing pg_ctl -w stop, then starting a new postmaster with
pg_ctl -w start. Looking at the pg_ctl.c code (that you wrote), what
pg_ctl -w stop does is to wait for the postmaster.pid file to disappear,
then it returns complete. I suppose it is possible that the pid file is
getting removed, pg_ctl is returning done, but the pg_ctl binary is
still running, holding open those log files.

I guess the buildfarm is showing us the problems in pg_upgrade, as it
should. I think you might be right that we need to add a sleep(1) at
the end of stop_postmaster on Windows, and document it is to give the
postmaster time to release its log files.

Icky. I wish there were some nice portable flock() mechanism we could use.

I just re-ran the test on the same machine, same code, same
everything as the reporte3d failure, and it passed, so it definitely
looks like it's a timing issue.

I'd be inclined to put a loop around that fopen() to try it once
every second for, say, 5 seconds.

Yes, good idea.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#12)
1 attachment(s)
Re: 9.2 pg_upgrade regression tests on WIndows

On 09/05/2012 10:07 PM, Bruce Momjian wrote:

On Wed, Sep 5, 2012 at 10:04:07PM -0400, Andrew Dunstan wrote:

On 09/05/2012 09:42 PM, Bruce Momjian wrote:

On Wed, Sep 5, 2012 at 09:07:05PM -0400, Andrew Dunstan wrote:

OK, I worked with Andrew on this issue, and have applied the attached
patch which explains what is happening in this case. Andrew's #ifndef
WIN32 was the correct fix. I consider this issue closed.

It looks like we still have problems in this area :-( see <http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=pitta&amp;dt=2012-09-05%2023%3A05%3A16&gt;

Now it looks like somehow the fopen on the log file that isn't
commented out is failing. But the identical code worked on the same
machine on HEAD. SO this does rather look like a timing issue.

Investigating ...

Yes, that is very odd. It is also right after the code we just changed
to use binary mode to split the pg_dumpall file, split_old_dump().

The code is doing pg_ctl -w stop, then starting a new postmaster with
pg_ctl -w start. Looking at the pg_ctl.c code (that you wrote), what
pg_ctl -w stop does is to wait for the postmaster.pid file to disappear,
then it returns complete. I suppose it is possible that the pid file is
getting removed, pg_ctl is returning done, but the pg_ctl binary is
still running, holding open those log files.

I guess the buildfarm is showing us the problems in pg_upgrade, as it
should. I think you might be right that we need to add a sleep(1) at
the end of stop_postmaster on Windows, and document it is to give the
postmaster time to release its log files.

Icky. I wish there were some nice portable flock() mechanism we could use.

I just re-ran the test on the same machine, same code, same
everything as the reporte3d failure, and it passed, so it definitely
looks like it's a timing issue.

I'd be inclined to put a loop around that fopen() to try it once
every second for, say, 5 seconds.

Yes, good idea.

Suggested patch attached.

cheers

andrew

Attachments:

pg_upgrade_fopen_loop.patchtext/x-patch; name=pg_upgrade_fopen_loop.patchDownload
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index 99f5006..f84d857 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -63,7 +63,25 @@ exec_prog(const char *log_file, const char *opt_log_file,
 	if (written >= MAXCMDLEN)
 		pg_log(PG_FATAL, "command too long\n");
 
-	if ((log = fopen_priv(log_file, "a")) == NULL)
+#ifdef WIN32
+	{
+		/* 
+		 * Try to open the log file a few times in case the
+		 * server takes a bit longer than we'd like to release it.
+		 */
+		int iter;
+		for (iter = 0; iter < 5; iter++)
+		{
+			log = fopen_priv(log_file, "a");
+			if (log != NULL || iter == 4)
+				break;
+			sleep(1);
+		}
+	}
+#else
+	log = fopen_priv(log_file, "a");
+#endif
+	if (log == NULL)
 		pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
 #ifdef WIN32
 	fprintf(log, "\n\n");
#14Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#13)
Re: 9.2 pg_upgrade regression tests on WIndows

On Wed, Sep 5, 2012 at 10:35:26PM -0400, Andrew Dunstan wrote:

Icky. I wish there were some nice portable flock() mechanism we could use.

I just re-ran the test on the same machine, same code, same
everything as the reporte3d failure, and it passed, so it definitely
looks like it's a timing issue.

I'd be inclined to put a loop around that fopen() to try it once
every second for, say, 5 seconds.

Yes, good idea.

Suggested patch attached.

cheers

andrew

diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index 99f5006..f84d857 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -63,7 +63,25 @@ exec_prog(const char *log_file, const char *opt_log_file,
if (written >= MAXCMDLEN)
pg_log(PG_FATAL, "command too long\n");
-	if ((log = fopen_priv(log_file, "a")) == NULL)
+#ifdef WIN32
+	{
+		/* 
+		 * Try to open the log file a few times in case the
+		 * server takes a bit longer than we'd like to release it.
+		 */
+		int iter;
+		for (iter = 0; iter < 5; iter++)
+		{
+			log = fopen_priv(log_file, "a");
+			if (log != NULL || iter == 4)
+				break;
+			sleep(1);
+		}
+	}
+#else
+	log = fopen_priv(log_file, "a");
+#endif
+	if (log == NULL)
pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
#ifdef WIN32
fprintf(log, "\n\n");

I would like to see a more verbose comment, so we don't forget why we
did this. I think my inability to quickly discover the cause of the
previous log write problem is that I didn't document which file
descriptors are kept open on Windows. I suggest for a comment:

/*
* "pg_ctl -w stop" might have reported that the server has
* stopped because the postmaster.pid file has been removed,
* but "pg_ctl -w start" might still be in the process of
* closing and might still be holding its stdout and -l log
* file descriptors open. Therefore, try to open the log
* file a few times.
*/

Anyway, we can easily adjust the comment post-9.2.0.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#15Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#13)
Re: 9.2 pg_upgrade regression tests on WIndows

On Wed, Sep 5, 2012 at 10:35:26PM -0400, Andrew Dunstan wrote:

diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index 99f5006..f84d857 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -63,7 +63,25 @@ exec_prog(const char *log_file, const char *opt_log_file,
if (written >= MAXCMDLEN)
pg_log(PG_FATAL, "command too long\n");
-	if ((log = fopen_priv(log_file, "a")) == NULL)
+#ifdef WIN32
+	{
+		/* 
+		 * Try to open the log file a few times in case the
+		 * server takes a bit longer than we'd like to release it.
+		 */
+		int iter;
+		for (iter = 0; iter < 5; iter++)
+		{
+			log = fopen_priv(log_file, "a");
+			if (log != NULL || iter == 4)
+				break;
+			sleep(1);
+		}
+	}
+#else
+	log = fopen_priv(log_file, "a");
+#endif
+	if (log == NULL)
pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
#ifdef WIN32
fprintf(log, "\n\n");

Oh, also, we normally put the ifndef WIN32 code first because that is
our most common platform. Also, is "|| iter == 4" necessary?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#14)
Re: 9.2 pg_upgrade regression tests on WIndows

On 09/05/2012 10:41 PM, Bruce Momjian wrote:

I would like to see a more verbose comment, so we don't forget why we
did this. I think my inability to quickly discover the cause of the
previous log write problem is that I didn't document which file
descriptors are kept open on Windows. I suggest for a comment:

/*
* "pg_ctl -w stop" might have reported that the server has
* stopped because the postmaster.pid file has been removed,
* but "pg_ctl -w start" might still be in the process of
* closing and might still be holding its stdout and -l log
* file descriptors open. Therefore, try to open the log
* file a few times.
*/

Anyway, we can easily adjust the comment post-9.2.0.

Shall I apply the patch now? If so I'll include your comment.

cheers

andrew

#17Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#16)
Re: 9.2 pg_upgrade regression tests on WIndows

On Wed, Sep 5, 2012 at 10:46:17PM -0400, Andrew Dunstan wrote:

On 09/05/2012 10:41 PM, Bruce Momjian wrote:

I would like to see a more verbose comment, so we don't forget why we
did this. I think my inability to quickly discover the cause of the
previous log write problem is that I didn't document which file
descriptors are kept open on Windows. I suggest for a comment:

/*
* "pg_ctl -w stop" might have reported that the server has
* stopped because the postmaster.pid file has been removed,
* but "pg_ctl -w start" might still be in the process of
* closing and might still be holding its stdout and -l log
* file descriptors open. Therefore, try to open the log
* file a few times.
*/

Anyway, we can easily adjust the comment post-9.2.0.

Shall I apply the patch now? If so I'll include your comment.

Well, seems it is a crash bug. Apply so we can get some buildfarm
testing overnight.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +