[v15 beta] pg_upgrade failed if earlier executed with -c switch

Started by tusharover 3 years ago25 messages
#1tushar
tushar.ahuja@enterprisedb.com

Hi,

While performing pg_upgrade from v15Beta binaries/source,
I got this error below error

could not create directory "d2/pg_upgrade_output.d": File exists
Failure, exiting

*Steps to reproduce *
v15 Beta sources
initalize a cluster ( ./initdb -D d1)
initalize another cluster ( ./initdb -D d2)
run pg_upgrade with -c option  ( ./pg_upgrade -d d1 -D d2 -b . -B . -c -v)
run pg_upgrade without -c option ( ./pg_upgrade -d d1 -D d2 -b . -B .)
--
--
--
Error

This behavior was not there in earlier released versions, i guess.
Is it expected behavior now onwards?

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

#2Daniel Gustafsson
daniel@yesql.se
In reply to: tushar (#1)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On 3 Jun 2022, at 13:19, tushar <tushar.ahuja@enterprisedb.com> wrote:

This behavior was not there in earlier released versions, i guess.
Is it expected behavior now onwards?

That's an unfortunate side effect which AFAICT was overlooked in the original
thread. Having a predictable name was defined as important for CI/BF, but I
agree that the above is likely to be a common user pattern (first running -c is
exactly what I did when managing databases and upgraded them with pg_upgrade).

This might break a few automated upgrade scripts out there (but they might also
already need changes to cope with the moved file locations).

We can address this by documentation, and specifically highlight under the -c
option in the manual that the folder need to removed/renamed (and possibly to
STDOUT aswell when run with -c).

--
Daniel Gustafsson https://vmware.com/

#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Daniel Gustafsson (#2)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Fri, Jun 03, 2022 at 02:01:18PM +0200, Daniel Gustafsson wrote:

On 3 Jun 2022, at 13:19, tushar <tushar.ahuja@enterprisedb.com> wrote:

This behavior was not there in earlier released versions, i guess.
Is it expected behavior now onwards?

That's an unfortunate side effect which AFAICT was overlooked in the original
thread. Having a predictable name was defined as important for CI/BF, but I
agree that the above is likely to be a common user pattern (first running -c is
exactly what I did when managing databases and upgraded them with pg_upgrade).

I agree that it's an problem, but it's not limited to -c.

For example, I ran this:

|$ time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-14/bin/initdb -d ./pgsql14.dat -D ./pgsql15.dat
|"/usr/pgsql-14/bin/initdb" is not a directory
|Failure, exiting

And then reran with the correct "-b" option, but then it failed because it had
failed before...

|$ time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-14/bin -d ./pgsql14.dat -D ./pgsql15.dat
|could not create directory "pgsql15.dat/pg_upgrade_output.d": File exists
|Failure, exiting

This is a kind of geometric circle of errors - an error at point A requires
first re-running after fixing A's issue, and then an error at B requires
re-running after fixing B's issue, hitting the "A" error again, and then
rerunning again again. It's the same kind of problem that led to 3c0471b5f.

-c could use a different output directory, but that means it would fail if
pg_upgrade -c were run multiple times, which seems undesirable for a "check"
command.

We could call cleanup() if -c was successful. But that doesn't help the case
that -c fails; the new dir would still need to be manually removed, which seems
like imposing useless busywork on the user.

We could allow mkdir to fail with EEXIST, except that breaks the original
motivation for the patch: the logs are appended to and any old errors are still
in the logs after re-running pg_upgrade.

--
Justin

#4Daniel Gustafsson
daniel@yesql.se
In reply to: Justin Pryzby (#3)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On 3 Jun 2022, at 15:53, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Fri, Jun 03, 2022 at 02:01:18PM +0200, Daniel Gustafsson wrote:

On 3 Jun 2022, at 13:19, tushar <tushar.ahuja@enterprisedb.com> wrote:

This behavior was not there in earlier released versions, i guess.
Is it expected behavior now onwards?

That's an unfortunate side effect which AFAICT was overlooked in the original
thread. Having a predictable name was defined as important for CI/BF, but I
agree that the above is likely to be a common user pattern (first running -c is
exactly what I did when managing databases and upgraded them with pg_upgrade).

I agree that it's an problem, but it's not limited to -c.

Indeed.

For example, I ran this:

|$ time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-14/bin/initdb -d ./pgsql14.dat -D ./pgsql15.dat
|"/usr/pgsql-14/bin/initdb" is not a directory
|Failure, exiting

And then reran with the correct "-b" option, but then it failed because it had
failed before...

Thats, not ideal.

We could call cleanup() if -c was successful. But that doesn't help the case
that -c fails; the new dir would still need to be manually removed, which seems
like imposing useless busywork on the user.

We could allow mkdir to fail with EEXIST, except that breaks the original
motivation for the patch: the logs are appended to and any old errors are still
in the logs after re-running pg_upgrade.

Or we could revisit Tom's proposal in the thread that implemented the feature:
to have timestamped directory names to get around this very problem? I think
we should be able to figure out a way to make it easy enough for the BF code to
figure out (and clean up).

--
Daniel Gustafsson https://vmware.com/

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#4)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

Daniel Gustafsson <daniel@yesql.se> writes:

Or we could revisit Tom's proposal in the thread that implemented the feature:
to have timestamped directory names to get around this very problem? I think
we should be able to figure out a way to make it easy enough for the BF code to
figure out (and clean up).

How about inserting an additional level of subdirectory?

pg_upgrade_output.d/20220603122528/foo.log

Then code doing "rm -rf pg_upgrade_output.d" needs no changes.

regards, tom lane

#6Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#5)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On 3 Jun 2022, at 18:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Daniel Gustafsson <daniel@yesql.se> writes:

Or we could revisit Tom's proposal in the thread that implemented the feature:
to have timestamped directory names to get around this very problem? I think
we should be able to figure out a way to make it easy enough for the BF code to
figure out (and clean up).

How about inserting an additional level of subdirectory?

pg_upgrade_output.d/20220603122528/foo.log

Then code doing "rm -rf pg_upgrade_output.d" needs no changes.

Off the cuff that seems like a good compromise. Adding Andrew on CC: for input
on how that affects the buildfarm.

--
Daniel Gustafsson https://vmware.com/

#7Michael Paquier
michael@paquier.xyz
In reply to: Daniel Gustafsson (#6)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Fri, Jun 03, 2022 at 06:55:28PM +0200, Daniel Gustafsson wrote:

On 3 Jun 2022, at 18:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

How about inserting an additional level of subdirectory?

pg_upgrade_output.d/20220603122528/foo.log

Then code doing "rm -rf pg_upgrade_output.d" needs no changes.

Off the cuff that seems like a good compromise. Adding Andrew on CC: for input
on how that affects the buildfarm.

I am not so sure. My first reaction was actually to bypass the
creation of the directories on EEXIST. But, isn't the problem
different and actually older here? In the set of commands given by
Tushar, he uses the --check option without --retain, but the logs are
kept around after the execution of the command. It seems to me that
there is an argument for also removing the logs if the caller of the
command does not want to retain them.

Seeing the top of the thread, I think that it would be a good idea to
add an extra pg_upgrade --check before the real upgrade run. I've
also relied on --check as a safety measure in the past for automated
workflows.
--
Michael

#8Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#7)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Sat, Jun 04, 2022 at 12:13:19PM +0900, Michael Paquier wrote:

On Fri, Jun 03, 2022 at 06:55:28PM +0200, Daniel Gustafsson wrote:

On 3 Jun 2022, at 18:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:

How about inserting an additional level of subdirectory?

pg_upgrade_output.d/20220603122528/foo.log

Then code doing "rm -rf pg_upgrade_output.d" needs no changes.

Off the cuff that seems like a good compromise. Adding Andrew on CC: for input
on how that affects the buildfarm.

I am not so sure. My first reaction was actually to bypass the
creation of the directories on EEXIST.

But that breaks the original motive behind the patch I wrote - logfiles are
appended to, even if they're full of errors that were fixed before re-running
pg_upgrade.

But, isn't the problem different and actually older here? In the set of
commands given by Tushar, he uses the --check option without --retain, but
the logs are kept around after the execution of the command. It seems to me
that there is an argument for also removing the logs if the caller of the
command does not want to retain them.

You're right that --check is a bit inconsistent, but I don't think we could
backpatch any change to fix it. It wouldn't matter much anyway, since the
usual workflow would be "pg_upgrade --check && pg_upgrade". In which case the
logs would end up being removed anyway.

On Sat, Jun 04, 2022 at 12:13:19PM +0900, Michael Paquier wrote:

Seeing the top of the thread, I think that it would be a good idea to
add an extra pg_upgrade --check before the real upgrade run. I've
also relied on --check as a safety measure in the past for automated
workflows.

It already does this; --check really means "stop-after-checking".

Hmm .. maybe what you mean is that the *tap test* should first run with
--check?

--
Justin

#9Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#8)
1 attachment(s)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Fri, Jun 03, 2022 at 10:32:27PM -0500, Justin Pryzby wrote:

On Sat, Jun 04, 2022 at 12:13:19PM +0900, Michael Paquier wrote:

I am not so sure. My first reaction was actually to bypass the
creation of the directories on EEXIST.

But that breaks the original motive behind the patch I wrote - logfiles are
appended to, even if they're full of errors that were fixed before re-running
pg_upgrade.

Yep.

But, isn't the problem different and actually older here? In the set of
commands given by Tushar, he uses the --check option without --retain, but
the logs are kept around after the execution of the command. It seems to me
that there is an argument for also removing the logs if the caller of the
command does not want to retain them.

You're right that --check is a bit inconsistent, but I don't think we could
backpatch any change to fix it. It wouldn't matter much anyway, since the
usual workflow would be "pg_upgrade --check && pg_upgrade". In which case the
logs would end up being removed anyway.

Exactly, the inconsistency in the log handling is annoying, and
cleaning up the logs when --retain is not used makes sense to me when
the --check command succeeds, but we should keep them if the --check
fails. I don't see an argument in backpatching that either.

Hmm .. maybe what you mean is that the *tap test* should first run with
--check?

Sorry for the confusion. I meant to add an extra command in the TAP
test itself.

I would suggest the attached patch then, to add a --check command in
the test suite, with a change to clean up the logs when --check is
used without --retain.
--
Michael

Attachments:

upgrade-check-logs.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 6114303b52..a8b73b6c5e 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -210,6 +210,11 @@ report_clusters_compatible(void)
 		pg_log(PG_REPORT, "\n*Clusters are compatible*\n");
 		/* stops new cluster */
 		stop_postmaster(false);
+
+		/* Remove log files? */
+		if (!log_opts.retain)
+			(void) rmtree(log_opts.basedir, true);
+
 		exit(0);
 	}
 
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 55c7354ba2..23a4190abb 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -213,6 +213,14 @@ chdir ${PostgreSQL::Test::Utils::tmp_check};
 
 # Upgrade the instance.
 $oldnode->stop;
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',        '-d', $oldnode->data_dir,
+		'-D',         $newnode->data_dir, '-b', $oldbindir,
+		'-B',         $newbindir,         '-p', $oldnode->port,
+		'-P',         $newnode->port,     '--check'
+	],
+	'run of pg_upgrade --check for new instance');
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',        '-d', $oldnode->data_dir,
#10Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#9)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Sat, Jun 04, 2022 at 06:48:19PM +0900, Michael Paquier wrote:

I would suggest the attached patch then, to add a --check command in
the test suite, with a change to clean up the logs when --check is
used without --retain.

This doesn't address one of the problems that I already enumerated.

./tmp_install/usr/local/pgsql/bin/initdb -D pgsql15.dat
./tmp_install/usr/local/pgsql/bin/initdb -D pgsql15.dat-2

$ ./tmp_install/usr/local/pgsql/bin/pg_upgrade -b ./tmp_install/usr/local/pgsql/bin/bad -d pgsql15.dat-2 -D pgsql15.dat-2
check for "tmp_install/usr/local/pgsql/bin/bad" failed: No such file or directory
Failure, exiting

$ ./tmp_install/usr/local/pgsql/bin/pg_upgrade -b ./tmp_install/usr/local/pgsql/bin/bad -d pgsql15.dat-2 -D pgsql15.dat-2
could not create directory "pgsql15.dat-2/pg_upgrade_output.d": File exists
Failure, exiting

..failing the 2nd time because it failed the 1st time (even if I fix the bad
argument).

Maybe that's easy enough to fix just be rearranging verify_directories() or
make_outputdirs().

But actually it seems annoying to have to remove the failed outputdir.
It's true that those logs *can* be useful to fix whatever underlying problem,
but I'm afraid the *requirement* to remove the failed outputdir is a nuisance,
even outside of check mode.

--
Justin

#11Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#10)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Sat, Jun 04, 2022 at 09:13:46AM -0500, Justin Pryzby wrote:

Maybe that's easy enough to fix just be rearranging verify_directories() or
make_outputdirs().

For the case, I mentioned, yes.

But actually it seems annoying to have to remove the failed outputdir.
It's true that those logs *can* be useful to fix whatever underlying problem,
but I'm afraid the *requirement* to remove the failed outputdir is a nuisance,
even outside of check mode.

Well, another error that could happen in the early code paths is
EACCES on a custom socket directory specified, and we'd still face the
same problem on a follow-up restart. Using a sub-directory structure
as Daniel and Tom mention would address all that (if ignoring EEXIST
for the BASE_OUTPUTDIR), removing any existing content from the base
path when not using --retain. This comes with the disadvantage of
bloating the disk on repeated errors, but this last bit would not
really be a huge problem, I guess, as it could be more useful to keep
the error information around.
--
Michael

#12Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#11)
1 attachment(s)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Sun, Jun 05, 2022 at 09:24:25AM +0900, Michael Paquier wrote:

Well, another error that could happen in the early code paths is
EACCES on a custom socket directory specified, and we'd still face the
same problem on a follow-up restart. Using a sub-directory structure
as Daniel and Tom mention would address all that (if ignoring EEXIST
for the BASE_OUTPUTDIR), removing any existing content from the base
path when not using --retain. This comes with the disadvantage of
bloating the disk on repeated errors, but this last bit would not
really be a huge problem, I guess, as it could be more useful to keep
the error information around.

I have been toying with the idea of a sub-directory named with a
timestamp (Unix time, like log_line_prefix's %n but this could be
any format) under pg_upgrade_output.d/ and finished with the
attached. The logs are removed from the root path when --check is
used without --retain, like for a non-check command. I have added a
set of tests to provide some coverage for the whole:
- Failure of --check where the binary path does not exist, and
pg_upgrade_output.d/ is not removed.
- Follow-up run of pg_upgrade --check, where pg_upgrade_output.d/ is
removed.
- Check that pg_upgrade_output.d/ is also removed after the main
upgrade command completes.

The logic in charge of cleaning up the logs has been moved to a single
routine, aka cleanup_logs().

Thoughts?
--
Michael

Attachments:

upgrade-check-logs-v2.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 6114303b52..b736f89816 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -210,6 +210,8 @@ report_clusters_compatible(void)
 		pg_log(PG_REPORT, "\n*Clusters are compatible*\n");
 		/* stops new cluster */
 		stop_postmaster(false);
+
+		cleanup_logs();
 		exit(0);
 	}
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index ecb3e1f647..ac854d268f 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -58,7 +58,6 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
-static void cleanup(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -204,7 +203,7 @@ main(int argc, char **argv)
 
 	pg_free(deletion_script_file_name);
 
-	cleanup();
+	cleanup_logs();
 
 	return 0;
 }
@@ -221,14 +220,37 @@ make_outputdirs(char *pgdata)
 	char	  **filename;
 	time_t		run_time = time(NULL);
 	char		filename_path[MAXPGPATH];
+	char		timebuf[128];
+	struct timeval time;
 
+	log_opts.rootdir = (char *) pg_malloc(MAXPGPATH);
+	snprintf(log_opts.rootdir, MAXPGPATH, "%s/%s", pgdata, BASE_OUTPUTDIR);
+
+	/* BASE_OUTPUTDIR/$unix_timestamp/ */
+	gettimeofday(&time, NULL);
+	snprintf(timebuf, sizeof(timebuf), "%ld.%03d",
+			 (long) time.tv_sec,
+			 (int) (time.tv_usec / 1000));
 	log_opts.basedir = (char *) pg_malloc(MAXPGPATH);
-	snprintf(log_opts.basedir, MAXPGPATH, "%s/%s", pgdata, BASE_OUTPUTDIR);
-	log_opts.dumpdir = (char *) pg_malloc(MAXPGPATH);
-	snprintf(log_opts.dumpdir, MAXPGPATH, "%s/%s", pgdata, DUMP_OUTPUTDIR);
-	log_opts.logdir = (char *) pg_malloc(MAXPGPATH);
-	snprintf(log_opts.logdir, MAXPGPATH, "%s/%s", pgdata, LOG_OUTPUTDIR);
+	snprintf(log_opts.basedir, MAXPGPATH, "%s/%s", log_opts.rootdir,
+			 timebuf);
 
+	/* BASE_OUTPUTDIR/$unix_timestamp/dump/ */
+	log_opts.dumpdir = (char *) pg_malloc(MAXPGPATH);
+	snprintf(log_opts.dumpdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
+			 timebuf, DUMP_OUTPUTDIR);
+
+	/* BASE_OUTPUTDIR/$unix_timestamp/log/ */
+	log_opts.logdir = (char *) pg_malloc(MAXPGPATH);
+	snprintf(log_opts.logdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
+			 timebuf, LOG_OUTPUTDIR);
+
+	/*
+	 * Ignore the error case where the root path exists, as it is kept
+	 * the same across runs.
+	 */
+	if (mkdir(log_opts.rootdir, pg_dir_create_mode) && errno != EEXIST)
+		pg_fatal("could not create directory \"%s\": %m\n", log_opts.rootdir);
 	if (mkdir(log_opts.basedir, pg_dir_create_mode))
 		pg_fatal("could not create directory \"%s\": %m\n", log_opts.basedir);
 	if (mkdir(log_opts.dumpdir, pg_dir_create_mode))
@@ -745,14 +767,3 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
-
-
-static void
-cleanup(void)
-{
-	fclose(log_opts.internal);
-
-	/* Remove dump and log files? */
-	if (!log_opts.retain)
-		(void) rmtree(log_opts.basedir, true);
-}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 86d3dc46fa..157ef0e2ff 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -30,12 +30,14 @@
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
 
 /*
- * Base directories that include all the files generated internally,
- * from the root path of the new cluster.
+ * Base directories that include all the files generated internally, from the
+ * root path of the new cluster.  The paths are dynamically built as of
+ * BASE_OUTPUTDIR/$unix_timestamp/{LOG_OUTPUTDIR,DUMP_OUTPUTDIR} to ensure
+ * their uniqueness in each run.
  */
 #define BASE_OUTPUTDIR		"pg_upgrade_output.d"
-#define LOG_OUTPUTDIR		BASE_OUTPUTDIR "/log"
-#define DUMP_OUTPUTDIR		BASE_OUTPUTDIR "/dump"
+#define LOG_OUTPUTDIR		 "log"
+#define DUMP_OUTPUTDIR		 "dump"
 
 #define DB_DUMP_LOG_FILE_MASK	"pg_upgrade_dump_%u.log"
 #define SERVER_LOG_FILE		"pg_upgrade_server.log"
@@ -276,7 +278,8 @@ typedef struct
 	bool		verbose;		/* true -> be verbose in messages */
 	bool		retain;			/* retain log files on success */
 	/* Set of internal directories for output files */
-	char	   *basedir;		/* Base output directory */
+	char	   *rootdir;		/* Root directory, aka pg_upgrade_output.d */
+	char	   *basedir;		/* Base output directory, with timestamp */
 	char	   *dumpdir;		/* Dumps */
 	char	   *logdir;			/* Log files */
 	bool		isatty;			/* is stdout a tty */
@@ -432,6 +435,7 @@ void		report_status(eLogType type, const char *fmt,...) pg_attribute_printf(2, 3
 void		pg_log(eLogType type, const char *fmt,...) pg_attribute_printf(2, 3);
 void		pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2) pg_attribute_noreturn();
 void		end_progress_output(void);
+void		cleanup_logs(void);
 void		prep_status(const char *fmt,...) pg_attribute_printf(1, 2);
 void		prep_status_progress(const char *fmt,...) pg_attribute_printf(1, 2);
 unsigned int str2uint(const char *str);
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 55c7354ba2..db939ee9a1 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -213,6 +213,38 @@ chdir ${PostgreSQL::Test::Utils::tmp_check};
 
 # Upgrade the instance.
 $oldnode->stop;
+
+# Cause a failure at the start of pg_upgrade, this should create the logging
+# directory pg_upgrade_output.d but leave it around.  Keep --check for an
+# early exit.
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $oldnode->data_dir,
+		'-D',         $newnode->data_dir,
+		'-b',         $oldbindir . '/does/not/exist/',
+		'-B',         $newbindir,
+		'-p',         $oldnode->port,
+		'-P',         $newnode->port,
+		'--check'
+	],
+	'run of pg_upgrade --check for new instance with incorrect binary path');
+ok(-d $newnode->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+
+# --check command works here, cleans up pg_upgrade_output.d.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',        '-d', $oldnode->data_dir,
+		'-D',         $newnode->data_dir, '-b', $oldbindir,
+		'-B',         $newbindir,         '-p', $oldnode->port,
+		'-P',         $newnode->port,     '--check'
+	],
+	'run of pg_upgrade --check for new instance');
+ok(!-d $newnode->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade --check success");
+
+# Actual run, pg_upgrade_output.d is removed at the end.
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',        '-d', $oldnode->data_dir,
@@ -221,6 +253,9 @@ command_ok(
 		'-P',         $newnode->port
 	],
 	'run of pg_upgrade for new instance');
+ok( !-d $newnode->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
 $newnode->start;
 
 # Check if there are any logs coming from pg_upgrade, that would only be
diff --git a/src/bin/pg_upgrade/util.c b/src/bin/pg_upgrade/util.c
index 1a328b4270..45ae719639 100644
--- a/src/bin/pg_upgrade/util.c
+++ b/src/bin/pg_upgrade/util.c
@@ -55,6 +55,18 @@ end_progress_output(void)
 		pg_log(PG_REPORT, "%-*s", MESSAGE_WIDTH, "");
 }
 
+/*
+ * Remove any logs generated internally.  To be used once when exiting.
+ */
+void
+cleanup_logs(void)
+{
+	fclose(log_opts.internal);
+
+	/* Remove dump and log files? */
+	if (!log_opts.retain)
+		(void) rmtree(log_opts.rootdir, true);
+}
 
 /*
  * prep_status
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 8cda8d16d1..0c46707069 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -768,7 +768,8 @@ psql --username=postgres --file=script.sql postgres
   <para>
    <application>pg_upgrade</application> creates various working files, such
    as schema dumps, stored within <literal>pg_upgrade_output.d</literal> in
-   the directory of the new cluster.
+   the directory of the new cluster. Each run creates a new subdirectory named
+   with a Unix timestamp where all the generated files are stored.
   </para>
 
   <para>
#13Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Paquier (#12)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On 5 Jun 2022, at 11:19, Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Jun 05, 2022 at 09:24:25AM +0900, Michael Paquier wrote:

Well, another error that could happen in the early code paths is
EACCES on a custom socket directory specified, and we'd still face the
same problem on a follow-up restart. Using a sub-directory structure
as Daniel and Tom mention would address all that (if ignoring EEXIST
for the BASE_OUTPUTDIR), removing any existing content from the base
path when not using --retain. This comes with the disadvantage of
bloating the disk on repeated errors, but this last bit would not
really be a huge problem, I guess, as it could be more useful to keep
the error information around.

I have been toying with the idea of a sub-directory named with a
timestamp (Unix time, like log_line_prefix's %n but this could be
any format) under pg_upgrade_output.d/ and finished with the
attached.

I was thinking more along the lines of %m to make it (more) human readable, but
I'm certainly not wedded to any format.

The logs are removed from the root path when --check is
used without --retain, like for a non-check command.

This removes all logs after a command without --retain, even if a previous
command used --retain to keep the logs around.

As a user I would expect the logs from this current invocation to be removed
without --retain, and any other older log entries be kept. I think we should
remove log_opts.logdir and only remove log_opts.rootdir if it is left empty
after .logdir is removed.

The logic in charge of cleaning up the logs has been moved to a single
routine, aka cleanup_logs().

+ cleanup_logs();

Maybe we should register cleanup_logs() as an atexit() handler once we're done
with option processing?

+	snprintf(log_opts.logdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
+			 timebuf, LOG_OUTPUTDIR);

While not introduced by this patch, it does make me uneasy that we create paths
without checking for buffer overflows..

--
Daniel Gustafsson https://vmware.com/

#14Michael Paquier
michael@paquier.xyz
In reply to: Daniel Gustafsson (#13)
1 attachment(s)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Mon, Jun 06, 2022 at 02:38:03AM +0200, Daniel Gustafsson wrote:

On 5 Jun 2022, at 11:19, Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Jun 05, 2022 at 09:24:25AM +0900, Michael Paquier wrote:

Well, another error that could happen in the early code paths is
EACCES on a custom socket directory specified, and we'd still face the
same problem on a follow-up restart. Using a sub-directory structure
as Daniel and Tom mention would address all that (if ignoring EEXIST
for the BASE_OUTPUTDIR), removing any existing content from the base
path when not using --retain. This comes with the disadvantage of
bloating the disk on repeated errors, but this last bit would not
really be a huge problem, I guess, as it could be more useful to keep
the error information around.

I have been toying with the idea of a sub-directory named with a
timestamp (Unix time, like log_line_prefix's %n but this could be
any format) under pg_upgrade_output.d/ and finished with the
attached.

I was thinking more along the lines of %m to make it (more) human readable, but
I'm certainly not wedded to any format.

Neither am I. I would not map exactly to %m as it uses whitespaces,
but something like %Y%m%d_%H%M%S.%03d (3-digit ms for last part) would
be fine? If there are other ideas for the format, just let me know.

As a user I would expect the logs from this current invocation to be removed
without --retain, and any other older log entries be kept. I think we should
remove log_opts.logdir and only remove log_opts.rootdir if it is left empty
after .logdir is removed.

Okay, however I think you mean log_opts.basedir rather than logdir?
That's simple enough to switch around as pg_check_dir() does this
job.

The logic in charge of cleaning up the logs has been moved to a single
routine, aka cleanup_logs().

+ cleanup_logs();

Maybe we should register cleanup_logs() as an atexit() handler once we're done
with option processing?

It seems to me that the original intention is to keep the logs around
on failure, hence we should only clean up things on a clean exit().
That's why I didn't add an exit callback for that.

+	snprintf(log_opts.logdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
+			 timebuf, LOG_OUTPUTDIR);

While not introduced by this patch, it does make me uneasy that we create paths
without checking for buffer overflows..

I don't mind adding such checks in those code paths. You are right
that they tend to produce longer path strings than others.
--
Michael

Attachments:

upgrade-check-logs-v3.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 6114303b52..ace7387eda 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -210,6 +210,8 @@ report_clusters_compatible(void)
 		pg_log(PG_REPORT, "\n*Clusters are compatible*\n");
 		/* stops new cluster */
 		stop_postmaster(false);
+
+		cleanup_output_dirs();
 		exit(0);
 	}
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index ecb3e1f647..5b8e3e73db 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -58,7 +58,6 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
-static void cleanup(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -204,7 +203,7 @@ main(int argc, char **argv)
 
 	pg_free(deletion_script_file_name);
 
-	cleanup();
+	cleanup_output_dirs();
 
 	return 0;
 }
@@ -221,14 +220,48 @@ make_outputdirs(char *pgdata)
 	char	  **filename;
 	time_t		run_time = time(NULL);
 	char		filename_path[MAXPGPATH];
+	char		timebuf[128];
+	struct timeval time;
+	time_t		tt;
+	int			len;
 
+	log_opts.rootdir = (char *) pg_malloc(MAXPGPATH);
+	len = snprintf(log_opts.rootdir, MAXPGPATH, "%s/%s", pgdata, BASE_OUTPUTDIR);
+	if (len >= MAXPGPATH)
+		pg_fatal("buffer for root directory too small");
+
+	/* BASE_OUTPUTDIR/$unix_timestamp/ */
+	gettimeofday(&time, NULL);
+	tt = (time_t) time.tv_sec;
+	strftime(timebuf, sizeof(timebuf), "%Y%m%d_%H%M%S", localtime(&tt));
+	snprintf(timebuf, sizeof(timebuf), "%s.%03d",
+			 timebuf, (int) (time.tv_usec / 1000));
 	log_opts.basedir = (char *) pg_malloc(MAXPGPATH);
-	snprintf(log_opts.basedir, MAXPGPATH, "%s/%s", pgdata, BASE_OUTPUTDIR);
-	log_opts.dumpdir = (char *) pg_malloc(MAXPGPATH);
-	snprintf(log_opts.dumpdir, MAXPGPATH, "%s/%s", pgdata, DUMP_OUTPUTDIR);
-	log_opts.logdir = (char *) pg_malloc(MAXPGPATH);
-	snprintf(log_opts.logdir, MAXPGPATH, "%s/%s", pgdata, LOG_OUTPUTDIR);
+	len = snprintf(log_opts.basedir, MAXPGPATH, "%s/%s", log_opts.rootdir,
+				   timebuf);
+	if (len >= MAXPGPATH)
+		pg_fatal("buffer for base directory too small");
 
+	/* BASE_OUTPUTDIR/$unix_timestamp/dump/ */
+	log_opts.dumpdir = (char *) pg_malloc(MAXPGPATH);
+	len = snprintf(log_opts.dumpdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
+				   timebuf, DUMP_OUTPUTDIR);
+	if (len >= MAXPGPATH)
+		pg_fatal("buffer for dump directory too small");
+
+	/* BASE_OUTPUTDIR/$unix_timestamp/log/ */
+	log_opts.logdir = (char *) pg_malloc(MAXPGPATH);
+	len = snprintf(log_opts.logdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
+				   timebuf, LOG_OUTPUTDIR);
+	if (len >= MAXPGPATH)
+		pg_fatal("buffer for log directory too small");
+
+	/*
+	 * Ignore the error case where the root path exists, as it is kept the
+	 * same across runs.
+	 */
+	if (mkdir(log_opts.rootdir, pg_dir_create_mode) && errno != EEXIST)
+		pg_fatal("could not create directory \"%s\": %m\n", log_opts.rootdir);
 	if (mkdir(log_opts.basedir, pg_dir_create_mode))
 		pg_fatal("could not create directory \"%s\": %m\n", log_opts.basedir);
 	if (mkdir(log_opts.dumpdir, pg_dir_create_mode))
@@ -745,14 +778,3 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
-
-
-static void
-cleanup(void)
-{
-	fclose(log_opts.internal);
-
-	/* Remove dump and log files? */
-	if (!log_opts.retain)
-		(void) rmtree(log_opts.basedir, true);
-}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 86d3dc46fa..895137e732 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -30,12 +30,14 @@
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
 
 /*
- * Base directories that include all the files generated internally,
- * from the root path of the new cluster.
+ * Base directories that include all the files generated internally, from the
+ * root path of the new cluster.  The paths are dynamically built as of
+ * BASE_OUTPUTDIR/$unix_timestamp/{LOG_OUTPUTDIR,DUMP_OUTPUTDIR} to ensure
+ * their uniqueness in each run.
  */
 #define BASE_OUTPUTDIR		"pg_upgrade_output.d"
-#define LOG_OUTPUTDIR		BASE_OUTPUTDIR "/log"
-#define DUMP_OUTPUTDIR		BASE_OUTPUTDIR "/dump"
+#define LOG_OUTPUTDIR		 "log"
+#define DUMP_OUTPUTDIR		 "dump"
 
 #define DB_DUMP_LOG_FILE_MASK	"pg_upgrade_dump_%u.log"
 #define SERVER_LOG_FILE		"pg_upgrade_server.log"
@@ -276,7 +278,8 @@ typedef struct
 	bool		verbose;		/* true -> be verbose in messages */
 	bool		retain;			/* retain log files on success */
 	/* Set of internal directories for output files */
-	char	   *basedir;		/* Base output directory */
+	char	   *rootdir;		/* Root directory, aka pg_upgrade_output.d */
+	char	   *basedir;		/* Base output directory, with timestamp */
 	char	   *dumpdir;		/* Dumps */
 	char	   *logdir;			/* Log files */
 	bool		isatty;			/* is stdout a tty */
@@ -432,6 +435,7 @@ void		report_status(eLogType type, const char *fmt,...) pg_attribute_printf(2, 3
 void		pg_log(eLogType type, const char *fmt,...) pg_attribute_printf(2, 3);
 void		pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2) pg_attribute_noreturn();
 void		end_progress_output(void);
+void		cleanup_output_dirs(void);
 void		prep_status(const char *fmt,...) pg_attribute_printf(1, 2);
 void		prep_status_progress(const char *fmt,...) pg_attribute_printf(1, 2);
 unsigned int str2uint(const char *str);
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 55c7354ba2..f4c400a93e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -5,6 +5,7 @@ use warnings;
 use Cwd qw(abs_path);
 use File::Basename qw(dirname);
 use File::Compare;
+use File::Path qw(rmtree);
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
@@ -213,6 +214,39 @@ chdir ${PostgreSQL::Test::Utils::tmp_check};
 
 # Upgrade the instance.
 $oldnode->stop;
+
+# Cause a failure at the start of pg_upgrade, this should create the logging
+# directory pg_upgrade_output.d but leave it around.  Keep --check for an
+# early exit.
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $oldnode->data_dir,
+		'-D',         $newnode->data_dir,
+		'-b',         $oldbindir . '/does/not/exist/',
+		'-B',         $newbindir,
+		'-p',         $oldnode->port,
+		'-P',         $newnode->port,
+		'--check'
+	],
+	'run of pg_upgrade --check for new instance with incorrect binary path');
+ok(-d $newnode->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+rmtree($newnode->data_dir . "/pg_upgrade_output.d");
+
+# --check command works here, cleans up pg_upgrade_output.d.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',        '-d', $oldnode->data_dir,
+		'-D',         $newnode->data_dir, '-b', $oldbindir,
+		'-B',         $newbindir,         '-p', $oldnode->port,
+		'-P',         $newnode->port,     '--check'
+	],
+	'run of pg_upgrade --check for new instance');
+ok( !-d $newnode->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade --check success");
+
+# Actual run, pg_upgrade_output.d is removed at the end.
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',        '-d', $oldnode->data_dir,
@@ -221,6 +255,9 @@ command_ok(
 		'-P',         $newnode->port
 	],
 	'run of pg_upgrade for new instance');
+ok( !-d $newnode->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
 $newnode->start;
 
 # Check if there are any logs coming from pg_upgrade, that would only be
diff --git a/src/bin/pg_upgrade/util.c b/src/bin/pg_upgrade/util.c
index 1a328b4270..f25e14c321 100644
--- a/src/bin/pg_upgrade/util.c
+++ b/src/bin/pg_upgrade/util.c
@@ -55,6 +55,48 @@ end_progress_output(void)
 		pg_log(PG_REPORT, "%-*s", MESSAGE_WIDTH, "");
 }
 
+/*
+ * Remove any logs generated internally.  To be used once when exiting.
+ */
+void
+cleanup_output_dirs(void)
+{
+	fclose(log_opts.internal);
+
+	/* Remove dump and log files? */
+	if (log_opts.retain)
+		return;
+
+	(void) rmtree(log_opts.basedir, true);
+
+	/* Remove pg_upgrade_output.d only if empty */
+	switch (pg_check_dir(log_opts.rootdir))
+	{
+		case 0:					/* non-existent */
+		case 3:					/* exists and contains a mount point */
+			Assert(false);
+			break;
+
+		case 1:					/* exists and empty */
+		case 2:					/* exists and contains only dot files */
+			(void) rmtree(log_opts.rootdir, true);
+			break;
+
+		case 4:					/* exists */
+
+			/*
+			 * Keep the root directory as this includes some past log
+			 * activity.
+			 */
+			break;
+
+		default:
+			/* different failure, just report it */
+			pg_log(PG_WARNING, "could not access directory \"%s\": %m",
+				   log_opts.rootdir);
+			break;
+	}
+}
 
 /*
  * prep_status
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 8cda8d16d1..5b6d92c2f8 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -768,7 +768,9 @@ psql --username=postgres --file=script.sql postgres
   <para>
    <application>pg_upgrade</application> creates various working files, such
    as schema dumps, stored within <literal>pg_upgrade_output.d</literal> in
-   the directory of the new cluster.
+   the directory of the new cluster. Each run creates a new subdirectory named
+   with a timestamp, as of <literal>%Y%m%d_%H%M%S.{milli_seconds}</literal>
+   where all the generated files are stored.
   </para>
 
   <para>
#15Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Paquier (#14)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On 6 Jun 2022, at 06:17, Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Jun 06, 2022 at 02:38:03AM +0200, Daniel Gustafsson wrote:

On 5 Jun 2022, at 11:19, Michael Paquier <michael@paquier.xyz> wrote:

I have been toying with the idea of a sub-directory named with a
timestamp (Unix time, like log_line_prefix's %n but this could be
any format) under pg_upgrade_output.d/ and finished with the
attached.

I was thinking more along the lines of %m to make it (more) human readable, but
I'm certainly not wedded to any format.

Neither am I. I would not map exactly to %m as it uses whitespaces,
but something like %Y%m%d_%H%M%S.%03d (3-digit ms for last part) would
be fine? If there are other ideas for the format, just let me know.

I think this makes more sense from an end-user perspective.

As a user I would expect the logs from this current invocation to be removed
without --retain, and any other older log entries be kept. I think we should
remove log_opts.logdir and only remove log_opts.rootdir if it is left empty
after .logdir is removed.

Okay, however I think you mean log_opts.basedir rather than logdir?
That's simple enough to switch around as pg_check_dir() does this
job.

Correct, I mistyped. The cleanup in this version of the patch looks sane to
me.

--
Daniel Gustafsson https://vmware.com/

#16Justin Pryzby
pryzby@telsasoft.com
In reply to: Daniel Gustafsson (#15)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Mon, Jun 06, 2022 at 07:43:53PM +0200, Daniel Gustafsson wrote:

On 6 Jun 2022, at 06:17, Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Jun 06, 2022 at 02:38:03AM +0200, Daniel Gustafsson wrote:

On 5 Jun 2022, at 11:19, Michael Paquier <michael@paquier.xyz> wrote:

I have been toying with the idea of a sub-directory named with a
timestamp (Unix time, like log_line_prefix's %n but this could be
any format) under pg_upgrade_output.d/ and finished with the
attached.

I was thinking more along the lines of %m to make it (more) human readable, but
I'm certainly not wedded to any format.

It seems important to use a format in most-significant-parts-first which sorts
nicely by filename, but otherwise anything could be okay.

Neither am I. I would not map exactly to %m as it uses whitespaces,
but something like %Y%m%d_%H%M%S.%03d (3-digit ms for last part) would
be fine? If there are other ideas for the format, just let me know.

I think this makes more sense from an end-user perspective.

Is it better to use "T" instead of "_" ?

Apparently, that's ISO 8601, which can optionally use separators
(YYYY-MM-DDTHH:MM:SS).

https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations

I was thinking this would not include fractional seconds. Maybe that would
mean that the TAP tests would need to sleep(1) at some points...

--
Justin

#17Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#16)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Mon, Jun 06, 2022 at 01:53:35PM -0500, Justin Pryzby wrote:

It seems important to use a format in most-significant-parts-first which sorts
nicely by filename, but otherwise anything could be okay.

Agreed.

Apparently, that's ISO 8601, which can optionally use separators
(YYYY-MM-DDTHH:MM:SS).

OK, let's use a T, with the basic format and a minimal number of
separators then, we get 20220603T082255.

I was thinking this would not include fractional seconds. Maybe that would
mean that the TAP tests would need to sleep(1) at some points...

If we don't split by the millisecond, we would come back to the
problems of the original report. On my laptop, the --check phase
that passes takes more than 1s, but the one that fails takes 0.1s, so
a follow-up run would complain with the path conflicts. So at the end
I would reduce the format to be YYYYMMDDTHHMMSS_ms (we could also use
a logic that checks for conflicts and appends an extra number if
needed, though the addition of the extra ms is a bit shorter).
--
Michael

#18Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#17)
1 attachment(s)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Tue, Jun 07, 2022 at 08:30:47AM +0900, Michael Paquier wrote:

If we don't split by the millisecond, we would come back to the
problems of the original report. On my laptop, the --check phase
that passes takes more than 1s, but the one that fails takes 0.1s, so
a follow-up run would complain with the path conflicts. So at the end
I would reduce the format to be YYYYMMDDTHHMMSS_ms (we could also use
a logic that checks for conflicts and appends an extra number if
needed, though the addition of the extra ms is a bit shorter).

So, attached is the patch I would like to apply for all that (commit
message included). One issue I missed previously is that the TAP test
missed the log files on failure, so I had to tweak that with a find
routine. I have fixed a few comments, and improved the docs to
describe the directory structure.

We are still need a refresh of the buildfarm client for the case where
pg_upgrade is tested without TAP, like that I guess:
--- a/PGBuild/Modules/TestUpgrade.pm
+++ b/PGBuild/Modules/TestUpgrade.pm
@@ -140,6 +140,7 @@ sub check
          $self->{pgsql}/src/bin/pg_upgrade/log/*
          $self->{pgsql}/src/bin/pg_upgrade/tmp_check/*/*.diffs
          $self->{pgsql}/src/bin/pg_upgrade/tmp_check/data/pg_upgrade_output.d/log/*
+         $self->{pgsql}/src/bin/pg_upgrade/tmp_check/data/pg_upgrade_output.d/*/log/*
          $self->{pgsql}/src/test/regress/*.diffs"
 	);
 	$log->add_log($_) foreach (@logfiles);
--
Michael

Attachments:

v3-0001-Restructure-pg_upgrade-output-directories-for-bet.patchtext/x-diff; charset=us-asciiDownload
From a865e736951814d0b7aeee2b93ac4e87d355af0f Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 7 Jun 2022 11:29:31 +0900
Subject: [PATCH v3] Restructure pg_upgrade output directories for better
 idempotence

38bfae3 has moved the contents written to files by pg_upgrade under a
new directory called pg_upgrade_output.d/ located in the new cluster's
data folder, and it used a simple structure made of two subdirectories
leading to a fixed structure: log/ and dump/.  This design has made
weaker pg_upgrade on repeated calls, as we could get failures when
creating one or more of those directories, while potentially losing the
logs of a previous run (logs are retained automatically on failure, and
cleaned up on success unless --retain is specified).  So a user would
need to clean up pg_upgrade_output.d/ as an extra step for any repeated
calls of pg_upgrade.  The most common scenario here is --check followed
by the actual upgrade, but one could see the failure when specifying an
incorrect argument value.  Removing entirely the logs would have the
disadvantage of removing all the past information, even if --retain was
specified at some past step.

This result is annoying for a lot of users and automated upgrade flows.
So, tather than requiring a manual removal of pg_upgrade_output.d/, this
redesigns the set of output directories in a more dynamic way, based on
a suggestion from Tom Lane and Daniel Gustafsson.  pg_upgrade_output.d/
is still the base path, but a second directory level is added, mostly
named after an ISO-8601-formatted timestamp (in short human-readable,
with milliseconds appended to the name to avoid any conflicts).  The
logs and dumps are saved within the same subdirectories as previously,
as of log/ and dump/, but these are located inside the subdirectory
named after the timestamp.

The logs of a given run are removed only after a successful run if
--retain is not used, and pg_upgrade_output.d/ is kept if there are any
logs from a previous run.  Note that previously, pg_upgrade would have
kept the logs even after a successful --check but that was inconsistent
compared to the case without --check.  The code is charge of the removal
of the output directories is now refactored into a single routine.

Two TAP tests are added with a --check command (one failure and one
success), to look after the case fixed here.  Note that the test had to
be tweaked a bit to fit with the new directory structure so as it can
find any logs generated on failure.  This is still going to require a
change in the buildfarm client for the case where pg_upgrade is tested
without the TAP test, though.

Reported-by: Tushar Ahuja
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson, Justin Pryzby
Discussion: https://postgr.es/m/77e6ecaa-2785-97aa-f229-4b6e047cbd2b@enterprisedb.com
---
 src/bin/pg_upgrade/check.c             |  2 +
 src/bin/pg_upgrade/pg_upgrade.c        | 65 +++++++++++++++++---------
 src/bin/pg_upgrade/pg_upgrade.h        | 14 ++++--
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 49 ++++++++++++++++++-
 src/bin/pg_upgrade/util.c              | 42 +++++++++++++++++
 doc/src/sgml/ref/pgupgrade.sgml        |  5 +-
 6 files changed, 148 insertions(+), 29 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 6114303b52..ace7387eda 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -210,6 +210,8 @@ report_clusters_compatible(void)
 		pg_log(PG_REPORT, "\n*Clusters are compatible*\n");
 		/* stops new cluster */
 		stop_postmaster(false);
+
+		cleanup_output_dirs();
 		exit(0);
 	}
 
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index ecb3e1f647..254c46d27a 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -58,7 +58,6 @@ static void copy_xact_xlog_xid(void);
 static void set_frozenxids(bool minmxid_only);
 static void make_outputdirs(char *pgdata);
 static void setup(char *argv0, bool *live_check);
-static void cleanup(void);
 
 ClusterInfo old_cluster,
 			new_cluster;
@@ -204,7 +203,7 @@ main(int argc, char **argv)
 
 	pg_free(deletion_script_file_name);
 
-	cleanup();
+	cleanup_output_dirs();
 
 	return 0;
 }
@@ -221,19 +220,54 @@ make_outputdirs(char *pgdata)
 	char	  **filename;
 	time_t		run_time = time(NULL);
 	char		filename_path[MAXPGPATH];
+	char		timebuf[128];
+	struct timeval time;
+	time_t		tt;
+	int			len;
 
+	log_opts.rootdir = (char *) pg_malloc(MAXPGPATH);
+	len = snprintf(log_opts.rootdir, MAXPGPATH, "%s/%s", pgdata, BASE_OUTPUTDIR);
+	if (len >= MAXPGPATH)
+		pg_fatal("buffer for root directory too small");
+
+	/* BASE_OUTPUTDIR/$timestamp/ */
+	gettimeofday(&time, NULL);
+	tt = (time_t) time.tv_sec;
+	strftime(timebuf, sizeof(timebuf), "%Y%m%dT%H%M%S", localtime(&tt));
+	/* append milliseconds */
+	snprintf(timebuf, sizeof(timebuf), "%s_%03d",
+			 timebuf, (int) (time.tv_usec / 1000));
 	log_opts.basedir = (char *) pg_malloc(MAXPGPATH);
-	snprintf(log_opts.basedir, MAXPGPATH, "%s/%s", pgdata, BASE_OUTPUTDIR);
-	log_opts.dumpdir = (char *) pg_malloc(MAXPGPATH);
-	snprintf(log_opts.dumpdir, MAXPGPATH, "%s/%s", pgdata, DUMP_OUTPUTDIR);
-	log_opts.logdir = (char *) pg_malloc(MAXPGPATH);
-	snprintf(log_opts.logdir, MAXPGPATH, "%s/%s", pgdata, LOG_OUTPUTDIR);
+	len = snprintf(log_opts.basedir, MAXPGPATH, "%s/%s", log_opts.rootdir,
+				   timebuf);
+	if (len >= MAXPGPATH)
+		pg_fatal("buffer for base directory too small");
 
-	if (mkdir(log_opts.basedir, pg_dir_create_mode))
+	/* BASE_OUTPUTDIR/$timestamp/dump/ */
+	log_opts.dumpdir = (char *) pg_malloc(MAXPGPATH);
+	len = snprintf(log_opts.dumpdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
+				   timebuf, DUMP_OUTPUTDIR);
+	if (len >= MAXPGPATH)
+		pg_fatal("buffer for dump directory too small");
+
+	/* BASE_OUTPUTDIR/$timestamp/log/ */
+	log_opts.logdir = (char *) pg_malloc(MAXPGPATH);
+	len = snprintf(log_opts.logdir, MAXPGPATH, "%s/%s/%s", log_opts.rootdir,
+				   timebuf, LOG_OUTPUTDIR);
+	if (len >= MAXPGPATH)
+		pg_fatal("buffer for log directory too small");
+
+	/*
+	 * Ignore the error case where the root path exists, as it is kept the
+	 * same across runs.
+	 */
+	if (mkdir(log_opts.rootdir, pg_dir_create_mode) < 0 && errno != EEXIST)
+		pg_fatal("could not create directory \"%s\": %m\n", log_opts.rootdir);
+	if (mkdir(log_opts.basedir, pg_dir_create_mode) < 0)
 		pg_fatal("could not create directory \"%s\": %m\n", log_opts.basedir);
-	if (mkdir(log_opts.dumpdir, pg_dir_create_mode))
+	if (mkdir(log_opts.dumpdir, pg_dir_create_mode) < 0)
 		pg_fatal("could not create directory \"%s\": %m\n", log_opts.dumpdir);
-	if (mkdir(log_opts.logdir, pg_dir_create_mode))
+	if (mkdir(log_opts.logdir, pg_dir_create_mode) < 0)
 		pg_fatal("could not create directory \"%s\": %m\n", log_opts.logdir);
 
 	snprintf(filename_path, sizeof(filename_path), "%s/%s", log_opts.logdir,
@@ -745,14 +779,3 @@ set_frozenxids(bool minmxid_only)
 
 	check_ok();
 }
-
-
-static void
-cleanup(void)
-{
-	fclose(log_opts.internal);
-
-	/* Remove dump and log files? */
-	if (!log_opts.retain)
-		(void) rmtree(log_opts.basedir, true);
-}
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 86d3dc46fa..a3df419a1d 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -30,12 +30,14 @@
 #define DB_DUMP_FILE_MASK	"pg_upgrade_dump_%u.custom"
 
 /*
- * Base directories that include all the files generated internally,
- * from the root path of the new cluster.
+ * Base directories that include all the files generated internally, from the
+ * root path of the new cluster.  The paths are dynamically built as of
+ * BASE_OUTPUTDIR/$timestamp/{LOG_OUTPUTDIR,DUMP_OUTPUTDIR} to ensure thier
+ * uniqueness in each run.
  */
 #define BASE_OUTPUTDIR		"pg_upgrade_output.d"
-#define LOG_OUTPUTDIR		BASE_OUTPUTDIR "/log"
-#define DUMP_OUTPUTDIR		BASE_OUTPUTDIR "/dump"
+#define LOG_OUTPUTDIR		 "log"
+#define DUMP_OUTPUTDIR		 "dump"
 
 #define DB_DUMP_LOG_FILE_MASK	"pg_upgrade_dump_%u.log"
 #define SERVER_LOG_FILE		"pg_upgrade_server.log"
@@ -276,7 +278,8 @@ typedef struct
 	bool		verbose;		/* true -> be verbose in messages */
 	bool		retain;			/* retain log files on success */
 	/* Set of internal directories for output files */
-	char	   *basedir;		/* Base output directory */
+	char	   *rootdir;		/* Root directory, aka pg_upgrade_output.d */
+	char	   *basedir;		/* Base output directory, with timestamp */
 	char	   *dumpdir;		/* Dumps */
 	char	   *logdir;			/* Log files */
 	bool		isatty;			/* is stdout a tty */
@@ -432,6 +435,7 @@ void		report_status(eLogType type, const char *fmt,...) pg_attribute_printf(2, 3
 void		pg_log(eLogType type, const char *fmt,...) pg_attribute_printf(2, 3);
 void		pg_fatal(const char *fmt,...) pg_attribute_printf(1, 2) pg_attribute_noreturn();
 void		end_progress_output(void);
+void		cleanup_output_dirs(void);
 void		prep_status(const char *fmt,...) pg_attribute_printf(1, 2);
 void		prep_status_progress(const char *fmt,...) pg_attribute_printf(1, 2);
 unsigned int str2uint(const char *str);
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 55c7354ba2..3f11540e18 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -5,6 +5,8 @@ use warnings;
 use Cwd qw(abs_path);
 use File::Basename qw(dirname);
 use File::Compare;
+use File::Find qw(find);
+use File::Path qw(rmtree);
 
 use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
@@ -213,6 +215,39 @@ chdir ${PostgreSQL::Test::Utils::tmp_check};
 
 # Upgrade the instance.
 $oldnode->stop;
+
+# Cause a failure at the start of pg_upgrade, this should create the logging
+# directory pg_upgrade_output.d but leave it around.  Keep --check for an
+# early exit.
+command_fails(
+	[
+		'pg_upgrade', '--no-sync',
+		'-d',         $oldnode->data_dir,
+		'-D',         $newnode->data_dir,
+		'-b',         $oldbindir . '/does/not/exist/',
+		'-B',         $newbindir,
+		'-p',         $oldnode->port,
+		'-P',         $newnode->port,
+		'--check'
+	],
+	'run of pg_upgrade --check for new instance with incorrect binary path');
+ok(-d $newnode->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade failure");
+rmtree($newnode->data_dir . "/pg_upgrade_output.d");
+
+# --check command works here, cleans up pg_upgrade_output.d.
+command_ok(
+	[
+		'pg_upgrade', '--no-sync',        '-d', $oldnode->data_dir,
+		'-D',         $newnode->data_dir, '-b', $oldbindir,
+		'-B',         $newbindir,         '-p', $oldnode->port,
+		'-P',         $newnode->port,     '--check'
+	],
+	'run of pg_upgrade --check for new instance');
+ok( !-d $newnode->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ not removed after pg_upgrade --check success");
+
+# Actual run, pg_upgrade_output.d is removed at the end.
 command_ok(
 	[
 		'pg_upgrade', '--no-sync',        '-d', $oldnode->data_dir,
@@ -221,14 +256,24 @@ command_ok(
 		'-P',         $newnode->port
 	],
 	'run of pg_upgrade for new instance');
+ok( !-d $newnode->data_dir . "/pg_upgrade_output.d",
+	"pg_upgrade_output.d/ removed after pg_upgrade success");
+
 $newnode->start;
 
 # Check if there are any logs coming from pg_upgrade, that would only be
 # retained on failure.
-my $log_path = $newnode->data_dir . "/pg_upgrade_output.d/log";
+my $log_path = $newnode->data_dir . "/pg_upgrade_output.d";
 if (-d $log_path)
 {
-	foreach my $log (glob("$log_path/*"))
+	my @log_files;
+	find(
+		sub {
+			push @log_files, $File::Find::name
+			  if $File::Find::name =~ m/.*\.log/;
+		},
+		$newnode->data_dir . "/pg_upgrade_output.d");
+	foreach my $log (@log_files)
 	{
 		note "=== contents of $log ===\n";
 		print slurp_file($log);
diff --git a/src/bin/pg_upgrade/util.c b/src/bin/pg_upgrade/util.c
index 1a328b4270..f25e14c321 100644
--- a/src/bin/pg_upgrade/util.c
+++ b/src/bin/pg_upgrade/util.c
@@ -55,6 +55,48 @@ end_progress_output(void)
 		pg_log(PG_REPORT, "%-*s", MESSAGE_WIDTH, "");
 }
 
+/*
+ * Remove any logs generated internally.  To be used once when exiting.
+ */
+void
+cleanup_output_dirs(void)
+{
+	fclose(log_opts.internal);
+
+	/* Remove dump and log files? */
+	if (log_opts.retain)
+		return;
+
+	(void) rmtree(log_opts.basedir, true);
+
+	/* Remove pg_upgrade_output.d only if empty */
+	switch (pg_check_dir(log_opts.rootdir))
+	{
+		case 0:					/* non-existent */
+		case 3:					/* exists and contains a mount point */
+			Assert(false);
+			break;
+
+		case 1:					/* exists and empty */
+		case 2:					/* exists and contains only dot files */
+			(void) rmtree(log_opts.rootdir, true);
+			break;
+
+		case 4:					/* exists */
+
+			/*
+			 * Keep the root directory as this includes some past log
+			 * activity.
+			 */
+			break;
+
+		default:
+			/* different failure, just report it */
+			pg_log(PG_WARNING, "could not access directory \"%s\": %m",
+				   log_opts.rootdir);
+			break;
+	}
+}
 
 /*
  * prep_status
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 8cda8d16d1..c60fdf9d67 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -768,7 +768,10 @@ psql --username=postgres --file=script.sql postgres
   <para>
    <application>pg_upgrade</application> creates various working files, such
    as schema dumps, stored within <literal>pg_upgrade_output.d</literal> in
-   the directory of the new cluster.
+   the directory of the new cluster. Each run creates a new subdirectory named
+   with a timestamp formatted as per ISO 8601
+   (<literal>%Y%m%dT%H%M%S</literal>) appended by an underscore and
+   the timestamp's milliseconds, where all the generated files are stored.
   </para>
 
   <para>
-- 
2.36.1

#19Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#18)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

tather => rather
is charge => in charge

On Mon, Jun 06, 2022 at 01:17:52PM +0900, Michael Paquier wrote:

but something like %Y%m%d_%H%M%S.%03d (3-digit ms for last part) would

On Tue, Jun 07, 2022 at 08:30:47AM +0900, Michael Paquier wrote:

I would reduce the format to be YYYYMMDDTHHMMSS_ms (we could also use

I think it's better with a dot (HHMMSS.ms) rather than underscore (HHMMSS_ms).

On Tue, Jun 07, 2022 at 11:42:37AM +0900, Michael Paquier wrote:

+	/* append milliseconds */
+	snprintf(timebuf, sizeof(timebuf), "%s_%03d",
+			 timebuf, (int) (time.tv_usec / 1000));
+   with a timestamp formatted as per ISO 8601
+   (<literal>%Y%m%dT%H%M%S</literal>) appended by an underscore and
+   the timestamp's milliseconds, where all the generated files are stored.

The ISO timestamp can include milliseconds (or apparently fractional parts of
the "lowest-order" unit), so the "appended by" part doesn't need to be
explained here.

+       snprintf(timebuf, sizeof(timebuf), "%s_%03d",
+                        timebuf, (int) (time.tv_usec / 1000));

Is it really allowed to sprintf a buffer onto itself ?
I can't find any existing cases doing that.

It seems useless in any case - you could instead
snprintf(timebuf+strlen(timebuf), or increment len+=snprintf()...

Or append the milliseconds here:

+       len = snprintf(log_opts.basedir, MAXPGPATH, "%s/%s", log_opts.rootdir,
+                                  timebuf);

--
Justin

#20Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#19)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Mon, Jun 06, 2022 at 10:11:48PM -0500, Justin Pryzby wrote:

tather => rather
is charge => in charge

Thanks for the extra read. Fixed. There was an extra one in the
comments, as of s/thier/their/.

I think it's better with a dot (HHMMSS.ms) rather than underscore (HHMMSS_ms).

The ISO timestamp can include milliseconds (or apparently fractional parts of
the "lowest-order" unit), so the "appended by" part doesn't need to be
explained here.

+       snprintf(timebuf, sizeof(timebuf), "%s_%03d",
+                        timebuf, (int) (time.tv_usec / 1000));

Is it really allowed to sprintf a buffer onto itself ?
I can't find any existing cases doing that.

Yes, there is no need to do that, so I have just appended the ms
digits to the end of the string.

And applied, to take care of this open item.
--
Michael

#21Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#20)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Wed, Jun 08, 2022 at 10:55:29AM +0900, Michael Paquier wrote:

And applied, to take care of this open item.

Shouldn't this wait for the buildfarm to be updated again ?

--
Justin

#22Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#21)
1 attachment(s)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Wed, Jun 08, 2022 at 04:13:37PM -0500, Justin Pryzby wrote:

On Wed, Jun 08, 2022 at 10:55:29AM +0900, Michael Paquier wrote:

And applied, to take care of this open item.

Shouldn't this wait for the buildfarm to be updated again ?

The TAP logic is able to find any logs by itself on failure, so what
would be impacted is the case of the tests running pg_upgrade via the
past route in TestUpgrade.pm (it had better not run in the buildfarm
client for 15~ and I am wondering if it would be worth backpatching
the TAP test once it brews a bit more). Anyway, seeing my time sheet
for the next couple of days coupled with a potential beta2 in the very
short term and with the broken upgrade workflow, I have given priority
to fix the issue because that's what impacts directly people looking
at 15 and testing their upgrades, which is what Tushar did.

Saying that, I have already sent a pull request to the buildfarm repo
to refresh the set of logs, as of the patch attached. This updates
the logic so as this would work for any changes in the structure of
pg_upgrade_output.d/, fetching any files prefixed by ".log".
--
Michael

Attachments:

0001-Update-TestUpgrade.pm-to-grab-for-any-existing-log-f.patchtext/x-diff; charset=us-asciiDownload
From c5cf14647c7acc5c4f704bb788a89afa0f5c5732 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 9 Jun 2022 09:36:18 +0900
Subject: [PATCH] Update TestUpgrade.pm to grab for any existing log files with
 Postgres 15~

Upstream has changed the location of the log files generated by
pg_upgrade to be in a directory located within the target cluster, named
pg_upgrade_output.d.  Its structure has changed to use more
subdirectories, and glob() is not really able to cope with that.  This
modifies the logic grabbing the log files to use "find" and grab all the
files within pg_upgrade_output.d that are prefixed with ".log".
---
 PGBuild/Modules/TestUpgrade.pm | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/PGBuild/Modules/TestUpgrade.pm b/PGBuild/Modules/TestUpgrade.pm
index f193251..1e0a1e3 100644
--- a/PGBuild/Modules/TestUpgrade.pm
+++ b/PGBuild/Modules/TestUpgrade.pm
@@ -18,6 +18,7 @@ use PGBuild::SCM;
 use PGBuild::Utils qw(:DEFAULT $steps_completed);
 
 use File::Basename;
+use File::Find;
 
 use strict;
 use warnings;
@@ -139,9 +140,18 @@ sub check
          $self->{pgsql}/src/bin/pg_upgrade/*.log
          $self->{pgsql}/src/bin/pg_upgrade/log/*
          $self->{pgsql}/src/bin/pg_upgrade/tmp_check/*/*.diffs
-         $self->{pgsql}/src/bin/pg_upgrade/tmp_check/data/pg_upgrade_output.d/log/*
          $self->{pgsql}/src/test/regress/*.diffs"
 	);
+
+	# Extra location of logs, changed as per Postgres 15~ with multiple
+	# levels of subdirectories.
+	find(
+		sub {
+			push @logfiles, $File::Find::name
+				if $File::Find::name =~ m/.*\.log/;
+		},
+		"$self->{pgsql}/src/bin/pg_upgrade/tmp_check/data/pg_upgrade_output.d/");
+
 	$log->add_log($_) foreach (@logfiles);
 
 	my $status = $? >> 8;
-- 
2.36.1

#23Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#22)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On 2022-06-08 We 20:53, Michael Paquier wrote:

On Wed, Jun 08, 2022 at 04:13:37PM -0500, Justin Pryzby wrote:

On Wed, Jun 08, 2022 at 10:55:29AM +0900, Michael Paquier wrote:

And applied, to take care of this open item.

Shouldn't this wait for the buildfarm to be updated again ?

The TAP logic is able to find any logs by itself on failure, so what
would be impacted is the case of the tests running pg_upgrade via the
past route in TestUpgrade.pm (it had better not run in the buildfarm
client for 15~ and I am wondering if it would be worth backpatching
the TAP test once it brews a bit more). Anyway, seeing my time sheet
for the next couple of days coupled with a potential beta2 in the very
short term and with the broken upgrade workflow, I have given priority
to fix the issue because that's what impacts directly people looking
at 15 and testing their upgrades, which is what Tushar did.

Saying that, I have already sent a pull request to the buildfarm repo
to refresh the set of logs, as of the patch attached. This updates
the logic so as this would work for any changes in the structure of
pg_upgrade_output.d/, fetching any files prefixed by ".log".

The module is already a noop if there's a TAP test for pg_upgrade. So I
don't understand the point of the PR at all.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#24Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#23)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On Fri, Jun 10, 2022 at 05:45:11PM -0400, Andrew Dunstan wrote:

The module is already a noop if there's a TAP test for pg_upgrade. So I
don't understand the point of the PR at all.

Oh. I thought that the old path was still taken as long as
--enable-tap-tests was not used. I was wrong, then. I'll go and
remove the pull request.
--
Michael

#25Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#24)
Re: [v15 beta] pg_upgrade failed if earlier executed with -c switch

On 2022-06-13 Mo 22:50, Michael Paquier wrote:

On Fri, Jun 10, 2022 at 05:45:11PM -0400, Andrew Dunstan wrote:

The module is already a noop if there's a TAP test for pg_upgrade. So I
don't understand the point of the PR at all.

Oh. I thought that the old path was still taken as long as
--enable-tap-tests was not used. I was wrong, then. I'll go and
remove the pull request.

It did that from 2018 (826d450), but from 2021(691e649) all it does is
look for the TAP test subdirectory. The old logic is still there
redundantly, so I'll remove it to clean up confusion.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com