pg_upgrade should truncate/remove its logs before running

Started by Justin Pryzbyover 4 years ago53 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

I have seen this numerous times but had not dug into it, until now.

If pg_upgrade fails and is re-run, it appends to its logfiles, which is
confusing since, if it fails again, it then looks like the original error
recurred and wasn't fixed. The "append" behavior dates back to 717f6d608.

I think it should either truncate the logfiles, or error early if any of the
files exist. Or it could put all its output files into a newly-created
subdirectory. Or this message could be output to the per-db logfiles, and not
just the static ones:
| "pg_upgrade run on %s".

For the per-db logfiels with OIDs in their name, changing open() from "append"
mode to truncate mode doesn't work, since they're written to in parallel.
They have to be removed/truncated in advance.

This is one possible fix. You can test its effect by deliberately breaking one
of the calls to exec_progs(), like this.

-  "\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+  "\"%s/pg_restore\" %s %s --exit-on-error --verboose "

Attachments:

0001-pg_upgrade-fail-if-logfiles-exist.patchtext/x-diff; charset=us-asciiDownload+22-1
#2Bruce Momjian
bruce@momjian.us
In reply to: Justin Pryzby (#1)
Re: pg_upgrade should truncate/remove its logs before running

On Sat, Dec 11, 2021 at 08:50:17PM -0600, Justin Pryzby wrote:

I have seen this numerous times but had not dug into it, until now.

If pg_upgrade fails and is re-run, it appends to its logfiles, which is
confusing since, if it fails again, it then looks like the original error
recurred and wasn't fixed. The "append" behavior dates back to 717f6d608.

I think it should either truncate the logfiles, or error early if any of the
files exist. Or it could put all its output files into a newly-created
subdirectory. Or this message could be output to the per-db logfiles, and not
just the static ones:
| "pg_upgrade run on %s".

For the per-db logfiels with OIDs in their name, changing open() from "append"
mode to truncate mode doesn't work, since they're written to in parallel.
They have to be removed/truncated in advance.

This is one possible fix. You can test its effect by deliberately breaking one
of the calls to exec_progs(), like this.

-  "\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+  "\"%s/pg_restore\" %s %s --exit-on-error --verboose "

Uh, the database server doesn't erase its logs on crash/failure, so why
should pg_upgrade do that?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Bruce Momjian (#2)
Re: pg_upgrade should truncate/remove its logs before running

On Wed, Dec 15, 2021 at 04:09:16PM -0500, Bruce Momjian wrote:

On Sat, Dec 11, 2021 at 08:50:17PM -0600, Justin Pryzby wrote:

I have seen this numerous times but had not dug into it, until now.

If pg_upgrade fails and is re-run, it appends to its logfiles, which is
confusing since, if it fails again, it then looks like the original error
recurred and wasn't fixed. The "append" behavior dates back to 717f6d608.

I think it should either truncate the logfiles, or error early if any of the
files exist. Or it could put all its output files into a newly-created
subdirectory. Or this message could be output to the per-db logfiles, and not
just the static ones:
| "pg_upgrade run on %s".

For the per-db logfiels with OIDs in their name, changing open() from "append"
mode to truncate mode doesn't work, since they're written to in parallel.
They have to be removed/truncated in advance.

This is one possible fix. You can test its effect by deliberately breaking one
of the calls to exec_progs(), like this.

-  "\"%s/pg_restore\" %s %s --exit-on-error --verbose "
+  "\"%s/pg_restore\" %s %s --exit-on-error --verboose "

Uh, the database server doesn't erase its logs on crash/failure, so why
should pg_upgrade do that?

To avoid the presence of irrelevant errors from the previous invocation of
pg_upgrade.

Maybe you would prefer one of my other ideas , like "put all its output files
into a newly-created subdirectory" ?

--
Justin

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: pg_upgrade should truncate/remove its logs before running

Bruce Momjian <bruce@momjian.us> writes:

On Sat, Dec 11, 2021 at 08:50:17PM -0600, Justin Pryzby wrote:

If pg_upgrade fails and is re-run, it appends to its logfiles, which is
confusing since, if it fails again, it then looks like the original error
recurred and wasn't fixed. The "append" behavior dates back to 717f6d608.

Uh, the database server doesn't erase its logs on crash/failure, so why
should pg_upgrade do that?

The server emits enough information so that it's not confusing:
there are timestamps, and there's an identifiable startup line.
pg_upgrade does neither. If you don't want to truncate as
Justin suggests, you should do that instead.

Personally I like the idea of making a timestamped subdirectory
and dropping all the files in that, because the thing that most
annoys *me* about pg_upgrade is the litter it leaves behind in
$CWD. A subdirectory would make it far easier to mop up the mess.

regards, tom lane

#5Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#4)
Re: pg_upgrade should truncate/remove its logs before running

On Wed, Dec 15, 2021 at 04:17:23PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Sat, Dec 11, 2021 at 08:50:17PM -0600, Justin Pryzby wrote:

If pg_upgrade fails and is re-run, it appends to its logfiles, which is
confusing since, if it fails again, it then looks like the original error
recurred and wasn't fixed. The "append" behavior dates back to 717f6d608.

Uh, the database server doesn't erase its logs on crash/failure, so why
should pg_upgrade do that?

The server emits enough information so that it's not confusing:
there are timestamps, and there's an identifiable startup line.
pg_upgrade does neither. If you don't want to truncate as
Justin suggests, you should do that instead.

Personally I like the idea of making a timestamped subdirectory
and dropping all the files in that, because the thing that most
annoys *me* about pg_upgrade is the litter it leaves behind in
$CWD. A subdirectory would make it far easier to mop up the mess.

Yes, lot of litter. Putting it in a subdirectory makes a lot of sense.
Justin, do you want to work on that patch, since you had an earlier
version to fix this?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#5)
Re: pg_upgrade should truncate/remove its logs before running

On 12/15/21 16:23, Bruce Momjian wrote:

On Wed, Dec 15, 2021 at 04:17:23PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Sat, Dec 11, 2021 at 08:50:17PM -0600, Justin Pryzby wrote:

If pg_upgrade fails and is re-run, it appends to its logfiles, which is
confusing since, if it fails again, it then looks like the original error
recurred and wasn't fixed. The "append" behavior dates back to 717f6d608.

Uh, the database server doesn't erase its logs on crash/failure, so why
should pg_upgrade do that?

The server emits enough information so that it's not confusing:
there are timestamps, and there's an identifiable startup line.
pg_upgrade does neither. If you don't want to truncate as
Justin suggests, you should do that instead.

Personally I like the idea of making a timestamped subdirectory
and dropping all the files in that, because the thing that most
annoys *me* about pg_upgrade is the litter it leaves behind in
$CWD. A subdirectory would make it far easier to mop up the mess.

Yes, lot of litter. Putting it in a subdirectory makes a lot of sense.
Justin, do you want to work on that patch, since you had an earlier
version to fix this?

The directory name needs to be predictable somehow, or maybe optionally
set as a parameter. Having just a timestamped directory name would make
life annoying for a poor buildfarm maintainer. Also, please don't change
anything before I have a chance to adjust the buildfarm code to what is
going to be done.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#7Justin Pryzby
pryzby@telsasoft.com
In reply to: Andrew Dunstan (#6)
Re: pg_upgrade should truncate/remove its logs before running

On Wed, Dec 15, 2021 at 05:04:54PM -0500, Andrew Dunstan wrote:

On 12/15/21 16:23, Bruce Momjian wrote:

On Wed, Dec 15, 2021 at 04:17:23PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Sat, Dec 11, 2021 at 08:50:17PM -0600, Justin Pryzby wrote:

If pg_upgrade fails and is re-run, it appends to its logfiles, which is
confusing since, if it fails again, it then looks like the original error
recurred and wasn't fixed. The "append" behavior dates back to 717f6d608.

Uh, the database server doesn't erase its logs on crash/failure, so why
should pg_upgrade do that?

The server emits enough information so that it's not confusing:
there are timestamps, and there's an identifiable startup line.
pg_upgrade does neither. If you don't want to truncate as
Justin suggests, you should do that instead.

Personally I like the idea of making a timestamped subdirectory
and dropping all the files in that, because the thing that most
annoys *me* about pg_upgrade is the litter it leaves behind in
$CWD. A subdirectory would make it far easier to mop up the mess.

Yes, lot of litter. Putting it in a subdirectory makes a lot of sense.
Justin, do you want to work on that patch, since you had an earlier
version to fix this?

The directory name needs to be predictable somehow, or maybe optionally
set as a parameter. Having just a timestamped directory name would make
life annoying for a poor buildfarm maintainer. Also, please don't change
anything before I have a chance to adjust the buildfarm code to what is
going to be done.

Feel free to suggest the desirable behavior.
It could write to pg_upgrade.log/* and refuse to run if the dir already exists.

--
Justin

#8Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#7)
Re: pg_upgrade should truncate/remove its logs before running

On Wed, Dec 15, 2021 at 04:13:10PM -0600, Justin Pryzby wrote:

On Wed, Dec 15, 2021 at 05:04:54PM -0500, Andrew Dunstan wrote:

The directory name needs to be predictable somehow, or maybe optionally
set as a parameter. Having just a timestamped directory name would make
life annoying for a poor buildfarm maintainer. Also, please don't change
anything before I have a chance to adjust the buildfarm code to what is
going to be done.

Feel free to suggest the desirable behavior.
It could write to pg_upgrade.log/* and refuse to run if the dir already exists.

Andrew's point looks rather sensible to me. So, this stuff should
have a predictable name (pg_upgrade.log, pg_upgrade_log or upgrade_log
would be fine). But I would also add an option to be able to define a
custom log path. The latter would be useful for the regression tests
so as everything gets could get redirected to a path already filtered
out.
--
Michael

#9Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#8)
Re: pg_upgrade should truncate/remove its logs before running

On 16.12.21 02:39, Michael Paquier wrote:

On Wed, Dec 15, 2021 at 04:13:10PM -0600, Justin Pryzby wrote:

On Wed, Dec 15, 2021 at 05:04:54PM -0500, Andrew Dunstan wrote:

The directory name needs to be predictable somehow, or maybe optionally
set as a parameter. Having just a timestamped directory name would make
life annoying for a poor buildfarm maintainer. Also, please don't change
anything before I have a chance to adjust the buildfarm code to what is
going to be done.

Feel free to suggest the desirable behavior.
It could write to pg_upgrade.log/* and refuse to run if the dir already exists.

Andrew's point looks rather sensible to me. So, this stuff should
have a predictable name (pg_upgrade.log, pg_upgrade_log or upgrade_log
would be fine). But I would also add an option to be able to define a
custom log path. The latter would be useful for the regression tests
so as everything gets could get redirected to a path already filtered
out.

Could we make it write just one log file? Is having multiple log files
better?

#10Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#9)
Re: pg_upgrade should truncate/remove its logs before running

On 16 Dec 2021, at 12:11, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

Could we make it write just one log file? Is having multiple log files better?

Having individual <checkname>.txt files from checks with additional information
on how to handle the error are quite convenient when writing wrappers around
pg_upgrade (speaking from experience of having written multiple pg_upgraade
frontends). Parsing a single logfile is more work, and will break existing
scripts.

I'm in favor of a predictable by default logpath, with a parameter to override,
as mentioned upthread.

--
Daniel Gustafsson https://vmware.com/

#11Justin Pryzby
pryzby@telsasoft.com
In reply to: Daniel Gustafsson (#10)
Re: pg_upgrade should truncate/remove its logs before running

On Thu, Dec 16, 2021 at 12:23:08PM +0100, Daniel Gustafsson wrote:

On 16 Dec 2021, at 12:11, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

Could we make it write just one log file? Is having multiple log files better?

Having individual <checkname>.txt files from checks with additional information
on how to handle the error are quite convenient when writing wrappers around
pg_upgrade (speaking from experience of having written multiple pg_upgraade
frontends). Parsing a single logfile is more work, and will break existing
scripts.

I'm in favor of a predictable by default logpath, with a parameter to override,
as mentioned upthread.

I put this together in the simplest way, prefixing all the filenames with the
configured path..

Another options is to chdir() into the given path. But, pg_upgrade takes (and
requires) a bunch of other paths, like -d -D -b -B, and those are traditionally
interpretted relative to CWD. I could getcwd() and prefix all the -[dDbB] with
that, but prefixing a handful of binary/data paths is hardly better than
prefixing a handful of dump/logfile paths. I suppose that openat() isn't
portable. I don't think this it's worth prohibiting relative paths, so I can't
think of any less-naive way to do this.

I didn't move the delete-old-cluster.sh, since that's intended to stay around
even after a successful upgrade, as opposed to the other logs, which are
typically removed at that point.

--
Justin

Attachments:

0001-pg_upgrade-write-logfiles-and-dumps-in-subdir.patchtext/x-diff; charset=us-asciiDownload+77-41
#12Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#11)
Re: pg_upgrade should truncate/remove its logs before running

On Fri, Dec 17, 2021 at 11:21:13AM -0600, Justin Pryzby wrote:

I put this together in the simplest way, prefixing all the filenames with the
configured path..

Well, why not.

Another options is to chdir() into the given path. But, pg_upgrade takes (and
requires) a bunch of other paths, like -d -D -b -B, and those are traditionally
interpretted relative to CWD. I could getcwd() and prefix all the -[dDbB] with
that, but prefixing a handful of binary/data paths is hardly better than
prefixing a handful of dump/logfile paths. I suppose that openat() isn't
portable. I don't think this it's worth prohibiting relative paths, so I can't
think of any less-naive way to do this.

If we add a new file, .gitignore would find about it quickly and
inform about a not-so-clean tree. I would tend to prefer your
approach, here. Relative paths can be useful.

I didn't move the delete-old-cluster.sh, since that's intended to stay around
even after a successful upgrade, as opposed to the other logs, which are
typically removed at that point.

Makes sense to me.

+       log_opts.basedir = getenv("PG_UPGRADE_LOGDIR");
+       if (log_opts.basedir != NULL)
+               log_opts.basedir = strdup(log_opts.basedir);
+       else
+               log_opts.basedir = "pg_upgrade_log.d";
Why is this controlled with an environment variable?  It seems to me
that an option switch would be much better, no?  While tuning things,
we could choose something simpler for the default, like
"pg_upgrade_log".  I don't have a good history in naming new things,
though :)

.gitignore should be updated, I guess? Besides, this patch has no
documentation.
--
Michael

#13Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#12)
Re: pg_upgrade should truncate/remove its logs before running

On Mon, Dec 20, 2021 at 08:21:51PM +0900, Michael Paquier wrote:

On Fri, Dec 17, 2021 at 11:21:13AM -0600, Justin Pryzby wrote:

+ log_opts.basedir = "pg_upgrade_log.d";

we could choose something simpler for the default, like
"pg_upgrade_log". I don't have a good history in naming new things,
though :)

I specifically called it .d to made it obvious that it's a dir - nearly
everything that ends in "log" is a file, so people are likely to run "rm" and
"less" on it - including myself.

.gitignore should be updated, I guess?

Are you suggesting to remove these ?
-/pg_upgrade_internal.log
-/reindex_hash.sql
-/loadable_libraries.txt

Besides, this patch has no documentation.

TBH I'm not even sure if the dir needs to be configurable ?

Attachments:

0001-pg_upgrade-write-log-files-and-dumps-in-subdir.patchtext/x-diff; charset=us-asciiDownload+92-44
#14Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#13)
Re: pg_upgrade should truncate/remove its logs before running

On Mon, Dec 20, 2021 at 09:39:26PM -0600, Justin Pryzby wrote:

On Mon, Dec 20, 2021 at 08:21:51PM +0900, Michael Paquier wrote:

we could choose something simpler for the default, like
"pg_upgrade_log". I don't have a good history in naming new things,
though :)

I specifically called it .d to made it obvious that it's a dir - nearly
everything that ends in "log" is a file, so people are likely to run "rm" and
"less" on it - including myself.

Okay.

.gitignore should be updated, I guess?

Are you suggesting to remove these ?
-/pg_upgrade_internal.log
-/loadable_libraries.txt

Yep, it looks so as these are part of the logs, the second one being a
failure state.

-/reindex_hash.sql

But this one is not, no?

Besides, this patch has no documentation.

TBH I'm not even sure if the dir needs to be configurable ?

I'd think it is better to have some control on that. Not sure what
the opinion of others is on this specific point, though.
--
Michael

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#14)
Re: pg_upgrade should truncate/remove its logs before running

Michael Paquier <michael@paquier.xyz> writes:

On Mon, Dec 20, 2021 at 09:39:26PM -0600, Justin Pryzby wrote:

Are you suggesting to remove these ?
-/pg_upgrade_internal.log
-/loadable_libraries.txt

Yep, it looks so as these are part of the logs, the second one being a
failure state.

-/reindex_hash.sql

But this one is not, no?

I'd like to get to a state where there's just one thing to "rm -rf"
to clean up after any pg_upgrade run. If we continue to leave the
we-suggest-you-run-these scripts loose in $CWD then we've not really
improved things much.

Perhaps there'd be merit in putting log files into an additional
subdirectory of that output directory, like
pg_upgrade_output.d/logs/foo.log, so that the more-ignorable
output files would be separated from the less-ignorable ones.
Or perhaps that's just gilding the lily.

regards, tom lane

#16Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#15)
Re: pg_upgrade should truncate/remove its logs before running

On Wed, Dec 22, 2021 at 09:52:26AM -0500, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

On Mon, Dec 20, 2021 at 09:39:26PM -0600, Justin Pryzby wrote:

Are you suggesting to remove these ?
-/pg_upgrade_internal.log
-/loadable_libraries.txt

Yep, it looks so as these are part of the logs, the second one being a
failure state.

-/reindex_hash.sql

But this one is not, no?

I'd like to get to a state where there's just one thing to "rm -rf"
to clean up after any pg_upgrade run. If we continue to leave the
we-suggest-you-run-these scripts loose in $CWD then we've not really
improved things much.

My patch moves reindex_hash.sql, and I'm having trouble seeing why it shouldn't
be handled in .gitignore the same way as other stuff that's moved.

But delete-old-cluster.sh is not moved, and I'm not sure how to improve on
that.

Perhaps there'd be merit in putting log files into an additional
subdirectory of that output directory, like
pg_upgrade_output.d/logs/foo.log, so that the more-ignorable
output files would be separated from the less-ignorable ones.
Or perhaps that's just gilding the lily.

In the case it's successful, everything is removed - except for the delete
script. I can see the case for separating the dumps (which are essentially
internal and of which there may be many) and the logs (same), from
the .txt error files like loadable_libraries.txt (which are user-facing).

It could also be divided with each DB having its own subdir, with a dumpfile
and a logfile.

Should the unix socket be created underneath the "output dir" ?

Should it be possible to set the output dir to "." ? That would give the
pre-existing behavior, but only if we don't use subdirs for log/ and dump/.

Attachments:

0001-pg_upgrade-write-log-files-and-dumps-in-subdir.patchtext/x-diff; charset=us-asciiDownload+119-42
#17Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#16)
Re: pg_upgrade should truncate/remove its logs before running

The cfbot was failing under windows:
| [22:07:02.159] could not create directory "pg_upgrade_output.d": File exists

It's because parseCommandLine() was called before get_restricted_token(), which
re-executes the process, and runs parseCommandLine again.

parseCommandLine already does stuff like opening logfiles, so that's where my
mkdir() is. It fails when re-run, since the re-exec doesn't call the cleanup()
path.

I fixed it by calling get_restricted_token() before parseCommandLine().
There's precedent for that in pg_regress (but the 3 other callers do it
differently).

It seems more ideal to always call get_restricted_token sooner than later, but
for now I only changed pg_upgrade. It's probably also better if
parseCommandLine() only parses the commandline, but for now I added on to the
logfile stuff that's already there.

BTW the CI integration is pretty swell. I added a few lines of debugging code
to figure out what was happening here. check world on 4 OSes is faster than
check world run locally. I rearranged cirrus.yaml to make windows run its
upgrade check first to save a few minutes.

Maybe the commandline argument should be callled something other than "logdir"
since it also outputs dumps there. But the dumps are more or less not
user-facing. But -d and -o are already used. Maybe it shouldn't be
configurable at all?

--
Justin

Attachments:

0001-pg_upgrade-write-log-files-and-dumps-in-subdir.patchtext/x-diff; charset=us-asciiDownload+121-46
#18Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#17)
Re: pg_upgrade should truncate/remove its logs before running

On Sat, Jan 08, 2022 at 12:48:57PM -0600, Justin Pryzby wrote:

I fixed it by calling get_restricted_token() before parseCommandLine().
There's precedent for that in pg_regress (but the 3 other callers do it
differently).

It seems more ideal to always call get_restricted_token sooner than later, but
for now I only changed pg_upgrade. It's probably also better if
parseCommandLine() only parses the commandline, but for now I added on to the
logfile stuff that's already there.

Well, the routine does a bit more than just parsing the options as it
creates the directory infrastructure as well. As you say, I think
that it would be better to have the option parsing and the
loading-into-structure portions in one routine, and the creation of
the paths in a second one. So, the new contents of the patch could
just be moved in a new routine, after getting the restricted token.
Moving get_restricted_token() before or after the option parsing as
you do is not a big deal, but your patch is introducing in the
existing routine more than what's currently done there as of HEAD.

Maybe the commandline argument should be callled something other than "logdir"
since it also outputs dumps there. But the dumps are more or less not
user-facing. But -d and -o are already used. Maybe it shouldn't be
configurable at all?

If the choice of a short option becomes confusing, I'd be fine with
just a long option, but -l is fine IMO. Including the internal dumps
in the directory is fine to me, and using a subdir, as you do, makes
things more organized.

-             "--binary-upgrade %s -f %s",
+             "--binary-upgrade %s -f %s/dump/%s",
Some quotes seem to be missing here.
static void
cleanup(void)
{
+   int         dbnum;
+   char      **filename;
+   char        filename_path[MAXPGPATH];
[...] 
+   if (rmdir(filename_path))
+       pg_log(PG_WARNING, "failed to rmdir: %s: %m\n",
filename_path);
+
+   if (rmdir(log_opts.basedir))
+       pg_log(PG_WARNING, "failed to rmdir: %s: %m\n", log_opts.basedir);

Is it intentional to not use rmtree() here? If you put all the data
in the same directory, cleanup() gets simpler.
--
Michael

#19Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#18)
Re: pg_upgrade should truncate/remove its logs before running

On Tue, Jan 11, 2022 at 04:41:58PM +0900, Michael Paquier wrote:

On Sat, Jan 08, 2022 at 12:48:57PM -0600, Justin Pryzby wrote:

I fixed it by calling get_restricted_token() before parseCommandLine().
There's precedent for that in pg_regress (but the 3 other callers do it
differently).

It seems more ideal to always call get_restricted_token sooner than later, but
for now I only changed pg_upgrade. It's probably also better if
parseCommandLine() only parses the commandline, but for now I added on to the
logfile stuff that's already there.

Well, the routine does a bit more than just parsing the options as it
creates the directory infrastructure as well. As you say, I think
that it would be better to have the option parsing and the
loading-into-structure portions in one routine, and the creation of
the paths in a second one. So, the new contents of the patch could
just be moved in a new routine, after getting the restricted token.
Moving get_restricted_token() before or after the option parsing as
you do is not a big deal, but your patch is introducing in the
existing routine more than what's currently done there as of HEAD.

I added mkdir() before the other stuff that messes with logfiles, because it
needs to happen before that.

Are you suggesting to change the pre-existing behavior of when logfiles are
created, like 0002 ?

Maybe the commandline argument should be callled something other than "logdir"
since it also outputs dumps there. But the dumps are more or less not
user-facing. But -d and -o are already used. Maybe it shouldn't be
configurable at all?

If the choice of a short option becomes confusing, I'd be fine with
just a long option, but -l is fine IMO. Including the internal dumps
in the directory is fine to me, and using a subdir, as you do, makes
things more organized.

-             "--binary-upgrade %s -f %s",
+             "--binary-upgrade %s -f %s/dump/%s",
Some quotes seem to be missing here.

Yes, good catch

Is it intentional to not use rmtree() here? If you put all the data
in the same directory, cleanup() gets simpler.

There's no reason not to. We created the dir, and the user didn't specify to
preserve it. It'd be their fault if they put something valuable there after
starting pg_upgrade.

--
Justin

Attachments:

0001-pg_upgrade-write-log-files-and-dumps-in-subdir.patchtext/x-diff; charset=us-asciiDownload+74-51
0002-f-move-the-mkdir-and-logfile-stuff-to-a-new-function.patchtext/x-diff; charset=us-asciiDownload+47-39
#20Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#19)
Re: pg_upgrade should truncate/remove its logs before running

On Tue, Jan 11, 2022 at 02:03:07PM -0600, Justin Pryzby wrote:

I added mkdir() before the other stuff that messes with logfiles, because it
needs to happen before that.

Are you suggesting to change the pre-existing behavior of when logfiles are
created, like 0002 ?

Yes, something like that.

There's no reason not to. We created the dir, and the user didn't specify to
preserve it. It'd be their fault if they put something valuable there after
starting pg_upgrade.

This is a path for the data internal to pg_upgrade. My take is that
the code simplifications the new option brings are more valuable than
this assumption, which I guess would unlikely happen. I may be wrong,
of course. By the way, while thinking about that, should we worry
about --logdir="."?
--
Michael

#21Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#20)
#22Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#22)
#24Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#22)
#25Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#24)
#26Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#26)
#28Peter Eisentraut
peter_e@gmx.net
In reply to: Michael Paquier (#22)
#29Michael Paquier
michael@paquier.xyz
In reply to: Peter Eisentraut (#28)
#30Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#29)
#31Bruce Momjian
bruce@momjian.us
In reply to: Michael Paquier (#30)
#32Justin Pryzby
pryzby@telsasoft.com
In reply to: Bruce Momjian (#31)
#33Bruce Momjian
bruce@momjian.us
In reply to: Justin Pryzby (#32)
#34Michael Paquier
michael@paquier.xyz
In reply to: Bruce Momjian (#33)
#35Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#30)
#36Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#35)
#37Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#36)
#38Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#37)
#39Andrew Dunstan
andrew@dunslane.net
In reply to: Michael Paquier (#38)
#40Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#39)
#41Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#40)
#42Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#41)
#43Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#41)
#45Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#44)
#46Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#44)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#46)
#48Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#47)
#49Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#48)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#49)
#51Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#50)
#52Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#50)
#53Julien Rouhaud
rjuju123@gmail.com
In reply to: Andrew Dunstan (#52)