pg_upgrade: Pass -j down to vacuumdb

Started by Jesper Pedersenabout 7 years ago44 messages
#1Jesper Pedersen
jesper.pedersen@redhat.com
1 attachment(s)

Hi Hackers,

Here is a patch that passes the -j option from pg_upgrade down to
vacuumdb if supported.

I'll add it to the January 'Fest.

Thanks for considering !

Best regards,
Jesper

Attachments:

v1_0001-Pass-the-j-option-down-to-vacuumdb-if-supported.patchtext/x-patch; name=v1_0001-Pass-the-j-option-down-to-vacuumdb-if-supported.patchDownload
From ea941f942830589469281e0d5c17740469c6aebc Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Pass the -j option down to vacuumdb if supported.

Author: Jesper Pedersen <jesper.pedersen@redhat.com>
---
 src/bin/pg_upgrade/check.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index f2215b7095..102221a201 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,26 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905)
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data,
+		/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	else
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s-j %d --all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data, user_opts.jobs,
+		/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
-			new_cluster.bindir, user_specification.data);
+	if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905)
+		fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+				new_cluster.bindir, user_specification.data);
+	else
+		fprintf(script, "\"%s/vacuumdb\" %s-j %d --all --analyze-in-stages\n",
+				new_cluster.bindir, user_specification.data, user_opts.jobs);
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2

#2Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Jesper Pedersen (#1)
Re: pg_upgrade: Pass -j down to vacuumdb

On 21/12/2018 11:12, Jesper Pedersen wrote:

Here is a patch that passes the -j option from pg_upgrade down to
vacuumdb if supported.

It's not clear to me that that is what is usually wanted.

The point of the post-upgrade analyze script is specifically to do the
analyze in a gentle fashion. Mixing in parallelism would seem to defeat
that a bit.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Peter Eisentraut (#2)
Re: pg_upgrade: Pass -j down to vacuumdb

Hi,

On 12/29/18 10:03 AM, Peter Eisentraut wrote:

On 21/12/2018 11:12, Jesper Pedersen wrote:

Here is a patch that passes the -j option from pg_upgrade down to
vacuumdb if supported.

It's not clear to me that that is what is usually wanted.

The point of the post-upgrade analyze script is specifically to do the
analyze in a gentle fashion. Mixing in parallelism would seem to defeat
that a bit.

Well, that really depends. The user passed -j to pg_upgrade in order for
the upgrade to happen faster, so maybe they would expect, as I would,
that the ANALYZE phase would happen in parallel too.

At least there should be a "notice" about what the script will do in
this case.

Best regards,
Jesper

#4Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Jesper Pedersen (#3)
Re: pg_upgrade: Pass -j down to vacuumdb

On 02/01/2019 20:47, Jesper Pedersen wrote:

Well, that really depends. The user passed -j to pg_upgrade in order for
the upgrade to happen faster, so maybe they would expect, as I would,
that the ANALYZE phase would happen in parallel too.

pg_upgrade -j reduces the *downtime* caused by pg_upgrade during the
upgrade process. Also, during said downtime, nothing else is happening,
so you can use all the resources of the machine.

Once the system is back up, you don't necessarily want to use all the
resources. The analyze script is specifically written to run while
production traffic is active. If you just want to run the analyze as
fast as possible, you can just run vacuumdb -j directly, without using
the script.

Moreover, it's not clear that pg_upgrade and vacuumdb are bound the same
way, so it's not a given that the same -j number should be used.

Perhaps more documentation would be useful here.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Jerry Sievers
gsievers19@comcast.net
In reply to: Peter Eisentraut (#4)
Re: pg_upgrade: Pass -j down to vacuumdb

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 02/01/2019 20:47, Jesper Pedersen wrote:

Well, that really depends. The user passed -j to pg_upgrade in order for
the upgrade to happen faster, so maybe they would expect, as I would,
that the ANALYZE phase would happen in parallel too.

pg_upgrade -j reduces the *downtime* caused by pg_upgrade during the
upgrade process. Also, during said downtime, nothing else is happening,
so you can use all the resources of the machine.

Once the system is back up, you don't necessarily want to use all the
resources. The analyze script is specifically written to run while
production traffic is active. If you just want to run the analyze as
fast as possible, you can just run vacuumdb -j directly, without using
the script.

Peter, I'm skeptical here.

I might permit a connection to a just pg_upgraded DB prior to any
analyze being known finished only for the most trivial case. At my site
however, *trivial* systems are a small minority.

In fact, our automated upgrade workflow uses our home-built parallel
analyzer which predates vacuumdb -j. Apps are not allowed into the DB
until a fast 1st pass has been done.

We run it in 2 phases...

$all preceeding upgrade steps w/system locked out
analyze-lite (reduced stats target)
open DB for application traffic
analyze-full

Of course we are increasing downtime by disallowing app traffic till
finish of analyze-lite however the assumption is that many queries would
be too slow to attempt without full analyzer coverage, albiet at a
reduced stats target.

IMO this is certainly a case of no 1-size-fits-all solution so perhaps a
--analyze-jobs option :-)

FWIW

Thanks

Moreover, it's not clear that pg_upgrade and vacuumdb are bound the same
way, so it's not a given that the same -j number should be used.

Perhaps more documentation would be useful here.

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net

#6Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Peter Eisentraut (#4)
1 attachment(s)
Re: pg_upgrade: Pass -j down to vacuumdb

Hi Peter,

On 1/3/19 7:03 AM, Peter Eisentraut wrote:

Perhaps more documentation would be useful here.

Here is v2 that just notes that the -j option isn't passed down.

Best regards,
Jesper

Attachments:

v2_0001-Highlight-that-the-j-option-isn-t-passed-down-to-vac.patchtext/x-patch; name=v2_0001-Highlight-that-the-j-option-isn-t-passed-down-to-vac.patchDownload
From fe0be6c1f5cbcaeb2981ff4dcfceae2e2cb286b7 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the -j option isn't passed down to vacuumdb by
 default.

Author: Jesper Pedersen <jesper.pedersen@redhat.com>
---
 doc/src/sgml/ref/pgupgrade.sgml |  3 ++-
 src/bin/pg_upgrade/check.c      | 21 ++++++++++++++++-----
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..937e95688d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,8 @@ NET STOP postgresql-&majorversion;
      in parallel;  a good place to start is the maximum of the number of
      CPU cores and tablespaces.  This option can dramatically reduce the
      time to upgrade a multi-database server running on a multiprocessor
-     machine.
+     machine. Note, that this option isn't passed to the
+     <application>vacuumdb</application> application by default.
     </para>
 
     <para>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..7bae43a0b1 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,11 +495,22 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905)
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s-j %d --all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data, user_opts.jobs,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
 	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
-- 
2.17.2

#7Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jesper Pedersen (#6)
RE: pg_upgrade: Pass -j down to vacuumdb

Hi,

According to CF app, this patch needs review so I took a look at it.
Currently, your patch applies and builds cleanly, and all tests are also successful
based from the CF bot patch tester.

I'm not sure if I have understood the argument raised by Peter correctly.
Quoting Peter, "it's not clear that pg_upgrade and vacuumdb are bound the same way, so it's not a given that the same -j number should be used."
I think it's whether the # jobs for pg_upgrade is used the same way for parallel vacuum.

According to the official docs, the recommended setting for pg_upgrade --j option is the maximum of the number of CPU cores and tablespaces. [1]https://www.postgresql.org/docs/current/pgupgrade.html
As for vacuumdb, parallel vacuum's (-j) recommended setting is based from your max_connections [2]https://www.postgresql.org/docs/current/app-vacuumdb.html, which is the max # of concurrent connections to your db server.

[1]: https://www.postgresql.org/docs/current/pgupgrade.html
[2]: https://www.postgresql.org/docs/current/app-vacuumdb.html

Regards,
Kirk Jamison

#8Michael Paquier
michael@paquier.xyz
In reply to: Jamison, Kirk (#7)
Re: pg_upgrade: Pass -j down to vacuumdb

On Fri, Jan 25, 2019 at 02:31:57AM +0000, Jamison, Kirk wrote:

I'm not sure if I have understood the argument raised by Peter
correctly. Quoting Peter, "it's not clear that pg_upgrade and
vacuumdb are bound the same way, so it's not a given that the same
-j number should be used." I think it's whether the # jobs for
pg_upgrade is used the same way for parallel vacuum.

vacuumdb and pg_upgrade are designed for different purposes and have
different properties, hence using a value of -j for one does not
necessarily mean that the same value should be used for the other.
An ANALYZE is nice because it is non-intrusive for a live production
system, and if you begin to pass down a -j argument you may finish by
eating more resources that would have been necessary for the same
job. For some users perhaps that matters, for some it does not. So
based on that linking the value used by pg_upgrade and vacuumdb is a
bad concept in my opinion, and the patch should be rejected. More
documentation on pg_upgrade side to explain that a bit better could be
a good idea though, as it is perfectly possible to use your own
post-upgrade script or rewrite partially the generated one.
--
Michael

#9Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Michael Paquier (#8)
Re: pg_upgrade: Pass -j down to vacuumdb

On 25/01/2019 11:28, Michael Paquier wrote:

based on that linking the value used by pg_upgrade and vacuumdb is a
bad concept in my opinion, and the patch should be rejected. More
documentation on pg_upgrade side to explain that a bit better could be
a good idea though, as it is perfectly possible to use your own
post-upgrade script or rewrite partially the generated one.

Right. pg_upgrade doesn't actually call vacuumdb. It creates a script
that you may use. The script itself contains a comment that says, if
you want to do this as fast as possible, don't use this script. That
comment could be enhanced to suggest the use of the -j option.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Jamison, Kirk (#7)
Re: pg_upgrade: Pass -j down to vacuumdb

Hi Kirk,

On 1/24/19 9:31 PM, Jamison, Kirk wrote:

According to CF app, this patch needs review so I took a look at it.
Currently, your patch applies and builds cleanly, and all tests are also successful
based from the CF bot patch tester.

I'm not sure if I have understood the argument raised by Peter correctly.
Quoting Peter, "it's not clear that pg_upgrade and vacuumdb are bound the same way, so it's not a given that the same -j number should be used."
I think it's whether the # jobs for pg_upgrade is used the same way for parallel vacuum.

According to the official docs, the recommended setting for pg_upgrade --j option is the maximum of the number of CPU cores and tablespaces. [1]
As for vacuumdb, parallel vacuum's (-j) recommended setting is based from your max_connections [2], which is the max # of concurrent connections to your db server.

Thanks for your feedback !

As per Peter's comments I have changed the patch (v2) to not pass down
the -j option to vacuumdb.

Only an update to the documentation and console output is made in order
to make it more clear.

Best regards,
Jesper

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#9)
Re: pg_upgrade: Pass -j down to vacuumdb

On 2019-Jan-25, Peter Eisentraut wrote:

On 25/01/2019 11:28, Michael Paquier wrote:

based on that linking the value used by pg_upgrade and vacuumdb is a
bad concept in my opinion, and the patch should be rejected. More
documentation on pg_upgrade side to explain that a bit better could be
a good idea though, as it is perfectly possible to use your own
post-upgrade script or rewrite partially the generated one.

Right. pg_upgrade doesn't actually call vacuumdb. It creates a script
that you may use. The script itself contains a comment that says, if
you want to do this as fast as possible, don't use this script. That
comment could be enhanced to suggest the use of the -j option.

So let's have it write with a $VACUUMDB_OPTS variable, which is by
default defined as empty but with a comment suggesting that maybe the
user wants to add the -j option. This way, if they have to edit it,
they only have to edit the VACUUMDB_OPTS line instead of each of the two
vacuumdb lines.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#11)
Re: pg_upgrade: Pass -j down to vacuumdb

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

So let's have it write with a $VACUUMDB_OPTS variable, which is by
default defined as empty but with a comment suggesting that maybe the
user wants to add the -j option. This way, if they have to edit it,
they only have to edit the VACUUMDB_OPTS line instead of each of the two
vacuumdb lines.

Even better, allow the script to absorb a value from the environment,
so that it needn't be edited at all.

regards, tom lane

#13Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#12)
Re: pg_upgrade: Pass -j down to vacuumdb

On Fri, Jan 25, 2019 at 12:16:49PM -0500, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

So let's have it write with a $VACUUMDB_OPTS variable, which is by
default defined as empty but with a comment suggesting that maybe the
user wants to add the -j option. This way, if they have to edit it,
they only have to edit the VACUUMDB_OPTS line instead of each of the two
vacuumdb lines.

Even better, allow the script to absorb a value from the environment,
so that it needn't be edited at all.

+1.
--
Michael
#14Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jesper Pedersen (#10)
RE: pg_upgrade: Pass -j down to vacuumdb

Hi Jesper,

On Friday, January 25, 2019, Jesper Pedersen <jesper.pedersen@redhat.com>

Thanks for your feedback !

As per Peter's comments I have changed the patch (v2) to not pass down the -j option to vacuumdb.

Only an update to the documentation and console output is made in order to make it more clear.

Yeah, I understood from the references that the parallel option use
for vacuumdb and pg_upgrade are different. I was also getting
clarification whether pg_upgrade gets to pass it down to vacuumdb.
Based from other Michael's and Peter's replies, I now understand it.

There were also helpful comments from the developers above,
pointing it to the right direction.
In addition to Peter's comment, quoting "...if you want to do this
as fast as possible, don't use this script. That comment could be
enhanced to suggest the use of the -j option.", so I think you
should also update the following script in create_script_for_cluster_analyze():
fprintf(script, "echo %sIf you would like default statistics as quickly as possible, cancel%s\n",
ECHO_QUOTE, ECHO_QUOTE);

There were also advice from Tom and Alvaro that you can incorporate
further in your next patch.

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

So let's have it write with a $VACUUMDB_OPTS variable, which is by
default defined as empty but with a comment suggesting that maybe the
user wants to add the -j option. This way, if they have to edit it,
they only have to edit the VACUUMDB_OPTS line instead of each of the
two vacuumdb lines.

Tom Lane wrote:

Even better, allow the script to absorb a value from the environment, so that it needn't be edited at all.

Regards,
Kirk Jamison

#15Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Jamison, Kirk (#14)
1 attachment(s)
Re: pg_upgrade: Pass -j down to vacuumdb

Hi,

On 1/27/19 7:50 PM, Jamison, Kirk wrote:

There were also helpful comments from the developers above,
pointing it to the right direction.
In addition to Peter's comment, quoting "...if you want to do this
as fast as possible, don't use this script. That comment could be
enhanced to suggest the use of the -j option.", so I think you
should also update the following script in create_script_for_cluster_analyze():
fprintf(script, "echo %sIf you would like default statistics as quickly as possible, cancel%s\n",
ECHO_QUOTE, ECHO_QUOTE);

Yes, it will echo the -j option if passed, and the server supports it.

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

So let's have it write with a $VACUUMDB_OPTS variable, which is by
default defined as empty but with a comment suggesting that maybe the
user wants to add the -j option. This way, if they have to edit it,
they only have to edit the VACUUMDB_OPTS line instead of each of the
two vacuumdb lines.

Tom Lane wrote:

Even better, allow the script to absorb a value from the environment, so that it needn't be edited at all.

Done.

Attached is v3.

Best regards,
Jesper

Attachments:

v3_0001-Highlight-that-the-j-option-isn-t-passed-down-to-vac.patchtext/x-patch; name=v3_0001-Highlight-that-the-j-option-isn-t-passed-down-to-vac.patchDownload
From 1cba8299ae328942a4bf623d841a80b1b6043d3a Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the -j option isn't passed down to vacuumdb by
 default.

Author: Jesper Pedersen <jesper.pedersen@redhat.com>
---
 doc/src/sgml/ref/pgupgrade.sgml |  6 +++++-
 src/bin/pg_upgrade/check.c      | 28 ++++++++++++++++++++++------
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..b1da0cc7a2 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,11 @@ NET STOP postgresql-&majorversion;
      in parallel;  a good place to start is the maximum of the number of
      CPU cores and tablespaces.  This option can dramatically reduce the
      time to upgrade a multi-database server running on a multiprocessor
-     machine.
+     machine. Note, that this option isn't passed to the
+     <application>vacuumdb</application> application by default.
+     The system environment variable <literal>VACUUMDB_OPTS</literal> can be
+     used to pass extra options to the <application>vacuumdb</application>
+     application during the script execution such as <option>--jobs</option>.
     </para>
 
     <para>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..5114df01b5 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,31 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905)
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s-j %d --all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data, user_opts.jobs,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s$VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s%%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2

#16Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: Jesper Pedersen (#15)
Re: pg_upgrade: Pass -j down to vacuumdb

On Mon, Jan 28, 2019 at 1:15 PM Jesper Pedersen <jesper.pedersen@redhat.com>
wrote:

Done.

Attached is v3.

IMHO you should use long option name '--jobs' like others.

Regards,

--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#17Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Fabrízio de Royes Mello (#16)
1 attachment(s)
Re: pg_upgrade: Pass -j down to vacuumdb

On 1/28/19 11:30 AM, Fabrízio de Royes Mello wrote:

IMHO you should use long option name '--jobs' like others.

Thanks for your feedback !

Done, and v4 attached.

Best regards,
Jesper

Attachments:

v4_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patchtext/x-patch; name=v4_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patchDownload
From 142b4723f433f39d0eed2ced4beccc3721451103 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the --jobs option isn't passed down to
 vacuumdb by default.

Author: Jesper Pedersen <jesper.pedersen@redhat.com>
---
 doc/src/sgml/ref/pgupgrade.sgml |  6 +++++-
 src/bin/pg_upgrade/check.c      | 28 ++++++++++++++++++++++------
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..b1da0cc7a2 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,11 @@ NET STOP postgresql-&majorversion;
      in parallel;  a good place to start is the maximum of the number of
      CPU cores and tablespaces.  This option can dramatically reduce the
      time to upgrade a multi-database server running on a multiprocessor
-     machine.
+     machine. Note, that this option isn't passed to the
+     <application>vacuumdb</application> application by default.
+     The system environment variable <literal>VACUUMDB_OPTS</literal> can be
+     used to pass extra options to the <application>vacuumdb</application>
+     application during the script execution such as <option>--jobs</option>.
     </para>
 
     <para>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..3b478ef95f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,31 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905)
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--jobs %d --all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data, user_opts.jobs,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s$VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s%%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2

#18Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Jesper Pedersen (#17)
Re: pg_upgrade: Pass -j down to vacuumdb

On 28/01/2019 17:47, Jesper Pedersen wrote:

On 1/28/19 11:30 AM, Fabrízio de Royes Mello wrote:

IMHO you should use long option name '--jobs' like others.

Thanks for your feedback !

Done, and v4 attached.

I would drop the changes in pgupgrade.sgml. We don't need to explain
what doesn't happen, when nobody before now ever thought that it would
happen.

Also, we don't need the versioning stuff. The new cluster in pg_upgrade
is always of the current version, and we know what that one supports.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#19Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Peter Eisentraut (#18)
1 attachment(s)
Re: pg_upgrade: Pass -j down to vacuumdb

On 1/28/19 3:50 PM, Peter Eisentraut wrote:

Done, and v4 attached.

I would drop the changes in pgupgrade.sgml. We don't need to explain
what doesn't happen, when nobody before now ever thought that it would
happen.

Also, we don't need the versioning stuff. The new cluster in pg_upgrade
is always of the current version, and we know what that one supports.

Done.

Best regards,
Jesper

Attachments:

v5_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patchtext/x-patch; name=v5_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patchDownload
From 5b51f927dff808b4a8516b424c2ef790dbe96da6 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the --jobs option isn't passed down to
 vacuumdb by default.

Author: Jesper Pedersen <jesper.pedersen@redhat.com>
---
 src/bin/pg_upgrade/check.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..847e604088 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,31 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1)
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--jobs %d --all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data, user_opts.jobs,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s$VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s%%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2

#20Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jesper Pedersen (#19)
RE: pg_upgrade: Pass -j down to vacuumdb

Hi Jesper,

Thanks for updating your patches quickly.

On 1/28/19 3:50 PM, Peter Eisentraut wrote:

Done, and v4 attached.

I would drop the changes in pgupgrade.sgml. We don't need to explain
what doesn't happen, when nobody before now ever thought that it would
happen.

Also, we don't need the versioning stuff. The new cluster in
pg_upgrade is always of the current version, and we know what that one supports.

Done.

I just checked the patch.
As per advice, you removed the versioning and specified --jobs.
The patch still applies, builds and passed the tests successfully.

I'd argue though that it's helpful to find a short documentation to clarify
that it's not pass down by to vacuumdb default. I think the initial doc 
change from the previous patch would be good.
+     machine. Note, that this option isn't passed to the
+     <application>vacuumdb</application> application by default.
Just remove the comma symbol from "Note, that..."

Regards,
Kirk Jamison

#21Michael Paquier
michael@paquier.xyz
In reply to: Jamison, Kirk (#20)
Re: pg_upgrade: Pass -j down to vacuumdb

On Tue, Jan 29, 2019 at 12:35:30AM +0000, Jamison, Kirk wrote:

I just checked the patch.
As per advice, you removed the versioning and specified --jobs.
The patch still applies, builds and passed the tests successfully.

I would document the optional VACUUM_OPTS on the page of pg_upgrade.
If Peter thinks it is fine to not do so, that's fine for me as well.

It seems to me that the latest patch sent is incorrect for multiple
reasons:
1) You still enforce -j to use the number of jobs that the caller of
pg_upgrade provides, and we agreed that both things are separate
concepts upthread, no? What has been suggested by Alvaro is to add a
comment so as one can use VACUUM_OPTS with -j optionally, instead of
suggesting a full-fledged vacuumdb command which depends on what
pg_upgrade uses. So there is no actual need for the if/else
complication business.
2) Perhaps we need to worry about the second vacuumdb --all command,
which may want custom options, which are not necessarily the same as
the options of the first command? I don't think we need to care as it
applies only to an upgraded cluster using something older 8.4, just
wondering..
--
Michael

#22Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Michael Paquier (#21)
1 attachment(s)
Re: pg_upgrade: Pass -j down to vacuumdb

Hi,

On 1/29/19 12:08 AM, Michael Paquier wrote:

On Tue, Jan 29, 2019 at 12:35:30AM +0000, Jamison, Kirk wrote:

I just checked the patch.
As per advice, you removed the versioning and specified --jobs.
The patch still applies, builds and passed the tests successfully.

I would document the optional VACUUM_OPTS on the page of pg_upgrade.
If Peter thinks it is fine to not do so, that's fine for me as well.

I added most of the documentation back, as requested by Kirk.

It seems to me that the latest patch sent is incorrect for multiple
reasons:
1) You still enforce -j to use the number of jobs that the caller of
pg_upgrade provides, and we agreed that both things are separate
concepts upthread, no? What has been suggested by Alvaro is to add a
comment so as one can use VACUUM_OPTS with -j optionally, instead of
suggesting a full-fledged vacuumdb command which depends on what
pg_upgrade uses. So there is no actual need for the if/else
complication business.

I think it is ok for the echo command to highlight to the user that
running --analyze-only using the same amount of jobs will give a faster
result.

2) Perhaps we need to worry about the second vacuumdb --all command,
which may want custom options, which are not necessarily the same as
the options of the first command? I don't think we need to care as it
applies only to an upgraded cluster using something older 8.4, just
wondering..

I think that --all --analyze-in-stages is what most people want. And
with the $VACUUMDB_OPTS variable people have an option to put in more,
such as -j X.

Best regards,
Jesper

Attachments:

v6_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patchtext/x-patch; name=v6_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patchDownload
From b02e1f414e309361595ab3fe84fb6f66bcf63265 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the --jobs option isn't passed down to
 vacuumdb by default.

Author: Jesper Pedersen <jesper.pedersen@redhat.com>
---
 doc/src/sgml/ref/pgupgrade.sgml |  6 +++++-
 src/bin/pg_upgrade/check.c      | 28 ++++++++++++++++++++++------
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..84a7ccb6f1 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,11 @@ NET STOP postgresql-&majorversion;
      in parallel;  a good place to start is the maximum of the number of
      CPU cores and tablespaces.  This option can dramatically reduce the
      time to upgrade a multi-database server running on a multiprocessor
-     machine.
+     machine. Note that this option isn't passed to the
+     <application>vacuumdb</application> application by default.
+     The system environment variable <literal>VACUUMDB_OPTS</literal> can be
+     used to pass extra options to the <application>vacuumdb</application>
+     application during the script execution.
     </para>
 
     <para>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..847e604088 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,31 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1)
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--jobs %d --all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data, user_opts.jobs,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s$VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s%%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2

#23Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jesper Pedersen (#22)
RE: pg_upgrade: Pass -j down to vacuumdb

On January 29, 2019 8:19 PM +0000, Jesper Pedersen wrote:

On 1/29/19 12:08 AM, Michael Paquier wrote:

I would document the optional VACUUM_OPTS on the page of pg_upgrade.
If Peter thinks it is fine to not do so, that's fine for me as well.

..

I added most of the documentation back, as requested by Kirk.

Actually, I find it useful to be documented. However, major contributors have higher opinions in terms of experience, so I think it's alright with me not to include the doc part if they finally say so.

It seems to me that the latest patch sent is incorrect for multiple
reasons:
1) You still enforce -j to use the number of jobs that the caller of
pg_upgrade provides, and we agreed that both things are separate
concepts upthread, no? What has been suggested by Alvaro is to add a
comment so as one can use VACUUM_OPTS with -j optionally, instead of
suggesting a full-fledged vacuumdb command which depends on what
pg_upgrade uses. So there is no actual need for the if/else
complication business.

I think it is ok for the echo command to highlight to the user that running --analyze-only using the same amount of jobs will give a faster result.

I missed that part.
IIUC, using the $VACUUMDB_OPTS variable would simplify and correct the usage of jobs for vacuumdb. (pg_upgrade jobs is not equal to vacuumdb jobs) Plus, it might simplify and reduce the number of additional lines.
Tom Lane also suggested above to make the script absorb the value from env.
Would that address your concern of getting a faster result, yes?

Regards,
Kirk Jamison

#24Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Jamison, Kirk (#23)
Re: pg_upgrade: Pass -j down to vacuumdb

Hi,

On 1/30/19 7:59 PM, Jamison, Kirk wrote:

I added most of the documentation back, as requested by Kirk.

Actually, I find it useful to be documented. However, major contributors have higher opinions in terms of experience, so I think it's alright with me not to include the doc part if they finally say so.

I think we need to leave it to the Committer to decide, as both Peter
and Michael are committers; provided the patch reaches RfC.

It seems to me that the latest patch sent is incorrect for multiple
reasons:
1) You still enforce -j to use the number of jobs that the caller of
pg_upgrade provides, and we agreed that both things are separate
concepts upthread, no? What has been suggested by Alvaro is to add a
comment so as one can use VACUUM_OPTS with -j optionally, instead of
suggesting a full-fledged vacuumdb command which depends on what
pg_upgrade uses. So there is no actual need for the if/else
complication business.

I think it is ok for the echo command to highlight to the user that running --analyze-only using the same amount of jobs will give a faster result.

I missed that part.
IIUC, using the $VACUUMDB_OPTS variable would simplify and correct the usage of jobs for vacuumdb. (pg_upgrade jobs is not equal to vacuumdb jobs) Plus, it might simplify and reduce the number of additional lines.
Tom Lane also suggested above to make the script absorb the value from env.
Would that address your concern of getting a faster result, yes?

The actual line in the script executed is using $VACUUMDB_OPTS at the
moment, so could you expand on the above ? The 'if' could be removed if
we assume that pg_upgrade would only be used with server > 9.5, but to
be safer I would leave it in, as it is only console output.

Best regards,
Jesper

#25Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jesper Pedersen (#24)
RE: pg_upgrade: Pass -j down to vacuumdb

On January 31, 2019, 9:29PM +0000, Jesper Pedersen wrote:

I added most of the documentation back, as requested by Kirk.

Actually, I find it useful to be documented. However, major contributors have higher opinions in terms of experience, so I think it's alright with me not to include the doc part if they finally say so.

I think we need to leave it to the Committer to decide, as both Peter and Michael are committers; provided the patch reaches RfC.

Agreed.

1) You still enforce -j to use the number of jobs that the caller of
pg_upgrade provides, and we agreed that both things are separate
concepts upthread, no? What has been suggested by Alvaro is to add
a comment so as one can use VACUUM_OPTS with -j optionally, instead
of suggesting a full-fledged vacuumdb command which depends on what
pg_upgrade uses. So there is no actual need for the if/else
complication business.

I think it is ok for the echo command to highlight to the user that
running --analyze-only using the same amount of jobs will give a faster result.

Since you used user_opts.jobs (which is coming from pg_upgrade, is it not?),
could you clarify more the statement above? Or did you mean somehow that
it won't be a problem for vacuumdb to use the same?
Though correctness-wise is arguable, if the committers can let it pass
from your answer, then I think it's alright.

I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I thought what
the other developers suggested is to utilize it so that --jobs for vacuumdb
would be optional and just based from user-specified option $VACUUMDB_OPTS.
By which it would not utilize the amount of jobs used for pg_upgrade.
To address your need of needing a parallel, the script would just then
suggest if the user wants a --jobs option. The if/else for number of jobs is
not needed in that case, maybe only for ifndef WIN32 else case.

Regards,
Kirk Jamison

#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jamison, Kirk (#25)
Re: pg_upgrade: Pass -j down to vacuumdb

On 2019-Feb-01, Jamison, Kirk wrote:

I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I thought what
the other developers suggested is to utilize it so that --jobs for vacuumdb
would be optional and just based from user-specified option $VACUUMDB_OPTS.
By which it would not utilize the amount of jobs used for pg_upgrade.
To address your need of needing a parallel, the script would just then
suggest if the user wants a --jobs option. The if/else for number of jobs is
not needed in that case, maybe only for ifndef WIN32 else case.

No Kirk, you got it right -- (some of) those ifdefs are not needed, as
adding the --jobs in the command line is not needed. I do think some
extra whitespace in the format string is needed.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#27Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Alvaro Herrera (#26)
2 attachment(s)
Re: pg_upgrade: Pass -j down to vacuumdb

Hi,

On 2/1/19 4:58 AM, Alvaro Herrera wrote:

On 2019-Feb-01, Jamison, Kirk wrote:

I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I thought what
the other developers suggested is to utilize it so that --jobs for vacuumdb
would be optional and just based from user-specified option $VACUUMDB_OPTS.
By which it would not utilize the amount of jobs used for pg_upgrade.
To address your need of needing a parallel, the script would just then
suggest if the user wants a --jobs option. The if/else for number of jobs is
not needed in that case, maybe only for ifndef WIN32 else case.

No Kirk, you got it right -- (some of) those ifdefs are not needed, as
adding the --jobs in the command line is not needed. I do think some
extra whitespace in the format string is needed.

I'm confused by this.

The point of the patch is to highlight to the user that even though
he/she does "pg_upgrade -j 8" the "-j 8" option isn't passed down to
vacuumdb.

Default value is 1, so in that case the echo command doesn't highlight
that fact, otherwise it is. The user can then cancel the script and use
the suggested command line.

The script then executes the vaccumdb command with the $VACUUMDB_OPTS
argument as noted in the documentation.

Sample script attached as well.

I added extra space in the --analyze-in-stages part.

Kirk, can you provide a delta patch to match what you are thinking ?

Best regards,
Jesper

Attachments:

v7_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patchtext/x-patch; name=v7_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patchDownload
From 4b5856725af5d971a26a2ba150c1067ee21bb242 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the --jobs option isn't passed down to
 vacuumdb by default.

Author: Jesper Pedersen <jesper.pedersen@redhat.com>
---
 doc/src/sgml/ref/pgupgrade.sgml |  6 +++++-
 src/bin/pg_upgrade/check.c      | 28 ++++++++++++++++++++++------
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..84a7ccb6f1 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,11 @@ NET STOP postgresql-&majorversion;
      in parallel;  a good place to start is the maximum of the number of
      CPU cores and tablespaces.  This option can dramatically reduce the
      time to upgrade a multi-database server running on a multiprocessor
-     machine.
+     machine. Note that this option isn't passed to the
+     <application>vacuumdb</application> application by default.
+     The system environment variable <literal>VACUUMDB_OPTS</literal> can be
+     used to pass extra options to the <application>vacuumdb</application>
+     application during the script execution.
     </para>
 
     <para>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..570b7c9ae7 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,31 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1)
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s    \"%s/vacuumdb\" %s--jobs %d --all %s%s\n", ECHO_QUOTE,
+				new_cluster.bindir, user_specification.data, user_opts.jobs,
+				/* Did we copy the free space files? */
+				(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+				"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s $VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s %%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2

analyze_new_cluster.shapplication/x-shellscript; name=analyze_new_cluster.shDownload
#28Andres Freund
andres@anarazel.de
In reply to: Jesper Pedersen (#27)
Re: pg_upgrade: Pass -j down to vacuumdb

Hi,

On 2019-02-01 14:14:27 -0500, Jesper Pedersen wrote:

From 4b5856725af5d971a26a2ba150c1067ee21bb242 Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the --jobs option isn't passed down to
vacuumdb by default.

This is currently marked as waiting for author, which doesn't seem
accurate. I've set it to needs review, and moved it to the next
commitfest - please revert if that's not accurate.

- Andres

#29Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jesper Pedersen (#27)
RE: pg_upgrade: Pass -j down to vacuumdb

Hi,

On February 1, 2019 8:14 PM +0000, Jesper Pedersen wrote:

On 2/1/19 4:58 AM, Alvaro Herrera wrote:

On 2019-Feb-01, Jamison, Kirk wrote:

I'm not sure if misunderstood the purpose of $VACUUMDB_OPTS. I
thought what the other developers suggested is to utilize it so that
--jobs for vacuumdb would be optional and just based from user-specified option $VACUUMDB_OPTS.
By which it would not utilize the amount of jobs used for pg_upgrade.
To address your need of needing a parallel, the script would just
then suggest if the user wants a --jobs option. The if/else for
number of jobs is not needed in that case, maybe only for ifndef WIN32 else case.

No Kirk, you got it right -- (some of) those ifdefs are not needed, as
adding the --jobs in the command line is not needed. I do think some
extra whitespace in the format string is needed.

The point of the patch is to highlight to the user that even though he/she does "pg_upgrade -j 8" the "-j 8" option isn't passed down to vacuumdb.
Default value is 1, so in that case the echo command doesn't highlight that fact, otherwise it is. The user can then cancel the script and use the suggested command line.
The script then executes the vaccumdb command with the $VACUUMDB_OPTS argument as noted in the documentation.
Sample script attached as well.

Sorry I think I might have understood now where you are coming from,
where your script clarifies that it's not necessarily passed down.
If committers can let it pass, maybe it's necessary to highlight in the script,
Something like:
"If you would like default statistics as quickly as possible (e.g. run in parallel)..."

The difference is still perhaps your use of --jobs option
as somehow "explicitly" embedded in the code, compared to the suggestion of
making it optional by using the $VACUUMDB_OPTS variable, so that
jobs need not to be explicitly written, unless the user wants to.
(which I also think it's a better improvement because it's optional)

Some quick code check. sorry, don't have much time to write a patch for it today.
Perhaps you could write it similar to what you did in the --analyze-in-stages part.
Maybe something like (not sure if correct):
+	fprintf(script, "echo %s    \"%s/vacuumdb\" %s $VACUUMDB_OPTS --all %s%s\n", ECHO_QUOTE,
+			new_cluster.bindir, user_specification.data,
+			/* Did we copy the free space files? */
+			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+			"--analyze-only" : "--analyze", ECHO_QUOTE);

I'll continue to review though from time to time.

Regards,
Kirk Jamison

#30Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jesper Pedersen (#27)
2 attachment(s)
RE: pg_upgrade: Pass -j down to vacuumdb

Hi Jesper,

Sorry I almost completely forgot to get back to you on this.
Actually your patch works when I tested it before,
and I understand the intention.
.
Although a point was raised by other developers by making
--jobs optional in the suggested line by using the env variable instead.

Kirk, can you provide a delta patch to match what you are thinking ?

I was thinking of something like the attached,
but the user needs to edit the variable value though.
I am not sure how to implement the suggestion to allow the
script to absorb a value from the environment,
so that it doesn't need to be edited at all.

Regards,
Kirk Jamison

Attachments:

v8_0001-Use-vacuumdb-opt-env-variable-to-pass-vacuumdb-options.patchapplication/octet-stream; name=v8_0001-Use-vacuumdb-opt-env-variable-to-pass-vacuumdb-options.patchDownload
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf..84a7ccb 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,11 @@ NET STOP postgresql-&majorversion;
      in parallel;  a good place to start is the maximum of the number of
      CPU cores and tablespaces.  This option can dramatically reduce the
      time to upgrade a multi-database server running on a multiprocessor
-     machine.
+     machine. Note that this option isn't passed to the
+     <application>vacuumdb</application> application by default.
+     The system environment variable <literal>VACUUMDB_OPTS</literal> can be
+     used to pass extra options to the <application>vacuumdb</application>
+     application during the script execution.
     </para>
 
     <para>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa70..e4356f0 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -493,17 +493,32 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 
 	fprintf(script, "echo %sIf you would like default statistics as quickly as possible, cancel%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %sthis script and run:%s\n",
+	fprintf(script, "echo %sthis script. You may add optional vacuumdb options (e.g. --jobs)%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+	fprintf(script, "echo %sby editing $VACUUMDB_OPTS, then run the command below.%s\n",
+			ECHO_QUOTE, ECHO_QUOTE);
+#ifndef WIN32
+	fprintf(script, "echo %s    \"%s/vacuumdb\" %s $VACUUMDB_OPTS --all %s%s\n", ECHO_QUOTE,
+			new_cluster.bindir, user_specification.data,
+			/* Did we copy the free space files? */
+			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+			"--analyze-only" : "--analyze", ECHO_QUOTE);
+#else
+	fprintf(script, "echo %s    \"%s\\vacuumdb.exe\" %s %%VACUUMDB_OPTS%% --all %s%s\n", ECHO_QUOTE,
 			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
+			/* Did we copy the free space files? */
 			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
 			"--analyze-only" : "--analyze", ECHO_QUOTE);
+#endif
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s $VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s %%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
analyze_new_cluster.shapplication/octet-stream; name=analyze_new_cluster.shDownload
#31Robert Haas
robertmhaas@gmail.com
In reply to: Jamison, Kirk (#30)
Re: pg_upgrade: Pass -j down to vacuumdb

On Mon, Mar 11, 2019 at 10:05 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

I was thinking of something like the attached,

+     machine. Note that this option isn't passed to the
+     <application>vacuumdb</application> application by default.

The words 'by default' should be removed here, because there is also
no non-default way to get that behavior, either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#32Jesper Pedersen
jesper.pedersen@redhat.com
In reply to: Robert Haas (#31)
1 attachment(s)
Re: pg_upgrade: Pass -j down to vacuumdb

Hi,

Thanks Kirk !

On 3/12/19 2:20 PM, Robert Haas wrote:

The words 'by default' should be removed here, because there is also
no non-default way to get that behavior, either.

Here is v9 based on Kirk's and your input.

Best regards,
Jesper

Attachments:

v9_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patchtext/x-patch; name=v9_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patchDownload
From 5b879f79300412638705e32aa3ed51aff3cbe75c Mon Sep 17 00:00:00 2001
From: jesperpedersen <jesper.pedersen@redhat.com>
Date: Tue, 12 Mar 2019 16:02:27 -0400
Subject: [PATCH] Highlight that the --jobs option isn't passed down to
 vacuumdb.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Author: Jesper Pedersen and Kirk Jamison
Reviewed-by: Peter Eisentraut, Jerry Sievers, Michael Paquier, Tom Lane, Fabrízio de Royes Mello, Álvaro Herrera and Robert Haas
Discussion: https://www.postgresql.org/message-id/flat/11b6b108-6ac0-79a6-785a-f13dfe60ab92@redhat.com
---
 doc/src/sgml/ref/pgupgrade.sgml |  6 +++++-
 src/bin/pg_upgrade/check.c      | 23 +++++++++++++++++++----
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..5f2a94918d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,11 @@ NET STOP postgresql-&majorversion;
      in parallel;  a good place to start is the maximum of the number of
      CPU cores and tablespaces.  This option can dramatically reduce the
      time to upgrade a multi-database server running on a multiprocessor
-     machine.
+     machine. Note that this option isn't passed to the
+     <application>vacuumdb</application> application.
+     The system environment variable <literal>VACUUMDB_OPTS</literal> can be
+     used to pass extra options to the <application>vacuumdb</application>
+     application during the script execution.
     </para>
 
     <para>
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..e4356f011c 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -493,17 +493,32 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 
 	fprintf(script, "echo %sIf you would like default statistics as quickly as possible, cancel%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %sthis script and run:%s\n",
+	fprintf(script, "echo %sthis script. You may add optional vacuumdb options (e.g. --jobs)%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+	fprintf(script, "echo %sby editing $VACUUMDB_OPTS, then run the command below.%s\n",
+			ECHO_QUOTE, ECHO_QUOTE);
+#ifndef WIN32
+	fprintf(script, "echo %s    \"%s/vacuumdb\" %s $VACUUMDB_OPTS --all %s%s\n", ECHO_QUOTE,
 			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
+			/* Did we copy the free space files? */
 			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
 			"--analyze-only" : "--analyze", ECHO_QUOTE);
+#else
+	fprintf(script, "echo %s    \"%s\\vacuumdb.exe\" %s %%VACUUMDB_OPTS%% --all %s%s\n", ECHO_QUOTE,
+			new_cluster.bindir, user_specification.data,
+			/* Did we copy the free space files? */
+			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+			"--analyze-only" : "--analyze", ECHO_QUOTE);
+#endif
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s $VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s %%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2

#33Jamison, Kirk
k.jamison@jp.fujitsu.com
In reply to: Jesper Pedersen (#32)
RE: pg_upgrade: Pass -j down to vacuumdb

Hi Jesper,

Thanks Kirk !

On 3/12/19 2:20 PM, Robert Haas wrote:
The words 'by default' should be removed here, because there is also
no non-default way to get that behavior, either.

Here is v9 based on Kirk's and your input.

Thanks! Although there were trailing whitespaces.
After fixing that, it works as intended (built & passed tests successfully).
That aside, the v9 patch looks good and comment was addressed.

Regards,
Kirk Jamison

#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jesper Pedersen (#32)
Re: pg_upgrade: Pass -j down to vacuumdb

Jesper Pedersen <jesper.pedersen@redhat.com> writes:

[ v9_0001-Highlight-that-the-jobs-option-isn-t-passed-down-to-.patch ]

Now that I've looked at this, it seems like it's fairly confused about
what the proposed VACUUMDB_OPTS variable would actually do for you.
If you're going to run vacuumdb directly, it hardly seems like you'd
bother with setting such a variable; you'd just enter the options you
want directly on the command line.

Conversely, if what you plan to do is set VACUUMDB_OPTS and then
run this script, the fact that --analyze-in-stages is hard-wired
into the script's invocation doesn't seem right either: you probably
don't want that if your goal is to get done as fast as possible.

In short, I'm not convinced that most of this patch is an improvement
on the status quo. I think we'd be best off to just take the idea
of explicitly mentioning adding --jobs to a manual run, ie roughly

	fprintf(script, "echo %sthis script and run:%s\n",
			ECHO_QUOTE, ECHO_QUOTE);
	fprintf(script, "echo %s    \"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
			new_cluster.bindir, user_specification.data,
	/* Did we copy the free space files? */
			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	fprintf(script, "echo %sYou may wish to add --jobs=N for parallel analyzing.%s\n",
+			ECHO_QUOTE, ECHO_QUOTE);
	fprintf(script, "echo%s\n\n", ECHO_BLANK);

fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",

regards, tom lane

#35Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#34)
Re: pg_upgrade: Pass -j down to vacuumdb

On Mon, Mar 25, 2019 at 05:57:50PM -0400, Tom Lane wrote:

In short, I'm not convinced that most of this patch is an improvement
on the status quo. I think we'd be best off to just take the idea
of explicitly mentioning adding --jobs to a manual run, ie roughly

Yeah, no objections from here to keep that stuff the simpler the
better. So I am on board with your suggestion.
--
Michael

#36Daniel Gustafsson
daniel@yesql.se
In reply to: Michael Paquier (#35)
Re: pg_upgrade: Pass -j down to vacuumdb

On Tuesday, March 26, 2019 3:20 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Mar 25, 2019 at 05:57:50PM -0400, Tom Lane wrote:

In short, I'm not convinced that most of this patch is an improvement
on the status quo. I think we'd be best off to just take the idea
of explicitly mentioning adding --jobs to a manual run, ie roughly

Yeah, no objections from here to keep that stuff the simpler the
better. So I am on board with your suggestion.

+1, I'd rather have that as "documentation" after an upgrade.

cheers ./daniel

#37Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Tom Lane (#34)
Re: pg_upgrade: Pass -j down to vacuumdb

On 2019-03-25 22:57, Tom Lane wrote:

+	fprintf(script, "echo %sYou may wish to add --jobs=N for parallel analyzing.%s\n",
+			ECHO_QUOTE, ECHO_QUOTE);

But then you get that information after you have already started the script.

I don't find any information about this analyze business on the
pg_upgrade reference page. Maybe a discussion there could explain the
different paths better than making the output script extra complicated.

Essentially: If you want a slow and gentle analyze, use the supplied
script. If you want a fast analyze, use vacuumdb, perhaps with an
appropriate --jobs option. Note that pg_upgrade --jobs and vacuumdb
--jobs are resource-bound in different ways, so the same value might not
be appropriate for both.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#37)
Re: pg_upgrade: Pass -j down to vacuumdb

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

On 2019-03-25 22:57, Tom Lane wrote:

+	fprintf(script, "echo %sYou may wish to add --jobs=N for parallel analyzing.%s\n",
+			ECHO_QUOTE, ECHO_QUOTE);

But then you get that information after you have already started the script.

Yes, but that's already true of all the info that this script prints out.

I don't find any information about this analyze business on the
pg_upgrade reference page. Maybe a discussion there could explain the
different paths better than making the output script extra complicated.

Essentially: If you want a slow and gentle analyze, use the supplied
script. If you want a fast analyze, use vacuumdb, perhaps with an
appropriate --jobs option. Note that pg_upgrade --jobs and vacuumdb
--jobs are resource-bound in different ways, so the same value might not
be appropriate for both.

I have no objection to handling it that way (i.e., just as a doc fix).
Or we could do both.

regards, tom lane

#39Jeff Janes
jeff.janes@gmail.com
In reply to: Peter Eisentraut (#37)
Re: pg_upgrade: Pass -j down to vacuumdb

On Tue, Mar 26, 2019 at 7:28 AM Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 2019-03-25 22:57, Tom Lane wrote:

+ fprintf(script, "echo %sYou may wish to add --jobs=N for parallel

analyzing.%s\n",

+ ECHO_QUOTE, ECHO_QUOTE);

But then you get that information after you have already started the
script.

True, but the same goes for all the other information there, and it sleeps
to let you break out of it. And I make it a habit to glance through any
scripts someone suggests that I run, so would notice the embedded advice
without running it at all.

I don't find any information about this analyze business on the

pg_upgrade reference page. Maybe a discussion there could explain the
different paths better than making the output script extra complicated.

Essentially: If you want a slow and gentle analyze, use the supplied
script. If you want a fast analyze, use vacuumdb, perhaps with an
appropriate --jobs option. Note that pg_upgrade --jobs and vacuumdb
--jobs are resource-bound in different ways, so the same value might not
be appropriate for both.

To me, analyze-in-stages is not about gentleness at all. For example, it
does nothing to move vacuum_cost_delay away from its default of 0. Rather,
it is about slamming the bare minimum statistics in there as fast as
possible, so your database doesn't keel over from horrible query plans on
even simple queries as soon as you reopen. I want the database to survive
long enough for the more complete statistics to be gathered. If you have
quickly accumulated max_connection processes all running horrible query
plans that never finish, your database might as well still be closed for
all the good it does the users. And all the load generated by those is
going to make the critical ANALYZE all that much slower.

At first blush I thought it was obvious that you would not want to run
analyze-in-stages in parallel. But after thinking about it some more and
reflecting on experience doing some troublesome upgrades, I would reverse
that and say it is now obvious you do want at least the first stage of
analyze-in-stages, and probably the first two, to run in parallel. That is
not currently an option it supports, so we can't really recommend it in the
script or the docs.

But we could at least adopt the more straightforward patch, suggest that if
they don't want analyze-in-stages they should consider doing the big-bang
analyze in parallel.

Cheers,

Jeff

#40Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Jeff Janes (#39)
Re: pg_upgrade: Pass -j down to vacuumdb

On 2019-03-28 02:43, Jeff Janes wrote:

At first blush I thought it was obvious that you would not want to run
analyze-in-stages in parallel.  But after thinking about it some more
and reflecting on experience doing some troublesome upgrades, I would
reverse that and say it is now obvious you do want at least the first
stage of analyze-in-stages, and probably the first two, to run in
parallel.  That is not currently an option it supports, so we can't
really recommend it in the script or the docs.

So do you think we should copy down the -j option from pg_upgrade, or
make some other arrangement?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#41Jeff Janes
jeff.janes@gmail.com
In reply to: Peter Eisentraut (#40)
Re: pg_upgrade: Pass -j down to vacuumdb

On Fri, Mar 29, 2019 at 5:58 AM Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 2019-03-28 02:43, Jeff Janes wrote:

At first blush I thought it was obvious that you would not want to run
analyze-in-stages in parallel. But after thinking about it some more
and reflecting on experience doing some troublesome upgrades, I would
reverse that and say it is now obvious you do want at least the first
stage of analyze-in-stages, and probably the first two, to run in
parallel. That is not currently an option it supports, so we can't
really recommend it in the script or the docs.

So do you think we should copy down the -j option from pg_upgrade, or
make some other arrangement?

For 12, I think we should not pass it down so that it runs automatically,
and just go with a doc-only patch instead. Where the text written into
the analyze_new_cluster.sh recommending to hit Ctrl-C and do something else
is counted as documentation.

I agree with you that the value of -j that was used for pg_ugrade might not
be suitable for vacuumdb, but anyone who tests things thoroughly enough to
know exactly what value to use for each is not going to be the target
audience of analyze_new_cluster.sh anyway. So I think the -j used in
pg_upgrade can just be written directly into the suggestion, and that would
be good enough for the intended use.

Ideally I think the analyze-in-stages should have the ability to
parallelize the first stage and not the last one, but that is not v12
material.

Even more ideally it should only have two stages, not three. One stage
runs to generate minimally-tolerable stats before the database is opened to
other users, and one runs after it is open (but hopefully before the big
end-of-month reports get run, or whatever the big statistics-sensitive
queries are on your system). But we don't really have a concept of "open
to other users" in the first place, and doing it yourself by juggling
pg_hba files is annoying and error prone. So maybe the first stage could
be run by pg_upgrade itself, while the new server is still running on a
linux socket in a private directory.

The defense of the current three-stage method for analyze-in-stages is that
very few people are likely to know just what the level of
"minimally-tolerable stats" for their system actually are, because upgrades
are rare enough not to learn by experience, and realistic load-generators
are rare enough not to learn from test/QA environments. If the
default_statistics_target=1 stats are enough, then you got them quickly.
If they aren't, at least you didn't waste too much time collecting them
before moving on to the second-stage default_statistics_target=10 and then
the final stage. So the three stages are amortizing over our ignorance.

Cheers,

Jeff

#42Justin Pryzby
pryzby@telsasoft.com
In reply to: Jeff Janes (#41)
Re: pg_upgrade: Pass -j down to vacuumdb

On Wed, Apr 03, 2019 at 04:42:14PM -0400, Jeff Janes wrote:

So maybe the first stage could
be run by pg_upgrade itself, while the new server is still running on a
linux socket in a private directory.

I think that would take too long. It would be less of an issue if there was
feedback/progress from pg_upgrade during the analyze.

For our upgrades (which typically take ~15min but several customers take up to
~60min), I only analyze base tables (essentially, those which are neither
parents nor children), then start services, then ANALYZE with default stats
target. I would want to avoid delaying services restart for more than another
(say) 5 minutes, and I would want to avoid even that unless there was a
progress report indicating that it's projected to take only a few more minutes.

I just did a test on one of our large-but-not-huge customers. With
stats_target=1, analyzing a 145GB partitioned table looks like it'll take
perhaps an hour; they have ~1TB data, so delaying services during ANALYZE would
nullify the utility of pg_upgrade. I can restore the essential tables from
backup in 15-30 minutes.

It might be fine if pg_upgrade took an option which enabled analyze, perhaps
instead of outputting analyze_new_cluster.sh. But actually, a problem with
*that* is that currently pg_upgrade avoids starting the new cluster. That
seems to be deliberate, since, with --link, that's an irreversible operation:
it's unsafe to start the old cluster afterwards.

Tangent: I have a queued mail from ~15 months ago wherein I proposed adding to
pg_upgrade an option to remove the old data dir (or probably only the files
associated with known relations). I realized at the time that would be pretty
scary without having first verified that the new cluster at least starts. I'm
not sure how good an idea that is, but --startnewcluster would be needed there,
too.

Justin

#43Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Justin Pryzby (#42)
Re: pg_upgrade: Pass -j down to vacuumdb

On 2019-04-03 23:24, Justin Pryzby wrote:

I just did a test on one of our large-but-not-huge customers. With
stats_target=1, analyzing a 145GB partitioned table looks like it'll take
perhaps an hour; they have ~1TB data, so delaying services during ANALYZE would
nullify the utility of pg_upgrade. I can restore the essential tables from
backup in 15-30 minutes.

I think the real fix is to have pg_upgrade copy over the statistics.
They are right there after all, just take them.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#43)
Re: pg_upgrade: Pass -j down to vacuumdb

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

I think the real fix is to have pg_upgrade copy over the statistics.
They are right there after all, just take them.

Well, it's not "just take them", we need to develop some infrastructure
that will permit coping with cross-version differences in the stats
storage. This isn't insoluble, but it will take some work. We had
a prior discussion that trailed off without much result:

/messages/by-id/724322880.K8vzik8zPz@abook

regards, tom lane