add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

Started by Nathan Bossartover 1 year ago8 messages
#1Nathan Bossart
nathandbossart@gmail.com
1 attachment(s)

This is likely small potatoes compared to some of the other
pg_upgrade-related improvements I've proposed [0]/messages/by-id/20240418041712.GA3441570@nathanxps13 [1]/messages/by-id/20240503025140.GA1227404@nathanxps13 or plan to propose,
but this is easy enough, and I already wrote the patch, so here it is.
AFAICT there's no reason to bother syncing these dump files to disk. If
someone pulls the plug during pg_upgrade, it's not like you can resume
pg_upgrade from where it left off. Also, I think we skipped syncing before
v10, anyway, as the --no-sync flag was only added in commit 96a7128, which
added the code to sync dump files, too.

[0]: /messages/by-id/20240418041712.GA3441570@nathanxps13
[1]: /messages/by-id/20240503025140.GA1227404@nathanxps13

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-add-no-sync-to-pg_upgrade-s-calls-to-pg_dump-all.patchtext/x-diff; charset=us-asciiDownload
From 998613046fb5ff5c6618964c821109b573bfa89b Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Fri, 3 May 2024 10:35:21 -0500
Subject: [PATCH v1 1/1] add --no-sync to pg_upgrade's calls to pg_dump[all]

---
 src/bin/pg_upgrade/dump.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 29fb45b928..f052a4985c 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -22,9 +22,10 @@ generate_old_dump(void)
 	/* run new pg_dumpall binary for globals */
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
 			  "\"%s/pg_dumpall\" %s --globals-only --quote-all-identifiers "
-			  "--binary-upgrade %s -f \"%s/%s\"",
+			  "--binary-upgrade %s %s -f \"%s/%s\"",
 			  new_cluster.bindir, cluster_conn_opts(&old_cluster),
 			  log_opts.verbose ? "--verbose" : "",
+			  GET_MAJOR_VERSION(old_cluster.major_version) >= 1000 ? "--no-sync" : "",
 			  log_opts.dumpdir,
 			  GLOBALS_DUMP_FILE);
 	check_ok();
@@ -53,9 +54,10 @@ generate_old_dump(void)
 
 		parallel_exec_prog(log_file_name, NULL,
 						   "\"%s/pg_dump\" %s --schema-only --quote-all-identifiers "
-						   "--binary-upgrade --format=custom %s --file=\"%s/%s\" %s",
+						   "--binary-upgrade --format=custom %s %s --file=\"%s/%s\" %s",
 						   new_cluster.bindir, cluster_conn_opts(&old_cluster),
 						   log_opts.verbose ? "--verbose" : "",
+						   GET_MAJOR_VERSION(old_cluster.major_version) >= 1000 ? "--no-sync" : "",
 						   log_opts.dumpdir,
 						   sql_file_name, escaped_connstr.data);
 
-- 
2.25.1

#2Peter Eisentraut
peter@eisentraut.org
In reply to: Nathan Bossart (#1)
Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

On 03.05.24 19:13, Nathan Bossart wrote:

This is likely small potatoes compared to some of the other
pg_upgrade-related improvements I've proposed [0] [1] or plan to propose,
but this is easy enough, and I already wrote the patch, so here it is.
AFAICT there's no reason to bother syncing these dump files to disk. If
someone pulls the plug during pg_upgrade, it's not like you can resume
pg_upgrade from where it left off. Also, I think we skipped syncing before
v10, anyway, as the --no-sync flag was only added in commit 96a7128, which
added the code to sync dump files, too.

Looks good to me.

#3Nathan Bossart
nathandbossart@gmail.com
In reply to: Peter Eisentraut (#2)
1 attachment(s)
Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

On Wed, May 08, 2024 at 10:09:46AM +0200, Peter Eisentraut wrote:

On 03.05.24 19:13, Nathan Bossart wrote:

This is likely small potatoes compared to some of the other
pg_upgrade-related improvements I've proposed [0] [1] or plan to propose,
but this is easy enough, and I already wrote the patch, so here it is.
AFAICT there's no reason to bother syncing these dump files to disk. If
someone pulls the plug during pg_upgrade, it's not like you can resume
pg_upgrade from where it left off. Also, I think we skipped syncing before
v10, anyway, as the --no-sync flag was only added in commit 96a7128, which
added the code to sync dump files, too.

Looks good to me.

Thanks for looking. I noticed that the version check is unnecessary since
we always use the new binary's pg_dump[all], so I removed that in v2.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-add-no-sync-to-pg_upgrade-s-calls-to-pg_dump-all.patchtext/x-diff; charset=us-asciiDownload
From 265a999ed65bf56491f76ae013f705ab64491486 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathan@postgresql.org>
Date: Fri, 3 May 2024 10:35:21 -0500
Subject: [PATCH v2 1/1] add --no-sync to pg_upgrade's calls to pg_dump[all]

---
 src/bin/pg_upgrade/dump.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
index 29fb45b928..8345f55be8 100644
--- a/src/bin/pg_upgrade/dump.c
+++ b/src/bin/pg_upgrade/dump.c
@@ -22,7 +22,7 @@ generate_old_dump(void)
 	/* run new pg_dumpall binary for globals */
 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
 			  "\"%s/pg_dumpall\" %s --globals-only --quote-all-identifiers "
-			  "--binary-upgrade %s -f \"%s/%s\"",
+			  "--binary-upgrade %s --no-sync -f \"%s/%s\"",
 			  new_cluster.bindir, cluster_conn_opts(&old_cluster),
 			  log_opts.verbose ? "--verbose" : "",
 			  log_opts.dumpdir,
@@ -53,7 +53,7 @@ generate_old_dump(void)
 
 		parallel_exec_prog(log_file_name, NULL,
 						   "\"%s/pg_dump\" %s --schema-only --quote-all-identifiers "
-						   "--binary-upgrade --format=custom %s --file=\"%s/%s\" %s",
+						   "--binary-upgrade --format=custom %s --no-sync --file=\"%s/%s\" %s",
 						   new_cluster.bindir, cluster_conn_opts(&old_cluster),
 						   log_opts.verbose ? "--verbose" : "",
 						   log_opts.dumpdir,
-- 
2.25.1

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nathan Bossart (#3)
Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

Nathan Bossart <nathandbossart@gmail.com> writes:

Thanks for looking. I noticed that the version check is unnecessary since
we always use the new binary's pg_dump[all], so I removed that in v2.

+1

regards, tom lane

#5Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#4)
Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

On Wed, May 08, 2024 at 02:49:58PM -0400, Tom Lane wrote:

Nathan Bossart <nathandbossart@gmail.com> writes:

Thanks for looking. I noticed that the version check is unnecessary since
we always use the new binary's pg_dump[all], so I removed that in v2.

+1

+1.  Could there be an argument in favor of a backpatch?  This is a
performance improvement, but one could also side that the addition of
sync support in pg_dump[all] has made that a regression that we'd
better fix because the flushes don't matter in this context.  They
also bring costs for no gain.
--
Michael
#6Nathan Bossart
nathandbossart@gmail.com
In reply to: Michael Paquier (#5)
Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

On Thu, May 09, 2024 at 09:03:56AM +0900, Michael Paquier wrote:

+1. Could there be an argument in favor of a backpatch? This is a
performance improvement, but one could also side that the addition of
sync support in pg_dump[all] has made that a regression that we'd
better fix because the flushes don't matter in this context. They
also bring costs for no gain.

I don't see a strong need to back-patch this, if for no other reason than
it seems to have gone unnoticed for 7 major versions. Plus, based on my
admittedly limited testing, this is unlikely to provide significant
improvements.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#7Daniel Gustafsson
daniel@yesql.se
In reply to: Nathan Bossart (#6)
Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

On 9 May 2024, at 21:34, Nathan Bossart <nathandbossart@gmail.com> wrote:

On Thu, May 09, 2024 at 09:03:56AM +0900, Michael Paquier wrote:

+1. Could there be an argument in favor of a backpatch? This is a
performance improvement, but one could also side that the addition of
sync support in pg_dump[all] has made that a regression that we'd
better fix because the flushes don't matter in this context. They
also bring costs for no gain.

I don't see a strong need to back-patch this, if for no other reason than
it seems to have gone unnoticed for 7 major versions. Plus, based on my
admittedly limited testing, this is unlikely to provide significant
improvements.

Agreed, this is a nice little improvement but it's unlikely to be enough of a
speedup to warrant changing the backbranches.

--
Daniel Gustafsson

#8Nathan Bossart
nathandbossart@gmail.com
In reply to: Daniel Gustafsson (#7)
Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

Committed.

--
nathan