More performance improvements for pg_dump in binary upgrade mode

Started by Daniel Gustafssonover 1 year ago4 messages

daniel@yesql.se

over 1 year ago

1 attachment(s)

Prompted by an off-list bugreport of pg_upgrade hanging (which turned out to be
slow enough to be perceived to hang) for large schemas I had a look at pg_dump
performance during --binary-upgrade mode today. My initial take was to write
more or less exactly what Nathan did in [0]https://commitfest.postgresql.org/48/4936/, only to realize that it was a)
already proposed and b) I had even reviewed it. Doh.

The next attempt was to reduce more per-object queries from binary upgrade, and
the typarray lookup binary_upgrade_set_type_oids_by_type_oid seemed like a good
candidate for a cache lookup. Since already cache type TypeInfo objects, if we
add typarray to TypeInfo we can use the existing lookup code.

As a baseline, pg_dump dumps a synthetic workload of 10,000 (empty) relations
with a width of 1-10 columns:

$ time ./bin/pg_dump --schema-only --quote-all-identifiers --format=custom \
--file a postgres > /dev/null

real 0m1.256s
user 0m0.273s
sys 0m0.059s

The same dump in binary upgrade mode runs significantly slower:

$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
--format=custom --file a postgres > /dev/null

real 1m9.921s
user 0m0.782s
sys 0m0.436s

With the typarray caching from the patch attached here added:

$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
--format=custom --file b postgres > /dev/null

real 0m45.210s
user 0m0.655s
sys 0m0.299s

With the typarray caching from the patch attached here added *and* Nathan's
patch from [0]https://commitfest.postgresql.org/48/4936/ added:

$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
--format=custom --file a postgres > /dev/null

real 0m1.566s
user 0m0.309s
sys 0m0.080s

The combination of these patches thus puts binary uphrade mode almost on par
with a plain dump, which has the potential to make upgrades of large schemas
faster. Parallel-parking this patch with Nathan's in the July CF, just wanted
to type it up while it was fresh in my mind.

--
Daniel Gustafsson

[0]: https://commitfest.postgresql.org/48/4936/

Attachments:

0001-Cache-typarray-for-fast-lookups-in-binary-upgrade-mo.patchapplication/octet-stream; name=0001-Cache-typarray-for-fast-lookups-in-binary-upgrade-mo.patch; x-unix-mode=0644Download

From 8dbb3c1caaa0adefafab8483dd7827ce45157a20 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Wed, 15 May 2024 22:03:22 +0200
Subject: [PATCH] Cache typarray for fast lookups in binary upgrade mode

When upgrading a large schema it adds significant overhead to perform
individual catalog lookups per relation in order to retrieve Oid for
preserving Oid calls. This instead adds the typarray to the TypeInfo
cache which then allows for fast lookups using the existing API.
---
 src/bin/pg_dump/pg_dump.c | 20 ++++++++------------
 src/bin/pg_dump/pg_dump.h |  1 +
 2 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index ac920f64c7..5fb4453712 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5304,23 +5304,15 @@ binary_upgrade_set_type_oids_by_type_oid(Archive *fout,
 	Oid			pg_type_array_oid;
 	Oid			pg_type_multirange_oid;
 	Oid			pg_type_multirange_array_oid;
+	TypeInfo   *tinfo;
 
 	appendPQExpBufferStr(upgrade_buffer, "\n-- For binary upgrade, must preserve pg_type oid\n");
 	appendPQExpBuffer(upgrade_buffer,
 					  "SELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('%u'::pg_catalog.oid);\n\n",
 					  pg_type_oid);
 
-	appendPQExpBuffer(upgrade_query,
-					  "SELECT typarray "
-					  "FROM pg_catalog.pg_type "
-					  "WHERE oid = '%u'::pg_catalog.oid;",
-					  pg_type_oid);
-
-	res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
-
-	pg_type_array_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typarray")));
-
-	PQclear(res);
+	tinfo = findTypeByOid(pg_type_oid);
+	pg_type_array_oid = tinfo->typarray;
 
 	if (!OidIsValid(pg_type_array_oid) && force_array_type)
 		pg_type_array_oid = get_next_possible_free_pg_type_oid(fout, upgrade_query);
@@ -5785,6 +5777,7 @@ getTypes(Archive *fout, int *numTypes)
 	int			i_typtype;
 	int			i_typisdefined;
 	int			i_isarray;
+	int			i_typarray;
 
 	/*
 	 * we include even the built-in types because those may be used as array
@@ -5805,7 +5798,7 @@ getTypes(Archive *fout, int *numTypes)
 						 "typnamespace, typacl, "
 						 "acldefault('T', typowner) AS acldefault, "
 						 "typowner, "
-						 "typelem, typrelid, "
+						 "typelem, typrelid, typarray, "
 						 "CASE WHEN typrelid = 0 THEN ' '::\"char\" "
 						 "ELSE (SELECT relkind FROM pg_class WHERE oid = typrelid) END AS typrelkind, "
 						 "typtype, typisdefined, "
@@ -5832,6 +5825,7 @@ getTypes(Archive *fout, int *numTypes)
 	i_typtype = PQfnumber(res, "typtype");
 	i_typisdefined = PQfnumber(res, "typisdefined");
 	i_isarray = PQfnumber(res, "isarray");
+	i_typarray = PQfnumber(res, "typarray");
 
 	for (i = 0; i < ntups; i++)
 	{
@@ -5864,6 +5858,8 @@ getTypes(Archive *fout, int *numTypes)
 		else
 			tyinfo[i].isArray = false;
 
+		tyinfo[i].typarray = atooid(PQgetvalue(res, i, i_typarray));
+
 		if (tyinfo[i].typtype == TYPTYPE_MULTIRANGE)
 			tyinfo[i].isMultirange = true;
 		else
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index f518a1e6d2..135219bf4a 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -200,6 +200,7 @@ typedef struct _typeInfo
 	const char *rolname;
 	Oid			typelem;
 	Oid			typrelid;
+	Oid			typarray;
 	char		typrelkind;		/* 'r', 'v', 'c', etc */
 	char		typtype;		/* 'b', 'c', etc */
 	bool		isArray;		/* true if auto-generated array type */
-- 
2.39.3 (Apple Git-146)

Nathan Bossart

nathandbossart@gmail.com

over 1 year ago

In reply to: Daniel Gustafsson (#1)

Re: More performance improvements for pg_dump in binary upgrade mode

On Wed, May 15, 2024 at 10:15:13PM +0200, Daniel Gustafsson wrote:

With the typarray caching from the patch attached here added *and* Nathan's
patch from [0] added:

$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
--format=custom --file a postgres > /dev/null

real 0m1.566s
user 0m0.309s
sys 0m0.080s

The combination of these patches thus puts binary uphrade mode almost on par
with a plain dump, which has the potential to make upgrades of large schemas
faster. Parallel-parking this patch with Nathan's in the July CF, just wanted
to type it up while it was fresh in my mind.

Nice! I'll plan on taking a closer look at this one. I have a couple
other ideas in-flight (e.g., parallelizing the once-in-each-database
operations with libpq's asynchronous APIs) that I'm hoping to post soon,
too. v18 should have a lot of good stuff for pg_upgrade...

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Nathan Bossart

nathandbossart@gmail.com

over 1 year ago

In reply to: Nathan Bossart (#2)

Re: More performance improvements for pg_dump in binary upgrade mode

On Wed, May 15, 2024 at 03:21:36PM -0500, Nathan Bossart wrote:

Nice! I'll plan on taking a closer look at this one.

LGTM. I've marked the commitfest entry as ready-for-committer.

--
nathan

Daniel Gustafsson

daniel@yesql.se

over 1 year ago

In reply to: Nathan Bossart (#3)

Re: More performance improvements for pg_dump in binary upgrade mode

On 5 Jun 2024, at 04:39, Nathan Bossart <nathandbossart@gmail.com> wrote:

On Wed, May 15, 2024 at 03:21:36PM -0500, Nathan Bossart wrote:

Nice! I'll plan on taking a closer look at this one.

LGTM. I've marked the commitfest entry as ready-for-committer.

Thanks for review, committed.

--
Daniel Gustafsson