More performance improvements for pg_dump in binary upgrade mode
Prompted by an off-list bugreport of pg_upgrade hanging (which turned out to be
slow enough to be perceived to hang) for large schemas I had a look at pg_dump
performance during --binary-upgrade mode today. My initial take was to write
more or less exactly what Nathan did in [0]https://commitfest.postgresql.org/48/4936/, only to realize that it was a)
already proposed and b) I had even reviewed it. Doh.
The next attempt was to reduce more per-object queries from binary upgrade, and
the typarray lookup binary_upgrade_set_type_oids_by_type_oid seemed like a good
candidate for a cache lookup. Since already cache type TypeInfo objects, if we
add typarray to TypeInfo we can use the existing lookup code.
As a baseline, pg_dump dumps a synthetic workload of 10,000 (empty) relations
with a width of 1-10 columns:
$ time ./bin/pg_dump --schema-only --quote-all-identifiers --format=custom \
--file a postgres > /dev/null
real 0m1.256s
user 0m0.273s
sys 0m0.059s
The same dump in binary upgrade mode runs significantly slower:
$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
--format=custom --file a postgres > /dev/null
real 1m9.921s
user 0m0.782s
sys 0m0.436s
With the typarray caching from the patch attached here added:
$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
--format=custom --file b postgres > /dev/null
real 0m45.210s
user 0m0.655s
sys 0m0.299s
With the typarray caching from the patch attached here added *and* Nathan's
patch from [0]https://commitfest.postgresql.org/48/4936/ added:
$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
--format=custom --file a postgres > /dev/null
real 0m1.566s
user 0m0.309s
sys 0m0.080s
The combination of these patches thus puts binary uphrade mode almost on par
with a plain dump, which has the potential to make upgrades of large schemas
faster. Parallel-parking this patch with Nathan's in the July CF, just wanted
to type it up while it was fresh in my mind.
--
Daniel Gustafsson
Attachments:
0001-Cache-typarray-for-fast-lookups-in-binary-upgrade-mo.patchapplication/octet-stream; name=0001-Cache-typarray-for-fast-lookups-in-binary-upgrade-mo.patch; x-unix-mode=0644Download
From 8dbb3c1caaa0adefafab8483dd7827ce45157a20 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Wed, 15 May 2024 22:03:22 +0200
Subject: [PATCH] Cache typarray for fast lookups in binary upgrade mode
When upgrading a large schema it adds significant overhead to perform
individual catalog lookups per relation in order to retrieve Oid for
preserving Oid calls. This instead adds the typarray to the TypeInfo
cache which then allows for fast lookups using the existing API.
---
src/bin/pg_dump/pg_dump.c | 20 ++++++++------------
src/bin/pg_dump/pg_dump.h | 1 +
2 files changed, 9 insertions(+), 12 deletions(-)
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index ac920f64c7..5fb4453712 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -5304,23 +5304,15 @@ binary_upgrade_set_type_oids_by_type_oid(Archive *fout,
Oid pg_type_array_oid;
Oid pg_type_multirange_oid;
Oid pg_type_multirange_array_oid;
+ TypeInfo *tinfo;
appendPQExpBufferStr(upgrade_buffer, "\n-- For binary upgrade, must preserve pg_type oid\n");
appendPQExpBuffer(upgrade_buffer,
"SELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('%u'::pg_catalog.oid);\n\n",
pg_type_oid);
- appendPQExpBuffer(upgrade_query,
- "SELECT typarray "
- "FROM pg_catalog.pg_type "
- "WHERE oid = '%u'::pg_catalog.oid;",
- pg_type_oid);
-
- res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
-
- pg_type_array_oid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typarray")));
-
- PQclear(res);
+ tinfo = findTypeByOid(pg_type_oid);
+ pg_type_array_oid = tinfo->typarray;
if (!OidIsValid(pg_type_array_oid) && force_array_type)
pg_type_array_oid = get_next_possible_free_pg_type_oid(fout, upgrade_query);
@@ -5785,6 +5777,7 @@ getTypes(Archive *fout, int *numTypes)
int i_typtype;
int i_typisdefined;
int i_isarray;
+ int i_typarray;
/*
* we include even the built-in types because those may be used as array
@@ -5805,7 +5798,7 @@ getTypes(Archive *fout, int *numTypes)
"typnamespace, typacl, "
"acldefault('T', typowner) AS acldefault, "
"typowner, "
- "typelem, typrelid, "
+ "typelem, typrelid, typarray, "
"CASE WHEN typrelid = 0 THEN ' '::\"char\" "
"ELSE (SELECT relkind FROM pg_class WHERE oid = typrelid) END AS typrelkind, "
"typtype, typisdefined, "
@@ -5832,6 +5825,7 @@ getTypes(Archive *fout, int *numTypes)
i_typtype = PQfnumber(res, "typtype");
i_typisdefined = PQfnumber(res, "typisdefined");
i_isarray = PQfnumber(res, "isarray");
+ i_typarray = PQfnumber(res, "typarray");
for (i = 0; i < ntups; i++)
{
@@ -5864,6 +5858,8 @@ getTypes(Archive *fout, int *numTypes)
else
tyinfo[i].isArray = false;
+ tyinfo[i].typarray = atooid(PQgetvalue(res, i, i_typarray));
+
if (tyinfo[i].typtype == TYPTYPE_MULTIRANGE)
tyinfo[i].isMultirange = true;
else
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index f518a1e6d2..135219bf4a 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -200,6 +200,7 @@ typedef struct _typeInfo
const char *rolname;
Oid typelem;
Oid typrelid;
+ Oid typarray;
char typrelkind; /* 'r', 'v', 'c', etc */
char typtype; /* 'b', 'c', etc */
bool isArray; /* true if auto-generated array type */
--
2.39.3 (Apple Git-146)
On Wed, May 15, 2024 at 10:15:13PM +0200, Daniel Gustafsson wrote:
With the typarray caching from the patch attached here added *and* Nathan's
patch from [0] added:$ time ./bin/pg_dump --schema-only --quote-all-identifiers --binary-upgrade \
--format=custom --file a postgres > /dev/nullreal 0m1.566s
user 0m0.309s
sys 0m0.080sThe combination of these patches thus puts binary uphrade mode almost on par
with a plain dump, which has the potential to make upgrades of large schemas
faster. Parallel-parking this patch with Nathan's in the July CF, just wanted
to type it up while it was fresh in my mind.
Nice! I'll plan on taking a closer look at this one. I have a couple
other ideas in-flight (e.g., parallelizing the once-in-each-database
operations with libpq's asynchronous APIs) that I'm hoping to post soon,
too. v18 should have a lot of good stuff for pg_upgrade...
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Wed, May 15, 2024 at 03:21:36PM -0500, Nathan Bossart wrote:
Nice! I'll plan on taking a closer look at this one.
LGTM. I've marked the commitfest entry as ready-for-committer.
--
nathan
On 5 Jun 2024, at 04:39, Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, May 15, 2024 at 03:21:36PM -0500, Nathan Bossart wrote:
Nice! I'll plan on taking a closer look at this one.
LGTM. I've marked the commitfest entry as ready-for-committer.
Thanks for review, committed.
--
Daniel Gustafsson