speed up pg_upgrade with large number of tables
Hello postgres hackers:
I am recently working on speeding up pg_upgrade for database with over a
million tables and would like to share some (maybe) optimizeable or
interesting findings.
1: Skip Compatibility Check In "pg_upgrade"
=============================================
Concisely, we've got several databases, each with a million-plus tables.
Running the compatibility check before pg_dump can eat up like half an hour.
If I have performed an online check before the actual upgrade, repeating it
seems unnecessary and just adds to the downtime in many situations.
So, I'm thinking, why not add a "--skip-check" option in pg_upgrade to skip it?
See "1-Skip_Compatibility_Check_v1.patch".
2: Accelerate "FastPathTransferRelationLocks"
===============================================
In this scenario, pg_restore costs much more time than pg_dump. And through
monitoring the "postgres" backend via perf, I found that the much time are
taken by "LWLockAcquire" and "LWLockRelease". Diving deeper, I think I found
the reason:
When we try to create an index (pretty common in pg_restore), the "ShareLock"
to the relation must be held first. Such lock is a "strong" lock, so to acquire
the lock, before we change the global lock hash table, we must traverse each
proc to transfer their relation lock in fastpath. And the issue raise here
(in FastPathTransferRelationLocks ):
we acquire "fpInfoLock" before accessing "proc->databaseId". So we must perform
the lock acquiring and releasing "MaxBackends" times for each index. The reason
is recorded in the comment:
```
/*
* proc->databaseId is set at backend startup time and never changes
* thereafter, so it might be safe to perform this test before
* acquiring &proc->fpInfoLock. In particular, it's certainly safe to
* assume that if the target backend holds any fast-path locks, it
* must have performed a memory-fencing operation (in particular, an
* LWLock acquisition) since setting proc->databaseId. However, it's
* less clear that our backend is certain to have performed a memory
* fencing operation since the other backend set proc->databaseId. So
* for now, we test it after acquiring the LWLock just to be safe.
*/
```
I agree with the reason, but it seems OK to replace LWLockAcquire with a
memory barrier for "proc->databaseId". And this can save some time.
See "2-Accelerate_FastPathTransferRelationLocks_v1.patch".
3: Optimize Toast Index Creating
====================================
While tracing the reason mentioned in point "2", I notice an interesting
performance in creating toast index. In function "create_toast_table"
```
/* ShareLock is not really needed here, but take it anyway */
toast_rel = table_open(toast_relid, ShareLock);
/* some operation */
index_create(xxxx)
```
Yep, ShareLock is not really needed here, since we this is the only transaction
that the toast relation is visible to. But by design (in "relation_open"),
NoLock mode is only used when the caller confirms that it already holds the
lock. So I wonder is it still ok to let the NoLock mode used in such scenario
where the relation is created by current transaction.
See "3-Optimize_Toast_Index_Creating_v1.patch".
That's what I've got. Any response is appreciated.
Best regards,
Yang Boyu
Attachments:
1-Skip_Compatibility_Check_v1.patchapplication/octet-streamDownload
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 27924159d6..570500824d 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -590,6 +590,9 @@ check_and_dump_old_cluster(bool live_check)
get_loadable_libraries();
+ /* skip the below check */
+ if (user_opts.skip_check)
+ goto generate_old_dump;
/*
* Check for various failure cases
@@ -660,6 +663,8 @@ check_and_dump_old_cluster(bool live_check)
if (GET_MAJOR_VERSION(old_cluster.major_version) <= 905)
check_for_pg_role_prefix(&old_cluster);
+generate_old_dump:
+
/*
* While not a check option, we do this now because this is the only time
* the old server is running.
diff --git a/src/bin/pg_upgrade/option.c b/src/bin/pg_upgrade/option.c
index 548ea4e623..a85b6a96fd 100644
--- a/src/bin/pg_upgrade/option.c
+++ b/src/bin/pg_upgrade/option.c
@@ -60,6 +60,7 @@ parseCommandLine(int argc, char *argv[])
{"copy", no_argument, NULL, 2},
{"copy-file-range", no_argument, NULL, 3},
{"sync-method", required_argument, NULL, 4},
+ {"skip-check", no_argument, NULL, 5},
{NULL, 0, NULL, 0}
};
@@ -211,6 +212,9 @@ parseCommandLine(int argc, char *argv[])
exit(1);
user_opts.sync_method = pg_strdup(optarg);
break;
+ case 5:
+ user_opts.skip_check = true;
+ break;
default:
fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
@@ -289,6 +293,7 @@ usage(void)
printf(_(" -B, --new-bindir=BINDIR new cluster executable directory (default\n"
" same directory as pg_upgrade)\n"));
printf(_(" -c, --check check clusters only, don't change any data\n"));
+ printf(_(" --skip-check skip the check in actual upgrade \n"));
printf(_(" -d, --old-datadir=DATADIR old cluster data directory\n"));
printf(_(" -D, --new-datadir=DATADIR new cluster data directory\n"));
printf(_(" -j, --jobs=NUM number of simultaneous processes or threads to use\n"));
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index af370768b6..e0071d7725 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -125,6 +125,12 @@ main(int argc, char **argv)
setup(argv[0], &live_check);
+ if (live_check && user_opts.skip_check)
+ {
+ pg_fatal("skip-check can't be used in check mode. Please turn off "
+ "skip-check if you want to perform a live check .\n");
+ }
+
output_check_banner(live_check);
check_cluster_versions();
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 8afe240bdf..adf4ab7e1c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -327,6 +327,9 @@ typedef struct
int jobs; /* number of processes/threads to use */
char *socketdir; /* directory to use for Unix sockets */
char *sync_method;
+ bool skip_check; /* true -> skip check in actual upgrade,
+ * assuming the various failure cases are
+ * impossible. */
} UserOpts;
typedef struct
2-Accelerate_FastPathTransferRelationLocks_v1.patchapplication/octet-streamDownload
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 0400a50777..8a1427cd9b 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2717,28 +2717,25 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
PGPROC *proc = &ProcGlobal->allProcs[i];
uint32 f;
- LWLockAcquire(&proc->fpInfoLock, LW_EXCLUSIVE);
-
/*
* If the target backend isn't referencing the same database as the
* lock, then we needn't examine the individual relation IDs at all;
* none of them can be relevant.
*
* proc->databaseId is set at backend startup time and never changes
- * thereafter, so it might be safe to perform this test before
- * acquiring &proc->fpInfoLock. In particular, it's certainly safe to
+ * thereafter, so it is safe to perform this test before acquiring
+ * &proc->fpInfoLock. In particular, it's certainly safe to
* assume that if the target backend holds any fast-path locks, it
* must have performed a memory-fencing operation (in particular, an
- * LWLock acquisition) since setting proc->databaseId. However, it's
- * less clear that our backend is certain to have performed a memory
- * fencing operation since the other backend set proc->databaseId. So
- * for now, we test it after acquiring the LWLock just to be safe.
+ * LWLock acquisition) since setting proc->databaseId. So once a read
+ * barrier is performed, we can see the true proc->database if the
+ * target backend has acquired its fpInfoLock.
*/
+ pg_read_barrier();
if (proc->databaseId != locktag->locktag_field1)
- {
- LWLockRelease(&proc->fpInfoLock);
continue;
- }
+
+ LWLockAcquire(&proc->fpInfoLock, LW_EXCLUSIVE);
for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
{
@@ -2964,21 +2961,19 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
if (proc == MyProc)
continue;
- LWLockAcquire(&proc->fpInfoLock, LW_SHARED);
-
/*
* If the target backend isn't referencing the same database as
* the lock, then we needn't examine the individual relation IDs
* at all; none of them can be relevant.
*
* See FastPathTransferRelationLocks() for discussion of why we do
- * this test after acquiring the lock.
+ * pg_read_barrier here.
*/
+ pg_read_barrier();
if (proc->databaseId != locktag->locktag_field1)
- {
- LWLockRelease(&proc->fpInfoLock);
continue;
- }
+
+ LWLockAcquire(&proc->fpInfoLock, LW_SHARED);
for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
{
3-Optimize_Toast_Index_Creating_v1.patchapplication/octet-streamDownload
diff --git a/src/backend/access/common/relation.c b/src/backend/access/common/relation.c
index d8a313a2c9..d5b41b3887 100644
--- a/src/backend/access/common/relation.c
+++ b/src/backend/access/common/relation.c
@@ -35,7 +35,7 @@
* If lockmode is not "NoLock", the specified kind of lock is
* obtained on the relation. (Generally, NoLock should only be
* used if the caller knows it has some appropriate lock on the
- * relation already.)
+ * relation already or the relation is only visible to the caller.)
*
* An error is raised if the relation does not exist.
*
@@ -61,12 +61,14 @@ relation_open(Oid relationId, LOCKMODE lockmode)
elog(ERROR, "could not open relation with OID %u", relationId);
/*
- * If we didn't get the lock ourselves, assert that caller holds one,
- * except in bootstrap mode where no locks are used.
+ * If we didn't get the lock ourselves, assert that caller holds one or
+ * this relation is created in current trasaction except in bootstrap
+ * mode where no locks are used.
*/
Assert(lockmode != NoLock ||
IsBootstrapProcessingMode() ||
- CheckRelationLockedByMe(r, AccessShareLock, true));
+ CheckRelationLockedByMe(r, AccessShareLock, true) ||
+ OidIsValid(r->rd_createSubid));
/* Make note that we've accessed a temporary relation */
if (RelationUsesLocalBuffers(r))
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 738bc46ae8..37f73e98bb 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -269,8 +269,11 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
/* make the toast relation visible, else table_open will fail */
CommandCounterIncrement();
- /* ShareLock is not really needed here, but take it anyway */
- toast_rel = table_open(toast_relid, ShareLock);
+ /*
+ * ShareLock is not really needed here, since this is the only transaction
+ * the relation is visible to.
+ */
+ toast_rel = table_open(toast_relid, NoLock);
/*
* Create unique index on chunk_id, chunk_seq.
On 5 Jul 2024, at 09:12, 杨伯宇(长堂) <yangboyu.yby@alibaba-inc.com> wrote:
1: Skip Compatibility Check In "pg_upgrade"
=============================================
Concisely, we've got several databases, each with a million-plus tables.
Running the compatibility check before pg_dump can eat up like half an hour.
If I have performed an online check before the actual upgrade, repeating it
seems unnecessary and just adds to the downtime in many situations.So, I'm thinking, why not add a "--skip-check" option in pg_upgrade to skip it?
See "1-Skip_Compatibility_Check_v1.patch".
How would a user know that nothing has changed in the cluster between running
the check and running the upgrade with a skipped check? Considering how
complicated it is to understand exactly what pg_upgrade does it seems like
quite a large caliber footgun.
I would be much more interested in making the check phase go faster, and indeed
there is ongoing work in this area. Since it sounds like you have a dev and
test environment with a big workload, testing those patches would be helpful.
https://commitfest.postgresql.org/48/4995/ is one that comes to mind.
--
Daniel Gustafsson
So, I'm thinking, why not add a "--skip-check" option in pg_upgrade to skip it?
See "1-Skip_Compatibility_Check_v1.patch".How would a user know that nothing has changed in the cluster between running
the check and running the upgrade with a skipped check? Considering how
complicated it is to understand exactly what pg_upgrade does it seems like
quite a large caliber footgun.
Indeed, it's not feasible to execute an concise check ensuring that nothing
has changed. However, in many cases, a cluster consistently executes the
same SQL commands. Thus, if we've verified that the cluster was compatible
30 minutes prior, there's a strong likelihood that it remains compatible now.
Therefore, adding such an 'trust-me' option may still be beneficial.
I would be much more interested in making the check phase go faster, and indeed
there is ongoing work in this area. Since it sounds like you have a dev and
test environment with a big workload, testing those patches would be helpful.
https://commitfest.postgresql.org/48/4995/ is one that comes to mind.
Very meaningful work! I will try it.
--
Best regards,
Yang Boyu
On Fri, Jul 05, 2024 at 05:24:42PM +0800, 杨伯宇(长堂) wrote:
So, I'm thinking, why not add a "--skip-check" option in pg_upgrade to skip it?
See "1-Skip_Compatibility_Check_v1.patch".How would a user know that nothing has changed in the cluster between running
the check and running the upgrade with a skipped check? Considering how
complicated it is to understand exactly what pg_upgrade does it seems like
quite a large caliber footgun.
I am also -1 on this one for the same reasons as Daniel.
I would be much more interested in making the check phase go faster, and indeed
there is ongoing work in this area. Since it sounds like you have a dev and
test environment with a big workload, testing those patches would be helpful.
https://commitfest.postgresql.org/48/4995/ is one that comes to mind.Very meaningful work! I will try it.
Thanks! Since you mentioned that you have multiple databases with 1M+
databases, you might also be interested in commit 2329cad. That should
speed up the pg_dump step quite a bit.
--
nathan
Thanks! Since you mentioned that you have multiple databases with 1M+
databases, you might also be interested in commit 2329cad. That should
speed up the pg_dump step quite a bit.
Wow, I noticed this commit(2329cad) when it appeared in commitfest. It has
doubled the speed of pg_dump in this scenario. Thank you for your effort!
Besides, https://commitfest.postgresql.org/48/4995/ seems insufficient to
this situation. Some time-consuming functions like check_for_data_types_usage
are not yet able to run in parallel. But these patches could be a great
starting point for a more efficient parallelism implementation. Maybe we can
do it later.
On Mon, Jul 08, 2024 at 03:22:36PM +0800, 杨伯宇(长堂) wrote:
Besides, https://commitfest.postgresql.org/48/4995/ seems insufficient to
this situation. Some time-consuming functions like check_for_data_types_usage
are not yet able to run in parallel. But these patches could be a great
starting point for a more efficient parallelism implementation. Maybe we can
do it later.
I actually just wrote up the first version of the patch for parallelizing
the data type checks over the weekend. I'll post it shortly.
--
nathan
The restore side speedups suggested by Yang seem reasonable and can
potentially speed up the process. We can even go a bit further by starting
the new postgres in a --binary-upgrade mode and skip some of these locks
completely.
On Sun, Jan 19, 2025 at 3:43 PM Nathan Bossart <nathandbossart@gmail.com>
wrote:
Show quoted text
On Mon, Jul 08, 2024 at 03:22:36PM +0800, 杨伯宇(长堂) wrote:
Besides, https://commitfest.postgresql.org/48/4995/ seems insufficient
to
this situation. Some time-consuming functions like
check_for_data_types_usage
are not yet able to run in parallel. But these patches could be a great
starting point for a more efficient parallelism implementation. Maybe wecan
do it later.
I actually just wrote up the first version of the patch for parallelizing
the data type checks over the weekend. I'll post it shortly.--
nathan
On Sun, Jan 19, 2025 at 03:57:18PM -0800, Hari Krishna Sunder wrote:
The restore side speedups suggested by Yang seem reasonable and can
potentially speed up the process. We can even go a bit further by starting
the new postgres in a --binary-upgrade mode and skip some of these locks
completely.
I am curious about what kind of speedup we can expect with these patches.
IMHO some benchmarking results would be useful.
--
nathan