optimizing pg_upgrade's once-in-each-database steps
A number of pg_upgrade steps require connecting to each database and
running a query. When there are many databases, these steps are
particularly time-consuming, especially since this is done sequentially in
a single process. At a quick glance, I see the following such steps:
* create_logical_replication_slots
* check_for_data_types_usage
* check_for_isn_and_int8_passing_mismatch
* check_for_user_defined_postfix_ops
* check_for_incompatible_polymorphics
* check_for_tables_with_oids
* check_for_user_defined_encoding_conversions
* check_old_cluster_subscription_state
* get_loadable_libraries
* get_db_rel_and_slot_infos
* old_9_6_invalidate_hash_indexes
* report_extension_updates
I set out to parallelize these kinds of steps via multiple threads or
processes, but I ended up realizing that we could likely achieve much of
the same gain with libpq's asynchronous APIs. Specifically, both
establishing the connections and running the queries can be done without
blocking, so we can just loop over a handful of slots and advance a simple
state machine for each. The attached is a proof-of-concept grade patch for
doing this for get_db_rel_and_slot_infos(), which yielded the following
results on my laptop for "pg_upgrade --link --sync-method=syncfs --jobs 8"
for a cluster with 10K empty databases.
total pg_upgrade_time:
* HEAD: 14m 8s
* patch: 10m 58s
get_db_rel_and_slot_infos() on old cluster:
* HEAD: 2m 45s
* patch: 36s
get_db_rel_and_slot_infos() on new cluster:
* HEAD: 1m 46s
* patch: 29s
I am posting this early to get thoughts on the general approach. If we
proceeded with this strategy, I'd probably create some generic tooling that
each relevant step would provide a set of callback functions.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-parallel-get-relinfos.patchtext/x-diff; charset=us-asciiDownload+202-66
On Thu, 2024-05-16 at 16:16 -0500, Nathan Bossart wrote:
I am posting this early to get thoughts on the general approach. If
we
proceeded with this strategy, I'd probably create some generic
tooling that
each relevant step would provide a set of callback functions.
The documentation states:
"pg_dump -j uses multiple database connections; it connects to the
database once with the leader process and once again for each worker
job."
That might need to be adjusted.
How much complexity do you avoid by using async instead of multiple
processes?
Also, did you consider connecting once to each database and running
many queries? Most of those seem like just checks.
Regards,
Jeff Davis
On Thu, May 16, 2024 at 05:09:55PM -0700, Jeff Davis wrote:
How much complexity do you avoid by using async instead of multiple
processes?
If we didn't want to use async, my guess is we'd want to use threads to
avoid complicated IPC. And if we followed pgbench's example for using
threads, it might end up at a comparable level of complexity, although I'd
bet that threading would be the more complex of the two. It's hard to say
definitively without coding it up both ways, which might be worth doing.
Also, did you consider connecting once to each database and running
many queries? Most of those seem like just checks.
This was the idea behind 347758b. It may be possible to do more along
these lines. IMO parallelizing will still be useful even if we do combine
more of the steps.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
I figured I'd post what I have so far since this thread hasn't been updated
in a while. The attached patches are still "proof-of-concept grade," but
they are at least moving in the right direction (IMHO). The variable
naming is still not great, and they are woefully undercommented, among
other things.
0001 introduces a new API for registering callbacks and running them in
parallel on all databases in the cluster. This new system manages a set of
"slots" that follow a simple state machine to asynchronously establish a
connection and run the queries. It uses system() to wait for these
asynchronous tasks to complete. Users of this API only need to provide two
callbacks: one to return the query that should be run on each database and
another to process the results of that query. If multiple queries are
required for each database, users can provide multiple sets of callbacks.
The other patches change several of the existing tasks to use this new API.
With these patches applied, I see the following differences in the output
of 'pg_upgrade | ts -i' for a cluster with 1k empty databases:
WITHOUT PATCH
00:00:19 Checking database user is the install user ok
00:00:02 Checking for subscription state ok
00:00:06 Adding ".old" suffix to old global/pg_control ok
00:00:04 Checking for extension updates ok
WITH PATCHES (--jobs 1)
00:00:10 Checking database user is the install user ok
00:00:02 Checking for subscription state ok
00:00:07 Adding ".old" suffix to old global/pg_control ok
00:00:05 Checking for extension updates ok
WITH PATCHES (--jobs 4)
00:00:06 Checking database user is the install user ok
00:00:00 Checking for subscription state ok
00:00:02 Adding ".old" suffix to old global/pg_control ok
00:00:01 Checking for extension updates ok
Note that the "Checking database user is the install user" time also
includes the call to get_db_rel_and_slot_infos() on the old cluster as well
as the call to get_loadable_libraries() on the old cluster. I believe the
improvement with the patches with just one job is due to the consolidation
of the queries into one database connection (presently,
get_db_rel_and_slot_infos() creates 3 connections per database for some
upgrades). Similarly, the "Adding \".old\" suffix to old
global/pg_control" time includes the call to get_db_rel_and_slot_infos() on
the new cluster.
There are several remaining places where we could use this new API to speed
up upgrades. For example, I haven't attempted to use it for the data type
checks yet, and that tends to eat up a sizable chunk of time when there are
many databases.
On Thu, May 16, 2024 at 08:24:08PM -0500, Nathan Bossart wrote:
On Thu, May 16, 2024 at 05:09:55PM -0700, Jeff Davis wrote:
Also, did you consider connecting once to each database and running
many queries? Most of those seem like just checks.This was the idea behind 347758b. It may be possible to do more along
these lines. IMO parallelizing will still be useful even if we do combine
more of the steps.
My current thinking is that any possible further consolidation should
happen as part of a follow-up effort to parallelization. I'm cautiously
optimistic that the parallelization work will make the consolidation easier
since it moves things to rigidly-defined callback functions.
A separate piece of off-list feedback from Michael Paquier is that this new
parallel system might be something we can teach the ParallelSlot code used
by bin/scripts/ to do. I've yet to look too deeply into this, but I
suspect that it will be difficult to combine the two. For example, the
ParallelSlot system doesn't seem well-suited for the kind of
run-once-in-each-database tasks required by pg_upgrade, and the error
handling is probably little different, too. However, it's still worth a
closer look, and I'm interested in folks' opinions on the subject.
--
nathan
Attachments:
v2-0005-use-new-pg_upgrade-async-API-to-parallelize-getti.patchtext/plain; charset=us-asciiDownload+37-27
v2-0006-use-new-pg_upgrade-async-API-to-parallelize-repor.patchtext/plain; charset=us-asciiDownload+41-42
v2-0004-use-new-pg_upgrade-async-API-for-retrieving-relin.patchtext/plain; charset=us-asciiDownload+81-107
v2-0001-introduce-framework-for-parallelizing-pg_upgrade-.patchtext/plain; charset=us-asciiDownload+345-1
v2-0002-use-new-pg_upgrade-async-API-for-subscription-sta.patchtext/plain; charset=us-asciiDownload+106-95
v2-0003-move-live_check-variable-to-user_opts.patchtext/plain; charset=us-asciiDownload+43-45
On Mon, Jul 1, 2024 at 3:47 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
0001 introduces a new API for registering callbacks and running them in
parallel on all databases in the cluster. This new system manages a set of
"slots" that follow a simple state machine to asynchronously establish a
connection and run the queries. It uses system() to wait for these
asynchronous tasks to complete. Users of this API only need to provide two
callbacks: one to return the query that should be run on each database and
another to process the results of that query. If multiple queries are
required for each database, users can provide multiple sets of callbacks.
I do really like the idea of using asynchronous communication here. It
should be significantly cheaper than using multiple processes or
threads, and maybe less code, too. But I'm confused about why you
would need or want to use system() to wait for asynchronous tasks to
complete. Wouldn't it be something like select()?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Jul 01, 2024 at 03:58:16PM -0400, Robert Haas wrote:
On Mon, Jul 1, 2024 at 3:47 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
0001 introduces a new API for registering callbacks and running them in
parallel on all databases in the cluster. This new system manages a set of
"slots" that follow a simple state machine to asynchronously establish a
connection and run the queries. It uses system() to wait for these
asynchronous tasks to complete. Users of this API only need to provide two
callbacks: one to return the query that should be run on each database and
another to process the results of that query. If multiple queries are
required for each database, users can provide multiple sets of callbacks.I do really like the idea of using asynchronous communication here. It
should be significantly cheaper than using multiple processes or
threads, and maybe less code, too. But I'm confused about why you
would need or want to use system() to wait for asynchronous tasks to
complete. Wouldn't it be something like select()?
Whoops, I meant to type "select()" there. Sorry for the confusion.
--
nathan
As I mentioned elsewhere [0]/messages/by-id/Zov5kHbEyDMuHJI_@nathan, here's a first attempt at parallelizing the
data type checks. I was worried that I might have to refactor Daniel's
work in commit 347758b quite significantly, but I was able to avoid that by
using a set of generic callbacks and providing each task step an index to
the data_types_usage_checks array.
[0]: /messages/by-id/Zov5kHbEyDMuHJI_@nathan
--
nathan
Attachments:
v3-0001-introduce-framework-for-parallelizing-pg_upgrade-.patchtext/plain; charset=us-asciiDownload+345-1
v3-0002-use-new-pg_upgrade-async-API-for-subscription-sta.patchtext/plain; charset=us-asciiDownload+106-95
v3-0003-move-live_check-variable-to-user_opts.patchtext/plain; charset=us-asciiDownload+43-45
v3-0004-use-new-pg_upgrade-async-API-for-retrieving-relin.patchtext/plain; charset=us-asciiDownload+81-107
v3-0005-use-new-pg_upgrade-async-API-to-parallelize-getti.patchtext/plain; charset=us-asciiDownload+37-27
v3-0006-use-new-pg_upgrade-async-API-to-parallelize-repor.patchtext/plain; charset=us-asciiDownload+41-42
v3-0007-parallelize-data-type-checks-in-pg_upgrade.patchtext/plain; charset=us-asciiDownload+160-160
I finished parallelizing everything in pg_upgrade I could easily
parallelize with the proposed async task API. There are a few remaining
places where I could not easily convert to the new API for whatever reason.
AFAICT those remaining places are either not showing up prominently in my
tests, or they only affect versions that are unsupported or will soon be
unsupported when v18 is released. Similarly, I noticed many of the checks
follow a very similar coding pattern, but most (if not all) only affect
older versions, so I'm unsure whether it's worth the trouble to try
consolidating them into one scan through all the databases.
The code is still very rough and nowhere near committable, but this at
least gets the patch set into the editing phase.
--
nathan
Attachments:
v4-0002-use-new-pg_upgrade-async-API-for-subscription-sta.patchtext/plain; charset=us-asciiDownload+106-95
v4-0003-move-live_check-variable-to-user_opts.patchtext/plain; charset=us-asciiDownload+43-45
v4-0004-use-new-pg_upgrade-async-API-for-retrieving-relin.patchtext/plain; charset=us-asciiDownload+81-107
v4-0005-use-new-pg_upgrade-async-API-to-parallelize-getti.patchtext/plain; charset=us-asciiDownload+37-27
v4-0006-use-new-pg_upgrade-async-API-to-parallelize-repor.patchtext/plain; charset=us-asciiDownload+41-42
v4-0007-parallelize-data-type-checks-in-pg_upgrade.patchtext/plain; charset=us-asciiDownload+160-160
v4-0008-parallelize-isn-and-int8-passing-mismatch-check-i.patchtext/plain; charset=us-asciiDownload+44-42
v4-0009-parallelize-user-defined-postfix-ops-check-in-pg_.patchtext/plain; charset=us-asciiDownload+69-66
v4-0010-parallelize-incompatible-polymorphics-check-in-pg.patchtext/plain; charset=us-asciiDownload+92-84
v4-0011-parallelize-tables-with-oids-check-in-pg_upgrade.patchtext/plain; charset=us-asciiDownload+48-43
v4-0001-introduce-framework-for-parallelizing-pg_upgrade-.patchtext/plain; charset=us-asciiDownload+345-1
v4-0012-parallelize-user-defined-encoding-conversions-che.patchtext/plain; charset=us-asciiDownload+57-51
On 9 Jul 2024, at 05:33, Nathan Bossart <nathandbossart@gmail.com> wrote:
The code is still very rough and nowhere near committable, but this at
least gets the patch set into the editing phase.
First reaction after a read-through is that this seems really cool, can't wait
to see how much v18 pg_upgrade will be over v17. I will do more testing and
review once back from vacation, below are some comments from reading which is
all I had time for at this point:
+static void
+conn_failure(PGconn *conn)
+{
+ pg_log(PG_REPORT, "%s", PQerrorMessage(conn));
+ printf(_("Failure, exiting\n"));
+ exit(1);
+}
Any particular reason this isn't using pg_fatal()?
+static void
+dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot,
....
+ pg_free(query);
+}
A minor point, perhaps fueled by me not having played around much with this
patchset. It seems a bit odd that dispatch_query is responsible for freeing
the query from the get_query callback. I would have expected the output from
AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free.
+static void
+sub_process(DbInfo *dbinfo, PGresult *res, void *arg)
+{
....
+ fprintf(state->script, "The table sync state \"%s\" is not allowed for database:\"%s\" subscription:\"%s\" schema:\"%s\" relation:\"%s\"\n",
+ PQgetvalue(res, i, 0),
+ dbinfo->db_name,
+ PQgetvalue(res, i, 1),
With the query being in a separate place in the code from the processing it
takes a bit of jumping around to resolve the columns in PQgetvalue calls like
this. Using PQfnumber() calls and descriptive names would make this easier. I
know this case is copying old behavior, but the function splits make it less
useful than before.
+ char *query = pg_malloc(QUERY_ALLOC);
Should we convert this to a PQExpBuffer?
--
Daniel Gustafsson
On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote:
First reaction after a read-through is that this seems really cool, can't wait
to see how much v18 pg_upgrade will be over v17. I will do more testing and
review once back from vacation, below are some comments from reading which is
all I had time for at this point:
Thanks for taking a look!
+static void +conn_failure(PGconn *conn) +{ + pg_log(PG_REPORT, "%s", PQerrorMessage(conn)); + printf(_("Failure, exiting\n")); + exit(1); +}Any particular reason this isn't using pg_fatal()?
IIRC this was to match the error in connectToServer(). We could probably
move to pg_fatal().
+static void +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot, .... + pg_free(query); +}A minor point, perhaps fueled by me not having played around much with this
patchset. It seems a bit odd that dispatch_query is responsible for freeing
the query from the get_query callback. I would have expected the output from
AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free.
I don't see any problem with doing it the way you suggest.
Tangentially related, I waffled a bit on whether to require the query
callbacks to put the result in allocated memory. Some queries are the same
no matter what, and some require customization at runtime. As you can see,
I ended up just requiring allocated memory. That makes the code a tad
simpler, and I doubt the extra work is noticeable.
+static void +sub_process(DbInfo *dbinfo, PGresult *res, void *arg) +{ .... + fprintf(state->script, "The table sync state \"%s\" is not allowed for database:\"%s\" subscription:\"%s\" schema:\"%s\" relation:\"%s\"\n", + PQgetvalue(res, i, 0), + dbinfo->db_name, + PQgetvalue(res, i, 1),With the query being in a separate place in the code from the processing it
takes a bit of jumping around to resolve the columns in PQgetvalue calls like
this. Using PQfnumber() calls and descriptive names would make this easier. I
know this case is copying old behavior, but the function splits make it less
useful than before.
Good point.
+ char *query = pg_malloc(QUERY_ALLOC);
Should we convert this to a PQExpBuffer?
Seems like a good idea. I think I was trying to change as few lines as
possible for my proof-of-concept. :)
--
nathan
On 17 Jul 2024, at 23:32, Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote:
+static void +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot, .... + pg_free(query); +}A minor point, perhaps fueled by me not having played around much with this
patchset. It seems a bit odd that dispatch_query is responsible for freeing
the query from the get_query callback. I would have expected the output from
AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free.I don't see any problem with doing it the way you suggest.
Tangentially related, I waffled a bit on whether to require the query
callbacks to put the result in allocated memory. Some queries are the same
no matter what, and some require customization at runtime. As you can see,
I ended up just requiring allocated memory. That makes the code a tad
simpler, and I doubt the extra work is noticeable.
I absolutely agree with this.
+ char *query = pg_malloc(QUERY_ALLOC);
Should we convert this to a PQExpBuffer?
Seems like a good idea. I think I was trying to change as few lines as
possible for my proof-of-concept. :)
Yeah, that's a good approach, I just noticed it while reading the hunks. We
can do that separately from this patchset.
In order to trim down the size of the patchset I think going ahead with 0003
independently of this makes sense, it has value by itself.
--
Daniel Gustafsson
On Thu, Jul 18, 2024 at 09:57:23AM +0200, Daniel Gustafsson wrote:
On 17 Jul 2024, at 23:32, Nathan Bossart <nathandbossart@gmail.com> wrote:
On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote:+static void +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot, .... + pg_free(query); +}A minor point, perhaps fueled by me not having played around much with this
patchset. It seems a bit odd that dispatch_query is responsible for freeing
the query from the get_query callback. I would have expected the output from
AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free.I don't see any problem with doing it the way you suggest.
Actually, I do see a problem. If we do it this way, we'll have to store a
string per database somewhere, which seems unnecessary.
However, while looking into this, I noticed that only one get_query
callback (get_db_subscription_count()) actually customizes the generated
query using information in the provided DbInfo. AFAICT we can do this
particular step without running a query in each database, as I mentioned
elsewhere [0]/messages/by-id/ZprQJv_TxccN3tkr@nathan. That should speed things up a bit and allow us to simplify
the AsyncTask code.
With that, if we are willing to assume that a given get_query callback will
generate the same string for all databases (and I think we should), we can
run the callback once and save the string in the step for dispatch_query()
to use. This would look more like what you suggested in the quoted text.
[0]: /messages/by-id/ZprQJv_TxccN3tkr@nathan
--
nathan
On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote:
However, while looking into this, I noticed that only one get_query
callback (get_db_subscription_count()) actually customizes the generated
query using information in the provided DbInfo. AFAICT we can do this
particular step without running a query in each database, as I mentioned
elsewhere [0]. That should speed things up a bit and allow us to simplify
the AsyncTask code.With that, if we are willing to assume that a given get_query callback will
generate the same string for all databases (and I think we should), we can
run the callback once and save the string in the step for dispatch_query()
to use. This would look more like what you suggested in the quoted text.
Here is a new patch set. I've included the latest revision of the patch to
fix get_db_subscription_count() from the other thread [0]https://commitfest.postgresql.org/49/5135/ as 0001 since I
expect that to be committed soon. I've also moved the patch that moves the
"live_check" variable to "user_opts" to 0002 since I plan on committing
that sooner than later, too. Otherwise, I've tried to address all feedback
provided thus far.
[0]: https://commitfest.postgresql.org/49/5135/
--
nathan
Attachments:
v5-0001-pg_upgrade-retrieve-subscription-count-more-effic.patchtext/plain; charset=us-asciiDownload+13-43
v5-0002-move-live_check-variable-to-user_opts.patchtext/plain; charset=us-asciiDownload+43-45
v5-0003-introduce-framework-for-parallelizing-pg_upgrade-.patchtext/plain; charset=us-asciiDownload+349-1
v5-0004-use-new-pg_upgrade-async-API-for-subscription-sta.patchtext/plain; charset=us-asciiDownload+110-95
v5-0005-use-new-pg_upgrade-async-API-for-retrieving-relin.patchtext/plain; charset=us-asciiDownload+110-129
v5-0006-use-new-pg_upgrade-async-API-to-parallelize-getti.patchtext/plain; charset=us-asciiDownload+37-27
v5-0007-use-new-pg_upgrade-async-API-to-parallelize-repor.patchtext/plain; charset=us-asciiDownload+41-42
v5-0008-parallelize-data-type-checks-in-pg_upgrade.patchtext/plain; charset=us-asciiDownload+160-160
v5-0009-parallelize-isn-and-int8-passing-mismatch-check-i.patchtext/plain; charset=us-asciiDownload+44-42
v5-0010-parallelize-user-defined-postfix-ops-check-in-pg_.patchtext/plain; charset=us-asciiDownload+69-66
v5-0011-parallelize-incompatible-polymorphics-check-in-pg.patchtext/plain; charset=us-asciiDownload+92-84
v5-0012-parallelize-tables-with-oids-check-in-pg_upgrade.patchtext/plain; charset=us-asciiDownload+48-43
v5-0013-parallelize-user-defined-encoding-conversions-che.patchtext/plain; charset=us-asciiDownload+57-51
On Mon, Jul 22, 2024 at 03:07:10PM -0500, Nathan Bossart wrote:
Here is a new patch set. I've included the latest revision of the patch to
fix get_db_subscription_count() from the other thread [0] as 0001 since I
expect that to be committed soon. I've also moved the patch that moves the
"live_check" variable to "user_opts" to 0002 since I plan on committing
that sooner than later, too. Otherwise, I've tried to address all feedback
provided thus far.
Here is just the live_check patch. I rebased it, gave it a commit message,
and fixed a silly mistake. Barring objections or cfbot indigestion, I plan
to commit this within the next couple of days.
--
nathan
Attachments:
v6-0001-pg_upgrade-Move-live_check-variable-to-user_opts.patchtext/plain; charset=us-asciiDownload+41-43
On 25 Jul 2024, at 04:58, Nathan Bossart <nathandbossart@gmail.com> wrote:
On Mon, Jul 22, 2024 at 03:07:10PM -0500, Nathan Bossart wrote:
Here is a new patch set. I've included the latest revision of the patch to
fix get_db_subscription_count() from the other thread [0] as 0001 since I
expect that to be committed soon. I've also moved the patch that moves the
"live_check" variable to "user_opts" to 0002 since I plan on committing
that sooner than later, too. Otherwise, I've tried to address all feedback
provided thus far.Here is just the live_check patch. I rebased it, gave it a commit message,
and fixed a silly mistake. Barring objections or cfbot indigestion, I plan
to commit this within the next couple of days.
LGTM
--
Daniel Gustafsson
On Thu, Jul 25, 2024 at 11:42:55AM +0200, Daniel Gustafsson wrote:
On 25 Jul 2024, at 04:58, Nathan Bossart <nathandbossart@gmail.com> wrote:
Here is just the live_check patch. I rebased it, gave it a commit message,
and fixed a silly mistake. Barring objections or cfbot indigestion, I plan
to commit this within the next couple of days.LGTM
Thanks, committed.
--
nathan
On 22.07.2024 21:07, Nathan Bossart wrote:
On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote:
However, while looking into this, I noticed that only one get_query
callback (get_db_subscription_count()) actually customizes the generated
query using information in the provided DbInfo. AFAICT we can do this
particular step without running a query in each database, as I mentioned
elsewhere [0]. That should speed things up a bit and allow us to simplify
the AsyncTask code.With that, if we are willing to assume that a given get_query callback will
generate the same string for all databases (and I think we should), we can
run the callback once and save the string in the step for dispatch_query()
to use. This would look more like what you suggested in the quoted text.Here is a new patch set. I've included the latest revision of the patch to
fix get_db_subscription_count() from the other thread [0] as 0001 since I
expect that to be committed soon. I've also moved the patch that moves the
"live_check" variable to "user_opts" to 0002 since I plan on committing
that sooner than later, too. Otherwise, I've tried to address all feedback
provided thus far.
Hi,
I like your idea of parallelizing these checks with async libpq API,
thanks for working on it. The patch doesn't apply cleanly on master
anymore, but I've rebased locally and taken it for a quick spin with a
pg16 instance of 1000 empty databases. Didn't see any regressions with
-j 1, there's some speedup with -j 8 (33 sec vs 8 sec for these checks).
One thing that I noticed that could be improved is we could start a new
connection right away after having run all query callbacks for the
current connection in process_slot, instead of just returning and
establishing the new connection only on the next iteration of the loop
in async_task_run after potentially sleeping on select.
+1 to Jeff's suggestion that perhaps we could reuse connections, but
perhaps that's a separate story.
Regards,
Ilya
On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote:
I like your idea of parallelizing these checks with async libpq API, thanks
for working on it. The patch doesn't apply cleanly on master anymore, but
I've rebased locally and taken it for a quick spin with a pg16 instance of
1000 empty databases. Didn't see any regressions with -j 1, there's some
speedup with -j 8 (33 sec vs 8 sec for these checks).
Thanks for taking a look. I'm hoping to do a round of polishing before
posting a rebased patch set soon.
One thing that I noticed that could be improved is we could start a new
connection right away after having run all query callbacks for the current
connection in process_slot, instead of just returning and establishing the
new connection only on the next iteration of the loop in async_task_run
after potentially sleeping on select.
Yeah, we could just recursively call process_slot() right after freeing the
slot. That'd at least allow us to avoid the spinning behavior as we run
out of databases to process, if nothing else.
+1 to Jeff's suggestion that perhaps we could reuse connections, but perhaps
that's a separate story.
When I skimmed through the various tasks, I didn't see a ton of
opportunities for further consolidation, or at least opportunities that
would help for upgrades from supported versions. The data type checks are
already consolidated, for example.
--
nathan
On Thu, Aug 01, 2024 at 12:44:35PM -0500, Nathan Bossart wrote:
On Wed, Jul 31, 2024 at 10:55:33PM +0100, Ilya Gladyshev wrote:
I like your idea of parallelizing these checks with async libpq API, thanks
for working on it. The patch doesn't apply cleanly on master anymore, but
I've rebased locally and taken it for a quick spin with a pg16 instance of
1000 empty databases. Didn't see any regressions with -j 1, there's some
speedup with -j 8 (33 sec vs 8 sec for these checks).Thanks for taking a look. I'm hoping to do a round of polishing before
posting a rebased patch set soon.One thing that I noticed that could be improved is we could start a new
connection right away after having run all query callbacks for the current
connection in process_slot, instead of just returning and establishing the
new connection only on the next iteration of the loop in async_task_run
after potentially sleeping on select.Yeah, we could just recursively call process_slot() right after freeing the
slot. That'd at least allow us to avoid the spinning behavior as we run
out of databases to process, if nothing else.
Here is a new patch set. Besides rebasing, I've added the recursive call
to process_slot() mentioned in the quoted text, and I've added quite a bit
of commentary to async.c.
--
nathan
Attachments:
v7-0001-introduce-framework-for-parallelizing-pg_upgrade-.patchtext/plain; charset=us-asciiDownload+527-1
v7-0002-use-new-pg_upgrade-async-API-for-subscription-sta.patchtext/plain; charset=us-asciiDownload+110-95
v7-0003-use-new-pg_upgrade-async-API-for-retrieving-relin.patchtext/plain; charset=us-asciiDownload+110-129
v7-0004-use-new-pg_upgrade-async-API-to-parallelize-getti.patchtext/plain; charset=us-asciiDownload+37-27
v7-0005-use-new-pg_upgrade-async-API-to-parallelize-repor.patchtext/plain; charset=us-asciiDownload+41-42
v7-0006-parallelize-data-type-checks-in-pg_upgrade.patchtext/plain; charset=us-asciiDownload+160-160
v7-0007-parallelize-isn-and-int8-passing-mismatch-check-i.patchtext/plain; charset=us-asciiDownload+44-42
v7-0008-parallelize-user-defined-postfix-ops-check-in-pg_.patchtext/plain; charset=us-asciiDownload+69-66
v7-0009-parallelize-incompatible-polymorphics-check-in-pg.patchtext/plain; charset=us-asciiDownload+92-84
v7-0010-parallelize-tables-with-oids-check-in-pg_upgrade.patchtext/plain; charset=us-asciiDownload+48-43
v7-0011-parallelize-user-defined-encoding-conversions-che.patchtext/plain; charset=us-asciiDownload+57-51
On 01.08.2024 22:41, Nathan Bossart wrote:
Here is a new patch set. Besides rebasing, I've added the recursive call
to process_slot() mentioned in the quoted text, and I've added quite a bit
of commentary to async.c.
That's much better now, thanks! Here's my code review, note that I
haven't tested the patches yet:
+void
+async_task_add_step(AsyncTask *task,
+ AsyncTaskGetQueryCB query_cb,
+ AsyncTaskProcessCB process_cb, bool free_result,
+ void *arg)
Is there any reason to have query as a callback function instead of char
*? From what I see right now, it doesn't give any extra flexibility, as
the query has to be static anyway (can't be customized on a per-database
basis) and will be created once before all the callbacks are run. While
passing in char * makes the API simpler, excludes any potential error of
making the query dependent on the current database and removes the
unnecessary malloc/free of the static strings.
+static void
+dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot,
+ const AsyncTask *task)
+{
+ ...
+ if (!PQsendQuery(slot->conn, cbs->query))
+ conn_failure(slot->conn);
+}
This will print "connection failure: connection pointer is NULL", which
I don't think makes a lot of sense to the end user. I'd prefer something
like pg_fatal("failed to allocate a new connection").
if (found)
- pg_fatal("Data type checks failed: %s", report.data);
+ {
+ pg_fatal("Data type checks failed: %s",
data_type_check_report.data);
+ termPQExpBuffer(&data_type_check_report);
+ }
`found` should be removed and replaced with `data_type_check_failed`, as
it's not set anymore. Also the termPQExpBuffer after pg_fatal looks
unnecessary.
+static bool *data_type_check_results;
+static bool data_type_check_failed;
+static PQExpBufferData data_type_check_report;
IMO, it would be nicer to have these as a local state, that's passed in
as an arg* to the AsyncTaskProcessCB, which aligns with how the other
checks do it.
-- End of review --
Regarding keeping the connections, the way I envisioned it entailed
passing a list of connections from one check to the next one (or keeping
a global state with connections?). I didn't concretely look at the code
to verify this, so it's just an abstract idea.