parallel pg_restore
I am working on getting parallel pg_restore working. I'm currently
getting all the scaffolding working, and hope to have a naive prototype
posted within about a week.
The major question is how to choose the restoration order so as to
maximize efficiency both on the server and in reading the archive. My
thoughts are currently running something like this:
* when an item is completed, reduce the dependency count for each
item that depends on it by 1.
* when an item has a dependency count of 0 it is available for
execution, and gets moved to the head of the queue.
* when a new worker spot becomes available, if there not currently a
data load running then pick the first available data load,
otherwise pick the first available item.
This would mean that loading a table would probably be immediately
followed by creation of its indexes, including PK and UNIQUE
constraints, thus taking possible advantage of synchronised scans, data
in file system buffers, etc.
Another question is what we should do if the user supplies an explicit
order with --use-list. I'm inclined to say we should stick strictly with
the supplied order. Or maybe that should be an option.
Thoughts and comments welcome.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
I am working on getting parallel pg_restore working. I'm currently
getting all the scaffolding working, and hope to have a naive prototype
posted within about a week.
The major question is how to choose the restoration order so as to
maximize efficiency both on the server and in reading the archive.
One of the first software design principles I ever learned was to
separate policy from mechanism. ISTM in this first cut you ought to
concentrate on mechanism and let the policy just be something dumb
(but coded separately from the infrastructure). We can refine it after
that.
Another question is what we should do if the user supplies an explicit
order with --use-list. I'm inclined to say we should stick strictly with
the supplied order. Or maybe that should be an option.
Hmm. I think --use-list is used more for selecting a subset of items
to restore than for forcing a nondefault restore order. Forcing the
order used to be a major purpose, but that was years ago before we
had the dependency-driven-restore-order code working. So I'd vote that
the default behavior is to still allow parallel restore when this option
is used, and we should provide an orthogonal option that disables use of
parallel restore.
You'd really want the latter anyway for some cases, ie, when you don't
want the restore trying to hog the machine. Maybe the right form for
the extra option is just a limit on how many connections to use. Set it
to one to force the exact restore order, and to other values to throttle
how much of the machine the restore tries to eat.
One problem here though is that you'd need to be sure you behave sanely
when there is a dependency chain passing through an object that's not to
be restored. The ordering of the rest of the chain still ought to honor
the dependencies I think.
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I am working on getting parallel pg_restore working. I'm currently
getting all the scaffolding working, and hope to have a naive prototype
posted within about a week.The major question is how to choose the restoration order so as to
maximize efficiency both on the server and in reading the archive.One of the first software design principles I ever learned was to
separate policy from mechanism. ISTM in this first cut you ought to
concentrate on mechanism and let the policy just be something dumb
(but coded separately from the infrastructure). We can refine it after
that.
Indeed, that's exactly what I'm doing. However, given that time for the
8.4 window is short, I thought it would be sensible to get people
thinking about what the policy might be, while I get on with the mechanism.
Another question is what we should do if the user supplies an explicit
order with --use-list. I'm inclined to say we should stick strictly with
the supplied order. Or maybe that should be an option.Hmm. I think --use-list is used more for selecting a subset of items
to restore than for forcing a nondefault restore order. Forcing the
order used to be a major purpose, but that was years ago before we
had the dependency-driven-restore-order code working. So I'd vote that
the default behavior is to still allow parallel restore when this option
is used, and we should provide an orthogonal option that disables use of
parallel restore.You'd really want the latter anyway for some cases, ie, when you don't
want the restore trying to hog the machine. Maybe the right form for
the extra option is just a limit on how many connections to use. Set it
to one to force the exact restore order, and to other values to throttle
how much of the machine the restore tries to eat.
My intention is to have single-thread restore remain the default, at
least for this go round, and have the user be able to choose
--multi-thread=nn to specify the number of concurrent connections to use.
One problem here though is that you'd need to be sure you behave sanely
when there is a dependency chain passing through an object that's not to
be restored. The ordering of the rest of the chain still ought to honor
the dependencies I think.
Right. I think we'd need to fake doing a full restore and omit actually
restoring items not on the passed in list. That should be simple enough.
cheers
andrew
Le lundi 22 septembre 2008, Andrew Dunstan a écrit :
You'd really want the latter anyway for some cases, ie, when you don't
want the restore trying to hog the machine. Maybe the right form for
the extra option is just a limit on how many connections to use. Set it
to one to force the exact restore order, and to other values to throttle
how much of the machine the restore tries to eat.My intention is to have single-thread restore remain the default, at
least for this go round, and have the user be able to choose
--multi-thread=nn to specify the number of concurrent connections to use.
What about the make famous -j option?
-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously. If
there is more than one -j option, the last one is effective. If
the -j option is given without an argument, make will not limit
the number of jobs that can run simultaneously.
Regards,
--
dim
On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote:
My intention is to have single-thread restore remain the default, at
least for this go round, and have the user be able to choose
--multi-thread=nn to specify the number of concurrent connections to use.What about the make famous -j option?
-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously. If
there is more than one -j option, the last one is effective. If
the -j option is given without an argument, make will not limit
the number of jobs that can run simultaneously.
+1
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Sun, 2008-09-21 at 18:15 -0400, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
I am working on getting parallel pg_restore working. I'm currently
getting all the scaffolding working, and hope to have a naive prototype
posted within about a week.The major question is how to choose the restoration order so as to
maximize efficiency both on the server and in reading the archive.One of the first software design principles I ever learned was to
separate policy from mechanism. ISTM in this first cut you ought to
concentrate on mechanism and let the policy just be something dumb
(but coded separately from the infrastructure). We can refine it after
that.
Agreed. We musn't make too many built in assumptions about the best way
to parallelise the restore.
For example, running all CREATE INDEX at same time may help I/O on the
scan but it may also swamp memory and force additional I/O as a result.
We might need a setting for total memory available, so pg_restore can
try not to run tasks that will exceed that across settings. Preferably
this wouldn't be just a pg_restore setting.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote:
My intention is to have single-thread restore remain the default, at
least for this go round, and have the user be able to choose
--multi-thread=nn to specify the number of concurrent connections to use.What about the make famous -j option?
-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously. If
there is more than one -j option, the last one is effective. If
the -j option is given without an argument, make will not limit
the number of jobs that can run simultaneously.+1
If that's the preferred name I have no problem. I'm not sure about the
default argument part, though.
First, I'm not sure out getopt infrastructure actually provides for
optional arguments, and I am not going to remove it in pg_restore to get
around such a problem, at least now.
More importantly, I'm not convinced it's a good idea. It seems more like
a footgun that will potentially try to launch thousands of simultaneous
restore connections. I should have thought that optimal performance
would be reached at some small multiple (say maybe 2?) of the number of
CPUs on the server. You could achieve unlimited parallelism by saying
something like --jobs=99999, but I'd rather that were done very
explicitly instead of as the default value of the parameter.
cheers
andrew
On Mon, 2008-09-22 at 11:38 -0400, Andrew Dunstan wrote:
Simon Riggs wrote:
On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote:
My intention is to have single-thread restore remain the default, at
least for this go round, and have the user be able to choose
--multi-thread=nn to specify the number of concurrent connections to use.What about the make famous -j option?
-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously. If
there is more than one -j option, the last one is effective. If
the -j option is given without an argument, make will not limit
the number of jobs that can run simultaneously.+1
If that's the preferred name I have no problem. I'm not sure about the
default argument part, though.First, I'm not sure out getopt infrastructure actually provides for
optional arguments, and I am not going to remove it in pg_restore to get
around such a problem, at least now.More importantly, I'm not convinced it's a good idea. It seems more like
a footgun that will potentially try to launch thousands of simultaneous
restore connections. I should have thought that optimal performance
would be reached at some small multiple (say maybe 2?) of the number of
CPUs on the server. You could achieve unlimited parallelism by saying
something like --jobs=99999, but I'd rather that were done very
explicitly instead of as the default value of the parameter.
OK, sounds best.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Mon, 22 Sep 2008 17:24:28 +0100
Simon Riggs <simon@2ndQuadrant.com> wrote:
More importantly, I'm not convinced it's a good idea. It seems more
like a footgun that will potentially try to launch thousands of
simultaneous restore connections. I should have thought that
optimal performance would be reached at some small multiple (say
maybe 2?) of the number of CPUs on the server. You could achieve
unlimited parallelism by saying something like --jobs=99999, but
I'd rather that were done very explicitly instead of as the default
value of the parameter.OK, sounds best.
I will not argue vehemently here but I will say that "jobs" doesn't
seem correct. The term "workers" seems more appropriate.
Sincerely,
Joshua D. Drake
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Le lundi 22 septembre 2008, Joshua Drake a écrit :
I will not argue vehemently here but I will say that "jobs" doesn't
seem correct. The term "workers" seems more appropriate.
Mmmm, it sounds like it depends on the implementation (and how all workers
will share the same serializable transaction or just be independant jobs),
but my point here is more about giving the user a name they are used to.
Like in "oh, pg_restore -j, I see, thanks".
Now, if your argument is that the make concept of job does not match the
parallel pg_restore concept of workers, I'll simply bow to your choice:
baring other "limits", English not being my natural language makes it hard
for me to follow there ;)
Regards,
--
dim
On Mon, 2008-09-22 at 09:30 -0700, Joshua Drake wrote:
On Mon, 22 Sep 2008 17:24:28 +0100
Simon Riggs <simon@2ndQuadrant.com> wrote:More importantly, I'm not convinced it's a good idea. It seems more
like a footgun that will potentially try to launch thousands of
simultaneous restore connections. I should have thought that
optimal performance would be reached at some small multiple (say
maybe 2?) of the number of CPUs on the server. You could achieve
unlimited parallelism by saying something like --jobs=99999, but
I'd rather that were done very explicitly instead of as the default
value of the parameter.OK, sounds best.
I will not argue vehemently here but I will say that "jobs" doesn't
seem correct. The term "workers" seems more appropriate.
Agreed, but most utilities have "j" free but not w, p, t or other
letters that might be synonyms.
j is at least used for exactly this purpose in other tools.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
I will not argue vehemently here but I will say that "jobs" doesn't
seem correct. The term "workers" seems more appropriate.Agreed, but most utilities have "j" free but not w, p, t or other
letters that might be synonyms.j is at least used for exactly this purpose in other tools.
There are in fact very few letters available, as we've been fairly
profligate in our use of option letters in the pg_dump suite.
j and m happen to be two of those that are available.
I honestly don't have a terribly strong opinion about what it should be
called. I can live with jobs or multi-threads.
cheers
andrew
Andrew Dunstan wrote:
There are in fact very few letters available, as we've been fairly
profligate in our use of option letters in the pg_dump suite.j and m happen to be two of those that are available.
--max-workers
Max makes sense because the number of workers won't be consistent, a
worker may not have a job to do. It is also consistent with
auto_vacuum_max_workers.
Joshua D. Drake
Sincerely,
Joshua D. Drake
Joshua D. Drake wrote:
Andrew Dunstan wrote:
There are in fact very few letters available, as we've been fairly
profligate in our use of option letters in the pg_dump suite.
j and m happen to be two of those that are available.
--max-workers
Perhaps, but please do not use that as justification for using -m.
That would be equally silly as abbreviating "number of workers" to -n.
--
Sincerely,
Stephen R. van den Berg.
Experience is something you don't get until just after you need it.
On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
j and m happen to be two of those that are available.
I honestly don't have a terribly strong opinion about what it should be
called. I can live with jobs or multi-threads.
Perhaps we can use -j for jobs and -m for memory, so we can set memory
available across all threads with a single total value.
I can live with jobs or multi-threads also, whichever we decide. Neither
one is confusing to explain.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
On Tue, 23 Sep 2008 09:14:33 +0200
"Stephen R. van den Berg" <srb@cuci.nl> wrote:
Joshua D. Drake wrote:
Andrew Dunstan wrote:
There are in fact very few letters available, as we've been fairly
profligate in our use of option letters in the pg_dump suite.j and m happen to be two of those that are available.
--max-workers
Perhaps, but please do not use that as justification for using -m.
That would be equally silly as abbreviating "number of workers" to -n.
Actually I came up with it because it coincides with existing
terminology. Autovacuum has the concept of max_workers.
Joshua D. Drake
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
On Tue, 23 Sep 2008 08:44:19 +0100
Simon Riggs <simon@2ndQuadrant.com> wrote:
On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
j and m happen to be two of those that are available.
I honestly don't have a terribly strong opinion about what it
should be called. I can live with jobs or multi-threads.Perhaps we can use -j for jobs and -m for memory, so we can set memory
available across all threads with a single total value.I can live with jobs or multi-threads also, whichever we decide.
Neither one is confusing to explain.
Memory? Where did that come from. Andrew is that in your spec?
Joshua D. Drake
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote:
On Tue, 23 Sep 2008 08:44:19 +0100
Simon Riggs <simon@2ndQuadrant.com> wrote:On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
j and m happen to be two of those that are available.
I honestly don't have a terribly strong opinion about what it
should be called. I can live with jobs or multi-threads.Perhaps we can use -j for jobs and -m for memory, so we can set memory
available across all threads with a single total value.I can live with jobs or multi-threads also, whichever we decide.
Neither one is confusing to explain.Memory? Where did that come from. Andrew is that in your spec?
No, but it's in mine. As I said upthread, no point in making it more
parallel than memory allows. Different operations need more/less memory
than others, so we must think about that also. We can quickly work out
how big a table is, so we can work out how much memory it will need to
perform sorts for index builds and thus how many parallel builds can
sensibly take place.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
Simon Riggs wrote:
On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote:
On Tue, 23 Sep 2008 08:44:19 +0100
Simon Riggs <simon@2ndQuadrant.com> wrote:On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
j and m happen to be two of those that are available.
I honestly don't have a terribly strong opinion about what it
should be called. I can live with jobs or multi-threads.Perhaps we can use -j for jobs and -m for memory, so we can set memory
available across all threads with a single total value.I can live with jobs or multi-threads also, whichever we decide.
Neither one is confusing to explain.Memory? Where did that come from. Andrew is that in your spec?
No, but it's in mine. As I said upthread, no point in making it more
parallel than memory allows. Different operations need more/less memory
than others, so we must think about that also. We can quickly work out
how big a table is, so we can work out how much memory it will need to
perform sorts for index builds and thus how many parallel builds can
sensibly take place.
If that ever happens it will certainly not be in this go round.
In fact, we have some anecdotal evidence that the point of dimishing
returns is not reached until a fairly high degree of parallelism is used
(Joshua's and my client has been using 24 threads, I believe).
In any case, my agenda goes something like this:
* get it working with a basic selection algorithm on Unix (nearly
done - keep your eyes open for a patch soon)
* start testing
* get it working on Windows
* improve the selection algorithm
* harden code
If we get all that done by November we'll have done well. And we know
that in some cases just this much can lead to reductions in restore time
of the order of 80%.
cheers
andrew
On Tue, 2008-09-23 at 16:50 -0400, Andrew Dunstan wrote:
If we get all that done by November we'll have done well. And we know
that in some cases just this much can lead to reductions in restore
time
of the order of 80%.
Agreed. Go for it.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support