Review: Revise parallel pg_restore's scheduling heuristic
Rebased to correct for pg_indent changes.
Applies cleanly.
Compiles cleanly.
Passes regression tests.
Comments and format look good.
No documentation changes needed.
No regression test changes needed.
Performance tests to follow in a day or two.
-Kevin
Attachments:
alternate-parallel-restore-2.patchtext/plain; name=alternate-parallel-restore-2.patchDownload+142-60
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
Performance tests to follow in a day or two.
I'm looking to beg another week or so on this to run more tests. What
I can have by the end of today is pretty limited, mostly because I
decided it made the most sense to test this with big complex
databases, and it just takes a fair amount of time to throw around
that much data. (This patch didn't seem likely to make a significant
difference on smaller databases.)
My current plan is to test this on a web server class machine and a
distributed application class machine. Both database types have over
300 tables with tables with widely ranging row counts, widths, and
index counts.
It would be hard to schedule the requisite time on our biggest web
machines, but I assume an 8 core 64GB machine would give meaningful
results. Any sense what numbers of parallel jobs I should use for
tests? I would be tempted to try 1 (with the -1 switch), 8, 12, and
16 -- maybe keep going if 16 beats 12. My plan here would be to have
the dump on one machine, and run pg_restore there, and push it to a
database on another machine through the LAN on a 1Gb connection.
(This seems most likely to be what we'd be doing in real life.) I
would run each test with the CVS trunk tip with and without the patch
applied. The database is currently 1.1TB.
The application machine would have 2 cores and about 4GB RAM. I'm
tempted to use Milwaukee County's database there, as it has the most
rows per table, even though some of the counties doing a lot of
document scanning now have bigger databases in terms of disk space.
It's 89GB. I'd probably try job counts starting at one and going up by
one until performance starts to drop off. (At one I would use the -1
switch.)
In all cases I was planning on using a "conversion" postgresql.conf
file, turning off fsync, archiving, statistics, etc.
Does this sound like a sane approach to testing whether this patch
actually improves performance? Any suggestions before I start this,
to ensure most meaningful results?
-Kevin
On Sat, Jul 18, 2009 at 4:41 PM, Kevin
Grittner<Kevin.Grittner@wicourts.gov> wrote:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
Performance tests to follow in a day or two.
I'm looking to beg another week or so on this to run more tests. What
I can have by the end of today is pretty limited, mostly because I
decided it made the most sense to test this with big complex
databases, and it just takes a fair amount of time to throw around
that much data. (This patch didn't seem likely to make a significant
difference on smaller databases.)
No worries. We have a limited number of people who can test these
kinds of things and who have volunteered to serve as reviewers (two
that I'm aware of, and one of those I haven't heard from lately...).
So we'll be patient. :-)
My current plan is to test this on a web server class machine and a
distributed application class machine. Both database types have over
300 tables with tables with widely ranging row counts, widths, and
index counts.It would be hard to schedule the requisite time on our biggest web
machines, but I assume an 8 core 64GB machine would give meaningful
results. Any sense what numbers of parallel jobs I should use for
tests? I would be tempted to try 1 (with the -1 switch), 8, 12, and
16 -- maybe keep going if 16 beats 12. My plan here would be to have
the dump on one machine, and run pg_restore there, and push it to a
database on another machine through the LAN on a 1Gb connection.
(This seems most likely to be what we'd be doing in real life.) I
would run each test with the CVS trunk tip with and without the patch
applied. The database is currently 1.1TB.The application machine would have 2 cores and about 4GB RAM. I'm
tempted to use Milwaukee County's database there, as it has the most
rows per table, even though some of the counties doing a lot of
document scanning now have bigger databases in terms of disk space.
It's 89GB. I'd probably try job counts starting at one and going up by
one until performance starts to drop off. (At one I would use the -1
switch.)In all cases I was planning on using a "conversion" postgresql.conf
file, turning off fsync, archiving, statistics, etc.Does this sound like a sane approach to testing whether this patch
actually improves performance? Any suggestions before I start this,
to ensure most meaningful results?
This all sounds reasonable to me, although it might be worth testing
with default settings too.
...Robert
Kevin Grittner wrote:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
Performance tests to follow in a day or two.
I'm looking to beg another week or so on this to run more tests. What
I can have by the end of today is pretty limited, mostly because I
decided it made the most sense to test this with big complex
databases, and it just takes a fair amount of time to throw around
that much data. (This patch didn't seem likely to make a significant
difference on smaller databases.)My current plan is to test this on a web server class machine and a
distributed application class machine. Both database types have over
300 tables with tables with widely ranging row counts, widths, and
index counts.It would be hard to schedule the requisite time on our biggest web
machines, but I assume an 8 core 64GB machine would give meaningful
results. Any sense what numbers of parallel jobs I should use for
tests? I would be tempted to try 1 (with the -1 switch), 8, 12, and
16 -- maybe keep going if 16 beats 12. My plan here would be to have
the dump on one machine, and run pg_restore there, and push it to a
database on another machine through the LAN on a 1Gb connection.
(This seems most likely to be what we'd be doing in real life.) I
would run each test with the CVS trunk tip with and without the patch
applied. The database is currently 1.1TB.
you need to be careful here - in my latest round of benchmarking I had
actually test with the workload generator on the same box because on
fast boxes we can easily achive >100MB/s total load rate these days.
At these load rates you are very close or over the pratical limits of
GigE...
Stefan
Robert Haas wrote:
On Sat, Jul 18, 2009 at 4:41 PM, Kevin
Grittner<Kevin.Grittner@wicourts.gov> wrote:"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
Performance tests to follow in a day or two.
I'm looking to beg another week or so on this to run more tests. What
I can have by the end of today is pretty limited, mostly because I
decided it made the most sense to test this with big complex
databases, and it just takes a fair amount of time to throw around
that much data. (This patch didn't seem likely to make a significant
difference on smaller databases.)No worries. We have a limited number of people who can test these
kinds of things and who have volunteered to serve as reviewers (two
that I'm aware of, and one of those I haven't heard from lately...).
So we'll be patient. :-)My current plan is to test this on a web server class machine and a
distributed application class machine. Both database types have over
300 tables with tables with widely ranging row counts, widths, and
index counts.It would be hard to schedule the requisite time on our biggest web
machines, but I assume an 8 core 64GB machine would give meaningful
results. Any sense what numbers of parallel jobs I should use for
tests? I would be tempted to try 1 (with the -1 switch), 8, 12, and
16 -- maybe keep going if 16 beats 12. My plan here would be to have
the dump on one machine, and run pg_restore there, and push it to a
database on another machine through the LAN on a 1Gb connection.
(This seems most likely to be what we'd be doing in real life.) I
would run each test with the CVS trunk tip with and without the patch
applied. The database is currently 1.1TB.The application machine would have 2 cores and about 4GB RAM. I'm
tempted to use Milwaukee County's database there, as it has the most
rows per table, even though some of the counties doing a lot of
document scanning now have bigger databases in terms of disk space.
It's 89GB. I'd probably try job counts starting at one and going up by
one until performance starts to drop off. (At one I would use the -1
switch.)In all cases I was planning on using a "conversion" postgresql.conf
file, turning off fsync, archiving, statistics, etc.Does this sound like a sane approach to testing whether this patch
actually improves performance? Any suggestions before I start this,
to ensure most meaningful results?This all sounds reasonable to me, although it might be worth testing
with default settings too.
To performance test this properly you might need to devise a test that
will use a sufficiently different order of queueing items to show the
difference.
One thing I am particularly interested in is to see if queuing FK items
for a table as soon as they become available, which is most likely to be
when the referred to index is created, rather than possibly doing them
all together (assuming they are named with the table name as a prefix)
as TOC order would do, has a better performance or not.
cheers
andrew
Kevin,
It would be hard to schedule the requisite time on our biggest web
machines, but I assume an 8 core 64GB machine would give meaningful
results. Any sense what numbers of parallel jobs I should use for
tests? I would be tempted to try 1 (with the -1 switch), 8, 12, and
16 -- maybe keep going if 16 beats 12.
Personally, I wouldn't go over the number of cores. But if you do find
some gain that way, I'd be very interested to know it.
--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com
Andrew Dunstan <andrew@dunslane.net> wrote:
To performance test this properly you might need to devise a test
that will use a sufficiently different order of queueing items to
show the difference.One thing I am particularly interested in is to see if queuing FK
items for a table as soon as they become available, which is most
likely to be when the referred to index is created, rather than
possibly doing them all together (assuming they are named with the
table name as a prefix) as TOC order would do, has a better
performance or not.
Hmmm.... I'm reevaluating my database choice. The 1.1TB database
does not have foreign key constraints as a matter of policy. It is a
replica of the county databases, and we want to replicate whatever we
can of the county data -- failure for some reason of part of the data
to replicate should not block replication of something else, to
minimize differences. Is there still value in using such a database
at all, or should I focus on databases in the 50GB to 90GB range with
FK constraints defined?
When you suggest devising a test to show a difference, in what way
would it be likely that I would need to modify the real-life database
to get such a test? Our FKs do start with "<TableName>_". We don't
have underscores in our table names, although we use similar naming
for our indexes.
-Kevin
Robert Haas <robertmhaas@gmail.com> wrote:
it might be worth testing with default settings too.
OK. I'll do that too, if time allows.
-Kevin
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
My plan here would be to have
the dump on one machine, and run pg_restore there, and push it to a
database on another machine through the LAN on a 1Gb connection.
(This seems most likely to be what we'd be doing in real life.)
you need to be careful here - in my latest round of benchmarking I
had actually test with the workload generator on the same box
because on fast boxes we can easily achive >100MB/s total load rate
these days. At these load rates you are very close or over the
pratical limits of GigE...
Yeah, I was concerned about that. Problem was, with the 1.1TB
database it is hard to fit a dump and an extra copy of the database
onto the drives along with the production data for which they exist.
We would likely face the same constraints with real data if using the
parallel restore, since it requires an interim backup file (i.e., you
can't stream directly from the source database). There's also the
issue of reading from the same RAID you're targeting with the restore,
which is sometimes not optimal.
If I'm dropping down an order of magnitude or more in the databases I
will use, I could put the backup file on a separate RAID on the same
machine. This leaves a lot of options. I'm not sure which
combinations of configuration, file placement, and job count yield the
most useful results.
-Kevin
Andrew Dunstan <andrew@dunslane.net> wrote:
To performance test this properly you might need to devise a test
that will use a sufficiently different order of queueing items to
show the difference.
It would appear that I need help with devising a proper test. So far,
all tests have shown no difference in performance based on the patch;
I get almost twice the speed as a single job running in one database
transaction either way. Can someone explain what I should try to set
up to get a "best case" and a "worst case" for the patch? Our
production databases don't expose any difference, but I'm willing to
try to use them to "seed" an artificial case which will.
-Kevin
Kevin Grittner wrote:
Andrew Dunstan <andrew@dunslane.net> wrote:
To performance test this properly you might need to devise a test
that will use a sufficiently different order of queueing items to
show the difference.It would appear that I need help with devising a proper test. So far,
all tests have shown no difference in performance based on the patch;
I get almost twice the speed as a single job running in one database
transaction either way. Can someone explain what I should try to set
up to get a "best case" and a "worst case" for the patch? Our
production databases don't expose any difference, but I'm willing to
try to use them to "seed" an artificial case which will.
Does your test case have lots of foreign keys?
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> wrote:
Does your test case have lots of foreign keys?
488 of them.
There is some variation on individual tests, but the results look to
be "in the noise." When I add them all up, the patch comes out
0.0036% slower -- but that is so far into the noise as to be
considered "no difference" in my book.
-Kevin
I wrote:
So far, all tests have shown no difference in performance based on
the patch;
My testing to that point had been on a "big" machine with 16 CPUs and
128 GB RAM and dozens of spindles. Last night I tried with a dual
core machine with 4 GB RAM and 5 spindles in RAID 5. Still no
difference with the patch.
Any suggestions besides the foreign keys? Should 488 FKs be enough to
matter here? (Barring better suggestions, I'll try the small machine
again tonight with the default configuration, rather than the
optimized one.)
-Kevin
On Tue, Jul 28, 2009 at 10:28 AM, Kevin
Grittner<Kevin.Grittner@wicourts.gov> wrote:
I wrote:
So far, all tests have shown no difference in performance based on
the patch;My testing to that point had been on a "big" machine with 16 CPUs and
128 GB RAM and dozens of spindles. Last night I tried with a dual
core machine with 4 GB RAM and 5 spindles in RAID 5. Still no
difference with the patch.Any suggestions besides the foreign keys? Should 488 FKs be enough to
matter here? (Barring better suggestions, I'll try the small machine
again tonight with the default configuration, rather than the
optimized one.)
The other possibility here is that this just doesn't work. :-)
...Robert
Robert Haas <robertmhaas@gmail.com> writes:
The other possibility here is that this just doesn't work. :-)
That's why we wanted to test it ;-).
I don't have time to look right now, but ISTM the original discussion
that led to making that patch had ideas about scenarios where it would
be faster. It'd be worth digging that up and seeing if the current
tests covered the case or not.
regards, tom lane
On Tue, Jul 28, 2009 at 9:52 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
The other possibility here is that this just doesn't work. :-)
That's why we wanted to test it ;-).
I don't have time to look right now, but ISTM the original discussion
that led to making that patch had ideas about scenarios where it would
be faster. It'd be worth digging that up and seeing if the current
tests covered the case or not.
This is what I've been able to find on a quick look:
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00678.php
Sounds like Kevin may want to try renaming some of his indices to
produce intermingling...
...Robert
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Jul 28, 2009 at 9:52 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
I don't have time to look right now, but ISTM the original discussion
that led to making that patch had ideas about scenarios where it would
be faster.
This is what I've been able to find on a quick look:
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00678.php
Sounds like Kevin may want to try renaming some of his indices to
produce intermingling...
Also, the followup to that message points out that the 8.4.0 code has a
potential O(N^2) dependency on the total number of TOC items in the
dump. So it might be interesting to check the behavior with very large
numbers of tables/indexes.
regards, tom lane
Robert Haas <robertmhaas@gmail.com> wrote:
This is what I've been able to find on a quick look:
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00678.php
Sounds like Kevin may want to try renaming some of his indices to
produce intermingling...
Thanks, I'll give that a try. Renaming them is one thing, getting a
new dump is another, though. I probably won't be able to test that
theory until tomorrow night.
Last night's test yielded a couple interesting results.
For one thing, while the "optimized" postgresql.conf was 2.5% faster
than the default file for a single job in one database transaction, it
was 10% *slower* than the default for multi-job restores. I'll check
on that more later, to see what might be helping and what is hurting.
For another thing, with the default settings, the patched version ran
an additional 1% faster than the unpatched; although I don't have
enough samples to have a high degree of confidence it wasn't noise.
I'll run another slew of tests tonight with the existing dump file to
confirm to debunk that result, while I create a new dump file to test
with name intermingling on later nights.
For the record, the "default" postgresql.conf:
port = 5678
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
default_text_search_config = 'pg_catalog.english'
escape_string_warning = off
standard_conforming_strings = on
The "optimized" file adds these:
max_connections = 100
shared_buffers = 256MB
work_mem = 50MB
maintenance_work_mem = 500MB
bgwriter_lru_maxpages = 600
bgwriter_lru_multiplier = 10.0
fsync = off
full_page_writes = off
wal_buffers = 4MB
random_page_cost = 2.0
effective_cache_size = 3GB
logging_collector = on
log_line_prefix = '[%m] %p %q<%u %d %r> '
track_counts = off
autovacuum = off
sql_inheritance = off
I'm sure that there is a wealth of opinion on which of these are
slowing things down, but I'm going to withhold any guesses in favor of
testing them. (They all proved themselves neutral or beneficial in
objective testing for the single-job restores under 8.3.)
-Kevin
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, the followup to that message points out that the 8.4.0 code
has a potential O(N^2) dependency on the total number of TOC items
in the dump. So it might be interesting to check the behavior with
very large numbers of tables/indexes.
I've got 431 user tables with 578 indexes. How high should I push
this? Can I just create a bunch of randomly named empty tables with
primary keys to provoke this effect?
-Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, the followup to that message points out that the 8.4.0 code
has a potential O(N^2) dependency on the total number of TOC items
in the dump. So it might be interesting to check the behavior with
very large numbers of tables/indexes.
I've got 431 user tables with 578 indexes. How high should I push
this? Can I just create a bunch of randomly named empty tables with
primary keys to provoke this effect?
Yeah, just add a bunch of empty tables. Ten thousand or so, perhaps.
regards, tom lane