WIP parallel restore patch

Started by Andrew Dunstanabout 17 years ago8 messages
#1Andrew Dunstan
andrew@dunslane.net
1 attachment(s)

Attached is my latest parallel restore patch. I think it's functionally
complete for Unix.

Many bugs have been fixed since the last patch, and the hardcoded
limitation to two table dependencies is removed. It seems fairly robust
in my recent testing.

Remaining to be done:

. code cleanup
. better error checking in a few places
. final decision re command line option names/defaults
. documentation
. Windows support.

cheers

andrew

Attachments:

parallel_restore_10.patch.gzapplication/x-gzip; name=parallel_restore_10.patch.gzDownload
#2Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Andrew Dunstan (#1)
Re: WIP parallel restore patch

Andrew Dunstan wrote:

Attached is my latest parallel restore patch. I think it's functionally
complete for Unix.

Many bugs have been fixed since the last patch, and the hardcoded
limitation to two table dependencies is removed. It seems fairly robust
in my recent testing.

this version seems to be working much better on my test setup.
This patch results in a fairly nice improvment from ~170min (-m 1
--truncate-before-load) to ~39min (-m 16 --truncate-before-load) on my 8
core test box using a 70GB (uncompressed) dump consisting of 709 tables.

Stefan

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Stefan Kaltenbrunner (#2)
Re: WIP parallel restore patch

Stefan Kaltenbrunner wrote:

Andrew Dunstan wrote:

Attached is my latest parallel restore patch. I think it's
functionally complete for Unix.

Many bugs have been fixed since the last patch, and the hardcoded
limitation to two table dependencies is removed. It seems fairly
robust in my recent testing.

this version seems to be working much better on my test setup.
This patch results in a fairly nice improvment from ~170min (-m 1
--truncate-before-load) to ~39min (-m 16 --truncate-before-load) on my
8 core test box using a 70GB (uncompressed) dump consisting of 709
tables.

Great. Thanks for the report.

cheers

andrew

#4Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#1)
Re: WIP parallel restore patch

Andrew Dunstan wrote:

Attached is my latest parallel restore patch. I think it's functionally
complete for Unix.

Many bugs have been fixed since the last patch, and the hardcoded
limitation to two table dependencies is removed. It seems fairly robust
in my recent testing.

Remaining to be done:

. code cleanup
. better error checking in a few places
. final decision re command line option names/defaults
. documentation
. Windows support.

I've looked around this a bit, and it's fairly clear where the issue
comes in with Windows - we get heap corruption. Most likely because we
have multiple threads working on the same data somewhere.

I notice for example that we're doing a shallow copy of the
ArchiveHandle with a simple memcpy() for each thread. But that struct
contains a number of things like file descriptors and pointers. Have you
verified for each and every one of those that it actually doesn't get
modified anywhere? If not, a deep copy may be needed to make sure of that.

Other than that, are there any global variables that may be addressed
from more than one worker? If so they need to be marked as TLS.

And yes, I got hit by the lack of error checking a couple of times
during my testing - it would probably be a good idea to add that as soon
as possible, it helps a lot with the debugging.

If I run it with just a single thread, it also crashes in PQfinish()
called from die_horribly(), when trying to free conn->pgpass, which has
a value (non-NULL) but is not a valid pointer. This crash happens in the
worker thread, after it has logged that "fseek is required" - that's an
indicator something being passed down to the thread is either wrong or
being scribbled upon after the fact.

I didn't dig into these questions specifically - since you have already
been reading up on this code to do the patch you can probably reach the
answer to them much quicker :-) So I'll stick to the questions.

//Magnus

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#4)
Re: WIP parallel restore patch

Magnus Hagander wrote:

Andrew Dunstan wrote:

Attached is my latest parallel restore patch. I think it's functionally
complete for Unix.

Many bugs have been fixed since the last patch, and the hardcoded
limitation to two table dependencies is removed. It seems fairly robust
in my recent testing.

Remaining to be done:

. code cleanup
. better error checking in a few places
. final decision re command line option names/defaults
. documentation
. Windows support.

I've looked around this a bit, and it's fairly clear where the issue
comes in with Windows - we get heap corruption. Most likely because we
have multiple threads working on the same data somewhere.

I notice for example that we're doing a shallow copy of the
ArchiveHandle with a simple memcpy() for each thread. But that struct
contains a number of things like file descriptors and pointers. Have you
verified for each and every one of those that it actually doesn't get
modified anywhere? If not, a deep copy may be needed to make sure of that.

Other than that, are there any global variables that may be addressed
from more than one worker? If so they need to be marked as TLS.

And yes, I got hit by the lack of error checking a couple of times
during my testing - it would probably be a good idea to add that as soon
as possible, it helps a lot with the debugging.

If I run it with just a single thread, it also crashes in PQfinish()
called from die_horribly(), when trying to free conn->pgpass, which has
a value (non-NULL) but is not a valid pointer. This crash happens in the
worker thread, after it has logged that "fseek is required" - that's an
indicator something being passed down to the thread is either wrong or
being scribbled upon after the fact.

I didn't dig into these questions specifically - since you have already
been reading up on this code to do the patch you can probably reach the
answer to them much quicker :-) So I'll stick to the questions.

OK, Thanks, this will help. I thought I had caught the ArchiveHandle
things, but there might be one or two I missed.

I'll try to have a new version in a few days.

cheers

andrew

#6Kenneth Marshall
ktm@rice.edu
In reply to: Andrew Dunstan (#5)
Re: WIP parallel restore patch

Okay, I have had a chance to run some timing benchmarks.
Here are my results for the parallel pg_restore patch:

Ken
--------------------------------------------------
Server settings:

max_connections = 100 # (change requires restart)
shared_buffers = 256MB # min 128kB
work_mem = 128MB # min 64kB
maintenance_work_mem = 256MB # min 1MB

fsync = on # turns forced synchronization on or off

synchronous_commit = off # immediate fsync at commit

full_page_writes = on # recover from partial page writes
checkpoint_segments = 10 # in logfile segments, min 1, 16MB each
autovacuum = on # Enable autovacuum subprocess? 'on'

The total final database size is 6.5GB. Here are the timings for
the different run parallelism from 1 to 8 on a 4-core AMD box:

-bash-3.00$ time pg_restore -U postgres -p 5435 -d rttest /tmp/rtout.pz
...

real 19m3.175s
user 1m2.968s
sys 0m8.202s

improvement - 0%

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 2 -d rttest /tmp/rtout.pz
...
real 12m55.680s
user 1m12.440s
sys 0m8.343s

improvement - 32%

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 4 -d rttest /tmp/rtout.pz
...
real 9m45.056s
user 1m1.892s
sys 0m8.980s

improvement - 49%

The system only has 4 cores, but here are the results with "-m 8":

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 8 -d rttest /tmp/rtout.pz
...
real 8m15.320s
user 0m55.206s
sys 0m8.678s

improvement - 53%

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Kenneth Marshall (#6)
Re: WIP parallel restore patch

Kenneth Marshall wrote:

Okay, I have had a chance to run some timing benchmarks.
Here are my results for the parallel pg_restore patch:

Ken
--------------------------------------------------
Server settings:

max_connections = 100 # (change requires restart)
shared_buffers = 256MB # min 128kB
work_mem = 128MB # min 64kB
maintenance_work_mem = 256MB # min 1MB

fsync = on # turns forced synchronization on or off

synchronous_commit = off # immediate fsync at commit

full_page_writes = on # recover from partial page writes
checkpoint_segments = 10 # in logfile segments, min 1, 16MB each
autovacuum = on # Enable autovacuum subprocess? 'on'

The total final database size is 6.5GB. Here are the timings for
the different run parallelism from 1 to 8 on a 4-core AMD box:

-bash-3.00$ time pg_restore -U postgres -p 5435 -d rttest /tmp/rtout.pz
...

real 19m3.175s
user 1m2.968s
sys 0m8.202s

improvement - 0%

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 2 -d rttest /tmp/rtout.pz
...
real 12m55.680s
user 1m12.440s
sys 0m8.343s

improvement - 32%

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 4 -d rttest /tmp/rtout.pz
...
real 9m45.056s
user 1m1.892s
sys 0m8.980s

improvement - 49%

The system only has 4 cores, but here are the results with "-m 8":

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 8 -d rttest /tmp/rtout.pz
...
real 8m15.320s
user 0m55.206s
sys 0m8.678s

improvement - 53%

Interesting.

Can you try with two changes? Turn fsync off, and use the
--truncate-before-load switch.

In general, though, this is fairly much in line with other experience,
i.e. we can get up to about n/2 times speedup with n cores.

thanks

andrew

#8Kenneth Marshall
ktm@rice.edu
In reply to: Andrew Dunstan (#7)
Re: WIP parallel restore patch

On Thu, Nov 20, 2008 at 02:26:14PM -0500, Andrew Dunstan wrote:

Kenneth Marshall wrote:

Okay, I have had a chance to run some timing benchmarks.
Here are my results for the parallel pg_restore patch:

Ken
--------------------------------------------------
Server settings:

max_connections = 100 # (change requires restart)
shared_buffers = 256MB # min 128kB
work_mem = 128MB # min 64kB
maintenance_work_mem = 256MB # min 1MB

fsync = on # turns forced synchronization on or off

synchronous_commit = off # immediate fsync at commit

full_page_writes = on # recover from partial page writes
checkpoint_segments = 10 # in logfile segments, min 1, 16MB each
autovacuum = on # Enable autovacuum subprocess? 'on'

The total final database size is 6.5GB. Here are the timings for
the different run parallelism from 1 to 8 on a 4-core AMD box:

-bash-3.00$ time pg_restore -U postgres -p 5435 -d rttest /tmp/rtout.pz
...

real 19m3.175s
user 1m2.968s
sys 0m8.202s

improvement - 0%

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 2 -d rttest
/tmp/rtout.pz
...
real 12m55.680s
user 1m12.440s
sys 0m8.343s

improvement - 32%

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 4 -d rttest
/tmp/rtout.pz
...
real 9m45.056s
user 1m1.892s
sys 0m8.980s

improvement - 49%

The system only has 4 cores, but here are the results with "-m 8":

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 8 -d rttest
/tmp/rtout.pz
...
real 8m15.320s
user 0m55.206s
sys 0m8.678s

improvement - 53%

Interesting.

Can you try with two changes? Turn fsync off, and use the
--truncate-before-load switch.

In general, though, this is fairly much in line with other experience, i.e.
we can get up to about n/2 times speedup with n cores.

thanks

andrew

Okay, here is the same test run with:

Cheers,
Ken

--------------------------------------------------------------------
fsync = off
--truncate-before-load

-bash-3.00$ time pg_restore -U postgres -p 5435 --truncate-before-load -d rttest
/tmp/rtout.pz
...
real 16m25.031s
user 1m3.707s
sys 0m8.776s
improvement - 0%

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 2 --truncate-before-load -d r
ttest /tmp/rtout.pz
...
real 10m26.730s
user 0m48.782s
sys 0m7.214s
improvement - 36%

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 4 --truncate-before-load -d r
ttest /tmp/rtout.pz
...
real 8m5.061s
user 0m48.657s
sys 0m7.602s
improvement - 51%

-bash-3.00$ time pg_restore -U postgres -p 5435 -m 8 --truncate-before-load -d r
ttest /tmp/rtout.pz
...
real 6m18.787s
user 0m45.361s
sys 0m7.811s
improvement - 62%