Fastest way to restore a database

Started by William Garrisonover 17 years ago9 messagesgeneral
Jump to latest
#1William Garrison
postgres@mobydisk.com

I know that PostgreSQL is slow at restoring databases. But there are
some tricks to use when speeding it up. Here is a brief list I compiled
from reading the docs and reading some forums. Is there a definitive
list of things to do?

* Turn off fsync
So it won�t flush after every commit
* Turn off full_page_writes
Don�t write the WAL archives in a safe way. But we don�t need WAL
archives during a restore. Future versions of postgres will let you turn
off WAL archives entirely
* Turn off archive_mode
This disables the archiving of WAL logs
* Increase the checkpoint_segments parameter (the default is 3 � so...
maybe 10?)
Increases the number of transactions that can happen before a WAL checkpoint
* The --single-transaction paremeter to pg_restore *might* speed it up
One transaction is more efficient, but an entire DB restore might be too
big of a transaction. So I�m not so sure about this option
* Increase the maintenance_work_mem setting to 512MB
Gives more memory to CREATE_INDEX commands, which is part of the restore
process
* (PostgreSql 8.3 only) Turn off synchronous_commit
This makes it so that the database won�t wait for the WAL checkpoint to
be completed before moving on to the next operation. Again, we don�t
want WAL archiving during a restore anyway.

Are any of the above items not going to help? Anything I'm missing? Is
there a way to disable foreign key constraints during the restore since
I know it is already a good database?

I am using postgreSQL 8.2.9 on Win32

#2Greg Smith
gsmith@gregsmith.com
In reply to: William Garrison (#1)
Re: Fastest way to restore a database

On Fri, 12 Sep 2008, William Garrison wrote:

Is there a definitive list of things to do?

That section of the documention is pretty good:
http://www.postgresql.org/docs/current/static/populate.html

The main thing it's missing is a discussion of how to cut down on disk
commit overhead by either usinc async commit or turning fsync off. If
you've got a good caching controller that may not be needed though.

The other large chunk of information it doesn't really go into is what
server tuning you could do to improve general performance, which obviously
would then help with loading as well.
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server goes over
much of that.

* Turn off full_page_writes
Don’t write the WAL archives in a safe way. But we don’t need WAL archives
during a restore. Future versions of postgres will let you turn off WAL
archives entirely

Ideally you'd be using COPY such that the table was just created or
truncated before loading, which (if archive_mode is off) keeps them from
being WAL logged, as described in 14.4.7. If you do that and vastly
increase checkpoint_segments, full_page_writes has minimal impact.

* Increase the checkpoint_segments parameter (the default is 3 – so... maybe
10?)

64-256 is the usual range you'll see people using for bulk loading.

* Increase the maintenance_work_mem setting to 512MB

I haven't really seen any real improvement setting that over 256MB. If
you've got RAM to waste it doesn't really matter if you set it too high
though.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#3William Garrison
postgres@mobydisk.com
In reply to: Greg Smith (#2)
Re: Fastest way to restore a database

Thanks so much!

So... if I am using pg_dump and pg_restore with a compressed backup,
then it is using COPY, correct? And I think that would follow a CREATE
TABLE statement as mentioned in the first link... so no WAL files written?

Greg Smith wrote:

Show quoted text

On Fri, 12 Sep 2008, William Garrison wrote:

Is there a definitive list of things to do?

That section of the documention is pretty good:
http://www.postgresql.org/docs/current/static/populate.html

The main thing it's missing is a discussion of how to cut down on disk
commit overhead by either usinc async commit or turning fsync off. If
you've got a good caching controller that may not be needed though.

The other large chunk of information it doesn't really go into is what
server tuning you could do to improve general performance, which
obviously would then help with loading as well.
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server goes
over much of that.

* Turn off full_page_writes
Don�t write the WAL archives in a safe way. But we don�t need WAL
archives during a restore. Future versions of postgres will let you
turn off WAL archives entirely

Ideally you'd be using COPY such that the table was just created or
truncated before loading, which (if archive_mode is off) keeps them
from being WAL logged, as described in 14.4.7. If you do that and
vastly increase checkpoint_segments, full_page_writes has minimal impact.

* Increase the checkpoint_segments parameter (the default is 3 �
so... maybe 10?)

64-256 is the usual range you'll see people using for bulk loading.

* Increase the maintenance_work_mem setting to 512MB

I haven't really seen any real improvement setting that over 256MB.
If you've got RAM to waste it doesn't really matter if you set it too
high though.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: William Garrison (#3)
Re: Fastest way to restore a database

William Garrison <postgres@mobydisk.com> writes:

So... if I am using pg_dump and pg_restore with a compressed backup,
then it is using COPY, correct? And I think that would follow a CREATE
TABLE statement as mentioned in the first link... so no WAL files written?

Only if you use --single-transaction.

The worry expressed upthread about the transaction being "too large" is
unfounded, btw. Unlike some other DBs, PG doesn't have a finite-size
undo log.

regards, tom lane

#5Robert Treat
xzilla@users.sourceforge.net
In reply to: Greg Smith (#2)
Re: Fastest way to restore a database

On Friday 12 September 2008 14:32:07 Greg Smith wrote:

On Fri, 12 Sep 2008, William Garrison wrote:

Is there a definitive list of things to do?

That section of the documention is pretty good:
http://www.postgresql.org/docs/current/static/populate.html

The main thing it's missing is a discussion of how to cut down on disk
commit overhead by either usinc async commit or turning fsync off. If
you've got a good caching controller that may not be needed though.

The other large chunk of information it doesn't really go into is what
server tuning you could do to improve general performance, which obviously
would then help with loading as well.
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server goes over
much of that.

* Turn off full_page_writes
Don’t write the WAL archives in a safe way. But we don’t need WAL
archives during a restore. Future versions of postgres will let you turn
off WAL archives entirely

Ideally you'd be using COPY such that the table was just created or
truncated before loading, which (if archive_mode is off) keeps them from
being WAL logged, as described in 14.4.7. If you do that and vastly
increase checkpoint_segments, full_page_writes has minimal impact.

* Increase the checkpoint_segments parameter (the default is 3 – so...
maybe 10?)

64-256 is the usual range you'll see people using for bulk loading.

Don't forget to bump up checkpoint_timeout along with that... actually, I
blogged a couple of times on this topic:

http://people.planetpostgresql.org/xzilla/index.php?/archives/133-Getting-faster-database-restores-on-postgresql-8.1.html
http://people.planetpostgresql.org/xzilla/index.php?/archives/223-Measuring-database-restore-times.html

A little old, but might be helpful.

--
Robert Treat
http://www.omniti.com
Database: Scalability: Consulting:

#6Scott Ribe
scott_ribe@killerbytes.com
In reply to: Tom Lane (#4)
Re: Fastest way to restore a database

The worry expressed upthread about the transaction being "too large" is
unfounded, btw. Unlike some other DBs, PG doesn't have a finite-size
undo log.

Sure, it won't fail. But would there be some point at which it would become
slower than multiple transactions? Or is it always faster (or at least as
fast)?

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Ribe (#6)
Re: Fastest way to restore a database

Scott Ribe <scott_ribe@killerbytes.com> writes:

The worry expressed upthread about the transaction being "too large" is
unfounded, btw. Unlike some other DBs, PG doesn't have a finite-size
undo log.

Sure, it won't fail. But would there be some point at which it would become
slower than multiple transactions? Or is it always faster (or at least as
fast)?

I can't think of any reason it would be slower.

There are certainly issues you could run into with very long
transactions, like vacuum not being able to remove bloat elsewhere.

regards, tom lane

#8Greg Smith
gsmith@gregsmith.com
In reply to: Robert Treat (#5)
Re: Fastest way to restore a database

On Fri, 12 Sep 2008, Robert Treat wrote:

Don't forget to bump up checkpoint_timeout along with that... actually, I
blogged a couple of times on this topic

And with that there's enough information to start a dedicated page on this
topic: http://wiki.postgresql.org/wiki/Bulk_Loading_and_Restores

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

#9Robert Treat
xzilla@users.sourceforge.net
In reply to: Tom Lane (#7)
Re: Fastest way to restore a database

On Friday 12 September 2008 15:55:46 Tom Lane wrote:

Scott Ribe <scott_ribe@killerbytes.com> writes:

The worry expressed upthread about the transaction being "too large" is
unfounded, btw. Unlike some other DBs, PG doesn't have a finite-size
undo log.

Sure, it won't fail. But would there be some point at which it would
become slower than multiple transactions? Or is it always faster (or at
least as fast)?

I can't think of any reason it would be slower.

There are certainly issues you could run into with very long
transactions, like vacuum not being able to remove bloat elsewhere.

Which reminds me (and not seeing it elsewhere), on full restores you will
probably want to disable autovacuum entirely, as it will compete for
reasources and can lead to locking issues as well. Note, this can sometimes
apply to more narrow restore scenarios, but it isnt as cut and dried. (Ie,
with multiple database in a cluster, you dont want to disable it for all
databases, though it'd be nice to disable it for the one you're restoring)

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL