Fastest way to duplicate a quite large database

Started by Edson Richteralmost 10 years ago9 messagesgeneral
Jump to latest
#1Edson Richter
edsonrichter@hotmail.com

Hi!

I have a database "Customer" with about 60Gb of data.
I know I can backup and restore, but this seems too slow.

Is there any other option to duplicate this database as "CustomerTest"
as fast as possible (even fastar than backup/restore) - better if in one
operation (something like "copy database A to B")?
I would like to run this everyday, overnight, with minimal impact to
prepare a test environment based on production data.

Thanks,

--
Atenciosamente,

Edson Carlos Ericksson Richter

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2John R Pierce
pierce@hogranch.com
In reply to: Edson Richter (#1)
Re: Fastest way to duplicate a quite large database

On 4/12/2016 7:25 AM, Edson Richter wrote:

I have a database "Customer" with about 60Gb of data.
I know I can backup and restore, but this seems too slow.

Is there any other option to duplicate this database as "CustomerTest"
as fast as possible (even fastar than backup/restore) - better if in
one operation (something like "copy database A to B")?
I would like to run this everyday, overnight, with minimal impact to
prepare a test environment based on production data.

copy to the same machine, or copy to a different test server? different
answers.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Edson Richter
edsonrichter@hotmail.com
In reply to: John R Pierce (#2)
Re: Fastest way to duplicate a quite large database

Same machine, same cluster - just different database name.

Atenciosamente,

Edson Carlos Ericksson Richter

Em 12/04/2016 11:46, John R Pierce escreveu:

On 4/12/2016 7:25 AM, Edson Richter wrote:

I have a database "Customer" with about 60Gb of data.
I know I can backup and restore, but this seems too slow.

Is there any other option to duplicate this database as
"CustomerTest" as fast as possible (even fastar than backup/restore)
- better if in one operation (something like "copy database A to B")?
I would like to run this everyday, overnight, with minimal impact to
prepare a test environment based on production data.

copy to the same machine, or copy to a different test server?
different answers.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4John McKown
john.archie.mckown@gmail.com
In reply to: Edson Richter (#1)
Re: Fastest way to duplicate a quite large database

On Tue, Apr 12, 2016 at 9:25 AM, Edson Richter <edsonrichter@hotmail.com>
wrote:

Hi!

I have a database "Customer" with about 60Gb of data.
I know I can backup and restore, but this seems too slow.

Is there any other option to duplicate this database as "CustomerTest" as
fast as possible (even fastar than backup/restore) - better if in one
operation (something like "copy database A to B")?
I would like to run this everyday, overnight, with minimal impact to
prepare a test environment based on production data.

​Hum, I don't know exactly how to do it, but on Linux, you could put the
"Customer" database in a tablespace which resides on a BTRFS filesystem.
BTRFS can do a quick "snapshot" of the filesystem and you can then set
things for "incremental backup", as talked about here:
https://btrfs.wiki.kernel.org/index.php/Incremental_Backup . From some
reading, btrfs is a performance dog compared to others.
interesting take using various filesystems for PostgreSQL:
http://www.slideshare.net/fuzzycz/postgresql-on-ext4-xfs-btrfs-and-zfs
another on btrfs + PostgreSQL
http://www.cybertec.at/2015/01/forking-databases-the-art-of-copying-without-copying/
<quote from above>
...
So we managed to take fork a 15 GB database in 6 seconds with only a small
hiccup in performance. We are ready to start up the forked database.
...
<quote/>

I got a number of hits searching on "postgresql btrfs" using Google search.

Thanks,

--
Atenciosamente,

Edson Carlos Ericksson Richter

--
How many surrealists does it take to screw in a lightbulb? One to hold the
griffon and one to fill the bathtub with brightly colored LEDs.

Maranatha! <><
John McKown

#5Louis Battuello
louis.battuello@etasseo.com
In reply to: Edson Richter (#3)
Re: Fastest way to duplicate a quite large database

On Apr 12, 2016, at 10:51 AM, Edson Richter <edsonrichter@hotmail.com> wrote:

Same machine, same cluster - just different database name.

Atenciosamente,

Edson Carlos Ericksson Richter

Em 12/04/2016 11:46, John R Pierce escreveu:

On 4/12/2016 7:25 AM, Edson Richter wrote:

I have a database "Customer" with about 60Gb of data.
I know I can backup and restore, but this seems too slow.

Is there any other option to duplicate this database as "CustomerTest" as fast as possible (even fastar than backup/restore) - better if in one operation (something like "copy database A to B")?
I would like to run this everyday, overnight, with minimal impact to prepare a test environment based on production data.

copy to the same machine, or copy to a different test server? different answers.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Not sure how fast is “fast” for your system. You could try:

create database customer_test with template customer;

I’m able to duplicate a 20GB in a couple minutes with the above command.

Couple caveats:

1. No active connections to customer are allowed during the create.
2. You’ll likely have to recreate the search_path and reissue connect grants to the newly created database.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Edson Richter (#3)
Re: Fastest way to duplicate a quite large database

On 04/12/2016 07:51 AM, Edson Richter wrote:

Same machine, same cluster - just different database name.

Hmm, running tests against the same cluster you are running the
production database would seem to be a performance hit against the
production database and potentially dangerous should the tests trip a
bug that crashes the server.

Atenciosamente,

Edson Carlos Ericksson Richter

Em 12/04/2016 11:46, John R Pierce escreveu:

On 4/12/2016 7:25 AM, Edson Richter wrote:

I have a database "Customer" with about 60Gb of data.
I know I can backup and restore, but this seems too slow.

Is there any other option to duplicate this database as
"CustomerTest" as fast as possible (even fastar than backup/restore)
- better if in one operation (something like "copy database A to B")?
I would like to run this everyday, overnight, with minimal impact to
prepare a test environment based on production data.

copy to the same machine, or copy to a different test server?
different answers.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7John R Pierce
pierce@hogranch.com
In reply to: John McKown (#4)
Re: Fastest way to duplicate a quite large database

On 4/12/2016 7:55 AM, John McKown wrote:

Hum, I don't know exactly how to do it, but on Linux, you could put
the "Customer" database in a tablespace which resides on a BTRFS
filesystem. BTRFS can do a quick "snapshot" of the filesystem....

except, tablespaces aren't standalone, and there's no provision for
importing the contents of the tablespace. all the metadata remains
in the default tablespace, which leaves all sorts of room for problems
if you do this.

the /best/ way to achieve what the OP is asking for would likely be to
run the tests on a seperate server (or at least seperate postgres
instance aka cluster), and use pg_basebackup to rebuild this test instance.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8John McKown
john.archie.mckown@gmail.com
In reply to: John R Pierce (#7)
Re: Fastest way to duplicate a quite large database

On Tue, Apr 12, 2016 at 10:14 AM, John R Pierce <pierce@hogranch.com> wrote:

On 4/12/2016 7:55 AM, John McKown wrote:

Hum, I don't know exactly how to do it, but on Linux, you could put the
"Customer" database in a tablespace which resides on a BTRFS filesystem.
BTRFS can do a quick "snapshot" of the filesystem....

except, tablespaces aren't standalone, and there's no provision for
importing the contents of the tablespace. all the metadata remains in
the default tablespace, which leaves all sorts of room for problems if you
do this.

the /best/ way to achieve what the OP is asking for would likely be to run
the tests on a seperate server (or at least seperate postgres instance aka
cluster), and use pg_basebackup to rebuild this test instance.

​Good point. The reference, ​
http://www.cybertec.at/2015/01/forking-databases-the-art-of-copying-without-copying/
,
actually uses btrfs to make a copy using btrfs facilities, which is then
used in a _different_ PostgreSQL server instance on the same machine.
Actually, the data base _name_ would stay "Customer", but would reside in a
separate PostgreSQL server's PGDATA area instead of the production server's
PGDATA. Basically, any productions program or scripts could be used, as is,
with the test data simply by using the test server's port or unix socket
instead of the production server's. Not quite as good as being on a
separate Linux image, but not too shabby either. IMO.

--
john r pierce, recycling bits in santa cruz

--
How many surrealists does it take to screw in a lightbulb? One to hold the
griffon and one to fill the bathtub with brightly colored LEDs.

Maranatha! <><
John McKown

#9Louis Battuello
louis.battuello@etasseo.com
In reply to: John R Pierce (#7)
Re: Fastest way to duplicate a quite large database

On Apr 12, 2016, at 11:14 AM, John R Pierce <pierce@hogranch.com> wrote:

On 4/12/2016 7:55 AM, John McKown wrote:

Hum, I don't know exactly how to do it, but on Linux, you could put the "Customer" database in a tablespace which resides on a BTRFS filesystem. BTRFS can do a quick "snapshot" of the filesystem....

except, tablespaces aren't standalone, and there's no provision for importing the contents of the tablespace. all the metadata remains in the default tablespace, which leaves all sorts of room for problems if you do this.

the /best/ way to achieve what the OP is asking for would likely be to run the tests on a seperate server (or at least seperate postgres instance aka cluster), and use pg_basebackup to rebuild this test instance.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I agree with John’s post. I should have mentioned that my template database is never production. It’s an obfuscated copy of the production data on separate hardware. I use the "create with template” to spin up copies for developers/testers to provide a representative data set (not identical to production). And, since the create doesn’t copy table statistics, I have to kick off a post-copy background process to gather them:

nohup vacuumdb --analyze-only --quiet --dbname=${DATABASE} &>/dev/null &

Still, with all that, users are still able to drop and recreate a test database within a coffee break.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general