automated refresh of dev from prod

Started by Julie Nishimuraabout 7 years ago5 messagesgeneral
Jump to latest
#1Julie Nishimura
juliezain@hotmail.com

Hello everybody, I am new to postgresql environment, but trying to get up to speed.
Can you please share your experience on how you can automate refreshment of dev environment on regular basis (desirably weekly), taking for consideration some of prod dbs can be very large (like 20+ TB

Any suggestions?

Thank you!

#2Scot Kreienkamp
Scot.Kreienkamp@la-z-boy.com
In reply to: Julie Nishimura (#1)
RE: automated refresh of dev from prod

My method is complex and not so good for newbies, but it is incredibly fast and should scale to almost any size database. Mine are not nearly as large though.

I use two methods... the normal backup/restore for longer lived development environments, and for shorter lived environments I use postgres native mirroring from a secondary prod server to all my dev environments, then use LVM snapshots to take a snapshot of the postgres mount. Mount the snapshot and startup a second postgres instance in it and you have a mirror of production ready for use. That only lasts for a finite amount of time though (until you fill the space dedicated to the snapshot) before it becomes unusable,
that's the downside... it can't be long lived (hours, 1-2 days maybe). The upside is that the refresh from production in my environment for a 400G database is 3 seconds. It is a trade-off and not fit for every use, that's why we also use the traditional backup/restore in some cases.

Scot Kreienkamp |Senior Systems Engineer | La-Z-Boy Corporate
One La-Z-Boy Drive| Monroe, Michigan 48162 | Office: 734-384-6403 | | Mobile: 7349151444 | Email: Scot.Kreienkamp@la-z-boy.com
From: Julie Nishimura [mailto:juliezain@hotmail.com]
Sent: Wednesday, February 27, 2019 4:16 PM
To: pgsql-general@postgresql.org
Subject: automated refresh of dev from prod

ATTENTION: This email was sent to La-Z-Boy from an external source. Be vigilant when opening attachments or clicking links.
Hello everybody, I am new to postgresql environment, but trying to get up to speed.
Can you please share your experience on how you can automate refreshment of dev environment on regular basis (desirably weekly), taking for consideration some of prod dbs can be very large (like 20+ TB

Any suggestions?

Thank you!

This message is intended only for the individual or entity to which it is addressed. It may contain privileged, confidential information which is exempt from disclosure under applicable laws. If you are not the intended recipient, you are strictly prohibited from disseminating or distributing this information (other than to the intended recipient) or copying this information. If you have received this communication in error, please notify us immediately by e-mail or by telephone at the above number. Thank you.

#3Ron
ronljohnsonjr@gmail.com
In reply to: Julie Nishimura (#1)
Re: automated refresh of dev from prod

On 2/27/19 3:15 PM, Julie Nishimura wrote:

Hello everybody, I am new to postgresql environment, but trying to get up
to speed.
Can you please share your experience on how you can automate refreshment
of dev environment on regular basis (desirably weekly), taking for
consideration some of prod dbs can be very large (like 20+ TB

Any suggestions?

Weekly refreshes of 20TB from prod to dev seems a bit excessive.

--
Angular momentum makes the world go 'round.

#4Stephen Frost
sfrost@snowman.net
In reply to: Julie Nishimura (#1)
Re: automated refresh of dev from prod

Greetings,

* Julie Nishimura (juliezain@hotmail.com) wrote:

Hello everybody, I am new to postgresql environment, but trying to get up to speed.
Can you please share your experience on how you can automate refreshment of dev environment on regular basis (desirably weekly), taking for consideration some of prod dbs can be very large (like 20+ TB

Any suggestions?

The approach that I like to recommend is to have your backup/restore
solution be involved in this refreshing process, so that you're also
testing that your backup/restore process works correctly. For dealing
with larger databases, using a backup tool which has parallel backup,
parallel restore, and is able to restore just the files which are
different from the backup can make the restore take much less time (this
is what the 'delta-restore' option in pgbackrest does, and it was
specifically written to support exactly this kind of prod->dev periodic
refresh, though other tools may also support that these days).

As mentioned elsewhere on this thread, using snapshots can also be a
good approach though you have to be sure that the snapshot is completely
atomic across all filesystems that PostgreSQL is using, or you have to
deal with running pg_start/stop_backup and putting a backup_label into
place for the restored snapshot and a recovery.conf to provide a way for
PG to get at any WAL which was generated while the snapshot (or
snapshots) was being taken.

Thanks!

Stephen

#5Ben
bench@silentmedia.com
In reply to: Stephen Frost (#4)
Re: automated refresh of dev from prod

On Feb 28, 2019, at 8:04 AM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Julie Nishimura (juliezain@hotmail.com) wrote:

Hello everybody, I am new to postgresql environment, but trying to get up to speed.
Can you please share your experience on how you can automate refreshment of dev environment on regular basis (desirably weekly), taking for consideration some of prod dbs can be very large (like 20+ TB

Any suggestions?

The approach that I like to recommend is to have your backup/restore
solution be involved in this refreshing process, so that you're also
testing that your backup/restore process works correctly. For dealing
with larger databases, using a backup tool which has parallel backup,
parallel restore, and is able to restore just the files which are
different from the backup can make the restore take much less time (this
is what the 'delta-restore' option in pgbackrest does, and it was
specifically written to support exactly this kind of prod->dev periodic
refresh, though other tools may also support that these days).

As mentioned elsewhere on this thread, using snapshots can also be a
good approach though you have to be sure that the snapshot is completely
atomic across all filesystems that PostgreSQL is using, or you have to
deal with running pg_start/stop_backup and putting a backup_label into
place for the restored snapshot and a recovery.conf to provide a way for
PG to get at any WAL which was generated while the snapshot (or
snapshots) was being taken.

Very much yes to everything Stephen says. Regularly refreshing nonprod via your normal backup/restore process is an efficient way to test your backups, and snapshots are a great way to do backups when your data volume is greater than your churn between backups. (And at 20+ TB, I hope that's the case for you.)