pgBackRest backup from standby

Started by Don Seilerabout 8 years ago21 messagesgeneral
Jump to latest
#1Don Seiler
don@seiler.us

Evening all.

Looking to use pgBackRest to take a backup from a hot standby. I'm reading
that pgBackRest still needs to connect to the primary and copy some files.
My questions are:

1. What files does it need to copy? Config files? WAL files?
2. How does it connect? SSH?
3. Does pgBackRest need to be installed and configured on the primary as
well?

Thanks,
Don.

--
Don Seiler
www.seiler.us

#2Michael Paquier
michael@paquier.xyz
In reply to: Don Seiler (#1)
Re: pgBackRest backup from standby

On Sun, Feb 18, 2018 at 06:34:46PM -0600, Don Seiler wrote:

Looking to use pgBackRest to take a backup from a hot standby. I'm reading
that pgBackRest still needs to connect to the primary and copy some files.
My questions are:

1. What files does it need to copy? Config files? WAL files?
2. How does it connect? SSH?
3. Does pgBackRest need to be installed and configured on the primary as
well?

I am adding in CC: Stephen Frost and David Steele who work on the took.
You may want to contact the maintainers directly through github where
the project is maintained:
https://github.com/pgbackrest/pgbackrest
--
Michael

#3Don Seiler
don@seiler.us
In reply to: Michael Paquier (#2)
Re: pgBackRest backup from standby

On Sun, Feb 18, 2018 at 6:43 PM, Michael Paquier <michael@paquier.xyz>
wrote:

I am adding in CC: Stephen Frost and David Steele who work on the took.

I assumed Stephen was already on this list, and I communicate with him
regularly on Slack as well but just throwing this out there on a Sunday
night.

You may want to contact the maintainers directly through github where
the project is maintained:
https://github.com/pgbackrest/pgbackrest

Is that the place to just ask questions? I wasn't sure that "Issues" meant
just questions versus problems or bugs. I didn't see any mention of a forum
or list on their website, and there have been lots of pgBackRest questions
on this list in the past so I settled on this one.

Don.

--
Don Seiler
www.seiler.us

#4Michael Paquier
michael@paquier.xyz
In reply to: Don Seiler (#3)
Re: pgBackRest backup from standby

On Sun, Feb 18, 2018 at 06:48:30PM -0600, Don Seiler wrote:

On Sun, Feb 18, 2018 at 6:43 PM, Michael Paquier <michael@paquier.xyz> wrote:

You may want to contact the maintainers directly through github where
the project is maintained:
https://github.com/pgbackrest/pgbackrest

Is that the place to just ask questions? I wasn't sure that "Issues" meant
just questions versus problems or bugs. I didn't see any mention of a forum
or list on their website, and there have been lots of pgBackRest questions
on this list in the past so I settled on this one.

Stephen and David are around all the time, so it is not really an issue
to discuss things related to pgBackRest on this list I guess :)
Attaching related folks directly in CC: also usually helps.

You may get faster feedback by opening an issue directly on github
though, as there are a lot of emails on -general so it is easy to get
things lost. My 2c.
--
Michael

#5David Steele
david@pgmasters.net
In reply to: Don Seiler (#1)
Re: pgBackRest backup from standby

Hi Don,

On 2/18/18 7:34 PM, Don Seiler wrote:

Looking to use pgBackRest to take a backup from a hot standby. I'm
reading that pgBackRest still needs to connect to the primary and copy
some files. My questions are:

1. What files does it need to copy? Config files? WAL files?

It copies files that are not replicated from the primary so that a
primary-style backup is created. Anything that is replicated (which is
by far the bulk of the data) is copied from the standby.

2. How does it connect? SSH?

Yes.

3. Does pgBackRest need to be installed and configured on the primary
as well?

Yes. Anyway, it's best to archive from the primary so a replication
failure does not affect your archiving.

Configuring pgBackRest, SSH, standby, and backup from standby are all
covered in the user guide.

http://pgbackrest.org/user-guide.html

In particular:

http://pgbackrest.org/user-guide.html#backup-host
http://pgbackrest.org/user-guide.html#replication
http://pgbackrest.org/user-guide.html#standby-backup

Regards,
--
-David
david@pgmasters.net

#6David Steele
david@pgmasters.net
In reply to: Michael Paquier (#4)
Re: pgBackRest backup from standby

On 2/18/18 10:20 PM, Michael Paquier wrote:

On Sun, Feb 18, 2018 at 06:48:30PM -0600, Don Seiler wrote:

On Sun, Feb 18, 2018 at 6:43 PM, Michael Paquier <michael@paquier.xyz> wrote:

You may want to contact the maintainers directly through github where
the project is maintained:
https://github.com/pgbackrest/pgbackrest

Is that the place to just ask questions? I wasn't sure that "Issues" meant
just questions versus problems or bugs. I didn't see any mention of a forum
or list on their website, and there have been lots of pgBackRest questions
on this list in the past so I settled on this one.

Stephen and David are around all the time, so it is not really an issue
to discuss things related to pgBackRest on this list I guess :)
Attaching related folks directly in CC: also usually helps.

You may get faster feedback by opening an issue directly on github
though, as there are a lot of emails on -general so it is easy to get
things lost. My 2c.

Either is fine with me, but as Michael says I might miss postings to
-general. I'm sure somebody else would catch it, though.

--
-David
david@pgmasters.net

#7Don Seiler
don@seiler.us
In reply to: David Steele (#5)
Re: pgBackRest backup from standby

On Mon, Feb 19, 2018 at 8:18 AM, David Steele <david@pgmasters.net> wrote:

It copies files that are not replicated from the primary so that a
primary-style backup is created. Anything that is replicated (which is
by far the bulk of the data) is copied from the standby.

OK so all data files would be copied from standby. Can you give me an
example of the types of files that need to be copied from primary?

it's best to archive from the primary so a replication
failure does not affect your archiving.

Understood, just not something I can change in production primary at the
moment. Hence looking to see about a quick one-off backup from standby.

Configuring pgBackRest, SSH, standby, and backup from standby are all
covered in the user guide.

Thanks, I've been through it a few times and played with some test backups
from primary clones. I just ditched my master/replica clone setup but I'll
test there as well. I just had a couple questions about the mechanics.

--
Don Seiler
www.seiler.us

#8Don Seiler
don@seiler.us
In reply to: David Steele (#6)
Re: pgBackRest backup from standby

On Mon, Feb 19, 2018 at 8:23 AM, David Steele <david@pgmasters.net> wrote:

Either is fine with me, but as Michael says I might miss postings to
-general. I'm sure somebody else would catch it, though.

OK, I'll make use of the issues tracker going forward.

--
Don Seiler
www.seiler.us

#9David Steele
david@pgmasters.net
In reply to: Don Seiler (#7)
Re: pgBackRest backup from standby

Hi Don,

On 2/19/18 9:25 AM, Don Seiler wrote:

On Mon, Feb 19, 2018 at 8:18 AM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

It copies files that are not replicated from the primary so that a
primary-style backup is created. Anything that is replicated (which is
by far the bulk of the data) is copied from the standby.

OK so all data files would be copied from standby. Can you give me an
example of the types of files that need to be copied from primary?
 

Anything *not* in global (except pg_control), base, pg_tblspc,
pg_xact/pg_clog, and pg_multixact are copied from the primary.

For example, pg_stat is copied from the primary so these stats are
preserved on a standby backup.

pgBackRest uses all the same exclusions as pg_basebackup, so many
dirs/files are not copied at all: pg_dynshmem, pg_notify, pg_replslot,
pg_serial, pg_snapshots, pg_stat_tmp, pg_subtrans, etc.

Full list here
https://www.postgresql.org/docs/10/static/protocol-replication.html.

it's best to archive from the primary so a replication
failure does not affect your archiving.

Understood, just not something I can change in production primary at the
moment. Hence looking to see about a quick one-off backup from standby.

For a quick one-off, pg_basebackup is your friend.

Regards,
--
-David
david@pgmasters.net

#10Don Seiler
don@seiler.us
In reply to: David Steele (#9)
Re: pgBackRest backup from standby

On Mon, Feb 19, 2018 at 8:53 AM, David Steele <david@pgmasters.net> wrote:

Anything *not* in global (except pg_control), base, pg_tblspc,
pg_xact/pg_clog, and pg_multixact are copied from the primary.

For example, pg_stat is copied from the primary so these stats are
preserved on a standby backup.

So if I have tablespaces outside of $PGDATA (but symlinked from within
pg_tblspc, of course), those will still be backed up from the standby,
right?

Is it right to say that the files that would be copied from primary are
very small, typically? So it isn't a huge transfer over the WAN (in my
case)?

pgBackRest uses all the same exclusions as pg_basebackup, so many
dirs/files are not copied at all: pg_dynshmem, pg_notify, pg_replslot,
pg_serial, pg_snapshots, pg_stat_tmp, pg_subtrans, etc.

Full list here
https://www.postgresql.org/docs/10/static/protocol-replication.html.

it's best to archive from the primary so a replication
failure does not affect your archiving.

Understood, just not something I can change in production primary at the
moment. Hence looking to see about a quick one-off backup from standby.

For a quick one-off, pg_basebackup is your friend.

One of the requirements of this backup is encryption, which I don't see any
notes for with pg_basebackup. Also due to the size, parallel workers.
pgBackRest gives me both of these. I need compression as well but that
pg_basebackup does do.

I did come up with a sort of Rube Goldberg-esque workaround for now
involving using a clone of the prod standby VM from Veeam backup to use as
the backup source (after stopping recovery and opening it as a standalone
DB).

Don.

--
Don Seiler
www.seiler.us

#11David Steele
david@pgmasters.net
In reply to: Don Seiler (#10)
Re: pgBackRest backup from standby

Hi Don,

On 2/19/18 10:01 AM, Don Seiler wrote:

On Mon, Feb 19, 2018 at 8:53 AM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

Anything *not* in global (except pg_control), base, pg_tblspc,
pg_xact/pg_clog, and pg_multixact are copied from the primary.

For example, pg_stat is copied from the primary so these stats are
preserved on a standby backup.

So if I have tablespaces outside of $PGDATA (but symlinked from within
pg_tblspc, of course), those will still be backed up from the standby,
right?

Correct.

Is it right to say that the files that would be copied from primary are
very small, typically? So it isn't a huge transfer over the WAN (in my
case)?

Yes, they are typically very small. The general exception to this rule
is if logs are stored in pg_log. I recommend storing logs out of the
PGDATA dir as they can be quite large and don't really make sense to
restore to another server.

Files copied from the master will be marked as such in backup.manifest
(master:true) so you can check for yourself.

I did come up with a sort of Rube Goldberg-esque workaround for now
involving using a clone of the prod standby VM from Veeam backup to use
as the backup source (after stopping recovery and opening it as a
standalone DB).

You don't get PITR that way, of course, but at least it's a backup. As
long as your clone is consistent.

--
-David
david@pgmasters.net

#12Don Seiler
don@seiler.us
In reply to: David Steele (#11)
Re: pgBackRest backup from standby

On Mon, Feb 19, 2018 at 9:21 AM, David Steele <david@pgmasters.net> wrote:

Yes, they are typically very small. The general exception to this rule
is if logs are stored in pg_log. I recommend storing logs out of the
PGDATA dir as they can be quite large and don't really make sense to
restore to another server.

Files copied from the master will be marked as such in backup.manifest
(master:true) so you can check for yourself.

Good to know. And fortunately for this DB we do have pg_log (and pg_xlog)
symlinked to different volumes outside of $PGDATA.

I did come up with a sort of Rube Goldberg-esque workaround for now

involving using a clone of the prod standby VM from Veeam backup to use
as the backup source (after stopping recovery and opening it as a
standalone DB).

You don't get PITR that way, of course, but at least it's a backup. As
long as your clone is consistent.

Yes it's a crash-consistent snapshot-based backup. I've done quite a few
restores from it and it works great. It can do PITR as well since I would
have all the WAL files from prod needed to keep recovering. But for these
cases I just recover it to the first consistent point and open it for
testing (or backups in this case).

Thanks for all your help!

Don.

--
Don Seiler
www.seiler.us

#13David Steele
david@pgmasters.net
In reply to: Don Seiler (#12)
Re: pgBackRest backup from standby

On 2/19/18 10:32 AM, Don Seiler wrote:

On Mon, Feb 19, 2018 at 9:21 AM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

Yes, they are typically very small.  The general exception to this rule
is if logs are stored in pg_log.  I recommend storing logs out of the
PGDATA dir as they can be quite large and don't really make sense to
restore to another server.

Files copied from the master will be marked as such in backup.manifest
(master:true) so you can check for yourself.

Good to know. And fortunately for this DB we do have pg_log (and
pg_xlog) symlinked to different volumes outside of $PGDATA.

If pg_log is symlinked to PGDATA it will be copied. pg_xlog is not
copied in any backup.

I did come up with a sort of Rube Goldberg-esque workaround for now
involving using a clone of the prod standby VM from Veeam backup to use
as the backup source (after stopping recovery and opening it as a
standalone DB).

You don't get PITR that way, of course, but at least it's a backup.  As
long as your clone is consistent.

Yes it's a crash-consistent snapshot-based backup. I've done quite a few
restores from it and it works great. It can do PITR as well since I
would have all the WAL files from prod needed to keep recovering. But
for these cases I just recover it to the first consistent point and open
it for testing (or backups in this case). 

I don't think it would be safe to do PITR on a backup taken in this way.
The WAL diverges even if you suppress a timeline switch.

--
-David
david@pgmasters.net

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: David Steele (#13)
Re: pgBackRest backup from standby

On 19 February 2018 at 16:17, David Steele <david@pgmasters.net> wrote:

I did come up with a sort of Rube Goldberg-esque workaround for now
involving using a clone of the prod standby VM from Veeam backup to use
as the backup source (after stopping recovery and opening it as a
standalone DB).

You don't get PITR that way, of course, but at least it's a backup. As
long as your clone is consistent.

Yes it's a crash-consistent snapshot-based backup. I've done quite a few
restores from it and it works great. It can do PITR as well since I
would have all the WAL files from prod needed to keep recovering. But
for these cases I just recover it to the first consistent point and open
it for testing (or backups in this case).

I don't think it would be safe to do PITR on a backup taken in this way.

If you have all the WAL files, then it would be safe.

The WAL diverges even if you suppress a timeline switch.

Which is exactly why we have timelines.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15David Steele
david@pgmasters.net
In reply to: Simon Riggs (#14)
Re: pgBackRest backup from standby

On 2/19/18 11:29 AM, Simon Riggs wrote:

On 19 February 2018 at 16:17, David Steele <david@pgmasters.net> wrote:

I did come up with a sort of Rube Goldberg-esque workaround for now
involving using a clone of the prod standby VM from Veeam backup to use
as the backup source (after stopping recovery and opening it as a
standalone DB).

You don't get PITR that way, of course, but at least it's a backup. As
long as your clone is consistent.

Yes it's a crash-consistent snapshot-based backup. I've done quite a few
restores from it and it works great. It can do PITR as well since I
would have all the WAL files from prod needed to keep recovering. But
for these cases I just recover it to the first consistent point and open
it for testing (or backups in this case).

I don't think it would be safe to do PITR on a backup taken in this way.

If you have all the WAL files, then it would be safe.

I read "open it for testing (or backups in this case)" as letting
recovery complete and promoting the cluster to a master before taking
the backup.

Don, is that the case? If it is, I think there's a problem with or
without a timeline switch. If you confirm the backup is being taken as
above then I'll detail my concerns.

--
-David
david@pgmasters.net

#16Don Seiler
don@seiler.us
In reply to: David Steele (#15)
Re: pgBackRest backup from standby

On Mon, Feb 19, 2018 at 12:39 PM, David Steele <david@pgmasters.net> wrote:

I read "open it for testing (or backups in this case)" as letting
recovery complete and promoting the cluster to a master before taking
the backup.

Don, is that the case? If it is, I think there's a problem with or
without a timeline switch. If you confirm the backup is being taken as
above then I'll detail my concerns.

Note that this is just for creating a couple of one-off backups to restore
for our dev and pre-prod environments. Given that, I was going to open a
new clone as its own cluster and take backups from that. The data would be
the same though and suit purposes of the dev and pre-prod refreshes.

If I were taking backups for the purpose of production backups, I would not
do things this way. That is the eventual plan but right now we aren't ready
to make the changes necessary in the production environment.

--
Don Seiler
www.seiler.us

#17David Steele
david@pgmasters.net
In reply to: Don Seiler (#16)
Re: pgBackRest backup from standby

On 2/19/18 2:05 PM, Don Seiler wrote:

On Mon, Feb 19, 2018 at 12:39 PM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

I read "open it for testing (or backups in this case)" as letting
recovery complete and promoting the cluster to a master before taking
the backup.

Don, is that the case?  If it is, I think there's a problem with or
without a timeline switch.  If you confirm the backup is being taken as
above then I'll detail my concerns.

Note that this is just for creating a couple of one-off backups to
restore for our dev and pre-prod environments. Given that, I was going
to open a new clone as its own cluster and take backups from that. The
data would be the same though and suit purposes of the dev and pre-prod
refreshes.

If I were taking backups for the purpose of production backups, I would
not do things this way. That is the eventual plan but right now we
aren't ready to make the changes necessary in the production environment.

OK, that's fine then. You can play these to consistency and they'll be
fine. I just wouldn't try to do any PITR using the production WAL archive.

--
-David
david@pgmasters.net

#18Don Seiler
don@seiler.us
In reply to: David Steele (#13)
Re: pgBackRest backup from standby

On Mon, Feb 19, 2018 at 10:17 AM, David Steele <david@pgmasters.net> wrote:

If pg_log is symlinked to PGDATA it will be copied. pg_xlog is not
copied in any backup.

So an external pg_log directory symlinked into $PGDATA will have its log
contents copied?

I'm curious, why even copy the pg_log logs? They aren't needed for database
restore or recovery.

Don.
--
Don Seiler
www.seiler.us

#19David Steele
david@pgmasters.net
In reply to: Don Seiler (#18)
Re: pgBackRest backup from standby

On 2/19/18 3:41 PM, Don Seiler wrote:

On Mon, Feb 19, 2018 at 10:17 AM, David Steele <david@pgmasters.net
<mailto:david@pgmasters.net>> wrote:

If pg_log is symlinked to PGDATA it will be copied.  pg_xlog is not
copied in any backup.

So an external pg_log directory symlinked into $PGDATA will have its log
contents copied?

Yes.

I'm curious, why even copy the pg_log logs? They aren't needed for
database restore or recovery.

The general philosophy is to copy everything except what we know for
sure can be excluded. In practice, this means sticking to what
pg_basebackup excludes because that list is vetted by the community.

Also, relocating the log directory is easy using the log_directory
setting, so that's what I recommend if it's an issue. Some users do
want to backup their logs.

--
-David
david@pgmasters.net

#20Don Seiler
don@seiler.us
In reply to: David Steele (#19)
Re: pgBackRest backup from standby

On Mon, Feb 19, 2018 at 2:53 PM, David Steele <david@pgmasters.net> wrote:

Also, relocating the log directory is easy using the log_directory
setting, so that's what I recommend if it's an issue. Some users do
want to backup their logs.

That's probably a lot better idea than symlinking anyway. I'll look to do
that in my next round of config changes.

--
Don Seiler
www.seiler.us

#21Martin Marques
martin.marques@2ndquadrant.com
In reply to: David Steele (#19)