Basic question on recovery and disk snapshotting

Started by Yang Zhangalmost 13 years ago9 messagesgeneral
Jump to latest
#1Yang Zhang
yanghatespam@gmail.com

We're running on EBS volumes on EC2. We're interested in leveraging
EBS snapshotting for backups. However, does this mean we'd need to
ensure our pg_xlog is on the same EBS volume as our data?

(I believe) the usual reasoning for separating pg_xlog onto a separate
volume is for performance. However, if they are on different volumes,
the snapshots may be out of sync.

Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Jov
amutu@amutu.com
In reply to: Yang Zhang (#1)
Re: Basic question on recovery and disk snapshotting

Are you sure the EBS snapshot is consistent? if the snapshot is not
consistent,enven on the same volume,you will have prolbems with your backup.

One methed can be try is run pg_start_backup() before take snapshot.

2013/4/27 Yang Zhang <yanghatespam@gmail.com>

We're running on EBS volumes on EC2. We're interested in leveraging
EBS snapshotting for backups. However, does this mean we'd need to
ensure our pg_xlog is on the same EBS volume as our data?

(I believe) the usual reasoning for separating pg_xlog onto a separate
volume is for performance. However, if they are on different volumes,
the snapshots may be out of sync.

Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Jov
blog: http:amutu.com/blog <http://amutu.com/blog&gt;

#3Yang Zhang
yanghatespam@gmail.com
In reply to: Jov (#2)
Re: Basic question on recovery and disk snapshotting

On Sat, Apr 27, 2013 at 4:25 AM, Jov <amutu@amutu.com> wrote:

Are you sure the EBS snapshot is consistent? if the snapshot is not
consistent,enven on the same volume,you will have prolbems with your backup.

I think so. EBS gives you "point-in-time consistent snapshots"
(https://aws.amazon.com/ebs/), but maybe you're using the term
differently.

Even so, it's impossible to take snapshots of two different volumes at
exactly the same time so they won't be consistent with each other,
hence my question.

My question really boils down to: if we're interested in using COW
snapshotting (a common feature of modern filesystems and hosting
environments), would we necessarily need to ensure the data and
pg_xlog are on the same snapshotted volume? If not, how should we be
taking the snapshots - should we be using pg_start_backup() and then
taking the snapshot of one before the other? (What order?) What if
we have tablespaces, do we take snapshots of those, followed by the
cluster directory, followed by pg_xlog?

I read through http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
and it doesn't touch on these questions.

One methed can be try is run pg_start_backup() before take snapshot.

2013/4/27 Yang Zhang <yanghatespam@gmail.com>

We're running on EBS volumes on EC2. We're interested in leveraging
EBS snapshotting for backups. However, does this mean we'd need to
ensure our pg_xlog is on the same EBS volume as our data?

(I believe) the usual reasoning for separating pg_xlog onto a separate
volume is for performance. However, if they are on different volumes,
the snapshots may be out of sync.

Thanks.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Jov
blog: http:amutu.com/blog

--
Yang Zhang
http://yz.mit.edu/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Yang Zhang (#3)
Re: Basic question on recovery and disk snapshotting

Yang Zhang <yanghatespam@gmail.com> writes:

My question really boils down to: if we're interested in using COW
snapshotting (a common feature of modern filesystems and hosting
environments), would we necessarily need to ensure the data and
pg_xlog are on the same snapshotted volume?

Yeah, I think so. It's possible to imagine schemes that would let
a WAL-snapshot-shortly-after-the-data-snapshot work, but you would
(at least) need to disable WAL file recycling, which there's no
provision for at the moment.

The usual approach is to use a COW snapshot only for making a base
backup of the data area, and rely on WAL streaming/archiving to copy
the WAL.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Jeff Janes
jeff.janes@gmail.com
In reply to: Yang Zhang (#3)
Re: Basic question on recovery and disk snapshotting

On Sat, Apr 27, 2013 at 10:40 AM, Yang Zhang <yanghatespam@gmail.com> wrote:

On Sat, Apr 27, 2013 at 4:25 AM, Jov <amutu@amutu.com> wrote:

Are you sure the EBS snapshot is consistent? if the snapshot is not
consistent,enven on the same volume,you will have prolbems with your

backup.

I think so. EBS gives you "point-in-time consistent snapshots"
(https://aws.amazon.com/ebs/), but maybe you're using the term
differently.

I would not trust any data that I care about based on the description on
that page. They mention "consistent" once and "atomic" not at all. Which
is not to say it won't work.

This thread seems to indicate the file system on top of the EBS volume
would need to be quiescent in order for the snapshot to be reliable:

https://forums.aws.amazon.com/message.jspa?messageID=108940

Although I don't think that strength of consistency would actually be
needed. As long as the snapshot reflects all writes that were, at the time
the snapshot was initiated, reported back to PG as being successfully
synced, and contained no writes which were done after the snapshot was
reported as complete, that should be consistent enough for PG. Unless the
file system itself got scrambled.

Even so, it's impossible to take snapshots of two different volumes at
exactly the same time so they won't be consistent with each other,
hence my question.

My question really boils down to: if we're interested in using COW
snapshotting (a common feature of modern filesystems and hosting
environments), would we necessarily need to ensure the data and
pg_xlog are on the same snapshotted volume?

That would certainly make it easier. But it shouldn't be necessary, as
long as the xlog snapshot is taken after the cluster snapshot, and also as
long as no xlog files which were written to after the last completed
checkpoint prior to the cluster snapshot got recycled before the xlog
snapshot. As long as the snapshots run quickly and promptly one after the
other, this should not be a problem, but you should certainly validate that
a snapshot collection has all the xlogs it needs before accepting it as
being good. If you find some necessary xlog files are missing, you can
turn up wal_keep_segments and try again.

If not, how should we be
taking the snapshots - should we be using pg_start_backup() and then
taking the snapshot of one before the other? (What order?) What if
we have tablespaces, do we take snapshots of those, followed by the
cluster directory, followed by pg_xlog?

First the cluster directory (where "pg_control" is), then tablespaces, then
pg_xlog. pg_start_backup() shouldn't be necessary, unless you are running
with full_page_writes off. But it won't hurt, and if you don't use
pg_start_backup you should probably run a checkpoint of your own
immediately before starting.

I read through
http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
and it doesn't touch on these questions.

Your goal seems to be to *avoid* continuous archiving, so I wouldn't expect
that part of the docs to touch on your issues. But see the section
"Standalone Hot Backups" which would allow you to use snapshots for the
cluster "copy" part, and normal archiving for just the xlogs. The volume
of pg_xlog should be fairly small, so this seems to me like an attractive
option.

If you really don't want to use archiving, even just during the duration of
the cluster snapshotting, then this is the part that addresses your
questions:

http://www.postgresql.org/docs/9.1/static/backup-file.html

Cheers,

Jeff

#6Yang Zhang
yanghatespam@gmail.com
In reply to: Jeff Janes (#5)
Re: Basic question on recovery and disk snapshotting

On Sat, Apr 27, 2013 at 11:55 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Sat, Apr 27, 2013 at 10:40 AM, Yang Zhang <yanghatespam@gmail.com> wrote:

My question really boils down to: if we're interested in using COW
snapshotting (a common feature of modern filesystems and hosting
environments), would we necessarily need to ensure the data and
pg_xlog are on the same snapshotted volume?

That would certainly make it easier. But it shouldn't be necessary, as long
as the xlog snapshot is taken after the cluster snapshot, and also as long
as no xlog files which were written to after the last completed checkpoint
prior to the cluster snapshot got recycled before the xlog snapshot. As
long as the snapshots run quickly and promptly one after the other, this
should not be a problem, but you should certainly validate that a snapshot
collection has all the xlogs it needs before accepting it as being good. If
you find some necessary xlog files are missing, you can turn up
wal_keep_segments and try again.

This information is gold, thank you.

How do I validate that a snapshot collection has all the xlogs it needs?

If not, how should we be
taking the snapshots - should we be using pg_start_backup() and then
taking the snapshot of one before the other? (What order?) What if
we have tablespaces, do we take snapshots of those, followed by the
cluster directory, followed by pg_xlog?

First the cluster directory (where "pg_control" is), then tablespaces, then
pg_xlog. pg_start_backup() shouldn't be necessary, unless you are running
with full_page_writes off. But it won't hurt, and if you don't use
pg_start_backup you should probably run a checkpoint of your own immediately
before starting.

I read through
http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
and it doesn't touch on these questions.

Your goal seems to be to *avoid* continuous archiving, so I wouldn't expect
that part of the docs to touch on your issues. But see the section
"Standalone Hot Backups" which would allow you to use snapshots for the
cluster "copy" part, and normal archiving for just the xlogs. The volume of
pg_xlog should be fairly small, so this seems to me like an attractive
option.

Just to validate my understanding, are the two options as follows?

a. Checkpoint (optional but helps with time window?), snapshot
tablespaces/cluster/xlog, validate all necessary xlogs present.

b. Set wal_level/archive_mode/archive_command, pg_start_backup,
snapshot tablespaces/cluster, pg_stop_backup to archive xlog.

(a) sounds more appealing since it's treating recovery as crash
recovery rather than backup restore, and as such seems simpler and
lower-overhead (e.g. WAL verbosity, though I don't know how much that
overhead is). However, I'm not sure how complex that validation step
is.

If you really don't want to use archiving, even just during the duration of
the cluster snapshotting, then this is the part that addresses your
questions:

http://www.postgresql.org/docs/9.1/static/backup-file.html

I'm still interested in online backups, though - stopping the DB is a
no-go unfortunately.

--
Yang Zhang
http://yz.mit.edu/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Ben
bench@silentmedia.com
In reply to: Yang Zhang (#3)
Re: Basic question on recovery and disk snapshotting

On Apr 27, 2013, at 10:40 AM, Yang Zhang wrote:

My question really boils down to: if we're interested in using COW
snapshotting (a common feature of modern filesystems and hosting
environments), would we necessarily need to ensure the data and
pg_xlog are on the same snapshotted volume? If not, how should we be
taking the snapshots - should we be using pg_start_backup() and then
taking the snapshot of one before the other? (What order?) What if
we have tablespaces, do we take snapshots of those, followed by the
cluster directory, followed by pg_xlog?

We do this, using xfs to take advantage of being able to freeze the filesystem. (Because we're also using software raid.) The process looks like:

1. pg_start_backup()
2. xfs_freeze both the data and xlog filesystems.
3. snapshot all volumes.
4. unfreeze
5. stop backup

#8Jeff Janes
jeff.janes@gmail.com
In reply to: Yang Zhang (#6)
Re: Basic question on recovery and disk snapshotting

On Saturday, April 27, 2013, Yang Zhang wrote:

On Sat, Apr 27, 2013 at 11:55 AM, Jeff Janes <jeff.janes@gmail.com<javascript:;>>
wrote:

On Sat, Apr 27, 2013 at 10:40 AM, Yang Zhang <yanghatespam@gmail.com<javascript:;>>

wrote:

My question really boils down to: if we're interested in using COW
snapshotting (a common feature of modern filesystems and hosting
environments), would we necessarily need to ensure the data and
pg_xlog are on the same snapshotted volume?

That would certainly make it easier. But it shouldn't be necessary, as

long

as the xlog snapshot is taken after the cluster snapshot, and also as

long

as no xlog files which were written to after the last completed

checkpoint

prior to the cluster snapshot got recycled before the xlog snapshot. As
long as the snapshots run quickly and promptly one after the other, this
should not be a problem, but you should certainly validate that a

snapshot

collection has all the xlogs it needs before accepting it as being good.

If

you find some necessary xlog files are missing, you can turn up
wal_keep_segments and try again.

This information is gold, thank you.

How do I validate that a snapshot collection has all the xlogs it needs?

I've always validated my backups by practicing restoring them. It seems
like the most rigorous way, and I figure I need the practice. If that
isn't feasible, the backup_label file created by pg_start_backup() will
tell you by name which xlog is the first one you need. If not, you can use
pg_controldata to figure that out based on "Latest checkpoint's REDO
location", keeping in mind that the part after the / is not zero padded on
the left, so if it "short" you have to take that into account. I have no
actual experience in doing this in practice.

Your goal seems to be to *avoid* continuous archiving, so I wouldn't

expect

that part of the docs to touch on your issues. But see the section
"Standalone Hot Backups" which would allow you to use snapshots for the
cluster "copy" part, and normal archiving for just the xlogs. The

volume of

pg_xlog should be fairly small, so this seems to me like an attractive
option.

Just to validate my understanding, are the two options as follows?

a. Checkpoint (optional but helps with time window?), snapshot
tablespaces/cluster/xlog, validate all necessary xlogs present.

Yes. And doing the checkpoint immediately before has two good effects, it
makes the recovery faster, and it maximizes the time until wal logs you
need will start to be recycled.

b. Set wal_level/archive_mode/archive_command, pg_start_backup,
snapshot tablespaces/cluster, pg_stop_backup to archive xlog.

(a) sounds more appealing since it's treating recovery as crash
recovery rather than backup restore, and as such seems simpler and
lower-overhead (e.g. WAL verbosity, though I don't know how much that
overhead is).

That brings up another point to consider. If wal level is minimal, then
tables which you bulk load in the same transaction as you created them or
truncated them will not get any WAL records written. (That is the main
reason the WAL verbosity is reduced). But that also means that if any of
those operations is happening while you are taking your snapshot, those
operations will be corrupted. If the data and xlogs were part of the same
atomic snapshot, this would not be a problem, as either the operation
completed, or it never happened. But with different snapshots, the data
can get partially but not completely into the data-snapshot, but then the
xlog record which says the data was completely written does gets into the
xlog snapshot

If the system is very busy with different people doing different operations
at the same time, it could be hard to coordinate a time to take the backup.

However, I'm not sure how complex that validation step
is.

The more I think about it, the more I doubt it is worth it. It would be
very easy to come up with a method that sort of works, some of the time,
and then blows up spectacularly for no apparent reason. Using the
archiving, at least over the period during which the backup is being taken,
is the truly supported way of doing it. You can still use a snapshot of
the data directory, but you are really just treating it as a fast copy
operation. That it is atomic is not important to the process.

If you really don't want to use archiving, even just during the duration

of

the cluster snapshotting, then this is the part that addresses your
questions:

http://www.postgresql.org/docs/9.1/static/backup-file.html

I'm still interested in online backups, though - stopping the DB is a
no-go unfortunately.

I wonder if initiating a EBS snapshot of large volume isn't going to freeze
the database up for a while anyway, just by starving it of IO. It would be
interesting to hear back on your experiences.

Cheers,

Jeff

#9Yang Zhang
yanghatespam@gmail.com
In reply to: Jeff Janes (#8)
Re: Basic question on recovery and disk snapshotting

On Wed, May 1, 2013 at 4:56 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

That brings up another point to consider. If wal level is minimal, then
tables which you bulk load in the same transaction as you created them or
truncated them will not get any WAL records written. (That is the main
reason the WAL verbosity is reduced). But that also means that if any of
those operations is happening while you are taking your snapshot, those
operations will be corrupted. If the data and xlogs were part of the same
atomic snapshot, this would not be a problem, as either the operation
completed, or it never happened. But with different snapshots, the data can
get partially but not completely into the data-snapshot, but then the xlog
record which says the data was completely written does gets into the xlog
snapshot

Come to think of it, I'm no longer sure that EBS snapshots, which are
on the block device level, are OK, even if all your data is on a
single volume, since base backups (as documented) are supposed to be
taken via the FS (e.g. normal read operations, or even FS snapshots).
Block device level copies are not mentioned.

Can anyone confirm or refute?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general