Doc update for pg_start_backup

Started by Heikki Linnakangasover 18 years ago7 messages
#1Heikki Linnakangas
heikki@enterprisedb.com

Added a note to the docs that pg_start_backup can take a long time to
finish now that we spread out checkpoints:

*** doc/src/sgml/backup.sgml    1 Feb 2007 00:28:16 -0000       2.97
--- doc/src/sgml/backup.sgml    28 Jun 2007 11:44:20 -0000
***************
*** 672,678 ****
       <para>
        It does not matter which database within the cluster you connect 
to to
        issue this command.  You can ignore the result returned by the 
function;
!      but if it reports an error, deal with that before proceeding.
       </para>
      </listitem>
      <listitem>
--- 672,682 ----
       <para>
        It does not matter which database within the cluster you connect 
to to
        issue this command.  You can ignore the result returned by the 
function;
!      but if it reports an error, deal with that before proceeding. 
Note that
!      pg_start_backup can take a long time to finish. It performs a 
checkpoint,
!      and if one is already running it has to wait for it to finish 
first. You
!      can adjust <varname>checkpoint_completion_target</varname> to 
perform the
!      checkpoints more aggressively.
       </para>
      </listitem>
      <listitem>

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#1)
Re: Doc update for pg_start_backup

Heikki Linnakangas <heikki@enterprisedb.com> writes:

Added a note to the docs that pg_start_backup can take a long time to
finish now that we spread out checkpoints:

Rather than suggesting twiddling checkpoint_completion_target, should
we suggest a manual CHECKPOINT command before pg_start_backup?

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#1)
Re: [PATCHES] Doc update for pg_start_backup

Heikki Linnakangas <heikki@enterprisedb.com> writes:

Added a note to the docs that pg_start_backup can take a long time to
finish now that we spread out checkpoints:

I was starting to wordsmith this, and then wondered whether it's not
just a stupid idea for pg_start_backup to act that way. The reason
you're doing it is to take a base backup, right? What are you going
to take the base backup with? I do not offhand know of any backup
tools that don't suck major amounts of I/O bandwidth. That being
the case, you're simply not going to schedule the operation during
full-load periods. And that leads to the conclusion that
pg_start_backup should just use CHECKPOINT_IMMEDIATE and not slow
you down.

Thoughts?

regards, tom lane

#4Simon Riggs
simon@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: [PATCHES] Doc update for pg_start_backup

On Thu, 2007-06-28 at 23:35 -0400, Tom Lane wrote:

Heikki Linnakangas <heikki@enterprisedb.com> writes:

Added a note to the docs that pg_start_backup can take a long time to
finish now that we spread out checkpoints:

I was starting to wordsmith this, and then wondered whether it's not
just a stupid idea for pg_start_backup to act that way. The reason
you're doing it is to take a base backup, right? What are you going
to take the base backup with? I do not offhand know of any backup
tools that don't suck major amounts of I/O bandwidth. That being
the case, you're simply not going to schedule the operation during
full-load periods.

Well, that assumes you can predict a time of reduced load and that time
critical activities won't happen at that point. Many times you can, but
I see no reason to force a checkpoint immediate.

If you use snapshots you can copy the data away in your own time, so not
all backup mechanisms draw extensive/high priority I/O power.

And that leads to the conclusion that
pg_start_backup should just use CHECKPOINT_IMMEDIATE and not slow
you down.

I would prefer the default to be do this slowly. If there is a reason to
do it fast, maybe, but we should err towards low impact.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

#5Heikki Linnakangas
heikki@enterprisedb.com
In reply to: Tom Lane (#3)
Re: [PATCHES] Doc update for pg_start_backup

Tom Lane wrote:

Heikki Linnakangas <heikki@enterprisedb.com> writes:

Added a note to the docs that pg_start_backup can take a long time to
finish now that we spread out checkpoints:

I was starting to wordsmith this, and then wondered whether it's not
just a stupid idea for pg_start_backup to act that way. The reason
you're doing it is to take a base backup, right? What are you going
to take the base backup with? I do not offhand know of any backup
tools that don't suck major amounts of I/O bandwidth.

scp over a network? It's still going to consume a fair amount of I/O,
but the network could very well be the bottleneck.

That being
the case, you're simply not going to schedule the operation during
full-load periods. And that leads to the conclusion that
pg_start_backup should just use CHECKPOINT_IMMEDIATE and not slow
you down.

That's probably true in most cases. But on a system that doesn't have
quite periods, you're still going to have to take the backup.

To be honest, I've never worked as a DBA and never had to deal with
taking backups of a production system, so my gut feelings on this could
be totally wrong.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#6Theo Schlossnagle
jesus@omniti.com
In reply to: Heikki Linnakangas (#5)
Re: [PATCHES] Doc update for pg_start_backup

On Jun 29, 2007, at 4:25 AM, Heikki Linnakangas wrote:

Tom Lane wrote:

Heikki Linnakangas <heikki@enterprisedb.com> writes:

Added a note to the docs that pg_start_backup can take a long
time to finish now that we spread out checkpoints:

I was starting to wordsmith this, and then wondered whether it's not
just a stupid idea for pg_start_backup to act that way. The reason
you're doing it is to take a base backup, right? What are you going
to take the base backup with? I do not offhand know of any backup
tools that don't suck major amounts of I/O bandwidth.

scp over a network? It's still going to consume a fair amount of I/
O, but the network could very well be the bottleneck.

That being
the case, you're simply not going to schedule the operation during
full-load periods. And that leads to the conclusion that
pg_start_backup should just use CHECKPOINT_IMMEDIATE and not slow
you down.

That's probably true in most cases. But on a system that doesn't
have quite periods, you're still going to have to take the backup.
To be honest, I've never worked as a DBA and never had to deal with
taking backups of a production system, so my gut feelings on this
could be totally wrong.

I'll share my two cents having had to back up many terabytes of
oracle, postgres and mysql every day...

The comments that taking a backup causes a lot of absolutely
unavoidable I/O is right on the mark.

If you have a large enough database where this matters the technique
usually looks as follows.

(1) sanity
(2) postgres_start_backup
(3) snap
(4) postgres_stop_backup
(5) backup

Now, the backup will always have to read the data, if it is full it
reads every block. If it is incremental, it reads the blocks that
changed. You will frequently be in the position of performing a full
backup. The bandwidth for doing the read will inevitably happen in
one or more of the above steps. I strongly prefer that load to
happen in (5) and for steps (2,3,4) to happen as quickly as
possible. Right now on our largest (slowest) production box which is
postgres and over a terabyte, steps 2-4 take about 30-60 seconds.
Step 5 takes *cough* about 18 hours *cough*.

The snap in many of our cases is an logical software enabled snapshot
(either Veritas, LVM or ZFS). However, you can use many enterprise
storage to take a hard snapshot and expose that as a LUN to mount
elsewhere on attached to the same SAN. Many confuse this for being
"free". Regardless of how the snap is taken you have to pay for it..
either at snap time, at read time or at release time. Nothing's free.

// Theo Schlossnagle
// Principal@OmniTI: http://omniti.com
// Esoteric Curio: http://www.lethargy.org/~jesus/

#7Jim Nasby
decibel@decibel.org
In reply to: Heikki Linnakangas (#5)
Re: [PATCHES] Doc update for pg_start_backup

On Jun 29, 2007, at 3:25 AM, Heikki Linnakangas wrote:

Tom Lane wrote:

Heikki Linnakangas <heikki@enterprisedb.com> writes:

Added a note to the docs that pg_start_backup can take a long
time to finish now that we spread out checkpoints:

I was starting to wordsmith this, and then wondered whether it's not
just a stupid idea for pg_start_backup to act that way. The reason
you're doing it is to take a base backup, right? What are you going
to take the base backup with? I do not offhand know of any backup
tools that don't suck major amounts of I/O bandwidth.

scp over a network? It's still going to consume a fair amount of I/
O, but the network could very well be the bottleneck.

You can also use rsync and have it do bandwidth limiting (AFAIK that
would work locally too).

That being
the case, you're simply not going to schedule the operation during
full-load periods. And that leads to the conclusion that
pg_start_backup should just use CHECKPOINT_IMMEDIATE and not slow
you down.

That's probably true in most cases. But on a system that doesn't
have quite periods, you're still going to have to take the backup.

Correct. If the load presented by the base backup is too high, you'll
be looking at ways to slow it down; but I've yet to run across such a
case in the field.

I think having start_backup do a checkpoint immediate by default
would be best, since it's least surprising, but I do like having it
as an option for cases where it's needed (though I think those cases
are probably pretty rare).
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)