Problems with postgres online backup - restore

Started by Chris Barnesalmost 17 years ago6 messagesgeneral
Jump to latest
#1Chris Barnes
cbarnes@recognia.com

I started an online backup of postgres, tar'd my data folder, copy to usb
drive in production
and restored it into my RC environment. Have I missed something important?

When starting the database I receive many errors that look like the backup
was corrupted.

2009-06-23 08:29:15 EDT:@:7614ERROR: xlog flush request 10D/590D6578 is not
satisfied --- flushed only to 10D/510C4FB8

2009-06-23 08:29:15 EDT:@:7614CONTEXT: writing block 2 of relation
16403/16884/2696

2009-06-23 08:29:16 EDT:@:7614ERROR: xlog flush request 10D/590D6578 is not
satisfied --- flushed only to 10D/510C4FB8

2009-06-23 08:29:16 EDT:@:7614CONTEXT: writing block 2 of relation
16403/16884/2696

2009-06-23 08:29:16 EDT:@:7614WARNING: could not write block 2 of
16403/16884/2696

pgdb001=> select criteria_rank from client001.t1020screen where
criteria_rank like '%TR009%';

ERROR: missing chunk number 0 for toast value 738680

Version

[postgres@pgprd01 data]$ cat PG_VERSION

8.3

Online Backup

psql

checkpoint;

SELECT pg_switch_xlog();

SELECT pg_start_backup('postres_full_backup_June222009');

tar -czvf pgprd01_June22_2009_production.dmp.tar.gz data/

SELECT pg_stop_backup();

Restore

tar -xzvf pgprd01_June22_2009_production.dmp.tar.gz

#2Gerhard Wiesinger
lists@wiesinger.com
In reply to: Chris Barnes (#1)
Re: Problems with postgres online backup - restore

Hello,

I'd like to understand the PostgreSQL internals in "backup mode".

When I understood it correctly pg_start_backup() make a checkpoint and
stops writing to the data/ directory except the WAL.

All new transaction go into WAL which is also logical. But how is data
consistency done when the written/changed blocks don't fit into the buffer
cache?

E.g.
A lot of updates which exceed buffer cache (Where is data written except
WAL since data should be kept constant?)
SELECT FROM updated data: WHERE is the new data fetched from?

Thnx for any explainations.

Ciao,
Gerhard

--
http://www.wiesinger.com/

On Thu, 25 Jun 2009, Chris Barnes wrote:

Show quoted text

SELECT pg_switch_xlog();

SELECT pg_start_backup('postres_full_backup_June222009');

tar -czvf pgprd01_June22_2009_production.dmp.tar.gz data/

SELECT pg_stop_backup();

#3Richard Huxton
dev@archonet.com
In reply to: Gerhard Wiesinger (#2)
Re: Problems with postgres online backup - restore

Gerhard Wiesinger wrote:

Hello,

I'd like to understand the PostgreSQL internals in "backup mode".

When I understood it correctly pg_start_backup() make a checkpoint and
stops writing to the data/ directory except the WAL.

All new transaction go into WAL which is also logical. But how is data
consistency done when the written/changed blocks don't fit into the
buffer cache?

The data isn't kept consistent. Which is why you need the WAL. Restoring
from a PITR backup is basically the same idea as recovering from a
crash. Any blocks that might have been updated since you called
pg_start_backup() will be rewritten from the WAL.

--
Richard Huxton
Archonet Ltd

#4Gerhard Wiesinger
lists@wiesinger.com
In reply to: Richard Huxton (#3)
Re: Problems with postgres online backup - restore

Hello,

OK, what's then the difference doing a pg_start_backup() or just
doing the backup?

Isn't that a problem that the datablock are very inconsistent, even so
inconsistent that they are corrupt:

E.g. A part of a datablock is written when e.g. the file is tarred. =>
Datablock on backup is corrupt => An then even the WAL can't be applied.

Why does it work correctly? Or is there some design problem?

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

On Thu, 25 Jun 2009, Richard Huxton wrote:

Show quoted text

Gerhard Wiesinger wrote:

Hello,

I'd like to understand the PostgreSQL internals in "backup mode".

When I understood it correctly pg_start_backup() make a checkpoint and
stops writing to the data/ directory except the WAL.

All new transaction go into WAL which is also logical. But how is data
consistency done when the written/changed blocks don't fit into the buffer
cache?

The data isn't kept consistent. Which is why you need the WAL. Restoring from
a PITR backup is basically the same idea as recovering from a crash. Any
blocks that might have been updated since you called pg_start_backup() will
be rewritten from the WAL.

--
Richard Huxton
Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Richard Huxton
dev@archonet.com
In reply to: Gerhard Wiesinger (#4)
Re: Problems with postgres online backup - restore

Gerhard Wiesinger wrote:

Hello,

OK, what's then the difference doing a pg_start_backup() or just doing
the backup?

pg_start_backup() forces a checkpoint (and logs a label for your backup
too).

Isn't that a problem that the datablock are very inconsistent, even so
inconsistent that they are corrupt:

E.g. A part of a datablock is written when e.g. the file is tarred. =>
Datablock on backup is corrupt => An then even the WAL can't be applied.

Why does it work correctly? Or is there some design problem?

It works because the WAL doesn't hold a list of row updates ("update row
12345 set field 4 = true") but block updates. Any update to a disk block
is recorded - table or index. The WAL doesn't really know anything about
tables, columns, primary keys etc - just disk blocks.

One small optimisation is that the first time a block is touched after a
checkpoint the value of the whole block is written to WAL and after that
only updates.

So - if you have a checkpointed system (all updates guaranteed written
to disk) and a complete set of WAL files from that point on you can
always recreate the writes to any point in time after that.

http://www.postgresql.org/docs/8.3/static/continuous-archiving.html

--
Richard Huxton
Archonet Ltd

#6Gerhard Wiesinger
lists@wiesinger.com
In reply to: Richard Huxton (#5)
Re: Problems with postgres online backup - restore

Hello Richard,

OK, understood it and looks to me a good system with whole block overwrite
and then the deltas.

Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/

On Fri, 26 Jun 2009, Richard Huxton wrote:

Show quoted text

Gerhard Wiesinger wrote:

Hello,

OK, what's then the difference doing a pg_start_backup() or just doing the
backup?

pg_start_backup() forces a checkpoint (and logs a label for your backup too).

Isn't that a problem that the datablock are very inconsistent, even so
inconsistent that they are corrupt:

E.g. A part of a datablock is written when e.g. the file is tarred. =>
Datablock on backup is corrupt => An then even the WAL can't be applied.

Why does it work correctly? Or is there some design problem?

It works because the WAL doesn't hold a list of row updates ("update row
12345 set field 4 = true") but block updates. Any update to a disk block is
recorded - table or index. The WAL doesn't really know anything about
tables, columns, primary keys etc - just disk blocks.

One small optimisation is that the first time a block is touched after a
checkpoint the value of the whole block is written to WAL and after that only
updates.

So - if you have a checkpointed system (all updates guaranteed written to
disk) and a complete set of WAL files from that point on you can always
recreate the writes to any point in time after that.

http://www.postgresql.org/docs/8.3/static/continuous-archiving.html

--
Richard Huxton
Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general