How safe is pg_basebackup + continuous archiving?

Started by Kaixi Luoalmost 10 years ago8 messagesgeneral

kaixiluo@gmail.com

almost 10 years ago

Hello,

We use PostgreSQL at work and we do daily backups with pg_dump. After that
we pg_restore the dump and check the database that there isn't any data
corruption. As the database grows, the whole pg_dump / pg_restore cycle
time is quickly approaching 24h, so we need to change strategies.

We've thought about using pg_basebackup + continuous archiving as an
alternative backup process, but I have doubts regarding the safety of such
procedure. As far as I know, pg_basebackup is done via rsync (and we also
archive wals using rsync), so if by any chance disk corruption occurs on
the master server, the corruption would be carried over to our backup
server.

How can we check for backup corruption in this case? Thanks you very much.

Kaixi Luo

Michael Paquier

michael@paquier.xyz

almost 10 years ago

In reply to: Kaixi Luo (#1)

Re: How safe is pg_basebackup + continuous archiving?

On Wed, Jun 29, 2016 at 11:51 PM, Kaixi Luo <kaixiluo@gmail.com> wrote:

We use PostgreSQL at work and we do daily backups with pg_dump. After that
we pg_restore the dump and check the database that there isn't any data
corruption. As the database grows, the whole pg_dump / pg_restore cycle time
is quickly approaching 24h, so we need to change strategies.

That's a mature solution. And without doubts, many systems in
production use it, abuse of it and rely on it.

We've thought about using pg_basebackup + continuous archiving as an
alternative backup process, but I have doubts regarding the safety of such
procedure. As far as I know, pg_basebackup is done via rsync (and we also
archive wals using rsync), so if by any chance disk corruption occurs on
the master server, the corruption would be carried over to our backup
server.

pg_basebackup speaks the replication protocol and uses it to receive a
base backup from the server in the shape of a tar stream. It then
decides if it needs to untar the content or write it to disk as-is.
Note though that the contents of a backup are not fsync'd to disk
after pg_basebackup finished (that's in the works), so you had better
do it as well to ensure that the data stays here.

How can we check for backup corruption in this case? Thanks you very much.

Before replaying a backup on a production system, you would need a
pre-production setup where the backup is replayed and checked.
Honestly, you can only be sure that a backup is working correctly
after reusing it. You could always do some validation of the raw
backup contents, but you need at the end the WAL applied on top of it
to be able to check the status of a server that has reached a
consistent point.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

a.ignatov@postgrespro.ru

almost 10 years ago

In reply to: Kaixi Luo (#1)

Re: How safe is pg_basebackup + continuous archiving?

On 29.06.2016 17:51, Kaixi Luo wrote:

Hello,

We use PostgreSQL at work and we do daily backups with pg_dump. After
that we pg_restore the dump and check the database that there isn't
any data corruption. As the database grows, the whole pg_dump /
pg_restore cycle time is quickly approaching 24h, so we need to change
strategies.

We've thought about using pg_basebackup + continuous archiving as an
alternative backup process, but I have doubts regarding the safety of
such procedure. As far as I know, pg_basebackup is done via rsync (and
we also archive wals using rsync), so if by any chance disk
corruption occurs on the master server, the corruption would be
carried over to our backup server.

How can we check for backup corruption in this case? Thanks you very much.

Kaixi Luo

Hello!
Only pg_dump+ data checksums turned on can make you "sure" in absence
corruption in backup. But! If at any way you've lost some data file from
relation or it zeroed by say powerloss(there were some issue with xfs in
the past) even with pg_dump you will never know it.
But there is no any other method in PG to check database for
corruption. Also PG have no checksums on clog's file. So if any
corruption is happend in this file you also will never know it.
So at now pg_basebackup+ wal archiving is like walking on the minefield
with tightly closed eyes . You never know when it will make explode!

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

kaixiluo@gmail.com

almost 10 years ago

In reply to: Michael Paquier (#2)

Re: How safe is pg_basebackup + continuous archiving?

Before replaying a backup on a production system, you would need a
pre-production setup where the backup is replayed and checked.
Honestly, you can only be sure that a backup is working correctly
after reusing it. You could always do some validation of the raw
backup contents, but you need at the end the WAL applied on top of it
to be able to check the status of a server that has reached a
consistent point.

Could you elaborate a bit more on this last part? If a PostgreSQL recovery
from a pg_basebackup reaches a consistent point and is able to start up
fully, that is not a guarantee that there hasn't been any underlying
corruption, as far as I know. Am I correct on this?

Thanks once again.

Kaixi

Michael Paquier

michael@paquier.xyz

almost 10 years ago

In reply to: Kaixi Luo (#4)

Re: How safe is pg_basebackup + continuous archiving?

On Thu, Jun 30, 2016 at 9:00 PM, Kaixi Luo <kaixiluo@gmail.com> wrote:

Before replaying a backup on a production system, you would need a
pre-production setup where the backup is replayed and checked.
Honestly, you can only be sure that a backup is working correctly
after reusing it. You could always do some validation of the raw
backup contents, but you need at the end the WAL applied on top of it
to be able to check the status of a server that has reached a
consistent point.

Could you elaborate a bit more on this last part? If a PostgreSQL recovery
from a pg_basebackup reaches a consistent point and is able to start up
fully, that is not a guarantee that there hasn't been any underlying
corruption, as far as I know. Am I correct on this?

You are correct, that's why you need a copycat of the production
system that is a pre-stage of the production stage, where the backups
are replayed and checked with an application that replays the patterns
of the production application. Applying extra checks on top of that is
good as well: pg_dump, data checksums, index consistency checks (this
makes regret that we don't have pg_amcheck in core yet actually), etc.
--
Michael

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

sfrost@snowman.net

almost 10 years ago

In reply to: Kaixi Luo (#1)

Re: How safe is pg_basebackup + continuous archiving?

Greetings,

* Kaixi Luo (kaixiluo@gmail.com) wrote:

We use PostgreSQL at work and we do daily backups with pg_dump. After that
we pg_restore the dump and check the database that there isn't any data
corruption. As the database grows, the whole pg_dump / pg_restore cycle
time is quickly approaching 24h, so we need to change strategies.

I've found this to be a good strategy also, but it's far from perfect.
Corruption can still occur, for example, in indexes on the primary
system. Generally speaking, pg_dump doesn't exercise indexes and
therefore you won't notice if an index is corrupt.

We've thought about using pg_basebackup + continuous archiving as an
alternative backup process, but I have doubts regarding the safety of such
procedure. As far as I know, pg_basebackup is done via rsync (and we also
archive wals using rsync), so if by any chance disk corruption occurs on
the master server, the corruption would be carried over to our backup
server.

This is correct, but checksums are now available in modern versions of
PG, which will detect disk corruption. Those checksums would be carried
over to the backup server and could be verified there by using pg_dump
(note that this still wouldn't help with indexes, but you don't have
coverage there today anyway).

How can we check for backup corruption in this case? Thanks you very much.

There has been some discussion about a specific tool for checking the
checksums throughout the entire system. I don't know of anyone activly
working on that, unfortunately.

There are a number of tools available to help with online backups and
continuous archiving beyond pgbasebackup and having to hand-roll
scripts. I'm personally biased towards and prefer pgBackRest, as I
helped start that project, but there are other tools, such as barman and
WAL-E, which would still be better than trying to implement everything
correctly on your own.

Thanks!

Stephen

Jim.Nasby@BlueTreble.com

almost 10 years ago

In reply to: Stephen Frost (#6)

Re: How safe is pg_basebackup + continuous archiving?

On 6/30/16 8:30 AM, Stephen Frost wrote:

How can we check for backup corruption in this case? Thanks you very much.

There has been some discussion about a specific tool for checking the
checksums throughout the entire system. I don't know of anyone activly
working on that, unfortunately.

If someone did want that though, it could probably be done as an
extension. I believe you just have to pull all of each relation into
shared buffers for the checksums to be verified.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

kaixiluo@gmail.com

almost 10 years ago

In reply to: Michael Paquier (#5)

Re: How safe is pg_basebackup + continuous archiving?

Thank you all a lot for the detailed answers! :)

Kaixi

On Thu, Jun 30, 2016 at 3:15 PM Michael Paquier <michael.paquier@gmail.com>
wrote:

Show quoted text

On Thu, Jun 30, 2016 at 9:00 PM, Kaixi Luo <kaixiluo@gmail.com> wrote:

Before replaying a backup on a production system, you would need a
pre-production setup where the backup is replayed and checked.
Honestly, you can only be sure that a backup is working correctly
after reusing it. You could always do some validation of the raw
backup contents, but you need at the end the WAL applied on top of it
to be able to check the status of a server that has reached a
consistent point.

Could you elaborate a bit more on this last part? If a PostgreSQL

recovery

from a pg_basebackup reaches a consistent point and is able to start up
fully, that is not a guarantee that there hasn't been any underlying
corruption, as far as I know. Am I correct on this?

You are correct, that's why you need a copycat of the production
system that is a pre-stage of the production stage, where the backups
are replayed and checked with an application that replays the patterns
of the production application. Applying extra checks on top of that is
good as well: pg_dump, data checksums, index consistency checks (this
makes regret that we don't have pg_amcheck in core yet actually), etc.
--
Michael