Proposal: Incremental Backup
0. Introduction:
=================================
This is a proposal for adding incremental backup support to streaming
protocol and hence to pg_basebackup command.
1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.
This way of operating has also some advantages over using rsync to take
a physical backup: It does not require the files from the previous
backup to be checksummed again, and they could even reside on some form
of long-term, not-directly-accessible storage, like a tape cartridge or
somewhere in the cloud (e.g. Amazon S3 or Amazon Glacier).
It could also be used in 'refresh' mode, by allowing the pg_basebackup
command to 'refresh' an old backup directory with a new backup.
The final piece of this architecture is a new program called
pg_restorebackup which is able to operate on a "chain of incremental
backups", allowing the user to build an usable PGDATA from them or
executing maintenance operations like verify the checksums or estimate
the final size of recovered PGDATA.
We created a wiki page with all implementation details at
https://wiki.postgresql.org/wiki/Incremental_backup
2. Goals
=================================
The main goal of incremental backup is to reduce the size of the backup.
A secondary goal is to reduce backup time also.
3. Development plan
=================================
Our development plan proposal is articulated in four phases:
Phase 1: Add ‘PROFILE’ option to ‘BASE_BACKUP’
Phase 2: Add ‘INCREMENTAL’ option to ‘BASE_BACKUP’
Phase 3: Support of PROFILE and INCREMENTAL for pg_basebackup
Phase 4: pg_restorebackup
We are willing to get consensus over our design here before to start
implementing it.
Regards,
Marco
--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
On Fri, Jul 25, 2014 at 10:14 PM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
0. Introduction:
=================================
This is a proposal for adding incremental backup support to streaming
protocol and hence to pg_basebackup command.
Not sure that incremental is a right word as the existing backup
methods using WAL archives are already like that. I recall others
calling that differential backup from some previous threads. Would
that sound better?
1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile.
Sounds good. Thanks for looking at that.
The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.
There are actually two levels of differential backups: file-level,
which is the approach you are taking, and block level. Block level
backup makes necessary a scan of all the blocks of all the relations
and take only the data from the blocks newer than the LSN given by the
BASE_BACKUP command. In the case of file-level approach, you could
already backup the relation file after finding at least one block
already modified. Btw, the size of relation files depends on the size
defined by --with-segsize when running configure. 1GB is the default
though, and the value usually used. Differential backups can reduce
the size of overall backups depending on the application, at the cost
of some CPU to analyze the relation blocks that need to be included in
the backup.
It could also be used in 'refresh' mode, by allowing the pg_basebackup
command to 'refresh' an old backup directory with a new backup.
I am not sure this is really helpful...
The final piece of this architecture is a new program called
pg_restorebackup which is able to operate on a "chain of incremental
backups", allowing the user to build an usable PGDATA from them or
executing maintenance operations like verify the checksums or estimate
the final size of recovered PGDATA.
Yes, right. Taking a differential backup is not difficult, but
rebuilding a constant base backup with a full based backup and a set
of differential ones is the tricky part, but you need to be sure that
all the pieces of the puzzle are here.
We created a wiki page with all implementation details at
https://wiki.postgresql.org/wiki/Incremental_backup
I had a look at that, and I think that you are missing the shot in the
way differential backups should be taken. What would be necessary is
to pass a WAL position (or LSN, logical sequence number like
0/2000060) with a new clause called DIFFERENTIAL (INCREMENTAL in your
first proposal) in the BASE BACKUP command, and then have the server
report back to client all the files that contain blocks newer than the
given LSN position given for file-level backup, or the blocks newer
than the given LSN for the block-level differential backup.
Note that we would need a way to identify the type of the backup taken
in backup_label, with the LSN position sent with DIFFERENTIAL clause
of BASE_BACKUP, by adding a new field in it.
When taking a differential backup, the LSN position necessary would be
simply the value of START WAL LOCATION of the last differential or
full backup taken. This results as well in a new option for
pg_basebackup of the type --differential='0/2000060' to take directly
a differential backup.
Then, for the utility pg_restorebackup, what you would need to do is
simply to pass a list of backups to it, then validate if they can
build a consistent backup, and build it.
Btw, the file-based method would be simpler to implement, especially
for rebuilding the backups.
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.
That wouldn't nearly as useful as the LSN-based approach mentioned before.
I've had my share of rsyncing live databases (when resizing
filesystems, not for backup, but the anecdotal evidence applies
anyhow) and with moderately write-heavy databases, even if you only
modify a tiny portion of the records, you end up modifying a huge
portion of the segments, because the free space choice is random.
There have been patches going around to change the random nature of
that choice, but none are very likely to make a huge difference for
this application. In essence, file-level comparisons get you only a
mild speed-up, and are not worth the effort.
I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
the I/O of inspecting the LSN of entire segments (necessary
optimization for huge multi-TB databases) and backups only the
portions modified when segments do contain changes, so it's the best
of both worlds. Any partial implementation would either require lots
of I/O (LSN only) or save very little (file only) unless it's an
almost read-only database.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.That wouldn't nearly as useful as the LSN-based approach mentioned before.
I've had my share of rsyncing live databases (when resizing
filesystems, not for backup, but the anecdotal evidence applies
anyhow) and with moderately write-heavy databases, even if you only
modify a tiny portion of the records, you end up modifying a huge
portion of the segments, because the free space choice is random.There have been patches going around to change the random nature of
that choice, but none are very likely to make a huge difference for
this application. In essence, file-level comparisons get you only a
mild speed-up, and are not worth the effort.I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
the I/O of inspecting the LSN of entire segments (necessary
optimization for huge multi-TB databases) and backups only the
portions modified when segments do contain changes, so it's the best
of both worlds. Any partial implementation would either require lots
of I/O (LSN only) or save very little (file only) unless it's an
almost read-only database.
I agree with much of that. However, I'd question whether we can
really seriously expect to rely on file modification times for
critical data-integrity operations. I wouldn't like it if somebody
ran ntpdate to fix the time while the base backup was running, and it
set the time backward, and the next differential backup consequently
omitted some blocks that had been modified during the base backup.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jul 25, 2014 at 3:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.That wouldn't nearly as useful as the LSN-based approach mentioned before.
I've had my share of rsyncing live databases (when resizing
filesystems, not for backup, but the anecdotal evidence applies
anyhow) and with moderately write-heavy databases, even if you only
modify a tiny portion of the records, you end up modifying a huge
portion of the segments, because the free space choice is random.There have been patches going around to change the random nature of
that choice, but none are very likely to make a huge difference for
this application. In essence, file-level comparisons get you only a
mild speed-up, and are not worth the effort.I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
the I/O of inspecting the LSN of entire segments (necessary
optimization for huge multi-TB databases) and backups only the
portions modified when segments do contain changes, so it's the best
of both worlds. Any partial implementation would either require lots
of I/O (LSN only) or save very little (file only) unless it's an
almost read-only database.I agree with much of that. However, I'd question whether we can
really seriously expect to rely on file modification times for
critical data-integrity operations. I wouldn't like it if somebody
ran ntpdate to fix the time while the base backup was running, and it
set the time backward, and the next differential backup consequently
omitted some blocks that had been modified during the base backup.
I was thinking the same. But that timestamp could be saved on the file
itself, or some other catalog, like a "trusted metadata" implemented
by pg itself, and it could be an LSN range instead of a timestamp
really.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07/25/2014 11:49 AM, Claudio Freire wrote:
I agree with much of that. However, I'd question whether we can
really seriously expect to rely on file modification times for
critical data-integrity operations. I wouldn't like it if somebody
ran ntpdate to fix the time while the base backup was running, and it
set the time backward, and the next differential backup consequently
omitted some blocks that had been modified during the base backup.I was thinking the same. But that timestamp could be saved on the file
itself, or some other catalog, like a "trusted metadata" implemented
by pg itself, and it could be an LSN range instead of a timestamp
really.
What about requiring checksums to be on instead, and checking the
file-level checksums? Hmmm, wait, do we have file-level checksums? Or
just page-level?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WM90ecec900531768ebfd8dda8b75c6fe874e86a15217d05d374f14a4e9cfad052fa962c277ef83ffcb2e8cddece00f2c3@asav-1.01.com
On Fri, Jul 25, 2014 at 7:38 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 07/25/2014 11:49 AM, Claudio Freire wrote:
I agree with much of that. However, I'd question whether we can
really seriously expect to rely on file modification times for
critical data-integrity operations. I wouldn't like it if somebody
ran ntpdate to fix the time while the base backup was running, and it
set the time backward, and the next differential backup consequently
omitted some blocks that had been modified during the base backup.I was thinking the same. But that timestamp could be saved on the file
itself, or some other catalog, like a "trusted metadata" implemented
by pg itself, and it could be an LSN range instead of a timestamp
really.What about requiring checksums to be on instead, and checking the
file-level checksums? Hmmm, wait, do we have file-level checksums? Or
just page-level?
It would be very computationally expensive to have up-to-date
file-level checksums, so I highly doubt it.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Il 25/07/14 16:15, Michael Paquier ha scritto:
On Fri, Jul 25, 2014 at 10:14 PM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:0. Introduction:
=================================
This is a proposal for adding incremental backup support to streaming
protocol and hence to pg_basebackup command.Not sure that incremental is a right word as the existing backup
methods using WAL archives are already like that. I recall others
calling that differential backup from some previous threads. Would
that sound better?
"differential backup" is widely used to refer to a backup that is always
based on a "full backup". An "incremental backup" can be based either on
a "full backup" or on a previous "incremental backup". We picked that
name to emphasize this property.
1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile.Sounds good. Thanks for looking at that.
The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.There are actually two levels of differential backups: file-level,
which is the approach you are taking, and block level. Block level
backup makes necessary a scan of all the blocks of all the relations
and take only the data from the blocks newer than the LSN given by the
BASE_BACKUP command. In the case of file-level approach, you could
already backup the relation file after finding at least one block
already modified.
I like the idea of shortcutting the checksum when you find a block with
a LSN newer than the previous backup START WAL LOCATION, however I see
it as a further optimization. In any case, it is worth storing the
backup start LSN in the header section of the backup_profile together
with other useful information about the backup starting position.
As a first step we would have a simple and robust method to produce a
file-level incremental backup.
Btw, the size of relation files depends on the size
defined by --with-segsize when running configure. 1GB is the default
though, and the value usually used. Differential backups can reduce
the size of overall backups depending on the application, at the cost
of some CPU to analyze the relation blocks that need to be included in
the backup.
We tested the idea on several multi-terabyte installations using a
custom deduplication script which follows this approach. The result is
that it can reduce the backup size of more than 50%. Also most of
databases in the range 50GB - 1TB can take a big advantage of it.
It could also be used in 'refresh' mode, by allowing the pg_basebackup
command to 'refresh' an old backup directory with a new backup.I am not sure this is really helpful...
Could you please elaborate the last sentence?
The final piece of this architecture is a new program called
pg_restorebackup which is able to operate on a "chain of incremental
backups", allowing the user to build an usable PGDATA from them or
executing maintenance operations like verify the checksums or estimate
the final size of recovered PGDATA.Yes, right. Taking a differential backup is not difficult, but
rebuilding a constant base backup with a full based backup and a set
of differential ones is the tricky part, but you need to be sure that
all the pieces of the puzzle are here.
If we limit it to be file-based, the recover procedure is conceptually
simple. Read every involved manifest from the start and take the latest
available version of any file (or mark it for deletion, if the last time
it is named is in a backup_exceptions file). Keeping the algorithm as
simple as possible is in our opinion the best way to go.
We created a wiki page with all implementation details at
https://wiki.postgresql.org/wiki/Incremental_backupI had a look at that, and I think that you are missing the shot in the
way differential backups should be taken. What would be necessary is
to pass a WAL position (or LSN, logical sequence number like
0/2000060) with a new clause called DIFFERENTIAL (INCREMENTAL in your
first proposal) in the BASE BACKUP command, and then have the server
report back to client all the files that contain blocks newer than the
given LSN position given for file-level backup, or the blocks newer
than the given LSN for the block-level differential backup.
In our proposal a file is skipped only, and only if it has the same
size, the same mtime and *the same checksum* of the original file. We
intentionally want to keep it simple, easily supporting also files that
are stored in $PGDATA but don't follow any format known by Postgres.
However, even with more complex algorithms, all the required information
should be stored in the header part of the backup_profile file.
Note that we would need a way to identify the type of the backup taken
in backup_label, with the LSN position sent with DIFFERENTIAL clause
of BASE_BACKUP, by adding a new field in it.
Good point, It has to be definitely reported in the backup_label file.
When taking a differential backup, the LSN position necessary would be
simply the value of START WAL LOCATION of the last differential or
full backup taken. This results as well in a new option for
pg_basebackup of the type --differential='0/2000060' to take directly
a differential backup.
It's possible to use this approach, but I feel that relying on checksums
is more robust. In any case I'd want to have a file with all the
checksums to be able to validate it later.
Then, for the utility pg_restorebackup, what you would need to do is
simply to pass a list of backups to it, then validate if they can
build a consistent backup, and build it.Btw, the file-based method would be simpler to implement, especially
for rebuilding the backups.Regards,
Exactly. This is the bare minimum. More options can be added later.
Regards,
Marco
--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Il 25/07/14 20:21, Claudio Freire ha scritto:
On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.That wouldn't nearly as useful as the LSN-based approach mentioned before.
I've had my share of rsyncing live databases (when resizing
filesystems, not for backup, but the anecdotal evidence applies
anyhow) and with moderately write-heavy databases, even if you only
modify a tiny portion of the records, you end up modifying a huge
portion of the segments, because the free space choice is random.There have been patches going around to change the random nature of
that choice, but none are very likely to make a huge difference for
this application. In essence, file-level comparisons get you only a
mild speed-up, and are not worth the effort.I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
the I/O of inspecting the LSN of entire segments (necessary
optimization for huge multi-TB databases) and backups only the
portions modified when segments do contain changes, so it's the best
of both worlds. Any partial implementation would either require lots
of I/O (LSN only) or save very little (file only) unless it's an
almost read-only database.
From my experience, if a database is big enough and there is any kind of
historical data in the database, the "file only" approach works well.
Moreover it has the advantage of being simple and easily verifiable.
Regards,
Marco
--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
On Tue, Jul 29, 2014 at 1:24 PM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.That wouldn't nearly as useful as the LSN-based approach mentioned before.
I've had my share of rsyncing live databases (when resizing
filesystems, not for backup, but the anecdotal evidence applies
anyhow) and with moderately write-heavy databases, even if you only
modify a tiny portion of the records, you end up modifying a huge
portion of the segments, because the free space choice is random.There have been patches going around to change the random nature of
that choice, but none are very likely to make a huge difference for
this application. In essence, file-level comparisons get you only a
mild speed-up, and are not worth the effort.I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
the I/O of inspecting the LSN of entire segments (necessary
optimization for huge multi-TB databases) and backups only the
portions modified when segments do contain changes, so it's the best
of both worlds. Any partial implementation would either require lots
of I/O (LSN only) or save very little (file only) unless it's an
almost read-only database.From my experience, if a database is big enough and there is any kind of
historical data in the database, the "file only" approach works well.
Moreover it has the advantage of being simple and easily verifiable.
I don't see how that would be true if it's not full of read-only or
append-only tables.
Furthermore, even in that case, you need to have the database locked
while performing the file-level backup, and computing all the
checksums means processing the whole thing. That's a huge amount of
time to be locked for multi-TB databases, so how is that good enough?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Il 25/07/14 20:44, Robert Haas ha scritto:
On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. The backup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.That wouldn't nearly as useful as the LSN-based approach mentioned before.
I've had my share of rsyncing live databases (when resizing
filesystems, not for backup, but the anecdotal evidence applies
anyhow) and with moderately write-heavy databases, even if you only
modify a tiny portion of the records, you end up modifying a huge
portion of the segments, because the free space choice is random.There have been patches going around to change the random nature of
that choice, but none are very likely to make a huge difference for
this application. In essence, file-level comparisons get you only a
mild speed-up, and are not worth the effort.I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
the I/O of inspecting the LSN of entire segments (necessary
optimization for huge multi-TB databases) and backups only the
portions modified when segments do contain changes, so it's the best
of both worlds. Any partial implementation would either require lots
of I/O (LSN only) or save very little (file only) unless it's an
almost read-only database.I agree with much of that. However, I'd question whether we can
really seriously expect to rely on file modification times for
critical data-integrity operations. I wouldn't like it if somebody
ran ntpdate to fix the time while the base backup was running, and it
set the time backward, and the next differential backup consequently
omitted some blocks that had been modified during the base backup.
Our proposal doesn't rely on file modification times for data integrity.
We are using the file mtime only as a fast indication that the file has
changed, and transfer it again without performing the checksum.
If timestamp and size match we rely on *checksums* to decide if it has
to be sent.
In "SMART MODE" we would use the file mtime to skip the checksum check
in some cases, but it wouldn't be the default operation mode and it will
have all the necessary warnings attached. However the "SMART MODE" isn't
a core part of our proposal, and can be delayed until we agree on the
safest way to bring it to the end user.
Regards,
Marco
--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
On Wed, Jul 30, 2014 at 1:11 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
"differential backup" is widely used to refer to a backup that is always
based on a "full backup". An "incremental backup" can be based either on
a "full backup" or on a previous "incremental backup". We picked that
name to emphasize this property.
You can refer to this email:
/messages/by-id/CABUevExZ-2NH6jxB5sjs_dsS7qbmoF0NOYpEEyayBKbUfKPbqw@mail.gmail.com
As a first step we would have a simple and robust method to produce a
file-level incremental backup.
An approach using Postgres internals, which we are sure we can rely
on, is more robust. A LSN is similar to a timestamp in pg internals as
it refers to the point in time where a block was lastly modified.
It could also be used in 'refresh' mode, by allowing the pg_basebackup
command to 'refresh' an old backup directory with a new backup.I am not sure this is really helpful...
Could you please elaborate the last sentence?
This overlaps with the features you are proposing with
pg_restorebackup, where a backup is rebuilt. Why implementing two
interfaces for the same things?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-07-29 18:35 GMT+02:00 Marco Nenciarini <marco.nenciarini@2ndquadrant.it
:
Il 25/07/14 20:44, Robert Haas ha scritto:
On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire <klaussfreire@gmail.com>
wrote:
On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:1. Proposal
=================================
Our proposal is to introduce the concept of a backup profile. Thebackup
profile consists of a file with one line per file detailing tablespace,
path, modification time, size and checksum.
Using that file the BASE_BACKUP command can decide which file needs to
be sent again and which is not changed. The algorithm should be very
similar to rsync, but since our files are never bigger than 1 GB per
file that is probably granular enough not to worry about copying parts
of files, just whole files.That wouldn't nearly as useful as the LSN-based approach mentioned
before.
I've had my share of rsyncing live databases (when resizing
filesystems, not for backup, but the anecdotal evidence applies
anyhow) and with moderately write-heavy databases, even if you only
modify a tiny portion of the records, you end up modifying a huge
portion of the segments, because the free space choice is random.There have been patches going around to change the random nature of
that choice, but none are very likely to make a huge difference for
this application. In essence, file-level comparisons get you only a
mild speed-up, and are not worth the effort.I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
the I/O of inspecting the LSN of entire segments (necessary
optimization for huge multi-TB databases) and backups only the
portions modified when segments do contain changes, so it's the best
of both worlds. Any partial implementation would either require lots
of I/O (LSN only) or save very little (file only) unless it's an
almost read-only database.I agree with much of that. However, I'd question whether we can
really seriously expect to rely on file modification times for
critical data-integrity operations. I wouldn't like it if somebody
ran ntpdate to fix the time while the base backup was running, and it
set the time backward, and the next differential backup consequently
omitted some blocks that had been modified during the base backup.Our proposal doesn't rely on file modification times for data integrity.
We are using the file mtime only as a fast indication that the file has
changed, and transfer it again without performing the checksum.
If timestamp and size match we rely on *checksums* to decide if it has
to be sent.In "SMART MODE" we would use the file mtime to skip the checksum check
in some cases, but it wouldn't be the default operation mode and it will
have all the necessary warnings attached. However the "SMART MODE" isn't
a core part of our proposal, and can be delayed until we agree on the
safest way to bring it to the end user.Regards,
Marco--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Hello,
I think it's very useful an incremental/differential backup
method, by the way
the method has two drawbacks:
1) In a database normally, even if the percent of modify rows is small
compared to total rows, the probability to change only some files /tables
is small, because the rows are normally not ordered inside a tables and the
update are "random". If some tables are static, probably they are lookup
tables or something like a registry, and normally these tables are small .
2) every time a file changed require every time to read all file. So if
the point A is true, probably you are reading a large part of the databases
and then send that part , instead of sending a small part.
In my opinion to solve these problems we need a different implementation of
incremental backup.
I will try to show my idea about it.
I think we need a bitmap map in memory to track the changed "chunks" of the
file/s/table [ for "chunk" I mean an X number of tracked pages , to divide
the every tracked files in "chunks" ], so we could send only the changed
blocks from last incremental backup ( that could be a full for incremental
backup ).The map could have one submaps for every tracked files, so it's
more simple.
So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk of
8 block is only an example] , If we use one map of 1Mbit ( 1Mbit are
125KB of memory ) we could track a table with a total size of 64Gb,
probably we could use a compression algorithm because the map is done by
1 and 0 . This is a very simple idea, but it shows that the map does not
need too much memory if we track groups of blocks i.e. "chunk", obviously
the problem is more complex, and probably there are better and more robust
solutions.
Probably we need more space for the header of map to track the
informations about file and the last backup and so on.
I think the map must be updated by the bgwriter , i.e. when it flushes the
dirty buffers, fortunately we don't need this map for consistence of
database, so we could create and manage it in memory to limit the impact on
performance.
The drawback is that If the db crashes or someone closes it , the next
incremental backup will be full , we could think to flush the map to disk
if the PostgreSQL will receive a signal of closing process or something
similar.
In this way we obtain :
1) we read only small part of a database ( the probability of a changed
chunk are less the the changed of the whole file )
2) we do not need to calculate the checksum, saving cpu
3) we save i/o in reading and writing ( we will send only the changed block
from last incremental backup )
4) we save network
5) we save time during backup. if we read and write less data, we reduce
the time to do an incremental backup.
6) I think the bitmap map in memory will not impact too much on the
performance of the bgwriter.
What do you think about?
Kind Regards
Mat
On Tue, Jul 29, 2014 at 12:35 PM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
I agree with much of that. However, I'd question whether we can
really seriously expect to rely on file modification times for
critical data-integrity operations. I wouldn't like it if somebody
ran ntpdate to fix the time while the base backup was running, and it
set the time backward, and the next differential backup consequently
omitted some blocks that had been modified during the base backup.Our proposal doesn't rely on file modification times for data integrity.
Good.
We are using the file mtime only as a fast indication that the file has
changed, and transfer it again without performing the checksum.
If timestamp and size match we rely on *checksums* to decide if it has
to be sent.
So an incremental backup reads every block in the database and
transfers only those that have changed? (BTW, I'm just asking.
That's OK with me for a first version; we can make improve it, shall
we say, incrementally.)
Why checksums (which have an arbitrarily-small chance of indicating a
match that doesn't really exist) rather than LSNs (which have no
chance of making that mistake)?
In "SMART MODE" we would use the file mtime to skip the checksum check
in some cases, but it wouldn't be the default operation mode and it will
have all the necessary warnings attached. However the "SMART MODE" isn't
a core part of our proposal, and can be delayed until we agree on the
safest way to bring it to the end user.
That's not a mode I'd feel comfortable calling "smart". More like
"roulette mode".
IMV, the way to eventually make this efficient is to have a background
process that reads the WAL and figures out which data blocks have been
modified, and tracks that someplace. Then we can send a precisely
accurate backup without relying on either modification times or
reading the full database. If Heikki's patch to standardize the way
this kind of information is represented in WAL gets committed, this
should get a lot easier to implement.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 30, 2014 at 11:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
IMV, the way to eventually make this efficient is to have a background
process that reads the WAL and figures out which data blocks have been
modified, and tracks that someplace.
Nice idea, however I think to make this happen we need to ensure
that WAL doesn't get deleted/overwritten before this process reads
it (may be by using some existing param or mechanism) and
wal_level has to be archive or more.
One more thing, what will happen for unlogged tables with such a
mechanism?
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 30, 2014 at 7:00 PM, desmodemone <desmodemone@gmail.com> wrote:
Hello,
I think it's very useful an incremental/differential backup
method, by the way
the method has two drawbacks:
1) In a database normally, even if the percent of modify rows is small
compared to total rows, the probability to change only some files /tables
is small, because the rows are normally not ordered inside a tables and the
update are "random". If some tables are static, probably they are lookup
tables or something like a registry, and normally these tables are small .
2) every time a file changed require every time to read all file. So if
the point A is true, probably you are reading a large part of the databases
and then send that part , instead of sending a small part.
In my opinion to solve these problems we need a different implementation
of incremental backup.
I will try to show my idea about it.
I think we need a bitmap map in memory to track the changed "chunks" of
the file/s/table [ for "chunk" I mean an X number of tracked pages , to
divide the every tracked files in "chunks" ], so we could send only the
changed blocks from last incremental backup ( that could be a full for
incremental backup ).The map could have one submaps for every tracked
files, so it's more simple.
So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk
of 8 block is only an example] , If we use one map of 1Mbit ( 1Mbit are
125KB of memory ) we could track a table with a total size of 64Gb,
probably we could use a compression algorithm because the map is done by
1 and 0 . This is a very simple idea, but it shows that the map does not
need too much memory if we track groups of blocks i.e. "chunk", obviously
the problem is more complex, and probably there are better and more robust
solutions.
Probably we need more space for the header of map to track the
informations about file and the last backup and so on.
I think the map must be updated by the bgwriter , i.e. when it flushes
the dirty buffers,
Not only bgwriter, but checkpointer and backends as well, as
those also flush buffers. Also there are some writes which are
done outside shared buffers, you need to track those separately.
Another point is that to track the changes due to hint bit modification,
you need to enable checksums or wal_log_hints which will either
lead to more cpu or I/O.
fortunately we don't need this map for consistence of database, so we
could create and manage it in memory to limit the impact on performance.
The drawback is that If the db crashes or someone closes it , the next
incremental backup will be full , we could think to flush the map to disk
if the PostgreSQL will receive a signal of closing process or something
similar.
In this way we obtain :
1) we read only small part of a database ( the probability of a changed
chunk are less the the changed of the whole file )
2) we do not need to calculate the checksum, saving cpu
3) we save i/o in reading and writing ( we will send only the changed
block from last incremental backup )
4) we save network
5) we save time during backup. if we read and write less data, we reduce
the time to do an incremental backup.
6) I think the bitmap map in memory will not impact too much on the
performance of the bgwriter.
What do you think about?
I think with this method has 3 drawbacks compare to method
proposed
a. either enable checksum or wal_log_hints, so it will incur extra
I/O if you enable wal_log_hints
b. backends also need to update the map which though a small
cost, but still ...
c. map is not crash safe, due to which sometimes full back up
is needed.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jul 31, 2014 at 3:00 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
One more thing, what will happen for unlogged tables with such a
mechanism?
I imagine that you can safely bypass them as they are not accessible
during recovery and will start with empty relation files once recovery
ends. The same applies to temporary relations. Also this bgworker will
need access to the catalogs to look at the relation relkind.
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2014-07-31 8:26 GMT+02:00 Amit Kapila <amit.kapila16@gmail.com>:
On Wed, Jul 30, 2014 at 7:00 PM, desmodemone <desmodemone@gmail.com>
wrote:Hello,
I think it's very useful an incremental/differential backupmethod, by the way
the method has two drawbacks:
1) In a database normally, even if the percent of modify rows is smallcompared to total rows, the probability to change only some files /tables
is small, because the rows are normally not ordered inside a tables and the
update are "random". If some tables are static, probably they are lookup
tables or something like a registry, and normally these tables are small .2) every time a file changed require every time to read all file. So if
the point A is true, probably you are reading a large part of the databases
and then send that part , instead of sending a small part.In my opinion to solve these problems we need a different implementation
of incremental backup.
I will try to show my idea about it.
I think we need a bitmap map in memory to track the changed "chunks" of
the file/s/table [ for "chunk" I mean an X number of tracked pages , to
divide the every tracked files in "chunks" ], so we could send only the
changed blocks from last incremental backup ( that could be a full for
incremental backup ).The map could have one submaps for every tracked
files, so it's more simple.So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk
of 8 block is only an example] , If we use one map of 1Mbit ( 1Mbit are
125KB of memory ) we could track a table with a total size of 64Gb,
probably we could use a compression algorithm because the map is done by
1 and 0 . This is a very simple idea, but it shows that the map does not
need too much memory if we track groups of blocks i.e. "chunk", obviously
the problem is more complex, and probably there are better and more robust
solutions.Probably we need more space for the header of map to track the
informations about file and the last backup and so on.
I think the map must be updated by the bgwriter , i.e. when it flushes
the dirty buffers,
Not only bgwriter, but checkpointer and backends as well, as
those also flush buffers. Also there are some writes which are
done outside shared buffers, you need to track those separately.Another point is that to track the changes due to hint bit modification,
you need to enable checksums or wal_log_hints which will either
lead to more cpu or I/O.fortunately we don't need this map for consistence of database, so we
could create and manage it in memory to limit the impact on performance.
The drawback is that If the db crashes or someone closes it , the next
incremental backup will be full , we could think to flush the map to disk
if the PostgreSQL will receive a signal of closing process or something
similar.In this way we obtain :
1) we read only small part of a database ( the probability of a changedchunk are less the the changed of the whole file )
2) we do not need to calculate the checksum, saving cpu
3) we save i/o in reading and writing ( we will send only the changedblock from last incremental backup )
4) we save network
5) we save time during backup. if we read and write less data, we reducethe time to do an incremental backup.
6) I think the bitmap map in memory will not impact too much on the
performance of the bgwriter.
What do you think about?
I think with this method has 3 drawbacks compare to method
proposed
a. either enable checksum or wal_log_hints, so it will incur extra
I/O if you enable wal_log_hints
b. backends also need to update the map which though a small
cost, but still ...
c. map is not crash safe, due to which sometimes full back up
is needed.With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Hi Amit, thank you for your comments .
However , about drawbacks:
a) It's not clear to me why the method needs checksum enable, I mean, if
the bgwriter or another process flushes a dirty buffer, it's only have to
signal in the map that the blocks are changed with an update of the value
from 0 to 1.They not need to verify the checksum of the block, we could
assume that when a dirty buffers is flushed, the block is changed [ or
better in my idea, the chunk of N blocks ].
We could think an advanced setting that verify the checksum, but I think
will be heavier.
b) yes the backends need to update the map, but it's in memory, and as I
show, could be very small if we you chunk of blocks.If we not compress the
map, I not think could be a bottleneck.
c) the map is not crash safe by design, because it needs only for
incremental backup to track what blocks needs to be backuped, not for
consistency or recovery of the whole cluster, so it's not an heavy cost for
the whole cluster to maintain it. we could think an option (but it's heavy)
to write it at every flush on file to have crash-safe map, but I not think
it's so usefull . I think it's acceptable, and probably it's better to
force that, to say: "if your db will crash, you need a fullbackup ", and
probably it's better to do it, almost the dba will verify if something go
wrong during the crash, no?corrupted block or something else.
Kind Regards
Mat
On Thu, Jul 31, 2014 at 11:30:52AM +0530, Amit Kapila wrote:
On Wed, Jul 30, 2014 at 11:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
IMV, the way to eventually make this efficient is to have a background
process that reads the WAL and figures out which data blocks have been
modified, and tracks that someplace.Nice idea, however I think to make this happen we need to ensure
that WAL doesn't get deleted/overwritten before this process reads
it (may be by using some existing param or mechanism) and�
wal_level has to be archive or more.
Well, you probably are going to have all the WAL files available because
you have not taken an incremental backup yet, and therefore you would
have no PITR backup at all. Once the incremental backup is done, you
can delete the old WAL files if you don't need fine-grained restore
points.
Robert also suggested reading the block numbers from the WAL as they are
created and not needing them at incremental backup time.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 31, 2014 at 2:00 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jul 30, 2014 at 11:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
IMV, the way to eventually make this efficient is to have a background
process that reads the WAL and figures out which data blocks have been
modified, and tracks that someplace.Nice idea, however I think to make this happen we need to ensure
that WAL doesn't get deleted/overwritten before this process reads
it (may be by using some existing param or mechanism) and
wal_level has to be archive or more.
That should be a problem; logical decoding added a mechanism for
retaining WAL until decoding is done with it, and if it needs to be
extended a bit further, so be it.
One more thing, what will happen for unlogged tables with such a
mechanism?
As Michael Paquier points out, it doesn't matter, because that data
will be gone anyway.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers