Incremental backup
On 10/28/21 7:23 AM, Edu Gargiulo wrote:
Hi all,
Is there any way to make incremental backups in postgres-12?
Using pg_dump? No.
Using pgBackRest? Yes.
--
Angular momentum makes the world go 'round.
On 28.10.2021 15:58, Ron wrote:
On 10/28/21 7:23 AM, Edu Gargiulo wrote:
Hi all,
Is there any way to make incremental backups in postgres-12?
Using pg_dump? No.
Using pgBackRest? Yes.
https://github.com/postgrespro/pg_probackup
--
On 10/28/21 05:23, Edu Gargiulo wrote:
Hi all,
Is there any way to make incremental backups in postgres-12?
It would helpful to be more explicit about what you are trying to achieve.
Do you want:
1) A continuous process or scheduled one?
2) Local or remote backups?
3) The backup as a standby?
4) Any other features requests you might have.
Kind regards,
--
Eduardo
--
Adrian Klaver
adrian.klaver@aklaver.com
On Thu, Oct 28, 2021 at 12:36 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
On 10/28/21 05:23, Edu Gargiulo wrote:
Hi all,
Is there any way to make incremental backups in postgres-12?
It would helpful to be more explicit about what you are trying to achieve.
Do you want:
1) A continuous process or scheduled one?
Scheduled
2) Local or remote backups?
Local. Now I'm doing locally and copying backups to other servers.
3) The backup as a standby?
Now I'have a standby cluster via physical replication. Could do the backup
from standby server too.
4) Any other features requests you might have.
90% of the data in the databse is "static data" (compressed timescaledb
chunks), I want to append only that 10% of recently inserted or updated
data daily to yesterday backup, instead of do a full backup daily.
<adrian.klaver@aklaver.com>
Thanks in advance and sorry for my english
--
Eduardo
On 10/28/21 10:36 AM, Adrian Klaver wrote:
On 10/28/21 05:23, Edu Gargiulo wrote:
Hi all,
Is there any way to make incremental backups in postgres-12?
It would helpful to be more explicit about what you are trying to achieve.
Do you want:
1) A continuous process or scheduled one?
2) Local or remote backups?
3) The backup as a standby?
Standby is not backup!!!!!
Honestly, these are bizarre questions which no one in the Enterprise world
would ask in response to "Is there any way to make incremental backups in
Oracle, DB2, SQL Server?" since the answer is a simple "yes, of course" (or
maybe "yes, differential backups").
4) Any other features requests you might have.
--
Angular momentum makes the world go 'round.
On 10/28/21 09:25, Ron wrote:
On 10/28/21 10:36 AM, Adrian Klaver wrote:
On 10/28/21 05:23, Edu Gargiulo wrote:
Hi all,
Is there any way to make incremental backups in postgres-12?
It would helpful to be more explicit about what you are trying to
achieve.Do you want:
1) A continuous process or scheduled one?
2) Local or remote backups?
3) The backup as a standby?
Standby is not backup!!!!!
But we don't know what the OP is doing so it is best to ask.
Honestly, these are bizarre questions which no one in the Enterprise
world would ask in response to "Is there any way to make incremental
backups in Oracle, DB2, SQL Server?" since the answer is a simple "yes,
of course" (or maybe "yes, differential backups").
Except we have no way of knowing what the situation is. I prefer not to
assume a context.
4) Any other features requests you might have.
--
Adrian Klaver
adrian.klaver@aklaver.com
On 10/28/21 5:23 AM, Edu Gargiulo wrote:
Is there any way to make incremental backups in postgres-12?
wal-e and wal-g are useful tools for this and easy to set up. They take
periodic full backups and then also continuously archive WAL files. To
restore they apply the last full backup and then replay the WAL since
then. They have nice support for putting the files on S3 or similar
places, but really they can drop the files anywhere.
Yours,
--
Paul ~{:-)
pj@illuminatedcomputing.com
On 10/28/21 11:40 AM, Adrian Klaver wrote:
On 10/28/21 09:25, Ron wrote:
On 10/28/21 10:36 AM, Adrian Klaver wrote:
On 10/28/21 05:23, Edu Gargiulo wrote:
Hi all,
Is there any way to make incremental backups in postgres-12?
It would helpful to be more explicit about what you are trying to achieve.
Do you want:
1) A continuous process or scheduled one?
2) Local or remote backups?
3) The backup as a standby?
Standby is not backup!!!!!
But we don't know what the OP is doing so it is best to ask.
Honestly, these are bizarre questions which no one in the Enterprise
world would ask in response to "Is there any way to make incremental
backups in Oracle, DB2, SQL Server?" since the answer is a simple "yes,
of course" (or maybe "yes, differential backups").Except we have no way of knowing what the situation is. I prefer not to
assume a context.
You make it sound like incremental (and differential) backups are some
complicated thing that needs context. That's utter and complete rubbish in
every Enterprise RDBMS except Postgresql.
Something like "BACKUP DATABASE (DIFFERENTIAL) foo TO foo.bak;" should be a
fundamental feature of every RDBMS that claims to be enterprise class.
--
Angular momentum makes the world go 'round.
On 10/28/21 10:51, Ron wrote:
Except we have no way of knowing what the situation is. I prefer not
to assume a context.You make it sound like incremental (and differential) backups are some
complicated thing that needs context. That's utter and complete rubbish
in every Enterprise RDBMS except Postgresql.
This is Postgres so we do need context.
Also you have not defined what you consider incremental backup? I for
one would put this:
https://www.postgresql.org/docs/14/continuous-archiving.html
forward as a candidate.
Something like "BACKUP DATABASE (DIFFERENTIAL) foo TO foo.bak;" should
be a fundamental feature of every RDBMS that claims to be enterprise class.
Knock your self out:
https://wiki.postgresql.org/wiki/Developer_FAQ
--
Adrian Klaver
adrian.klaver@aklaver.com
On 10/28/21 1:00 PM, Adrian Klaver wrote:
On 10/28/21 10:51, Ron wrote:
Except we have no way of knowing what the situation is. I prefer not to
assume a context.You make it sound like incremental (and differential) backups are some
complicated thing that needs context. That's utter and complete rubbish
in every Enterprise RDBMS except Postgresql.This is Postgres so we do need context.
Also you have not defined what you consider incremental backup? I for one
would put this:https://www.postgresql.org/docs/14/continuous-archiving.html
I define Incremental and Differential backups the way everyone else does:
Incremental Backup: capture all *changed data* since the last incremental
backup. (Not the same as WAL archiving.)
Differential Backup: capture all *changed data* since the last full backup.
Transaction Backup: capture *all transactions* in a log file.
forward as a candidate.
Something like "BACKUP DATABASE (DIFFERENTIAL) foo TO foo.bak;" should be
a fundamental feature of every RDBMS that claims to be enterprise class.Knock your self out:
Not doable in Postgresql because WAL files are global to cluster. I've read
multiple times that will not be changed.
PgBackRest does full and incremental backups, plus captures WAL files, but
globally to a cluster.
--
Angular momentum makes the world go 'round.
On 10/28/21 11:48, Ron wrote:
On 10/28/21 1:00 PM, Adrian Klaver wrote:
On 10/28/21 10:51, Ron wrote:
Not doable in Postgresql because WAL files are global to cluster. I've
read multiple times that will not be changed.
Yet somehow logical replication does it:
https://www.postgresql.org/docs/14/logical-replication-architecture.html
"
Logical replication is built with an architecture similar to physical
streaming replication (see Section 27.2.5). It is implemented by
“walsender” and “apply” processes. The walsender process starts logical
decoding (described in Chapter 49) of the WAL and loads the standard
logical decoding plugin (pgoutput). The plugin transforms the changes
read from WAL to the logical replication protocol (see Section 53.5) and
filters the data according to the publication specification. The data is
then continuously transferred using the streaming replication protocol
to the apply worker, which maps the data to local tables and applies the
individual changes as they are received, in correct transactional order.
"
https://www.postgresql.org/docs/14/logical-replication.html
"The typical use-cases for logical replication are:
Sending incremental changes in a single database or a subset of a
database to subscribers as they occur.
...
"
PgBackRest does full and incremental backups, plus captures WAL files,
but globally to a cluster.--
Angular momentum makes the world go 'round.
--
Adrian Klaver
adrian.klaver@aklaver.com
On 10/28/21 2:06 PM, Adrian Klaver wrote:
On 10/28/21 11:48, Ron wrote:
On 10/28/21 1:00 PM, Adrian Klaver wrote:
On 10/28/21 10:51, Ron wrote:
Not doable in Postgresql because WAL files are global to cluster. I've
read multiple times that will not be changed.Yet somehow logical replication does it:
https://www.postgresql.org/docs/14/logical-replication-architecture.html
"
Logical replication is built with an architecture similar to physical
streaming replication (see Section 27.2.5). It is implemented by
“walsender” and “apply” processes. The walsender process starts logical
decoding (described in Chapter 49) of the WAL and loads the
Scans the (global) WAL data for only the that portion from the relevant
database?
If so, definitely not the same as having per-database WAL files.
Just as importantly, replication is not, and never will be, a substitute for
backups.
standard logical decoding plugin (pgoutput). The plugin transforms the
changes read from WAL to the logical replication protocol (see Section
53.5) and filters the data according to the publication specification. The
data is then continuously transferred using the streaming replication
protocol to the apply worker, which maps the data to local tables and
applies the individual changes as they are received, in correct
transactional order.
"https://www.postgresql.org/docs/14/logical-replication.html
"The typical use-cases for logical replication are:
Sending incremental changes in a single database or a subset of a
database to subscribers as they occur.
...
"
--
Angular momentum makes the world go 'round.
On Thu, Oct 28, 2021 at 02:23:39PM -0500, Ron wrote:
On 10/28/21 2:06 PM, Adrian Klaver wrote:
On 10/28/21 11:48, Ron wrote:
On 10/28/21 1:00 PM, Adrian Klaver wrote:
On 10/28/21 10:51, Ron wrote:
Not doable in Postgresql because WAL files are global to cluster.
I've read multiple times that will not be changed.Yet somehow logical replication does it:
https://www.postgresql.org/docs/14/logical-replication-architecture.html
"
Logical replication is built with an architecture similar to physical
streaming replication (see Section 27.2.5). It is implemented by
“walsender” and “apply” processes. The walsender process starts logical
decoding (described in Chapter 49) of the WAL and loads theScans the (global) WAL data for only the that portion from the relevant
database?If so, definitely not the same as having per-database WAL files.
Just as importantly, replication is not, and never will be, a substitute for
backups.
Uh, for replication slots, we don't send the entire WAL stream to the
subscriber:
https://www.postgresql.org/docs/14/logical-replication.html
Logical replication of a table typically starts with taking a snapshot
of the data on the publisher database and copying that to the subscriber.
Once that is done, the changes on the publisher are sent to the subscriber
as they occur in real-time. The subscriber applies the data in the same
order as the publisher so that transactional consistency is guaranteed for
publications within a single subscription. This method of data replication
is sometimes referred to as transactional replication.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On 10/28/21 12:23, Ron wrote:
On 10/28/21 2:06 PM, Adrian Klaver wrote:
On 10/28/21 11:48, Ron wrote:
On 10/28/21 1:00 PM, Adrian Klaver wrote:
On 10/28/21 10:51, Ron wrote:
"
Logical replication is built with an architecture similar to physical
streaming replication (see Section 27.2.5). It is implemented by
“walsender” and “apply” processes. The walsender process starts
logical decoding (described in Chapter 49) of the WAL and loads theScans the (global) WAL data for only the that portion from the relevant
database?If so, definitely not the same as having per-database WAL files.
Just as importantly, replication is not, and never will be, a substitute
for backups.
Who says you have to use the mechanism to replicate to another database,
why not to a file?
Not sure of the plausibility, still it might be interesting to find out?
Also isn't a backup just an interrupted form of replication?
--
Adrian Klaver
adrian.klaver@aklaver.com
On 10/28/21 3:43 PM, Adrian Klaver wrote:
On 10/28/21 12:23, Ron wrote:
On 10/28/21 2:06 PM, Adrian Klaver wrote:
On 10/28/21 11:48, Ron wrote:
"
Logical replication is built with an architecture similar to physical
streaming replication (see Section 27.2.5). It is implemented by
“walsender” and “apply” processes. The walsender process starts logical
decoding (described in Chapter 49) of the WAL and loads theScans the (global) WAL data for only the that portion from the relevant
database?If so, definitely not the same as having per-database WAL files.
Just as importantly, replication is not, and never will be, a substitute
for backups.Who says you have to use the mechanism to replicate to another database,
why not to a file?
But WAL files store every transaction, right? Differential and incremental
backups only send the modified pages, even if the page has been updated 1000
times.
Not sure of the plausibility, still it might be interesting to find out?
Also isn't a backup just an interrupted form of replication?
Highly interrupted, and usually never written back to disk in "active" form.
--
Angular momentum makes the world go 'round.
On 10/28/21 3:08 PM, Bruce Momjian wrote:
On Thu, Oct 28, 2021 at 02:23:39PM -0500, Ron wrote:
On 10/28/21 2:06 PM, Adrian Klaver wrote:
On 10/28/21 11:48, Ron wrote:
On 10/28/21 1:00 PM, Adrian Klaver wrote:
On 10/28/21 10:51, Ron wrote:
Not doable in Postgresql because WAL files are global to cluster.
I've read multiple times that will not be changed.Yet somehow logical replication does it:
https://www.postgresql.org/docs/14/logical-replication-architecture.html
"
Logical replication is built with an architecture similar to physical
streaming replication (see Section 27.2.5). It is implemented by
“walsender” and “apply” processes. The walsender process starts logical
decoding (described in Chapter 49) of the WAL and loads theScans the (global) WAL data for only the that portion from the relevant
database?If so, definitely not the same as having per-database WAL files.
Just as importantly, replication is not, and never will be, a substitute for
backups.Uh, for replication slots, we don't send the entire WAL stream to the
subscriber:
I meant scanning at the source.
https://www.postgresql.org/docs/14/logical-replication.html
Logical replication of a table typically starts with taking a snapshot
of the data on the publisher database and copying that to the subscriber.
Once that is done, the changes on the publisher are sent to the subscriber
as they occur in real-time. The subscriber applies the data in the same
order as the publisher so that transactional consistency is guaranteed for
publications within a single subscription. This method of data replication
is sometimes referred to as transactional replication.
It's still a bunch of transaction logs, whereas differential and incremental
backups only backup the changed pages, no matter how many times they've been
changed.
That's a serious reduction in disk space, and time to apply them.
--
Angular momentum makes the world go 'round.
På torsdag 28. oktober 2021 kl. 23:09:19, skrev Ron <ronljohnsonjr@gmail.com
<mailto:ronljohnsonjr@gmail.com>>:
[...]
It's still a bunch of transaction logs, whereas differential and incremental
backups only backup the changed pages, no matter how many times they've been
changed.
That's a serious reduction in disk space, and time to apply them.
I think everybody agrees that incremental backup per database, and not
cluster-wide, is nice, and it would be nice if PG supported it. But, given the
way PG is architectured, having cluster-wide WALs, that's not an easy task to
implement.
Repeating "other databases have it" doesn't change that.
--
Andreas Joseph Krogh
On 10/28/21 5:07 PM, Andreas Joseph Krogh wrote:
På torsdag 28. oktober 2021 kl. 23:09:19, skrev Ron
<ronljohnsonjr@gmail.com <mailto:ronljohnsonjr@gmail.com>>:[...]
It's still a bunch of transaction logs, whereas differential and
incremental
backups only backup the changed pages, no matter how many times
they've been
changed.That's a serious reduction in disk space, and time to apply them.
I think everybody agrees that incremental backup /per database/, and not
cluster-wide, is nice, and it would be nice if PG supported it. But, given
the way PG is architectured, having cluster-wide WALs, that's not an easy
task to implement.
Which is what I said a couple of hours ago.
--
Angular momentum makes the world go 'round.
On 10/28/21 18:07, Andreas Joseph Krogh wrote:
I think everybody agrees that incremental backup /per database/, and
not cluster-wide, is nice, and it would be nice if PG supported it.
But, given the way PG is architectured, having cluster-wide WALs,
that's not an easy task to implement.
Repeating "other databases have it" doesn't change that.
--
Andreas Joseph Krogh
I am not advocating for the database level incremental backups, but all
databases that have it also have cluster wide/instance wide WAL logs.
Cluster wide WAL logs do not make database level incremental backups
hard. Both Oracle and DB2 have database level incremental backups and
both have cluster wide WAL (redo or logs). So does SQL*Server which is a
bit specific because its log is used for both recovery, as WAL, and undo
so its very different from Postgres. Also, SQL Server is, to my
knowledge, the only database which doesn't use MVCC.
The technology that all of those databases employ is a bitmap device
which has one bit per each database block. Full backups set all bits to
0 and whenever block is modified, the corresponding bit is set to 1. The
backup tool in the incremental mode then only copies blocks with the
bitmap value of 1. I am not too thrilled by that implementation. In
particular, with Oracle there were bugs with database restore,
"duplicate database" operation, global cache locks and instance latches.
I've had quite a few headaches with RMAN cumulative incremental backups.
My preferred method of backup is storage snapshot. Snapshots then can
be backed up to other arrays (NetApp SnapVault, Hitachi HUR, EMC SRDF)
or can be backed up to deduplicated offline storage like AWS Glacier or
EMC Data Domain using simple file level utilities. Once snapshot is
taken, it is a read-only file system and the files are no longer opened
by the database processes.
The classic file level backup tools like pg_basebackup or rman are
pretty much unusable once your database hits 50 TB or so. With 50TB
database, your RTO (Recovery Time Objective) will be in days. You can
only push around 3TB/hour down a 10Gbit Ethernet pipe. On the other hand
companies like Facebook. Amazon, Walmart or Target are losing hundreds
of thousands USD per hour of downtime. Downtime of 15 hours or longer is
completely unacceptable. Backup is only the last line of defense. It
should only be used if everything else fails. And if you have to use
file level tool like pg_basebackup and tar to restore your VLDB, you
should also get your resume ready.
Incremental backups are awkward and ungainly. You have to restore the
full backup and all incrementals taken since the last full. After that,
you must apply the remaining WAL files. So, by definition, restoring
incremental backup is, by definition, longer than restoring a full
backup. So, if you have to take a backup, get yourself a deduplicating
backup utility. Deduplication will only save the backup blocks that are
different from the previous backup. In other words, you can do a daily
full backup with the space expenditure of an incremental backup. Also,
if you need backups because of regulatory reasons (HIPAA, SOX),
incremental backups will not do.
Long story short, there are much more important things to do on Postgres
than incremental database level backups. BTW, Chuck Norris can take and
incremental database level backup of a Postgres database and recover
database without restoring the preceding full backup. I am not sure
whether Chuck Norris is a member of Postgres steering committee.
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com