On markers of changed data

Started by Andrey Borodinover 8 years ago20 messages

x4mmm@yandex-team.ru

over 8 years ago

Hi, hackers!

Currently I'm working on page-level incremental backups using WAL-G codebase[0]https://github.com/wal-g/wal-g/pull/29. And I have two questions that I cannot resolve myself.

Incremental backup is a set of changes, that should be applied over preexisting backup. I use page LSN to understand should page be backup`ed or not.

Question 1. FSM and VM.
As you can see here [1]https://github.com/wal-g/wal-g/pull/29/files#diff-d77406e827f5f947d4d4a1e6d76c1f4eR114 FSM and VM files are exempt from incremental tracking and are backuped as whole files. I've done it this way, because sanity checks [2]https://github.com/wal-g/wal-g/pull/29/files#diff-d77406e827f5f947d4d4a1e6d76c1f4eR50 of page headers have indicated a lot of "invalid" pages in FSM and VM files. But seems like in some pages headers are valid with sane LSNs.
Can I use LSNs as history marker on FSM and VM pages? On 1Tb backup I get like 150Mb of FSM+VM, and it's kind of a lot.

Question 2. File dates.
Is it safe to use file modification time to track that file were changes since previous backup? If the file has date before start of previous backup I just add it to "skip list" [3]https://github.com/wal-g/wal-g/pull/29/files#diff-f5c8f0067297f98eb5acc6e2c6b1b234R87.
I have assumption: every time file is changes in filesystem, it's modification date is updated to higher value.
Is this assumption valid for most of used platforms and filesystems? Or can I check this "capacity" of FS?

Thank you for your attention. I'll be glad to receive any information\pointers on this matter.

Best regards, Andrey Borodin, Yandex.

[0]: https://github.com/wal-g/wal-g/pull/29
[1]: https://github.com/wal-g/wal-g/pull/29/files#diff-d77406e827f5f947d4d4a1e6d76c1f4eR114
[2]: https://github.com/wal-g/wal-g/pull/29/files#diff-d77406e827f5f947d4d4a1e6d76c1f4eR50
[3]: https://github.com/wal-g/wal-g/pull/29/files#diff-f5c8f0067297f98eb5acc6e2c6b1b234R87

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andrey Borodin (#1)

Re: On markers of changed data

Andrey Borodin <x4mmm@yandex-team.ru> writes:

Is it safe to use file modification time to track that file were changes
since previous backup?

I'd say no:

1. You don't know the granularity of the filesystem's timestamps, at least
not without making unportable assumptions.

2. There's no guarantee that the system clock can't be set backwards.

3. It's not uncommon for filesystems to have optimizations whereby they
skip or delay some updates of file mtimes. (I think this is usually
optional, but you couldn't know whether it's turned on.)

#2 is probably the worst of these problems.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Tom Lane (#2)

Re: On markers of changed data

On Fri, Oct 6, 2017 at 11:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrey Borodin <x4mmm@yandex-team.ru> writes:

Is it safe to use file modification time to track that file were changes
since previous backup?

I'd say no:

1. You don't know the granularity of the filesystem's timestamps, at least
not without making unportable assumptions.

2. There's no guarantee that the system clock can't be set backwards.

3. It's not uncommon for filesystems to have optimizations whereby they
skip or delay some updates of file mtimes. (I think this is usually
optional, but you couldn't know whether it's turned on.)

#2 is probably the worst of these problems.

Or upwards. A simple example of things depending on clock changes is
for example VM snapshotting. Any logic not depending on monotonic
timestamps, with things like clock_gettime(CLOCK_MONOTONIC) is a lot
of fun to investigate until you know that they are not using any
monotonic logic... So the answer is *no*, do not depend on FS-level
timestamps. The only sane method for Postgres is really to scan the
page header LSNs, and of course you already know that.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@alvh.no-ip.org

over 8 years ago

In reply to: Michael Paquier (#3)

Re: On markers of changed data

Michael Paquier wrote:

The only sane method for Postgres is really to scan the
page header LSNs, and of course you already know that.

I hope the idea is not to have to scan every single page in the
database, because that would be too slow. It should be possible to
build this so that a single summary LSN is kept for a largish group of
pages, allowing large portions of the database to be skipped from even
being read if they are known not to contain any page newer than the
previous backup.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stephen Frost

sfrost@snowman.net

over 8 years ago

In reply to: Michael Paquier (#3)

Re: On markers of changed data

Tom, Michael,

* Michael Paquier (michael.paquier@gmail.com) wrote:

On Fri, Oct 6, 2017 at 11:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrey Borodin <x4mmm@yandex-team.ru> writes:

Is it safe to use file modification time to track that file were changes
since previous backup?

I'd say no:

1. You don't know the granularity of the filesystem's timestamps, at least
not without making unportable assumptions.

2. There's no guarantee that the system clock can't be set backwards.

3. It's not uncommon for filesystems to have optimizations whereby they
skip or delay some updates of file mtimes. (I think this is usually
optional, but you couldn't know whether it's turned on.)

#2 is probably the worst of these problems.

Or upwards. A simple example of things depending on clock changes is
for example VM snapshotting. Any logic not depending on monotonic
timestamps, with things like clock_gettime(CLOCK_MONOTONIC) is a lot
of fun to investigate until you know that they are not using any
monotonic logic... So the answer is *no*, do not depend on FS-level
timestamps. The only sane method for Postgres is really to scan the
page header LSNs, and of course you already know that.

Really, these comments appear, at least to me, to be based on an
incorrect assumption that's only considering how tools like rsync use
mtime.

No, you can't trust rsync-based backups that look at mtime and only copy
if the mtime of the source file is currently 'more recent' than the
mtime of the destination file.

That doesn't mean that mtime can't be used to perform incremental
backups using PG, but care has to be taken when doing so to minimize the
risk of a file getting skipped that should have been copied.

There's a few things to do to minimize that risk:

Use mtime only as an indication of if the file changed from the last
time you looked at it- doesn't matter if the mtime on the file is newer
or older. If the mtime is *different*, then you can't trust that the
contents are the same and you need to include it in the backup. Of
course, combine this with checking the file size has changed, but in PG
there's lots of files of the same size, so that's not a terribly good
indicator.

Further, you have to get the mtime of all the files *before* you start
backing them up. If you take the mtime of the file at the time you
start actually copying it, then it could possibly be modified while you
copy it but without the mtime being updated from when you initially
pulled it (and that's not even talking about the concerns around the
clock time moving back and forth). To address the granularity concern,
you should also be sure to wait after you collect all the mtimes but
before actually starting the backup to the level of granularity. Any
optimization which delays setting the mtime would, certainly, still get
around to updating the mtime before the next backup runs and therefore
that file might get copied even though it hadn't changed, but that's
still technically correct, just slightly more work. Lastly, don't trust
any times which are from after the time that you collected the mtimes-
either during the initial backup or when you are doing the subsequent
incremental. Any file whose mtime is different *or* is from after the
time the mtimes were collected should be copied.

This isn't to say that there isn't some risk to using mtime, there still
is- if a backup is made of a file and its mtime collected, and then time
moves backwards, and the file is modified again at the *exact* same
time, leading the 'new' mtime to be identical to the 'old' mtime while
the file's contents are different, and that file is not subsequently
modified before the next backup happens, then the file might not be
included in the backup even though it should be.

Other risks are just blatent corruption happening in the mtime field, or
a kernel-level bug that doesn't update mtime when it should, or the
kernel somehow resetting the mtime back after the file has been changed,
or someone explicitly setting the mtime back after changing a file, or
perhaps other such attacks, though eliminating all of those risks isn't
possible (regardless of solution- someone could go change the LSN on a
page too, for example, and foil a tool which was based on that).

These are risks which I'd love to remove, but they also strike me as
quite small and ones which practical users are willing to accept for
their incremental and differential backups, though it's a reason to also
take full backups regularly.

As Alvaro notes downthread, it's also the only reasonable option
available today. It'd be great to have a better solution, and perhaps
one which summarizes the LSNs in each file would work and be better, but
that would also only be available for PG11, at the earliest.

Thanks!

Stephen

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Alvaro Herrera (#4)

Re: On markers of changed data

Le 6 oct. 2017 à 23:44, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :

Michael Paquier wrote:

The only sane method for Postgres is really to scan the
page header LSNs, and of course you already know that.

I hope the idea is not to have to scan every single page in the
database, because that would be too slow. It should be possible to
build this so that a single summary LSN is kept for a largish group of
pages, allowing large portions of the database to be skipped from even
being read if they are known not to contain any page newer than the
previous backup.

That’s actually what pg_rman is doing for what it calls incremental backups (perhaps that would be differential backup in PG terminology?), and the performance is bad as you can imagine. We could have a dedicated LSN map to do such things with 4 bytes per page. I am still not convinced that this much facility and the potential bug risks are worth it though, Postgres already knows about differential backups if you shape it as a delta of WAL segments. I think that, in order to find a LSN map more convincing, we should find first other use cases where it could become useful. Some use cases may pop up with VACUUM, but I have not studied the question hard enough...
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@alvh.no-ip.org

over 8 years ago

In reply to: Michael Paquier (#6)

Re: On markers of changed data

Michael Paquier wrote:

That’s actually what pg_rman is doing for what it calls incremental
backups (perhaps that would be differential backup in PG
terminology?), and the performance is bad as you can imagine. We could
have a dedicated LSN map to do such things with 4 bytes per page. I am
still not convinced that this much facility and the potential bug
risks are worth it though, Postgres already knows about differential
backups if you shape it as a delta of WAL segments. I think that, in
order to find a LSN map more convincing, we should find first other
use cases where it could become useful. Some use cases may pop up with
VACUUM, but I have not studied the question hard enough...

The case I've discussed with barman developers is a large database
(couple dozen of TBs should be enough) where a large fraction (say 95%)
is read-only but there are many changes to the active part of the data,
so that WAL is more massive than size of active data.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Stephen Frost

sfrost@snowman.net

over 8 years ago

In reply to: Alvaro Herrera (#7)

Re: On markers of changed data

Alvaro, Michael,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

Michael Paquier wrote:

That’s actually what pg_rman is doing for what it calls incremental
backups (perhaps that would be differential backup in PG
terminology?), and the performance is bad as you can imagine. We could
have a dedicated LSN map to do such things with 4 bytes per page. I am
still not convinced that this much facility and the potential bug
risks are worth it though, Postgres already knows about differential
backups if you shape it as a delta of WAL segments. I think that, in
order to find a LSN map more convincing, we should find first other
use cases where it could become useful. Some use cases may pop up with
VACUUM, but I have not studied the question hard enough...

The case I've discussed with barman developers is a large database
(couple dozen of TBs should be enough) where a large fraction (say 95%)
is read-only but there are many changes to the active part of the data,
so that WAL is more massive than size of active data.

Yes, we've seen environments like that also.

Thanks!

Stephen

Andrey Borodin

x4mmm@yandex-team.ru

over 8 years ago

In reply to: Stephen Frost (#8)

Re: On markers of changed data

Tom, Alvaro, Michael, and especially Septhen, thank you for your valuable comments.

I feel enlightened about mtime.
My takeaway is:
1. Any other marker would be better (It can be WAL scan during archiving, some new LSN-based mechanics* et c.)
2. mtime could be used, with precautions described by Stephen are taken.

But my other question still seems unanswered: can I use LSN logic for incrementing FSM and VM? Seems like most of the time there is valid LSN

* I like the idea of using something for both incr(diff) backups and VACUUM, it worth thinking about.

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Stephen Frost

sfrost@snowman.net

over 8 years ago

In reply to: Andrey Borodin (#9)

Re: On markers of changed data

Andrey,

* Andrey Borodin (x4mmm@yandex-team.ru) wrote:

But my other question still seems unanswered: can I use LSN logic for incrementing FSM and VM? Seems like most of the time there is valid LSN

I haven't gone and audited it myself, but I would certainly expect you
to be able to use the LSN for everything which is WAL'd. If you have
cases where that's not the case, it'd be useful to see them.

Thanks!

Stephen

#11

Andrey Borodin

x4mmm@yandex-team.ru

over 8 years ago

In reply to: Stephen Frost (#10)

Re: On markers of changed data

8 окт. 2017 г., в 20:11, Stephen Frost <sfrost@snowman.net> написал(а):
* Andrey Borodin (x4mmm@yandex-team.ru) wrote:

But my other question still seems unanswered: can I use LSN logic for incrementing FSM and VM? Seems like most of the time there is valid LSN

I haven't gone and audited it myself, but I would certainly expect you
to be able to use the LSN for everything which is WAL'd. If you have
cases where that's not the case, it'd be useful to see them.

Thanks, Stephen, this actually pointed what to look for
VM is WAL-logged [0]https://github.com/postgres/postgres/blob/113b0045e20d40f726a0a30e33214455e4f1385e/src/backend/access/heap/visibilitymap.c#L315
FSM is not [1]https://github.com/postgres/postgres/blob/1d25779284fe1ba08ecd57e647292a9deb241376/src/backend/storage/freespace/freespace.c#L593

Now I have everything I wanted, and go back coding :)

Best regards, Andrey Borodin.

[0]: https://github.com/postgres/postgres/blob/113b0045e20d40f726a0a30e33214455e4f1385e/src/backend/access/heap/visibilitymap.c#L315
[1]: https://github.com/postgres/postgres/blob/1d25779284fe1ba08ecd57e647292a9deb241376/src/backend/storage/freespace/freespace.c#L593

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Michael Paquier

michael.paquier@gmail.com

over 8 years ago

In reply to: Andrey Borodin (#11)

Re: On markers of changed data

On Mon, Oct 9, 2017 at 2:23 PM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

I haven't gone and audited it myself, but I would certainly expect you
to be able to use the LSN for everything which is WAL'd. If you have
cases where that's not the case, it'd be useful to see them.

Thanks, Stephen, this actually pointed what to look for
VM is WAL-logged [0]
FSM is not [1]

If you are willing to go down this road, here are my takes on the matter:
- Any LSN map should be in a different file than FSM and VM. The VM
uses 2 bits per blocks now, and the FSM 8 bits. The FSM is wanted to
remain small, so trying to plug into it more data is not welcome. The
VM bits also are dedicated to purely visibility matters. We may not
want to use that for the VM.
- Those should not be logged, as we would end up with tracking down
WAL records for things that themselves track the effects of other
records.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Andrey Borodin

x4mmm@yandex-team.ru

over 8 years ago

In reply to: Michael Paquier (#12)

Re: On markers of changed data

Hi Michael!

9 окт. 2017 г., в 17:28, Michael Paquier <michael.paquier@gmail.com> написал(а):

VM is WAL-logged [0]
FSM is not [1]

If you are willing to go down this road, here are my takes on the matter:
- Any LSN map should be in a different file than FSM and VM. The VM
uses 2 bits per blocks now, and the FSM 8 bits. The FSM is wanted to
remain small, so trying to plug into it more data is not welcome. The
VM bits also are dedicated to purely visibility matters. We may not
want to use that for the VM.
- Those should not be logged, as we would end up with tracking down
WAL records for things that themselves track the effects of other
records.

I was asking about FSM and VM not because I wanted to place something there, but because I was looking for the way to backup FSM and VM efficiently. VM can be copied page-incrementally, FSM cannot.

But the design you are describing resembles PTRACK[0]https://gist.github.com/stalkerg/ab833d94e2f64df241f1835651e06e4b <https://gist.github.com/stalkerg/ab833d94e2f64df241f1835651e06e4b>: fork for page changes tracking, not WAL-logged, but crash safe due to recovery from others WALs.

Best regards, Andrey Borodin.

[0]: https://gist.github.com/stalkerg/ab833d94e2f64df241f1835651e06e4b <https://gist.github.com/stalkerg/ab833d94e2f64df241f1835651e06e4b>

#14

Peter Geoghegan

pg@bowt.ie

over 8 years ago

In reply to: Stephen Frost (#8)

Re: On markers of changed data

On Sat, Oct 7, 2017 at 6:34 AM, Stephen Frost <sfrost@snowman.net> wrote:

That’s actually what pg_rman is doing for what it calls incremental
backups (perhaps that would be differential backup in PG
terminology?), and the performance is bad as you can imagine. We could
have a dedicated LSN map to do such things with 4 bytes per page. I am
still not convinced that this much facility and the potential bug
risks are worth it though, Postgres already knows about differential
backups if you shape it as a delta of WAL segments. I think that, in
order to find a LSN map more convincing, we should find first other
use cases where it could become useful. Some use cases may pop up with
VACUUM, but I have not studied the question hard enough...

The case I've discussed with barman developers is a large database
(couple dozen of TBs should be enough) where a large fraction (say 95%)
is read-only but there are many changes to the active part of the data,
so that WAL is more massive than size of active data.

Yes, we've seen environments like that also.

I'm pretty sure that those cases are cases where there are many more
FPIs than might be expected, due to a lack of locality. (UUID PKs can
make the size of WAL balloon, for example.)

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Greg Stark

stark@mit.edu

over 8 years ago

In reply to: Andrey Borodin (#9)

Re: On markers of changed data

On 8 October 2017 at 08:52, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

1. Any other marker would be better (It can be WAL scan during archiving, some new LSN-based mechanics* et c.)

The general shape of what I would like to see is some log which lists
where each checkpoint starts and ends and what blocks are modified
since the previous checkpoint. Then to generate an incremental backup
from any point in time to the current you union all the block lists
between them and fetch those blocks. There are other ways of using
this aside from incremental backups on disk too -- you could imagine a
replica that has fallen behind requesting the block lists and then
fetching just those blocks instead of needing to receive and apply all
the wal. Or possibly even making a cost-based decision between the two
depending on which would be faster.

It would also be useful for going in the reverse direction: look up
all the records (or just the last record) that modified a given block.
Instead of having to scan all the wal you would only need to scan the
checkpoint eras that had touched that block.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Alvaro Herrera

alvherre@alvh.no-ip.org

over 8 years ago

In reply to: Greg Stark (#15)

Re: On markers of changed data

Greg Stark wrote:

The general shape of what I would like to see is some log which lists
where each checkpoint starts and ends and what blocks are modified
since the previous checkpoint. Then to generate an incremental backup
from any point in time to the current you union all the block lists
between them and fetch those blocks. There are other ways of using
this aside from incremental backups on disk too -- you could imagine a
replica that has fallen behind requesting the block lists and then
fetching just those blocks instead of needing to receive and apply all
the wal.

Hmm, this sounds pretty clever. And we already have the blocks touched
by each record thanks to the work for pg_rewind (so we don't have to do
any nasty tricks like the stuff Suzuki-san did for pg_lesslog, where
each WAL record had to be processed individually to know what blocks it
referenced), so it shouldn't be *too* difficult ...

It would also be useful for going in the reverse direction: look up
all the records (or just the last record) that modified a given block.

Well, a LSN map is what I was suggesting.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Stephen Frost

sfrost@snowman.net

over 8 years ago

In reply to: Alvaro Herrera (#16)

Re: On markers of changed data

Alvaro,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:

Greg Stark wrote:

The general shape of what I would like to see is some log which lists
where each checkpoint starts and ends and what blocks are modified
since the previous checkpoint. Then to generate an incremental backup
from any point in time to the current you union all the block lists
between them and fetch those blocks. There are other ways of using
this aside from incremental backups on disk too -- you could imagine a
replica that has fallen behind requesting the block lists and then
fetching just those blocks instead of needing to receive and apply all
the wal.

Hmm, this sounds pretty clever. And we already have the blocks touched
by each record thanks to the work for pg_rewind (so we don't have to do
any nasty tricks like the stuff Suzuki-san did for pg_lesslog, where
each WAL record had to be processed individually to know what blocks it
referenced), so it shouldn't be *too* difficult ...

Yeah, it sounds interesting, but I was just chatting w/ David about it
and we were thinking about how checkpoints are really rather often done,
so you end up with quite a few of these lists being out there.

Now, if the lists were always kept in a sorted fashion, then perhaps we
would be able to essentially merge-sort them all back together and
de-dup that way, but even then, you're talking about an awful lot if
you're looking at daily incrementals- that's 288 standard 5-minute
checkpoints, each with some 128k pages changed, assuming max_wal_size of
1GB, and I think we can all agree that the default max_wal_size is for
rather small systems. That ends up being something around 2MB per
checkpoint to store the pages in or half a gig per day just to keep
track of the pages which changed in each checkpoint across that day.

There's a bit of hand-waving in there, but I don't think it's all that
much to reach a conclusion that this might not be the best approach.
David and I were kicking around the notion of a 'last LSN' which is kept
on a per-relation basis, but, of course, that ends up not really being
granular enough, and would likely be a source of contention unless we
could work out a way to make it "lazy" updated somehow, or similar.

It would also be useful for going in the reverse direction: look up
all the records (or just the last record) that modified a given block.

Well, a LSN map is what I was suggesting.

Not sure I entirely followed what you were getting at here..?

Thanks!

Stephen

#18

Greg Stark

stark@mit.edu

over 8 years ago

In reply to: Stephen Frost (#17)

Re: On markers of changed data

On 10 October 2017 at 23:50, Stephen Frost <sfrost@snowman.net> wrote:

Yeah, it sounds interesting, but I was just chatting w/ David about it
and we were thinking about how checkpoints are really rather often done,
so you end up with quite a few of these lists being out there.

Now, if the lists were always kept in a sorted fashion, then perhaps we
would be able to essentially merge-sort them all back together and
de-dup that way but even then, you're talking about an awful lot if
you're looking at daily incrementals- that's 288 standard 5-minute
checkpoints, each with some 128k pages changed, assuming max_wal_size of
1GB, and I think we can all agree that the default max_wal_size is for
rather small systems. That ends up being something around 2MB per
checkpoint to store the pages in or half a gig per day just to keep
track of the pages which changed in each checkpoint across that day.

I was actually imagining a bitmap, probably for each 1GB piece of each
table. That's probably how you would maintain this data in memory
anyways. After compression it should be fairly small. You'll probably
be modifying the same blocks frequently or doing bulk loads which will
touch a consecutive range of blocks.

But that's still about the same amount of data. But probably you don't
want to actually keep every checkpoint anyways. The nice thing about
the changelists is that they will tend to reach a maximum size
regardless of how long a time range they span so if you keep one
changelist for every 10 checkpoints or every 100 checkpoints you could
reduce the storage needs and only lose the time precision.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Robert Haas

robertmhaas@gmail.com

over 8 years ago

In reply to: Michael Paquier (#3)

Re: On markers of changed data

On Fri, Oct 6, 2017 at 10:34 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Oct 6, 2017 at 11:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'd say no:

1. You don't know the granularity of the filesystem's timestamps, at least
not without making unportable assumptions.

2. There's no guarantee that the system clock can't be set backwards.

3. It's not uncommon for filesystems to have optimizations whereby they
skip or delay some updates of file mtimes. (I think this is usually
optional, but you couldn't know whether it's turned on.)

#2 is probably the worst of these problems.

Or upwards. A simple example of things depending on clock changes is
for example VM snapshotting. Any logic not depending on monotonic
timestamps, with things like clock_gettime(CLOCK_MONOTONIC) is a lot
of fun to investigate until you know that they are not using any
monotonic logic... So the answer is *no*, do not depend on FS-level
timestamps. The only sane method for Postgres is really to scan the
page header LSNs, and of course you already know that.

It might also be worth noting that the FSM is not WAL-logged, so it
probably either doesn't have page LSNs or they are not meaningful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Andrey Borodin

x4mmm@yandex-team.ru

about 8 years ago

In reply to: Andrey Borodin (#11)

Re: On markers of changed data

Hello!

9 окт. 2017 г., в 10:23, Andrey Borodin <x4mmm@yandex-team.ru> написал(а):

Thanks, Stephen, this actually pointed what to look for
VM is WAL-logged [0]
FSM is not [1]

[0] https://github.com/postgres/postgres/blob/113b0045e20d40f726a0a30e33214455e4f1385e/src/backend/access/heap/visibilitymap.c#L315
[1] https://github.com/postgres/postgres/blob/1d25779284fe1ba08ecd57e647292a9deb241376/src/backend/storage/freespace/freespace.c#L593

After tests of binary equivalence before and after backup I've come to conclusion, that Visibility Map cannot be backuped incrementally: it's bits are unset without page LSN bump. This can lead to wrong results of Index Only Scans executed on freshly restored backups.

In my implementation of incremental backup in WAL-G I will disable any FSM, VM and XACT\CLOG incrementation.

Posting this for the record, so that if someone goes this way info will be available. Thank you for your attention.

Best regards, Andrey Borodin.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers