trying again to get incremental backup
A few years ago, I sketched out a design for incremental backup, but
no patch for incremental backup ever got committed. Instead, the whole
thing evolved into a project to add backup manifests, which are nice,
but not as nice as incremental backup would be. So I've decided to
have another go at incremental backup itself. Attached are some WIP
patches. Let me summarize the design and some open questions and
problems with it that I've discovered. I welcome problem reports and
test results from others, as well.
The basic design of this patch set is pretty simple, and there are
three main parts. First, there's a new background process called the
walsummarizer which runs all the time. It reads the WAL and generates
WAL summary files. WAL summary files are extremely small compared to
the original WAL and contain only the minimal amount of information
that we need in order to determine which parts of the database need to
be backed up. They tell us about files getting created, destroyed, or
truncated, and they tell us about modified blocks. Naturally, we don't
find out about blocks that were modified without any write-ahead log
record, e.g. hint bit updates, but those are of necessity not critical
for correctness, so it's OK. Second, pg_basebackup has a mode where it
can take an incremental backup. You must supply a backup manifest from
a previous full backup. We read the WAL summary files that have been
generated between the start of the previous backup and the start of
this one, and use that to figure out which relation files have changed
and how much. Non-relation files are sent normally, just as they would
be in a full backup. Relation files can either be sent in full or be
replaced by an incremental file, which contains a subset of the blocks
in the file plus a bit of information to handle truncations properly.
Third, there's now a pg_combinebackup utility which takes a full
backup and one or more incremental backups, performs a bunch of sanity
checks, and if everything works out, writes out a new, synthetic full
backup, aka a data directory.
Simple usage example:
pg_basebackup -cfast -Dx
pg_basebackup -cfast -Dy --incremental x/backup_manifest
pg_combinebackup x y -o z
The part of all this with which I'm least happy is the WAL
summarization engine. Actually, the core process of summarizing the
WAL seems totally fine, and the file format is very compact thanks to
some nice ideas from my colleague Dilip Kumar. Someone may of course
wish to argue that the information should be represented in some other
file format instead, and that can be done if it's really needed, but I
don't see a lot of value in tinkering with it, either. Where I do
think there's a problem is deciding how much WAL ought to be
summarized in one WAL summary file. Summary files cover a certain
range of WAL records - they have names like
$TLI${START_LSN}${END_LSN}.summary. It's not too hard to figure out
where a file should start - generally, it's wherever the previous file
ended, possibly on a new timeline, but figuring out where the summary
should end is trickier. You always have the option to either read
another WAL record and fold it into the current summary, or end the
current summary where you are, write out the file, and begin a new
one. So how do you decide what to do?
I originally had the idea of summarizing a certain number of MB of WAL
per WAL summary file, and so I added a GUC wal_summarize_mb for that
purpose. But then I realized that actually, you really want WAL
summary file boundaries to line up with possible redo points, because
when you do an incremental backup, you need a summary that stretches
from the redo point of the checkpoint written at the start of the
prior backup to the redo point of the checkpoint written at the start
of the current backup. The block modifications that happen in that
range of WAL records are the ones that need to be included in the
incremental. Unfortunately, there's no indication in the WAL itself
that you've reached a redo point, but I wrote code that tries to
notice when we've reached the redo point stored in shared memory and
stops the summary there. But I eventually realized that's not good
enough either, because if summarization zooms past the redo point
before noticing the updated redo point in shared memory, then the
backup sat around waiting for the next summary file to be generated so
it had enough summaries to proceed with the backup, while the
summarizer was in no hurry to finish up the current file and just sat
there waiting for more WAL to be generated. Eventually the incremental
backup would just time out. I tried to fix that by making it so that
if somebody's waiting for a summary file to be generated, they can let
the summarizer know about that and it can write a summary file ending
at the LSN up to which it has read and then begin a new file from
there. That seems to fix the hangs, but now I've got three
overlapping, interconnected systems for deciding where to end the
current summary file, and maybe that's OK, but I have a feeling there
might be a better way.
Dilip had an interesting potential solution to this problem, which was
to always emit a special WAL record at the redo pointer. That is, when
we fix the redo pointer for the checkpoint record we're about to
write, also insert a WAL record there. That way, when the summarizer
reaches that sentinel record, it knows it should stop the summary just
before. I'm not sure whether this approach is viable, especially from
a performance and concurrency perspective, and I'm not sure whether
people here would like it, but it does seem like it would make things
a whole lot simpler for this patch set.
Another thing that I'm not too sure about is: what happens if we find
a relation file on disk that doesn't appear in the backup_manifest for
the previous backup and isn't mentioned in the WAL summaries either?
The fact that said file isn't mentioned in the WAL summaries seems
like it ought to mean that the file is unchanged, in which case
perhaps this ought to be an error condition. But I'm not too sure
about that treatment. I have a feeling that there might be some subtle
problems here, especially if databases or tablespaces get dropped and
then new ones get created that happen to have the same OIDs. And what
about wal_level=minimal? I'm not at a point where I can say I've gone
through and plugged up these kinds of corner-case holes tightly yet,
and I'm worried that there may be still other scenarios of which I
haven't even thought. Happy to hear your ideas about what the problem
cases are or how any of the problems should be solved.
A related design question is whether we should really be sending the
whole backup manifest to the server at all. If it turns out that we
don't really need anything except for the LSN of the previous backup,
we could send that one piece of information instead of everything. On
the other hand, if we need the list of files from the previous backup,
then sending the whole manifest makes sense.
Another big and rather obvious problem with the patch set is that it
doesn't currently have any automated test cases, or any real
documentation. Those are obviously things that need a lot of work
before there could be any thought of committing this. And probably a
lot of bugs will be found along the way, too.
A few less-serious problems with the patch:
- We don't have an incremental JSON parser, so if you have a
backup_manifest>1GB, pg_basebackup --incremental is going to fail.
That's also true of the existing code in pg_verifybackup, and for the
same reason. I talked to Andrew Dunstan at one point about adapting
our JSON parser to support incremental parsing, and he had a patch for
that, but I think he found some problems with it and I'm not sure what
the current status is.
- The patch does support differential backup, aka an incremental atop
another incremental. There's no particular limit to how long a chain
of backups can be. However, pg_combinebackup currently requires that
the first backup is a full backup and all the later ones are
incremental backups. So if you have a full backup a and an incremental
backup b and a differential backup c, you can combine a b and c to get
a full backup equivalent to one you would have gotten if you had taken
a full backup at the time you took c. However, you can't combine b and
c with each other without combining them with a, and that might be
desirable in some situations. You might want to collapse a bunch of
older differential backups into a single one that covers the whole
time range of all of them. I think that the file format can support
that, but the tool is currently too dumb.
- We only know how to operate on directories, not tar files. I thought
about that when working on pg_verifybackup as well, but I didn't do
anything about it. It would be nice to go back and make that tool work
on tar-format backups, and this one, too. I don't think there would be
a whole lot of point trying to operate on compressed tar files because
you need random access and that seems hard on a compressed file, but
on uncompressed files it seems at least theoretically doable. I'm not
sure whether anyone would care that much about this, though, even
though it does sound pretty cool.
In the attached patch series, patches 1 through 6 are various
refactoring patches, patch 7 is the main event, and patch 8 adds a
useful inspection tool.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v1-0006-Move-src-bin-pg_verifybackup-parse_manifest.c-int.patchapplication/octet-stream; name=v1-0006-Move-src-bin-pg_verifybackup-parse_manifest.c-int.patchDownload+6-7
v1-0008-Add-new-pg_walsummary-tool.patchapplication/octet-stream; name=v1-0008-Add-new-pg_walsummary-tool.patchDownload+347-1
v1-0007-Prototype-patch-for-incremental-and-differential-.patchapplication/octet-stream; name=v1-0007-Prototype-patch-for-incremental-and-differential-.patchDownload+8454-71
v1-0001-In-basebackup.c-refactor-to-create-verify_page_ch.patchapplication/octet-stream; name=v1-0001-In-basebackup.c-refactor-to-create-verify_page_ch.patchDownload+104-85
v1-0005-Change-how-a-base-backup-decides-which-files-have.patchapplication/octet-stream; name=v1-0005-Change-how-a-base-backup-decides-which-files-have.patchDownload+56-118
v1-0003-Change-struct-tablespaceinfo-s-oid-member-from-ch.patchapplication/octet-stream; name=v1-0003-Change-struct-tablespaceinfo-s-oid-member-from-ch.patchDownload+49-30
v1-0002-In-basebackup.c-refactor-to-create-read_file_data.patchapplication/octet-stream; name=v1-0002-In-basebackup.c-refactor-to-create-read_file_data.patchDownload+133-99
v1-0004-Refactor-parse_filename_for_nontemp_relation-to-p.patchapplication/octet-stream; name=v1-0004-Refactor-parse_filename_for_nontemp_relation-to-p.patchDownload+93-65
Hi,
On 2023-06-14 14:46:48 -0400, Robert Haas wrote:
A few years ago, I sketched out a design for incremental backup, but
no patch for incremental backup ever got committed. Instead, the whole
thing evolved into a project to add backup manifests, which are nice,
but not as nice as incremental backup would be. So I've decided to
have another go at incremental backup itself. Attached are some WIP
patches. Let me summarize the design and some open questions and
problems with it that I've discovered. I welcome problem reports and
test results from others, as well.
Cool!
I originally had the idea of summarizing a certain number of MB of WAL
per WAL summary file, and so I added a GUC wal_summarize_mb for that
purpose. But then I realized that actually, you really want WAL
summary file boundaries to line up with possible redo points, because
when you do an incremental backup, you need a summary that stretches
from the redo point of the checkpoint written at the start of the
prior backup to the redo point of the checkpoint written at the start
of the current backup. The block modifications that happen in that
range of WAL records are the ones that need to be included in the
incremental.
I assume this is "solely" required for keeping the incremental backups as
small as possible, rather than being required for correctness?
Unfortunately, there's no indication in the WAL itself
that you've reached a redo point, but I wrote code that tries to
notice when we've reached the redo point stored in shared memory and
stops the summary there. But I eventually realized that's not good
enough either, because if summarization zooms past the redo point
before noticing the updated redo point in shared memory, then the
backup sat around waiting for the next summary file to be generated so
it had enough summaries to proceed with the backup, while the
summarizer was in no hurry to finish up the current file and just sat
there waiting for more WAL to be generated. Eventually the incremental
backup would just time out. I tried to fix that by making it so that
if somebody's waiting for a summary file to be generated, they can let
the summarizer know about that and it can write a summary file ending
at the LSN up to which it has read and then begin a new file from
there. That seems to fix the hangs, but now I've got three
overlapping, interconnected systems for deciding where to end the
current summary file, and maybe that's OK, but I have a feeling there
might be a better way.
Could we just recompute the WAL summary for the [redo, end of chunk] for the
relevant summary file?
Dilip had an interesting potential solution to this problem, which was
to always emit a special WAL record at the redo pointer. That is, when
we fix the redo pointer for the checkpoint record we're about to
write, also insert a WAL record there. That way, when the summarizer
reaches that sentinel record, it knows it should stop the summary just
before. I'm not sure whether this approach is viable, especially from
a performance and concurrency perspective, and I'm not sure whether
people here would like it, but it does seem like it would make things
a whole lot simpler for this patch set.
FWIW, I like the idea of a special WAL record at that point, independent of
this feature. It wouldn't be a meaningful overhead compared to the cost of a
checkpoint, and it seems like it'd be quite useful for debugging. But I can
see uses going beyond that - we occasionally have been discussing associating
additional data with redo points, and that'd be a lot easier to deal with
during recovery with such a record.
I don't really see a performance and concurrency angle right now - what are
you wondering about?
Another thing that I'm not too sure about is: what happens if we find
a relation file on disk that doesn't appear in the backup_manifest for
the previous backup and isn't mentioned in the WAL summaries either?
Wouldn't that commonly happen for unlogged relations at least?
I suspect there's also other ways to end up with such additional files,
e.g. by crashing during the creation of a new relation.
A few less-serious problems with the patch:
- We don't have an incremental JSON parser, so if you have a
backup_manifest>1GB, pg_basebackup --incremental is going to fail.
That's also true of the existing code in pg_verifybackup, and for the
same reason. I talked to Andrew Dunstan at one point about adapting
our JSON parser to support incremental parsing, and he had a patch for
that, but I think he found some problems with it and I'm not sure what
the current status is.
As a stopgap measure, can't we just use the relevant flag to allow larger
allocations?
- The patch does support differential backup, aka an incremental atop
another incremental. There's no particular limit to how long a chain
of backups can be. However, pg_combinebackup currently requires that
the first backup is a full backup and all the later ones are
incremental backups. So if you have a full backup a and an incremental
backup b and a differential backup c, you can combine a b and c to get
a full backup equivalent to one you would have gotten if you had taken
a full backup at the time you took c. However, you can't combine b and
c with each other without combining them with a, and that might be
desirable in some situations. You might want to collapse a bunch of
older differential backups into a single one that covers the whole
time range of all of them. I think that the file format can support
that, but the tool is currently too dumb.
That seems like a feature for the future...
- We only know how to operate on directories, not tar files. I thought
about that when working on pg_verifybackup as well, but I didn't do
anything about it. It would be nice to go back and make that tool work
on tar-format backups, and this one, too. I don't think there would be
a whole lot of point trying to operate on compressed tar files because
you need random access and that seems hard on a compressed file, but
on uncompressed files it seems at least theoretically doable. I'm not
sure whether anyone would care that much about this, though, even
though it does sound pretty cool.
I don't know the tar format well, but my understanding is that it doesn't have
a "central metadata" portion. I.e. doing something like this would entail
scanning the tar file sequentially, skipping file contents? And wouldn't you
have to create an entirely new tar file for the modified output? That kind of
makes it not so incremental ;)
IOW, I'm not sure it's worth bothering about this ever, and certainly doesn't
seem worth bothering about now. But I might just be missing something.
Greetings,
Andres Freund
On Wed, Jun 14, 2023 at 3:47 PM Andres Freund <andres@anarazel.de> wrote:
I assume this is "solely" required for keeping the incremental backups as
small as possible, rather than being required for correctness?
I believe so. I want to spend some more time thinking about this to
make sure I'm not missing anything.
Could we just recompute the WAL summary for the [redo, end of chunk] for the
relevant summary file?
I'm not understanding how that would help. If we were going to compute
a WAL summary on the fly rather than waiting for one to show up on
disk, what we'd want is [end of last WAL summary that does exist on
disk, redo]. But I'm not sure that's a great approach, because that
LSN gap might be large and then we're duplicating a lot of work that
the summarizer has probably already done most of.
FWIW, I like the idea of a special WAL record at that point, independent of
this feature. It wouldn't be a meaningful overhead compared to the cost of a
checkpoint, and it seems like it'd be quite useful for debugging. But I can
see uses going beyond that - we occasionally have been discussing associating
additional data with redo points, and that'd be a lot easier to deal with
during recovery with such a record.I don't really see a performance and concurrency angle right now - what are
you wondering about?
I'm not really sure. I expect Dilip would be happy to post his patch,
and if you'd be willing to have a look at it and express your concerns
or lack thereof, that would be super valuable.
Another thing that I'm not too sure about is: what happens if we find
a relation file on disk that doesn't appear in the backup_manifest for
the previous backup and isn't mentioned in the WAL summaries either?Wouldn't that commonly happen for unlogged relations at least?
I suspect there's also other ways to end up with such additional files,
e.g. by crashing during the creation of a new relation.
Yeah, this needs some more careful thought.
A few less-serious problems with the patch:
- We don't have an incremental JSON parser, so if you have a
backup_manifest>1GB, pg_basebackup --incremental is going to fail.
That's also true of the existing code in pg_verifybackup, and for the
same reason. I talked to Andrew Dunstan at one point about adapting
our JSON parser to support incremental parsing, and he had a patch for
that, but I think he found some problems with it and I'm not sure what
the current status is.As a stopgap measure, can't we just use the relevant flag to allow larger
allocations?
I'm not sure that's a good idea, but theoretically, yes. We can also
just choose to accept the limitation that your data directory can't be
too darn big if you want to use this feature. But getting incremental
JSON parsing would be better.
Not having the manifest in JSON would be an even better solution, but
regrettably I did not win that argument.
That seems like a feature for the future...
Sure.
I don't know the tar format well, but my understanding is that it doesn't have
a "central metadata" portion. I.e. doing something like this would entail
scanning the tar file sequentially, skipping file contents? And wouldn't you
have to create an entirely new tar file for the modified output? That kind of
makes it not so incremental ;)IOW, I'm not sure it's worth bothering about this ever, and certainly doesn't
seem worth bothering about now. But I might just be missing something.
Oh, yeah, it's just an idle thought. I'll get to it when I get to it,
or else I won't.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, 14 Jun 2023 at 20:47, Robert Haas <robertmhaas@gmail.com> wrote:
A few years ago, I sketched out a design for incremental backup, but
no patch for incremental backup ever got committed. Instead, the whole
thing evolved into a project to add backup manifests, which are nice,
but not as nice as incremental backup would be. So I've decided to
have another go at incremental backup itself. Attached are some WIP
patches.
Nice, I like this idea.
Let me summarize the design and some open questions and
problems with it that I've discovered. I welcome problem reports and
test results from others, as well.
Skimming through the 7th patch, I see claims that FSM is not fully
WAL-logged and thus shouldn't be tracked, and so it indeed doesn't
track those changes.
I disagree with that decision: we now have support for custom resource
managers, which may use the various forks for other purposes than
those used in PostgreSQL right now. It would be a shame if data is
lost because of the backup tool ignoring forks because the PostgreSQL
project itself doesn't have post-recovery consistency guarantees in
that fork. So, unless we document that WAL-logged changes in the FSM
fork are actually not recoverable from backup, regardless of the type
of contents, we should still keep track of the changes in the FSM fork
and include the fork in our backups or only exclude those FSM updates
that we know are safe to ignore.
Kind regards,
Matthias van de Meent
Neon, Inc.
Hi,
On 2023-06-14 16:10:38 -0400, Robert Haas wrote:
On Wed, Jun 14, 2023 at 3:47 PM Andres Freund <andres@anarazel.de> wrote:
Could we just recompute the WAL summary for the [redo, end of chunk] for the
relevant summary file?I'm not understanding how that would help. If we were going to compute
a WAL summary on the fly rather than waiting for one to show up on
disk, what we'd want is [end of last WAL summary that does exist on
disk, redo].
Oh, right.
But I'm not sure that's a great approach, because that LSN gap might be
large and then we're duplicating a lot of work that the summarizer has
probably already done most of.
I guess that really depends on what the summary granularity is. If you create
a separate summary every 32MB or so, recomputing just the required range
shouldn't be too bad.
FWIW, I like the idea of a special WAL record at that point, independent of
this feature. It wouldn't be a meaningful overhead compared to the cost of a
checkpoint, and it seems like it'd be quite useful for debugging. But I can
see uses going beyond that - we occasionally have been discussing associating
additional data with redo points, and that'd be a lot easier to deal with
during recovery with such a record.I don't really see a performance and concurrency angle right now - what are
you wondering about?I'm not really sure. I expect Dilip would be happy to post his patch,
and if you'd be willing to have a look at it and express your concerns
or lack thereof, that would be super valuable.
Will do. Adding me to CC: might help, I have a backlog unfortunately :(.
Greetings,
Andres Freund
On Thu, Jun 15, 2023 at 2:11 AM Andres Freund <andres@anarazel.de> wrote:
I'm not really sure. I expect Dilip would be happy to post his patch,
and if you'd be willing to have a look at it and express your concerns
or lack thereof, that would be super valuable.Will do. Adding me to CC: might help, I have a backlog unfortunately :(.
Thanks, I have posted it here[1]/messages/by-id/CAFiTN-s-K=mVA=HPr_VoU-5bvyLQpNeuzjq1ebPJMEfCJZKFsg@mail.gmail.com
[1]: /messages/by-id/CAFiTN-s-K=mVA=HPr_VoU-5bvyLQpNeuzjq1ebPJMEfCJZKFsg@mail.gmail.com
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 14, 2023 at 4:40 PM Andres Freund <andres@anarazel.de> wrote:
But I'm not sure that's a great approach, because that LSN gap might be
large and then we're duplicating a lot of work that the summarizer has
probably already done most of.I guess that really depends on what the summary granularity is. If you create
a separate summary every 32MB or so, recomputing just the required range
shouldn't be too bad.
Yeah, but I don't think that's the right approach, for two reasons.
First, one of the things I'm rather worried about is what happens when
the WAL distance between the prior backup and the incremental backup
is large. It could be a terabyte. If we have a WAL summary for every
32MB of WAL, that's 32k files we have to read, and I'm concerned
that's too many. Maybe it isn't, but it's something that has really
been weighing on my mind as I've been thinking through the design
questions here. The files are really very small, and having to open a
bazillion tiny little files to get the job done sounds lame. Second, I
don't see what problem it actually solves. Why not just signal the
summarizer to write out the accumulated data to a file instead of
re-doing the work ourselves? Or else adopt the
WAL-record-at-the-redo-pointer approach, and then the whole thing is
moot?
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On 2023-06-19 09:46:12 -0400, Robert Haas wrote:
On Wed, Jun 14, 2023 at 4:40 PM Andres Freund <andres@anarazel.de> wrote:
But I'm not sure that's a great approach, because that LSN gap might be
large and then we're duplicating a lot of work that the summarizer has
probably already done most of.I guess that really depends on what the summary granularity is. If you create
a separate summary every 32MB or so, recomputing just the required range
shouldn't be too bad.Yeah, but I don't think that's the right approach, for two reasons.
First, one of the things I'm rather worried about is what happens when
the WAL distance between the prior backup and the incremental backup
is large. It could be a terabyte. If we have a WAL summary for every
32MB of WAL, that's 32k files we have to read, and I'm concerned
that's too many. Maybe it isn't, but it's something that has really
been weighing on my mind as I've been thinking through the design
questions here.
It doesn't have to be a separate file - you could easily summarize ranges
at a higher granularity, storing multiple ranges into a single file with a
coarser naming pattern.
The files are really very small, and having to open a bazillion tiny little
files to get the job done sounds lame. Second, I don't see what problem it
actually solves. Why not just signal the summarizer to write out the
accumulated data to a file instead of re-doing the work ourselves? Or else
adopt the WAL-record-at-the-redo-pointer approach, and then the whole thing
is moot?
The one point for a relatively grainy summarization scheme that I see is that
it would pave the way for using the WAL summary data for other purposes in the
future. That could be done orthogonal to the other solutions to the redo
pointer issues.
Other potential use cases:
- only restore parts of a base backup that aren't going to be overwritten by
WAL replay
- reconstructing database contents from WAL after data loss
- more efficient pg_rewind
- more efficient prefetching during WAL replay
Greetings,
Andres Freund
Hi,
In the limited time that I've had to work on this project lately, I've
been trying to come up with a test case for this feature -- and since
I've gotten completely stuck, I thought it might be time to post and
see if anyone else has a better idea. I thought a reasonable test case
would be: Do a full backup. Change some stuff. Do an incremental
backup. Restore both backups and perform replay to the same LSN. Then
compare the files on disk. But I cannot make this work. The first
problem I ran into was that replay of the full backup does a
restartpoint, while the replay of the incremental backup does not.
That results in, for example, pg_subtrans having different contents.
I'm not sure whether it can also result in data files having different
contents: are changes that we replayed following the last restartpoint
guaranteed to end up on disk when the server is shut down? It wasn't
clear to me that this is the case. I thought maybe I could get both
servers to perform a restartpoint at the same location by shutting
down the primary and then replaying through the shutdown checkpoint,
but that doesn't work because the primary doesn't finish archiving
before shutting down. After some more fiddling I settled (at least for
research purposes) on having the restored backups PITR and promote,
instead of PITR and pause, so that we're guaranteed a checkpoint. But
that just caused me to run into a far worse problem: replay on the
standby doesn't actually create a state that is byte-for-byte
identical to the one that exists on the primary. I quickly discovered
that in my test case, I was ending up with different contents in the
"hole" of a block wherein a tuple got updated. Replay doesn't think
it's important to make the hole end up with the same contents on all
machines that replay the WAL, so I end up with one server that has
more junk in there than the other one and the tests fail.
Unless someone has a brilliant idea that I lack, this suggests to me
that this whole line of testing is a dead end. I can, of course, write
tests that compare clusters *logically* -- do the correct relations
exist, are they accessible, do they have the right contents? But I
feel like it would be easy to have bugs that escape detection in such
a test but would be detected by a physical comparison of the clusters.
However, such a comparison can only be conducted if either (a) there's
some way to set up the test so that byte-for-byte identical clusters
can be expected or (b) there's some way to perform the comparison that
can distinguish between expected, harmless differences and unexpected,
problematic differences. And at the moment my conclusion is that
neither (a) nor (b) exists. Does anyone think otherwise?
Meanwhile, here's a rebased set of patches. The somewhat-primitive
attempts at writing tests are in 0009, but they don't work, for the
reasons explained above. I think I'd probably like to go ahead and
commit 0001 and 0002 soon if there are no objections, since I think
those are good refactorings independently of the rest of this.
...Robert
Attachments:
0004-Refactor-parse_filename_for_nontemp_relation-to-pars.patchapplication/octet-stream; name=0004-Refactor-parse_filename_for_nontemp_relation-to-pars.patchDownload+93-65
0002-In-basebackup.c-refactor-to-create-read_file_data_in.patchapplication/octet-stream; name=0002-In-basebackup.c-refactor-to-create-read_file_data_in.patchDownload+133-99
0001-In-basebackup.c-refactor-to-create-verify_page_check.patchapplication/octet-stream; name=0001-In-basebackup.c-refactor-to-create-verify_page_check.patchDownload+104-85
0005-Change-how-a-base-backup-decides-which-files-have-ch.patchapplication/octet-stream; name=0005-Change-how-a-base-backup-decides-which-files-have-ch.patchDownload+56-118
0003-Change-struct-tablespaceinfo-s-oid-member-from-char-.patchapplication/octet-stream; name=0003-Change-struct-tablespaceinfo-s-oid-member-from-char-.patchDownload+49-30
0006-Move-src-bin-pg_verifybackup-parse_manifest.c-into-s.patchapplication/octet-stream; name=0006-Move-src-bin-pg_verifybackup-parse_manifest.c-into-s.patchDownload+6-7
0008-Add-new-pg_walsummary-tool.patchapplication/octet-stream; name=0008-Add-new-pg_walsummary-tool.patchDownload+347-1
0009-Add-TAP-tests-this-is-broken-doesn-t-work.patchapplication/octet-stream; name=0009-Add-TAP-tests-this-is-broken-doesn-t-work.patchDownload+332-3
0007-Prototype-patch-for-incremental-and-differential-bac.patchapplication/octet-stream; name=0007-Prototype-patch-for-incremental-and-differential-bac.patchDownload+8421-70
Hi Robert,
On 8/30/23 10:49, Robert Haas wrote:
In the limited time that I've had to work on this project lately, I've
been trying to come up with a test case for this feature -- and since
I've gotten completely stuck, I thought it might be time to post and
see if anyone else has a better idea. I thought a reasonable test case
would be: Do a full backup. Change some stuff. Do an incremental
backup. Restore both backups and perform replay to the same LSN. Then
compare the files on disk. But I cannot make this work. The first
problem I ran into was that replay of the full backup does a
restartpoint, while the replay of the incremental backup does not.
That results in, for example, pg_subtrans having different contents.
pg_subtrans, at least, can be ignored since it is excluded from the
backup and not required for recovery.
I'm not sure whether it can also result in data files having different
contents: are changes that we replayed following the last restartpoint
guaranteed to end up on disk when the server is shut down? It wasn't
clear to me that this is the case. I thought maybe I could get both
servers to perform a restartpoint at the same location by shutting
down the primary and then replaying through the shutdown checkpoint,
but that doesn't work because the primary doesn't finish archiving
before shutting down. After some more fiddling I settled (at least for
research purposes) on having the restored backups PITR and promote,
instead of PITR and pause, so that we're guaranteed a checkpoint. But
that just caused me to run into a far worse problem: replay on the
standby doesn't actually create a state that is byte-for-byte
identical to the one that exists on the primary. I quickly discovered
that in my test case, I was ending up with different contents in the
"hole" of a block wherein a tuple got updated. Replay doesn't think
it's important to make the hole end up with the same contents on all
machines that replay the WAL, so I end up with one server that has
more junk in there than the other one and the tests fail.
This is pretty much what I discovered when investigating backup from
standby back in 2016. My (ultimately unsuccessful) efforts to find a
clean delta resulted in [1]http://git.postgresql.org/pg/commitdiff/6ad8ac6026287e3ccbc4d606b6ab6116ccc0eec8 as I systematically excluded directories
that are not required for recovery and will not be synced between a
primary and standby.
FWIW Heikki also made similar attempts at this before me (back then I
found the thread but I doubt I could find it again) and arrived at
similar results. We discussed this in person and figured out that we had
come to more or less the same conclusion. Welcome to the club!
Unless someone has a brilliant idea that I lack, this suggests to me
that this whole line of testing is a dead end. I can, of course, write
tests that compare clusters *logically* -- do the correct relations
exist, are they accessible, do they have the right contents? But I
feel like it would be easy to have bugs that escape detection in such
a test but would be detected by a physical comparison of the clusters.
Agreed, though a matching logical result is still very compelling.
However, such a comparison can only be conducted if either (a) there's
some way to set up the test so that byte-for-byte identical clusters
can be expected or (b) there's some way to perform the comparison that
can distinguish between expected, harmless differences and unexpected,
problematic differences. And at the moment my conclusion is that
neither (a) nor (b) exists. Does anyone think otherwise?
I do not. My conclusion back then was that validating a physical
comparison would be nearly impossible without changes to Postgres to
make the primary and standby match via replication. Which, by the way, I
still think would be a great idea. In principle, at least. Replay is
already a major bottleneck and anything that makes it slower will likely
not be very popular.
This would also be great for WAL, since last time I tested the same WAL
segment can be different between the primary and standby because the
unused (and recycled) portion at the end is not zeroed as it is on the
primary (but logically they do match). I would be very happy if somebody
told me that my info is out of date here and this has been fixed. But
when I looked at the code it was incredibly tricky to do this because of
how WAL is replicated.
Meanwhile, here's a rebased set of patches. The somewhat-primitive
attempts at writing tests are in 0009, but they don't work, for the
reasons explained above. I think I'd probably like to go ahead and
commit 0001 and 0002 soon if there are no objections, since I think
those are good refactorings independently of the rest of this.
No objections to 0001/0002.
Regards,
-David
[1]: http://git.postgresql.org/pg/commitdiff/6ad8ac6026287e3ccbc4d606b6ab6116ccc0eec8
http://git.postgresql.org/pg/commitdiff/6ad8ac6026287e3ccbc4d606b6ab6116ccc0eec8
Hey, thanks for the reply.
On Thu, Aug 31, 2023 at 6:50 PM David Steele <david@pgmasters.net> wrote:
pg_subtrans, at least, can be ignored since it is excluded from the
backup and not required for recovery.
I agree...
Welcome to the club!
Thanks for the welcome, but being a member feels *terrible*. :-)
I do not. My conclusion back then was that validating a physical
comparison would be nearly impossible without changes to Postgres to
make the primary and standby match via replication. Which, by the way, I
still think would be a great idea. In principle, at least. Replay is
already a major bottleneck and anything that makes it slower will likely
not be very popular.
Fair point. But maybe the bigger issue is the work involved. I don't
think zeroing the hole in all cases would likely be that expensive,
but finding everything that can cause the standby to diverge from the
primary and fixing all of it sounds like an unpleasant amount of
effort. Still, it's good to know that I'm not missing something
obvious.
No objections to 0001/0002.
Cool.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Aug 30, 2023 at 9:20 PM Robert Haas <robertmhaas@gmail.com> wrote:
Unless someone has a brilliant idea that I lack, this suggests to me
that this whole line of testing is a dead end. I can, of course, write
tests that compare clusters *logically* -- do the correct relations
exist, are they accessible, do they have the right contents?
Can't we think of comparing at the block level, like we can compare
each block but ignore the content of the hole?
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Mon, Sep 4, 2023 at 8:42 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
Can't we think of comparing at the block level, like we can compare
each block but ignore the content of the hole?
We could do that, but I don't think that's a full solution. I think
I'd end up having to reimplement the equivalent of heap_mask,
btree_mask, et. al. in Perl, which doesn't seem very reasonable. It's
fairly complicated logic even written in C, and doing the right thing
in Perl would be more complex, I think, because it wouldn't have
access to all the same #defines which depend on things like word size
and Endianness and stuff. If we want to allow this sort of comparison,
I feel we should think of changing the C code in some way to make it
work reliably rather than try to paper over the problems in Perl.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Aug 30, 2023 at 9:20 PM Robert Haas <robertmhaas@gmail.com> wrote:
Meanwhile, here's a rebased set of patches. The somewhat-primitive
attempts at writing tests are in 0009, but they don't work, for the
reasons explained above. I think I'd probably like to go ahead and
commit 0001 and 0002 soon if there are no objections, since I think
those are good refactorings independently of the rest of this.
I have started reading the patch today, I haven't yet completed one
pass but here are my comments in 0007
1.
+ BlockNumber relative_block_numbers[RELSEG_SIZE];
This is close to 400kB of memory, so I think it is better we palloc it
instead of keeping it in the stack.
2.
/*
* Try to parse the directory name as an unsigned integer.
*
- * Tablespace directories should be positive integers that can
- * be represented in 32 bits, with no leading zeroes or trailing
+ * Tablespace directories should be positive integers that can be
+ * represented in 32 bits, with no leading zeroes or trailing
* garbage. If we come across a name that doesn't meet those
* criteria, skip it.
Unrelated code refactoring hunk
3.
+typedef struct
+{
+ const char *filename;
+ pg_checksum_context *checksum_ctx;
+ bbsink *sink;
+ size_t bytes_sent;
+} FileChunkContext;
This structure is not used anywhere.
4.
+ * If the file is to be set incrementally, then num_incremental_blocks
+ * should be the number of blocks to be sent, and incremental_blocks
/If the file is to be set incrementally/If the file is to be sent incrementally
5.
- while (bytes_done < statbuf->st_size)
+ while (1)
{
- size_t remaining = statbuf->st_size - bytes_done;
+ /*
I do not really like this change, because after removing this you have
put 2 independent checks for sending the full file[1]+ /* + * If we've read the required number of bytes, then it's time to + * stop. + */ + if (bytes_done >= statbuf->st_size) + break; and sending it
incrementally[1]+ /* + * If we've read the required number of bytes, then it's time to + * stop. + */ + if (bytes_done >= statbuf->st_size) + break;. Actually for sending incrementally
'statbuf->st_size' is computed from the 'num_incremental_blocks'
itself so why don't we keep this breaking condition in the while loop
itself? So that we can avoid these two separate conditions.
[1]
+ /*
+ * If we've read the required number of bytes, then it's time to
+ * stop.
+ */
+ if (bytes_done >= statbuf->st_size)
+ break;
[2]
+ /*
+ * If we've read all the blocks, then it's time to stop.
+ */
+ if (ibindex >= num_incremental_blocks)
+ break;
6.
+typedef struct
+{
+ TimeLineID tli;
+ XLogRecPtr start_lsn;
+ XLogRecPtr end_lsn;
+} backup_wal_range;
+
+typedef struct
+{
+ uint32 status;
+ const char *path;
+ size_t size;
+} backup_file_entry;
Better we add some comments for these structures.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Aug 30, 2023 at 4:50 PM Robert Haas <robertmhaas@gmail.com> wrote:
[..]
I've played a little bit more this second batch of patches on
e8d74ad625f7344f6b715254d3869663c1569a51 @ 31Aug (days before wait
events refactor):
test_across_wallevelminimal.sh
test_many_incrementals_dbcreate.sh
test_many_incrementals.sh
test_multixact.sh
test_pending_2pc.sh
test_reindex_and_vacuum_full.sh
test_truncaterollback.sh
test_unlogged_table.sh
all those basic tests had GOOD results. Please find attached. I'll try
to schedule some more realistic (in terms of workload and sizes) test
in a couple of days + maybe have some fun with cross-backup-and
restores across standbys. As per earlier doubt: raw wal_level =
minimal situation, shouldn't be a concern, sadly because it requires
max_wal_senders==0, while pg_basebackup requires it above 0 (due to
"FATAL: number of requested standby connections exceeds
max_wal_senders (currently 0)").
I wanted to also introduce corruption onto pg_walsummaries files, but
later saw in code that is already covered with CRC32, cool.
In v07:
+#define MINIMUM_VERSION_FOR_WAL_SUMMARIES 160000
170000 ?
A related design question is whether we should really be sending the
whole backup manifest to the server at all. If it turns out that we
don't really need anything except for the LSN of the previous backup,
we could send that one piece of information instead of everything. On
the other hand, if we need the list of files from the previous backup,
then sending the whole manifest makes sense.
If that is still an area open for discussion: wouldn't it be better to
just specify LSN as it would allow resyncing standby across major lag
where the WAL to replay would be enormous? Given that we had
primary->standby where standby would be stuck on some LSN, right now
it would be:
1) calculate backup manifest of desynced 10TB standby (how? using
which tool?) - even if possible, that means reading 10TB of data
instead of just putting a number, isn't it?
2) backup primary with such incremental backup >= LSN
3) copy the incremental backup to standby
4) apply it to the impaired standby
5) restart the WAL replay
- We only know how to operate on directories, not tar files. I thought
about that when working on pg_verifybackup as well, but I didn't do
anything about it. It would be nice to go back and make that tool work
on tar-format backups, and this one, too. I don't think there would be
a whole lot of point trying to operate on compressed tar files because
you need random access and that seems hard on a compressed file, but
on uncompressed files it seems at least theoretically doable. I'm not
sure whether anyone would care that much about this, though, even
though it does sound pretty cool.
Also maybe it's too early to ask, but wouldn't it be nice if we could
have an future option in pg_combinebackup to avoid double writes when
used from restore hosts (right now we need to first to reconstruct the
original datadir from full and incremental backups on host hosting
backups and then TRANSFER it again and on target host?). So something
like that could work well from restorehost: pg_combinebackup
/tmp/backup1 /tmp/incbackup2 /tmp/incbackup3 -O tar -o - | ssh
dbserver 'tar xvf -C /path/to/restored/cluster - ' . The bad thing is
that such a pipe prevents parallelism from day 1 and I'm afraid I do
not have a better easy idea on how to have both at the same time in
the long term.
-J.
Attachments:
On Fri, Sep 1, 2023 at 10:30 AM Robert Haas <robertmhaas@gmail.com> wrote:
No objections to 0001/0002.
Cool.
Nobody else objected either, so I went ahead and committed those. I'll
rebase the rest of the patches on top of the latest master and repost,
hopefully after addressing some of the other review comments from
Dilip and Jakub.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Sep 12, 2023 at 5:56 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
+ BlockNumber relative_block_numbers[RELSEG_SIZE];
This is close to 400kB of memory, so I think it is better we palloc it
instead of keeping it in the stack.
Fixed.
Unrelated code refactoring hunk
Fixed.
This structure is not used anywhere.
Removed.
/If the file is to be set incrementally/If the file is to be sent incrementally
Fixed.
I do not really like this change, because after removing this you have
put 2 independent checks for sending the full file[1] and sending it
incrementally[1]. Actually for sending incrementally
'statbuf->st_size' is computed from the 'num_incremental_blocks'
itself so why don't we keep this breaking condition in the while loop
itself? So that we can avoid these two separate conditions.
I don't think that would be correct. The number of bytes that need to
be read from the original file is not equal to the number of bytes
that will be written to the incremental file. Admittedly, they're
currently different by less than a block, but that could change if we
change the format of the incremental file (e.g. suppose we compressed
the blocks in the incremental file with gzip, or smushed out the holes
in the pages). I wrote the loop as I did precisely so that the two
cases could have different loop exit conditions.
Better we add some comments for these structures.
Done.
Here's a new patch set, also addressing Jakub's observation that
MINIMUM_VERSION_FOR_WAL_SUMMARIES needed updating.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v3-0002-Refactor-parse_filename_for_nontemp_relation-to-p.patchapplication/octet-stream; name=v3-0002-Refactor-parse_filename_for_nontemp_relation-to-p.patchDownload+93-65
v3-0004-Move-src-bin-pg_verifybackup-parse_manifest.c-int.patchapplication/octet-stream; name=v3-0004-Move-src-bin-pg_verifybackup-parse_manifest.c-int.patchDownload+6-7
v3-0001-Change-struct-tablespaceinfo-s-oid-member-from-ch.patchapplication/octet-stream; name=v3-0001-Change-struct-tablespaceinfo-s-oid-member-from-ch.patchDownload+49-30
v3-0003-Change-how-a-base-backup-decides-which-files-have.patchapplication/octet-stream; name=v3-0003-Change-how-a-base-backup-decides-which-files-have.patchDownload+55-118
v3-0005-Prototype-patch-for-incremental-and-differential-.patchapplication/octet-stream; name=v3-0005-Prototype-patch-for-incremental-and-differential-.patchDownload+8429-61
v3-0006-Add-new-pg_walsummary-tool.patchapplication/octet-stream; name=v3-0006-Add-new-pg_walsummary-tool.patchDownload+347-1
v3-0007-Add-TAP-tests-this-is-broken-doesn-t-work.patchapplication/octet-stream; name=v3-0007-Add-TAP-tests-this-is-broken-doesn-t-work.patchDownload+333-3
On Thu, Sep 28, 2023 at 6:22 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
all those basic tests had GOOD results. Please find attached. I'll try
to schedule some more realistic (in terms of workload and sizes) test
in a couple of days + maybe have some fun with cross-backup-and
restores across standbys.
That's awesome! Thanks for testing! This can definitely benefit from
any amount of beating on it that people wish to do. It's a complex,
delicate area that risks data loss.
If that is still an area open for discussion: wouldn't it be better to
just specify LSN as it would allow resyncing standby across major lag
where the WAL to replay would be enormous? Given that we had
primary->standby where standby would be stuck on some LSN, right now
it would be:
1) calculate backup manifest of desynced 10TB standby (how? using
which tool?) - even if possible, that means reading 10TB of data
instead of just putting a number, isn't it?
2) backup primary with such incremental backup >= LSN
3) copy the incremental backup to standby
4) apply it to the impaired standby
5) restart the WAL replay
Hmm. I wonder if this would even be a safe procedure. I admit that I
can't quite see a problem with it, but sometimes I'm kind of dumb.
Also maybe it's too early to ask, but wouldn't it be nice if we could
have an future option in pg_combinebackup to avoid double writes when
used from restore hosts (right now we need to first to reconstruct the
original datadir from full and incremental backups on host hosting
backups and then TRANSFER it again and on target host?). So something
like that could work well from restorehost: pg_combinebackup
/tmp/backup1 /tmp/incbackup2 /tmp/incbackup3 -O tar -o - | ssh
dbserver 'tar xvf -C /path/to/restored/cluster - ' . The bad thing is
that such a pipe prevents parallelism from day 1 and I'm afraid I do
not have a better easy idea on how to have both at the same time in
the long term.
I don't think it's too early to ask for this, but I do think it's too
early for you to get it. ;-)
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Oct 3, 2023 at 2:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
Here's a new patch set, also addressing Jakub's observation that
MINIMUM_VERSION_FOR_WAL_SUMMARIES needed updating.
Here's yet another new version. In this version, I reversed the order
of the first two patches, with the idea that what's now 0001 seems
fairly reasonable as an independent commit, and could thus perhaps be
committed sometime soon-ish. In the main patch, I added SGML
documentation for pg_combinebackup. I also fixed the broken TAP tests
so that they work, by basing them on pg_dump equivalence rather than
file-level equivalence. I'm sad to give up on testing the latter, but
it seems to be unrealistic. I cleaned up a few other odds and ends,
too. But, what exactly is the bigger picture for this patch in terms
of moving forward? Here's a list of things that are on my mind:
- I'd like to get the patch to mark the redo point in the WAL
committed[1]/messages/by-id/CA+TgmoZAM24Ub=uxP0aWuWstNYTUJQ64j976FYJeVaMJ+qD0uw@mail.gmail.com and then reword this patch set to make use of that
infrastructure. Right now, we make a best effort to end WAL summaries
at redo point boundaries, but it's racey, and sometimes we fail to do
so. In theory that just has the effect of potentially making an
incremental backup contain some extra blocks that it shouldn't really
need to contain, but I think it can actually lead to weird stalls,
because when an incremental backup is taken, we have to wait until a
WAL summary shows up that extends at least up to the start LSN of the
backup we're about to take. I believe all the logic in this area can
be made a good deal simpler and more reliable if that patch gets
committed and this one reworked accordingly.
- I would like some feedback on the generation of WAL summary files.
Right now, I have it enabled by default, and summaries are kept for a
week. That means that, with no additional setup, you can take an
incremental backup as long as the reference backup was taken in the
last week. File removal is governed by mtimes, so if you change the
mtimes of your summary files or whack your system clock around, weird
things might happen. But obviously this might be inconvenient. Some
people might not want WAL summary files to be generated at all because
they don't care about incremental backup, and other people might want
them retained for longer, and still other people might want them to be
not removed automatically or removed automatically based on some
criteria other than mtime. I don't really know what's best here. I
don't think the default policy that the patches implement is
especially terrible, but it's just something that I made up and I
don't have any real confidence that it's wonderful. One point to be
consider here is that, if WAL summarization is enabled, checkpoints
can't remove WAL that isn't summarized yet. Mostly that's not a
problem, I think, because the WAL summarizer is pretty fast. But it
could increase disk consumption for some people. I don't think that we
need to worry about the summaries themselves being a problem in terms
of space consumption; at least in all the cases I've tested, they're
just not very big.
- On a related note, I haven't yet tested this on a standby, which is
a thing that I definitely need to do. I don't know of a reason why it
shouldn't be possible for all of this machinery to work on a standby
just as it does on a primary, but then we need the WAL summarizer to
run there too, which could end up being a waste if nobody ever tries
to take an incremental backup. I wonder how that should be reflected
in the configuration. We could do something like what we've done for
archive_mode, where on means "only on if this is a primary" and you
have to say always if you want it to run on standbys as well ... but
I'm not sure if that's a design pattern that we really want to
replicate into more places. I'd be somewhat inclined to just make
whatever configuration parameters we need to configure this thing on
the primary also work on standbys, and you can set each server up as
you please. But I'm open to other suggestions.
- We need to settle the question of whether to send the whole backup
manifest to the server or just the LSN. In a previous attempt at
incremental backup, we decided the whole manifest was necessary,
because flat-copying files could make new data show up with old LSNs.
But that version of the patch set was trying to find modified blocks
by checking their LSNs individually, not by summarizing WAL. And since
the operations that flat-copy files are WAL-logged, the WAL summary
approach seems to eliminate that problem - maybe an LSN (and the
associated TLI) is good enough now. This also relates to Jakub's
question about whether this machinery could be used to fast-forward a
standby, which is not exactly a base backup but ... perhaps close
enough? I'm somewhat inclined to believe that we can simplify to an
LSN and TLI; however, if we do that, then we'll have big problems if
later we realize that we want the manifest for something after all. So
if anybody thinks that there's a reason to keep doing what the patch
does today -- namely, upload the whole manifest to the server --
please speak up.
- It's regrettable that we don't have incremental JSON parsing; I
think that means anyone who has a backup manifest that is bigger than
1GB can't use this feature. However, that's also a problem for the
existing backup manifest feature, and as far as I can see, we have no
complaints about it. So maybe people just don't have databases with
enough relations for that to be much of a live issue yet. I'm inclined
to treat this as a non-blocker, although Andrew Dunstan tells me he
does have a prototype for incremental JSON parsing so maybe that will
land and we can use it here.
- Right now, I have a hard-coded 60 second timeout for WAL
summarization. If you try to take an incremental backup and the WAL
summaries you need don't show up within 60 seconds, the backup times
out. I think that's a reasonable default, but should it be
configurable? If yes, should that be a GUC or, perhaps better, a
pg_basebackup option?
- I'm curious what people think about the pg_walsummary tool that is
included in 0006. I think it's going to be fairly important for
debugging, but it does feel a little bit bad to add a new binary for
something pretty niche. Nevertheless, merging it into any other
utility seems relatively awkward, so I'm inclined to think both that
this should be included in whatever finally gets committed and that it
should be a separate binary. I considered whether it should go in
contrib, but we seem to have moved to a policy that heavily favors
limiting contrib to extensions and loadable modules, rather than
binaries.
Clearly there's a good amount of stuff to sort out here, but we've
still got quite a bit of time left before feature freeze so I'd like
to have a go at it. Please let me know your thoughts, if you have any.
[1]: /messages/by-id/CA+TgmoZAM24Ub=uxP0aWuWstNYTUJQ64j976FYJeVaMJ+qD0uw@mail.gmail.com
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachments:
v4-0006-Add-new-pg_walsummary-tool.patchapplication/octet-stream; name=v4-0006-Add-new-pg_walsummary-tool.patchDownload+347-1
v4-0001-Refactor-parse_filename_for_nontemp_relation-to-p.patchapplication/octet-stream; name=v4-0001-Refactor-parse_filename_for_nontemp_relation-to-p.patchDownload+93-65
v4-0003-Change-how-a-base-backup-decides-which-files-have.patchapplication/octet-stream; name=v4-0003-Change-how-a-base-backup-decides-which-files-have.patchDownload+55-118
v4-0004-Move-src-bin-pg_verifybackup-parse_manifest.c-int.patchapplication/octet-stream; name=v4-0004-Move-src-bin-pg_verifybackup-parse_manifest.c-int.patchDownload+6-7
v4-0005-Prototype-patch-for-incremental-and-differential-.patchapplication/octet-stream; name=v4-0005-Prototype-patch-for-incremental-and-differential-.patchDownload+8899-68
v4-0002-Change-struct-tablespaceinfo-s-oid-member-from-ch.patchapplication/octet-stream; name=v4-0002-Change-struct-tablespaceinfo-s-oid-member-from-ch.patchDownload+49-30
On Wed, Oct 4, 2023 at 4:08 PM Robert Haas <robertmhaas@gmail.com> wrote:
Clearly there's a good amount of stuff to sort out here, but we've
still got quite a bit of time left before feature freeze so I'd like
to have a go at it. Please let me know your thoughts, if you have any.
Apparently, nobody has any thoughts, but here's an updated patch set
anyway. The main change, other than rebasing, is that I did a bunch
more documentation work on the main patch (0005). I'm much happier
with it now, although I expect it may need more adjustments here and
there as outstanding design questions get settled.
After some thought, I think that it should be fine to commit 0001 and
0002 as independent refactoring patches, and I plan to go ahead and do
that pretty soon unless somebody objects.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com