directory archive format for pg_dump
This is the first of two patches for parallel pg_dump. In particular, this
patch adds a new pg_dump archive type which can save pg_dump data to a
directory, with each table/blob being a file so that several processes can
write to different files in parallel.
Since the compression is currently all down in the custom format backup
code,
the first thing I've done was refactoring the compression functions into a
separate file. While at it, I have added support for liblzf compression.
Writing the backup to a directory brings the disadvantage that your backup
now
consists of a bunch of files and you should make sure not to lose files or
mix
files of different backup sets. Therefore, I have added a -k switch that
checks if a directory backup set is complete. To do this, every backup has a
different id (basically a random md5sum) which is copied into every file
(both
TOC and data files). The TOC also knows about the size of each data file and
can check if it has been truncated for some reason.
Regarding lzf compression, the last discussion was here:
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00442.php
I have included it to actually have multiple compression algorithms to build
a
framework for and to allow people to just compile and run it and see what
they
get. In my tests, when I run a backup with lzf compression, the postgres
backend is using 100% of one CPU and pg_dump is using 15% of another CPU.
Running with zlib however gives me 100% zlib and 70% postgres. Specifying
the
fastest zlib compression rate of 1 gives me 50% pg_dump and 100% postgres.
zlib
compression can be taken out of the code in like two minutes, it's all in
#ifdef's, so please see lzf just as an optional addition to the directory
patch
instead of as a main feature.
I am also submitting a WIP patch that shows the parallel version of pg_dump
which is a patch on top of this one. It is not completely ready yet but I am
releasing it as a WIP patch so you can see the overall picture and can play
with it already now. And hopefully I can get some feedback if I am going
into
the right direction.
There is a small shellscript included (test.sh) listing some of the
commands,
to give people a quick overview of how to call it.
Joachim
Attachments:
pg_dump-directory.difftext/x-patch; charset=US-ASCII; name=pg_dump-directory.diffDownload+3040-524
Hi,
Sharing some thoughts after a first round of reviewing, where I only had
time to read the patch itself.
Joachim Wieland <joe@mcknight.de> writes:
Since the compression is currently all down in the custom format
backup code,
the first thing I've done was refactoring the compression functions
into a
separate file. While at it, I have added support for liblzf
compression.
I think I'd like to see a separate patch for the new compression
support. Sorry about that, I realize that's extra work…
And it could be about personal preferences, but the way you added the
liblzf support strikes me at odd, with all those #ifdefs everywhere. Is
it possible to have a specific file for each supported compression
format, then some routing code in src/bin/pg_dump/compress_io.c?
The routing code already exists but then the file is full of #ifdef
sections to define the right supporting function when I think having a
compress_io_zlib and a compress_io_lzf files would be better.
Then there's the bulk of the new dump format feature in the other part
of the patch, namely src/bin/pg_dump/pg_backup_directory.c. You have to
update the copyright in the file header there, at least :)
I'm yet to devote more time on this part of the patch but it seems like
it's rewriting the full support without using the existing bits. That's
something I have to check, didn't have time to read the existing other
archive formats code there.
I'm hesitant as far as marking the patch "Waiting on author" to get it
split. Joachim, what do you think?
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Hi Dimitri and Joachim.
I've looked the patch too, and I want to share some thoughts too. I've
used http://wiki.postgresql.org/wiki/Reviewing_a_Patch to guide my
review.
Submission review:
I've apllied and compiled the patch successfully using the current master.
Usability review:
The dir format generated in my database 60 files, with different
sizes, and it looks very confusing. Is it possible to use the same
trick as pigz and pbzip2, creating a concatenated file of streams?
Feature test:
Just a partial review. I can dump / restore using lzf, but didnt
stress it hard to check robustness.
Performance review:
Didnt test it hard too, but looks ok.
Coding review:
Just a shallow review here.
I think I'd like to see a separate patch for the new compression
support. Sorry about that, I realize that's extra work…
Same feeling here, this is the 1st thing that I notice.
The md5.c and kwlookup.c reuse using a link doesn't look nice either.
This way you need to compile twice, among others things, but I think
that its temporary, right?
--
José Arthur Benetasso Villanova
Import Notes
Resolved by subject fallback
Excerpts from José Arthur Benetasso Villanova's message of vie nov 19 18:28:03 -0300 2010:
The md5.c and kwlookup.c reuse using a link doesn't look nice either.
This way you need to compile twice, among others things, but I think
that its temporary, right?
Not sure what you mean here, but kwlookup.c is a symlink without this
patch too. It's just the way it works; the compilation environments
here and in the backend are different, so there is no other option but
to compile twice. I guess md5.c is a new one (I didn't check), but I
would assume it's the same thing.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi Dimitri,
thanks for reviewing my patch!
On Fri, Nov 19, 2010 at 2:44 PM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
I think I'd like to see a separate patch for the new compression
support. Sorry about that, I realize that's extra work…
I guess it wouldn't be a very big deal but I also doubt that it makes
the review that much easier. Basically the compression refactor patch
would just touch pg_backup_custom.c (because this is the place where
the libz compression is currently burried into) and the two new
compress_io.(c|h) files. Everything else is pretty much the directory
stuff and is on top of these changes.
And it could be about personal preferences, but the way you added the
liblzf support strikes me at odd, with all those #ifdefs everywhere. Is
it possible to have a specific file for each supported compression
format, then some routing code in src/bin/pg_dump/compress_io.c?
Sure we could. But I wanted to wait with any fancy function pointer
stuff until we have decided if we want to include the liblzf support
at all. The #ifdefs might be a bit ugly but in case we do not include
liblzf support, it's the easiest way to take it out again. As written
in my introduction, this patch is not really about liblzf, liblzf is
just a proof of concept for factoring out the compression part and I
have included it, so that people can use it and see how much speed
improvement they get.
The routing code already exists but then the file is full of #ifdef
sections to define the right supporting function when I think having a
compress_io_zlib and a compress_io_lzf files would be better.
Sure! I completely agree...
Then there's the bulk of the new dump format feature in the other part
of the patch, namely src/bin/pg_dump/pg_backup_directory.c. You have to
update the copyright in the file header there, at least :)
Well, not sure if we can just change the copyright notice, because in
the end the structure was copied from one of the other files which all
have the copyright notice in them, so my work is based on those other
files...
I'm hesitant as far as marking the patch "Waiting on author" to get it
split. Joachim, what do you think?
I will see if I can split it.
Joachim
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
I think I'd like to see a separate patch for the new compression
support. Sorry about that, I realize that's extra work…
That part of the patch is likely to get rejected outright anyway,
so I *strongly* recommend splitting it out. We have generally resisted
adding random compression algorithms to pg_dump because of license and
patent considerations, and I see no reason to suppose this one is going
to pass muster.
regards, tom lane
On Fri, Nov 19, 2010 at 11:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:
I think I'd like to see a separate patch for the new compression
support. Sorry about that, I realize that's extra work…That part of the patch is likely to get rejected outright anyway,
so I *strongly* recommend splitting it out. We have generally resisted
adding random compression algorithms to pg_dump because of license and
patent considerations, and I see no reason to suppose this one is going
to pass muster.
I was already anticipating that possiblitiy and my inital patch description
is along these lines.
However, liblzf is BSD licensed so on the license side we should be fine.
Regarding patents, your last comment was that you'd like to see if it's
really worth it and so I have included support for lzf for anybody to go
ahead and find that out.
Will send an updated split up patch this weekend (which would actually be
four patches already...).
Joachim
Hi Jose,
2010/11/19 José Arthur Benetasso Villanova <jose.arthur@gmail.com>:
The dir format generated in my database 60 files, with different
sizes, and it looks very confusing. Is it possible to use the same
trick as pigz and pbzip2, creating a concatenated file of streams?
What pigz is parallelizing is the actual computation of the compressed
data. The directory archive format however is a preparation for a
parallel pg_dump, dumping several tables (especially large tables of
course) in parallel via multiple database connections and multiple
pg_dump frontends. The idea of multiplexing their output into one file
has been rejected on the grounds that it would probably slow down the
whole process.
Nevertheless pigz could be implemented as an alternative compression
algorithm and that way the custom and the directory archive format
could use it, but here as well, license and patent questions might be
in the way, even though it is based on libz.
The md5.c and kwlookup.c reuse using a link doesn't look nice either.
This way you need to compile twice, among others things, but I think
that its temporary, right?
No, it isn't. md5.c is used in the same way by e.g. libpq and there
are other examples for links in core, check out src/bin/psql for
example.
Joachim
On Fri, Nov 19, 2010 at 2:44 PM, Dimitri Fontaine
<dimitri@2ndquadrant.fr> wrote:
I think I'd like to see a separate patch for the new compression
support. Sorry about that, I realize that's extra work…
Attached are two patches building on top of each other. The first one
factors out the I/O routines (esp. libz) of pg_backup_custom.c into a
new file compress_io.c. This patch is without liblzf support now.
The second patch on top implements the new archive format of a directory.
Regarding the parallel part, I have been playing with Windows support
this weekend but I am still facing some issues (if anybody wants to
help who knows more about Windows programming than me, just let me
know). I will send the parallel patch and the liblzf part as two other
separate patches in the next few days.
Joachim
On 20.11.2010 06:10, Joachim Wieland wrote:
2010/11/19 Jos� Arthur Benetasso Villanova<jose.arthur@gmail.com>:
The md5.c and kwlookup.c reuse using a link doesn't look nice either.
This way you need to compile twice, among others things, but I think
that its temporary, right?No, it isn't. md5.c is used in the same way by e.g. libpq and there
are other examples for links in core, check out src/bin/psql for
example.
It seems like overkill to include md5 just for hashing the random bytes
that getRandomData() generates. And if random() doesn't produce unique
values, it's not going to get better by hashing it. How about using a
timestamp instead of the hash?
If you don't initialize random() with srandom(), BTW, it will always
return the same value.
But I'm not actually sure we should be preventing mix & match of files
from different dumps. It might be very useful to do just that sometimes,
like restoring a recent backup, with the contents of one table replaced
with older data. A warning would be ok, though.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
But I'm not actually sure we should be preventing mix & match of files
from different dumps. It might be very useful to do just that sometimes,
like restoring a recent backup, with the contents of one table replaced
with older data. A warning would be ok, though.
+1. This mechanism seems like a solution in search of a problem.
Just lose the whole thing, and instead fix pg_dump to complain if
the target directory isn't empty. That should be sufficient to guard
against accidental mixing of different dumps, and as Heikki says
there's not a good reason to prevent intentional mixing.
regards, tom lane
On 22.11.2010 19:07, Tom Lane wrote:
Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes:
But I'm not actually sure we should be preventing mix& match of files
from different dumps. It might be very useful to do just that sometimes,
like restoring a recent backup, with the contents of one table replaced
with older data. A warning would be ok, though.+1. This mechanism seems like a solution in search of a problem.
Just lose the whole thing, and instead fix pg_dump to complain if
the target directory isn't empty. That should be sufficient to guard
against accidental mixing of different dumps, and as Heikki says
there's not a good reason to prevent intentional mixing.
Extending that thought a bit, it would be nice if the per-file header
would carry the info if the file is compressed or not, instead of just
one such flag in the TOC. You could then also mix & match files from
compressed and non-compressed archives.
Other than the md5 thing, the patch looks fine to me. There's many quite
levels of indirection, it took me a while to get my head around the call
chains like DataDumper->_WriteData->WriteDataToArchive->_WriteBuf, but I
don't have any ideas on how to improve that.
However, docs are missing, so I'm marking this as "Waiting on Author".
There's some cosmetic changes I'd like to have fixed or do myself before
committing:
* wrap long lines
* use extern in function prototypes in header files
* "inline" some functions like _StartDataCompressor, _EndDataCompressor,
_DoInflate/_DoDeflate that aren't doing anything but call some other
function.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Nov 22, 2010 at 3:44 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
* wrap long lines
* use extern in function prototypes in header files
* "inline" some functions like _StartDataCompressor, _EndDataCompressor,
_DoInflate/_DoDeflate that aren't doing anything but call some other
function.
So here is a new round of patches. It turned out that the feature to
allow to also restore files from a different dump and with a different
compression required some changes in the compressor API. And in the
end I didn't like all the #ifdefs either and made a less #ifdef-rich
version using function pointers. The downside now is that I have
created quite a few one-line functions that Heikki doesn't like all
that much, but I assume that they are okay in this case on the grounds
that the public compressor interface is calling the private
implementation of a certain compressor.
Joachim
On 29.11.2010 07:11, Joachim Wieland wrote:
On Mon, Nov 22, 2010 at 3:44 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:* wrap long lines
* use extern in function prototypes in header files
* "inline" some functions like _StartDataCompressor, _EndDataCompressor,
_DoInflate/_DoDeflate that aren't doing anything but call some other
function.So here is a new round of patches. It turned out that the feature to
allow to also restore files from a different dump and with a different
compression required some changes in the compressor API. And in the
end I didn't like all the #ifdefs either and made a less #ifdef-rich
version using function pointers. The downside now is that I have
created quite a few one-line functions that Heikki doesn't like all
that much, but I assume that they are okay in this case on the grounds
that the public compressor interface is calling the private
implementation of a certain compressor.
Thanks, I'll take a look.
BTW, I know you wanted to have support for other compression algorithms;
I think the best way to achieve that is to make it possible to specify
an external command to be used for compression. pg_dump would fork() and
exec() that, and pipe the data to be compressed/decompressed to
stdin/stdout of the external command. We're not going to add support for
every new compression algorithm that's in vogue, but generic external
command support should make happy those who want it. I'd be particularly
excited about using something like pbzip2, to speed up the compression
on multi-core systems.
That should be a separate patch, but it's something to keep in mind with
these refactorings.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Mon, Nov 29, 2010 at 10:49 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
On 29.11.2010 07:11, Joachim Wieland wrote:
On Mon, Nov 22, 2010 at 3:44 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:* wrap long lines
* use extern in function prototypes in header files
* "inline" some functions like _StartDataCompressor, _EndDataCompressor,
_DoInflate/_DoDeflate that aren't doing anything but call some other
function.So here is a new round of patches. It turned out that the feature to
allow to also restore files from a different dump and with a different
compression required some changes in the compressor API. And in the
end I didn't like all the #ifdefs either and made a less #ifdef-rich
version using function pointers. The downside now is that I have
created quite a few one-line functions that Heikki doesn't like all
that much, but I assume that they are okay in this case on the grounds
that the public compressor interface is calling the private
implementation of a certain compressor.Thanks, I'll take a look.
BTW, I know you wanted to have support for other compression algorithms; I
think the best way to achieve that is to make it possible to specify an
external command to be used for compression. pg_dump would fork() and exec()
that, and pipe the data to be compressed/decompressed to stdin/stdout of the
external command. We're not going to add support for every new compression
algorithm that's in vogue, but generic external command support should make
happy those who want it. I'd be particularly excited about using something
like pbzip2, to speed up the compression on multi-core systems.That should be a separate patch, but it's something to keep in mind with
these refactorings.
That would also ease licensing concerns, since we wouldn't have to
redistribute or bundle anything.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 29.11.2010 07:11, Joachim Wieland wrote:
On Mon, Nov 22, 2010 at 3:44 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:* wrap long lines
* use extern in function prototypes in header files
* "inline" some functions like _StartDataCompressor, _EndDataCompressor,
_DoInflate/_DoDeflate that aren't doing anything but call some other
function.So here is a new round of patches. It turned out that the feature to
allow to also restore files from a different dump and with a different
compression required some changes in the compressor API. And in the
end I didn't like all the #ifdefs either and made a less #ifdef-rich
version using function pointers.
Ok. The separate InitCompressorState() and AllocateCompressorState()
functions seem unnecessary. As the code stands, there's little
performance gain from re-using the same CompressorState, just
re-initializing it, and I can't see any other justification for them either.
I combined those, and the Free/Flush steps, and did a bunch of other
editorializations and cleanups. Here's an updated patch, also available
in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch
"pg_dump-dir". I'm going to continue reviewing this later, tomorrow
hopefully.
The downside now is that I have
created quite a few one-line functions that Heikki doesn't like all
that much, but I assume that they are okay in this case on the grounds
that the public compressor interface is calling the private
implementation of a certain compressor.
You could avoid the wrapper functions by calling the function pointers
directly, but I agree it seems neater the way you did it.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
pg_dump-compression-refactor-2.difftext/x-diff; name=pg_dump-compression-refactor-2.diffDownload+886-364
On 29.11.2010 22:21, Heikki Linnakangas wrote:
On 29.11.2010 07:11, Joachim Wieland wrote:
On Mon, Nov 22, 2010 at 3:44 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:* wrap long lines
* use extern in function prototypes in header files
* "inline" some functions like _StartDataCompressor, _EndDataCompressor,
_DoInflate/_DoDeflate that aren't doing anything but call some other
function.So here is a new round of patches. It turned out that the feature to
allow to also restore files from a different dump and with a different
compression required some changes in the compressor API. And in the
end I didn't like all the #ifdefs either and made a less #ifdef-rich
version using function pointers.Ok. The separate InitCompressorState() and AllocateCompressorState()
functions seem unnecessary. As the code stands, there's little
performance gain from re-using the same CompressorState, just
re-initializing it, and I can't see any other justification for them
either.I combined those, and the Free/Flush steps, and did a bunch of other
editorializations and cleanups. Here's an updated patch, also available
in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch
"pg_dump-dir". I'm going to continue reviewing this later, tomorrow
hopefully.
Here's another update. I changed things quite heavily. I didn't see the
point of having the Alloc+Free functions for uncompressing, because the
ReadDataFromArchive processed the whole input stream in one go anyway.
So the new API consists of four functions, AllocateCompressor,
WriteDataToArchive and EndCompressor for writing, and
ReadDataFromArchive for reading.
Also, I reverted the zlib buffer size from 64k to 4k. If you want to
raise that, let's discuss that separately.
Please let me know what you think of this version, or if you spot any
bugs. I'll keep working on this, I'm hoping to get this into committable
shape by the end of the week.
The pg_backup_directory patch naturally won't apply over this anymore.
Once we have the compress_io part in shape, that will need to be fixed.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On 01.12.2010 16:03, Heikki Linnakangas wrote:
On 29.11.2010 22:21, Heikki Linnakangas wrote:
I combined those, and the Free/Flush steps, and did a bunch of other
editorializations and cleanups. Here's an updated patch, also available
in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch
"pg_dump-dir". I'm going to continue reviewing this later, tomorrow
hopefully.Here's another update.
Forgot attachment. This is also available in the above git repo.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachments:
pg_dump-compression-refactor-3.patchtext/x-diff; name=pg_dump-compression-refactor-3.patchDownload+845-365
On Wed, Dec 1, 2010 at 9:05 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
Forgot attachment. This is also available in the above git repo.
I have quickly checked your modifications, on the one hand I like the
reduction of functions, I would have said that we have AH around all
the time and so we could just allocate once and stuff it all into
ctx->cs and reuse the buffers for every object, but re-allocating them
for every (dumpable) object should be fine as well.
Regarding the function pointers that you removed, you are now putting
back in what Dimitri wanted me to take out, namely switch/case
instructions for the algorithms and then #ifdefs for every algorithm.
It's not too many now since we have taken out LZF. Well, I can live
with both ways.
There is one thing however that I am not in favor of, which is the
removal of the "sizeHint" parameter for the read functions. The reason
for this parameter is not very clear now without LZF but I have tried
to put in a few comments to explain the situation (which you have
taken out as well :-) ).
The point is that zlib is a stream based compression algorithm, you
just stuff data in and from time to time you get data out and in the
end you explicitly flush the compressor. The read function can just
return as many bytes as it wants and we can just hand it all over to
zlib. Other compression algorithms however are block based and first
write a block header that contains the information on the next data
block, including uncompressed and compressed sizes. Now with the
sizeHint parameter I used, the compressor could tell the read function
that it just wants to read the fixed size header (6 bytes IIRC). In
the header it would look up the compressed size for the next block and
would then ask the read function to get exactly this amount of data,
decompress it and go on with the next block, and so forth...
Of course you can possibly do that memory management inside the
compressor with an extra buffer holding what you got in excess but
it's a pain. If you removed that part on purpose on the grounds that
there is no block based compression algorithm in core and probably
never will be, then that's okay :-)
Joachim
On 02.12.2010 04:35, Joachim Wieland wrote:
There is one thing however that I am not in favor of, which is the
removal of the "sizeHint" parameter for the read functions. The reason
for this parameter is not very clear now without LZF but I have tried
to put in a few comments to explain the situation (which you have
taken out as well :-) ).The point is that zlib is a stream based compression algorithm, you
just stuff data in and from time to time you get data out and in the
end you explicitly flush the compressor. The read function can just
return as many bytes as it wants and we can just hand it all over to
zlib. Other compression algorithms however are block based and first
write a block header that contains the information on the next data
block, including uncompressed and compressed sizes. Now with the
sizeHint parameter I used, the compressor could tell the read function
that it just wants to read the fixed size header (6 bytes IIRC). In
the header it would look up the compressed size for the next block and
would then ask the read function to get exactly this amount of data,
decompress it and go on with the next block, and so forth...Of course you can possibly do that memory management inside the
compressor with an extra buffer holding what you got in excess but
it's a pain. If you removed that part on purpose on the grounds that
there is no block based compression algorithm in core and probably
never will be, then that's okay :-)
Yeah, we're not going to have lzf built-in anytime soon. The external
command approach seems like the best way to support additional
compression algorithms, and I don't think it could do anything with
sizeHint. And the custom format didn't obey sizeHint anyway, because it
reads one custom-format block at a time.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com