Re: Add LZ4 compression in pg_dump

Started by Justin Pryzbyover 3 years ago154 messageshackers

pryzby@telsasoft.com

over 3 years ago

On Fri, Aug 05, 2022 at 02:23:45PM +0000, Georgios Kokolatos wrote:

Thank you for your work during commitfest.

The patch is still in development. Given vacation status, expect the next patches to be ready for the November commitfest.
For now it has moved to the September one. Further action will be taken then as needed.

On Sun, Nov 06, 2022 at 02:53:12PM +0000, gkokolatos@pm.me wrote:

On Wed, Nov 2, 2022 at 14:28, Justin Pryzby <pryzby@telsasoft.com> wrote:

Checking if you'll be able to submit new patches soon ?

Thank you for checking up. Expect new versions within this commitfest cycle.

Hi,

I think this patch record should be closed for now. You can re-open the
existing patch record once a patch is ready to be reviewed.

The commitfest is a time for committing/reviewing patches that were
previously submitted, but there's no new patch since July. Making a
patch available for review at the start of the commitfest seems like a
requirement for current patch records (same as for new patch records).

I wrote essentially the same patch as your early patches 2 years ago
(before postgres was ready to consider new compression algorithms), so
I'm happy to review a new patch when it's available, regardless of its
status in the cfapp.

BTW, some of my own review comments from March weren't addressed.
Please check. Also, in February, I asked if you knew how to use
cirrusci to run checks on cirrusci, but the patches still had
compilation errors and warnings on various OS.

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3571

--
Justin

Import Notes

Reply to msg id not found: 165970942594.29780.15623018354659505573.pgcf@coridan.postgresql.org-pU80vtiyXMfSSpn4SWOR97t71MVmDOyHaJtvGsdEfroFsFUDdc-UaqD-8LMPMw2sy-7B2_IVnlrATfVvXwtv8n3kexLePZ7iWgiOnE4jRI@pm.me

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Justin Pryzby (#1)

On Sun, Nov 20, 2022 at 11:26:11AM -0600, Justin Pryzby wrote:

I think this patch record should be closed for now. You can re-open the
existing patch record once a patch is ready to be reviewed.

Indeed. As of things are, this is just a dead entry in the CF which
would be confusing. I have marked it as RwF.
--
Michael

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Michael Paquier (#2)

------- Original Message -------
On Monday, November 21st, 2022 at 12:13 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Sun, Nov 20, 2022 at 11:26:11AM -0600, Justin Pryzby wrote:

I think this patch record should be closed for now. You can re-open the
existing patch record once a patch is ready to be reviewed.

Indeed. As of things are, this is just a dead entry in the CF which
would be confusing. I have marked it as RwF.

Thank you for closing it.

For the record I am currently working on it simply unsure if I should submit
WIP patches and add noise to the list or wait until it is in a state that I
feel that the comments have been addressed.

A new version that I feel that is in a decent enough state for review should
be ready within this week. I am happy to drop the patch if you think I should
not work on it though.

Cheers,
//Georgios

Show quoted text

--
Michael

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Georgios Kokolatos (#3)

On Tue, Nov 22, 2022 at 10:00:47AM +0000, gkokolatos@pm.me wrote:

A new version that I feel that is in a decent enough state for review should
be ready within this week. I am happy to drop the patch if you think I should
not work on it though.

If you can post a new version of the patch, that's fine, of course.
I'll be happy to look over it more.
--
Michael

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Georgios Kokolatos (#3)

On Tue, Nov 22, 2022 at 10:00:47AM +0000, gkokolatos@pm.me wrote:

For the record I am currently working on it simply unsure if I should submit
WIP patches and add noise to the list or wait until it is in a state that I
feel that the comments have been addressed.

A new version that I feel that is in a decent enough state for review should
be ready within this week. I am happy to drop the patch if you think I should
not work on it though.

I hope you'll want to continue work on it. The patch record is like a
request for review, so it's closed if there's nothing ready to review.

I think you should re-send patches (and update the CF app) as often as
they're ready for more review. Your 001 commit (which is almost the
same as what I wrote 2 years ago) still needs to account for some review
comments, and the whole patch set ought to pass cirrusci tests. At that
point, you'll be ready for another round of review, even if there's
known TODO/FIXME items in later patches.

BTW I saw that you updated your branch on github. You'll need to make
the corresponding changes to ./meson.build that you made to ./Makefile.
https://wiki.postgresql.org/wiki/Meson_for_patch_authors
https://wiki.postgresql.org/wiki/Meson

--
Justin

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Michael Paquier (#4)

------- Original Message -------
On Tuesday, November 22nd, 2022 at 11:49 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Nov 22, 2022 at 10:00:47AM +0000, gkokolatos@pm.me wrote:

A new version that I feel that is in a decent enough state for review should
be ready within this week. I am happy to drop the patch if you think I should
not work on it though.

If you can post a new version of the patch, that's fine, of course.
I'll be happy to look over it more.

Thank you Michael (and Justin). Allow me to present v8.

The focus of this version of this series is 0001 and 0002.

Admittedly 0001 could be presented in a separate thread though given its size and
proximity to the topic, I present it here.

In an earlier review you spotted the similarity between pg_dump's and pg_receivewal's
parsing of compression options. However there exists a substantial difference in the
behaviour of the two programs; one treats the lack of support for the requested
algorithm as a fatal error, whereas the other does not. The existing functions in
common/compression.c do not account for the later. 0002 proposes an implementation
for this. It's usefulness is shown in 0003.

Please consider 0003-0005 as work in progress. They are differences from v7 yet they
may contain unaddressed comments for now.

A welcome feedback would be in splitting and/or reordering of 0003-0005. I think
that they now split in coherent units and are presented in a logical order. Let me
know if you disagree and where should the breakpoints be.

Cheers,
//Georgios

Show quoted text

--
Michael

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Georgios Kokolatos (#6)

On Mon, Nov 28, 2022 at 04:32:43PM +0000, gkokolatos@pm.me wrote:

The focus of this version of this series is 0001 and 0002.

Admittedly 0001 could be presented in a separate thread though given its size and
proximity to the topic, I present it here.

I don't mind. This was a hole in meson.build, so nice catch! I have
noticed a second defect with pg_verifybackup for all the commands, and
applied both at the same time.

In an earlier review you spotted the similarity between pg_dump's and pg_receivewal's
parsing of compression options. However there exists a substantial difference in the
behaviour of the two programs; one treats the lack of support for the requested
algorithm as a fatal error, whereas the other does not. The existing functions in
common/compression.c do not account for the later. 0002 proposes an implementation
for this. It's usefulness is shown in 0003.

In what does it matter? The logic in compression.c provides an error
when looking at a spec or validating it, but the caller is free to
consume it as it wants because this is shared between the frontend and
the backend, and that includes consuming it as a warning rather than a
ahrd failure. If we don't want to issue an error and force
non-compression if attempting to use a compression method not
supported in pg_dump, that's fine by me as a historical behavior, but
I don't see why these routines have any need to be split more as
proposed in 0002.

Saying that, I do agree that it would be nice to remove the
duplication between the option parsing of pg_basebackup and
pg_receivewal. Your patch is very close to that, actually, and it
occured to me that if we move the check on "server-" and "client-" in
pg_basebackup to be just before the integer-only check then we can
consolidate the whole thing.

Attached is an alternative that does not sacrifice the pluggability of
the existing routines while allowing 0003~ to still use them (I don't
really want to move around the checks on the supported build options
now in parse_compress_specification(), that was hard enough to settle
on this location). On top of that, pg_basebackup is able to cope with
the case of --compress=0 already, enforcing "none" (BaseBackup could
be simplified a bit more before StartLogStreamer). This refactoring
shaves a little bit of code.

Please consider 0003-0005 as work in progress. They are differences from v7 yet they
may contain unaddressed comments for now.

Okay.
--
Michael

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Michael Paquier (#7)

On Tue, Nov 29, 2022 at 03:19:17PM +0900, Michael Paquier wrote:

Attached is an alternative that does not sacrifice the pluggability of
the existing routines while allowing 0003~ to still use them (I don't
really want to move around the checks on the supported build options
now in parse_compress_specification(), that was hard enough to settle
on this location). On top of that, pg_basebackup is able to cope with
the case of --compress=0 already, enforcing "none" (BaseBackup could
be simplified a bit more before StartLogStreamer). This refactoring
shaves a little bit of code.

One thing that I forgot to mention is that this refactoring would
treat things like server-N, client-N as valid grammars (in this case N
takes precedence over an optional detail string), implying that N = 0
is "none" and N > 0 is gzip, so that makes for an extra grammar flavor
without impacting the existing ones. I am not sure that it is worth
documenting, still worth mentioning.
--
Michael

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Michael Paquier (#7)

------- Original Message -------
On Tuesday, November 29th, 2022 at 7:19 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Nov 28, 2022 at 04:32:43PM +0000, gkokolatos@pm.me wrote:

The focus of this version of this series is 0001 and 0002.

Admittedly 0001 could be presented in a separate thread though given its size and
proximity to the topic, I present it here.

I don't mind. This was a hole in meson.build, so nice catch! I have
noticed a second defect with pg_verifybackup for all the commands, and
applied both at the same time.

Thank you.

In an earlier review you spotted the similarity between pg_dump's and pg_receivewal's
parsing of compression options. However there exists a substantial difference in the
behaviour of the two programs; one treats the lack of support for the requested
algorithm as a fatal error, whereas the other does not. The existing functions in
common/compression.c do not account for the later. 0002 proposes an implementation
for this. It's usefulness is shown in 0003.

In what does it matter? The logic in compression.c provides an error
when looking at a spec or validating it, but the caller is free to
consume it as it wants because this is shared between the frontend and
the backend, and that includes consuming it as a warning rather than a
ahrd failure. If we don't want to issue an error and force
non-compression if attempting to use a compression method not
supported in pg_dump, that's fine by me as a historical behavior, but
I don't see why these routines have any need to be split more as
proposed in 0002.

I understand. The reason for the change in the routines was because it was
impossible to distinguish a genuine parse error from a missing library in
parse_compress_specification(). If the zlib library is missing, then both
'--compress=gzip:garbage' and '--compress=gzip:7' would populate the
parse_error member of the struct and subsequent calls to
validate_compress_specification() would error out, although only one of
the two options is truly an error. Historically the code would fail on
invalid input regardless of whether the library was present or not.

Saying that, I do agree that it would be nice to remove the
duplication between the option parsing of pg_basebackup and
pg_receivewal. Your patch is very close to that, actually, and it
occured to me that if we move the check on "server-" and "client-" in
pg_basebackup to be just before the integer-only check then we can
consolidate the whole thing.

Great. I did notice the possible benefit but chose to not tread too far
off the necessary in my patch.

Attached is an alternative that does not sacrifice the pluggability of
the existing routines while allowing 0003~ to still use them (I don't
really want to move around the checks on the supported build options
now in parse_compress_specification(), that was hard enough to settle
on this location).

Yeah, I thought that it would be a hard sell, hence an "earlier"
version.

The attached version 10, contains verbatim your proposed v9 as 0001.
Then 0002 is switching a bit the parsing order in pg_dump and will not
fail as described above on missing libraries. Now, it will first parse
the algorithm, discard it when unsupported, and only parse the rest of
the option if the algorithm is supported. Granted it is a bit 'uglier'
with the preprocessing blocks, yet it maintains most of the historic
behaviour without altering the common compression interfaces. Now, as
shown in 001_basic.pl, invalid detail will fail only if the algorithm
is supported.

On top of that, pg_basebackup is able to cope with
the case of --compress=0 already, enforcing "none" (BaseBackup could
be simplified a bit more before StartLogStreamer). This refactoring
shaves a little bit of code.

Please consider 0003-0005 as work in progress. They are differences from v7 yet they
may contain unaddressed comments for now.

Okay.

Thank you. Please advice if is preferable to split 0002 in two parts.
I think not but I will happily do so if you think otherwise.

Cheers,
//Georgios

Show quoted text

--
Michael

#10

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Georgios Kokolatos (#9)

On Tue, Nov 29, 2022 at 12:10:46PM +0000, gkokolatos@pm.me wrote:

Thank you. Please advice if is preferable to split 0002 in two parts.
I think not but I will happily do so if you think otherwise.

This one makes me curious. What kind of split are you talking about?
If it makes the code review and the git history cleaner and easier, I
am usually a lot in favor of such incremental changes. As far as I
can see, there is the switch from the compression integer to
compression specification as one thing. The second thing is the
refactoring of cfclose() and these routines, paving the way for 0003.
Hmm, it may be cleaner to move the switch to the compression spec in
one patch, and move the logic around cfclose() to its own, paving the
way to 0003.

By the way, I think that this 0002 should drop all the default clauses
in the switches for the compression method so as we'd catch any
missing code paths with compiler warnings if a new compression method
is added in the future.

Anyway, I have applied 0001, adding you as a primary author because
you did most of it with only tweaks from me for pg_basebackup. The
docs of pg_basebackup have been amended to mention the slight change
in grammar, affecting the case where we do not have a detail string.
--
Michael

#11

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Michael Paquier (#10)

------- Original Message -------
On Wednesday, November 30th, 2022 at 1:50 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Nov 29, 2022 at 12:10:46PM +0000, gkokolatos@pm.me wrote:

Thank you. Please advice if is preferable to split 0002 in two parts.
I think not but I will happily do so if you think otherwise.

This one makes me curious. What kind of split are you talking about?
If it makes the code review and the git history cleaner and easier, I
am usually a lot in favor of such incremental changes. As far as I
can see, there is the switch from the compression integer to
compression specification as one thing. The second thing is the
refactoring of cfclose() and these routines, paving the way for 0003.
Hmm, it may be cleaner to move the switch to the compression spec in
one patch, and move the logic around cfclose() to its own, paving the
way to 0003.

Fair enough. The atteched v11 does that. 0001 introduces compression
specification and is using it throughout. 0002 paves the way to the
new interface by homogenizing the use of cfp. 0003 introduces the new
API and stores the compression algorithm in the custom format header
instead of the compression level integer. Finally 0004 adds support for
LZ4.

Besides the version bump in 0003 which can possibly be split out and
as an independent and earlier step, I think that the patchset consists
of coherent units.

By the way, I think that this 0002 should drop all the default clauses
in the switches for the compression method so as we'd catch any
missing code paths with compiler warnings if a new compression method
is added in the future.

Sure.

Anyway, I have applied 0001, adding you as a primary author because
you did most of it with only tweaks from me for pg_basebackup. The
docs of pg_basebackup have been amended to mention the slight change
in grammar, affecting the case where we do not have a detail string.

Very kind of you, thank you.

Cheers,
//Georgios

Show quoted text

--
Michael

#12

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Georgios Kokolatos (#11)

On Wed, Nov 30, 2022 at 05:11:44PM +0000, gkokolatos@pm.me wrote:

Fair enough. The atteched v11 does that. 0001 introduces compression
specification and is using it throughout. 0002 paves the way to the
new interface by homogenizing the use of cfp. 0003 introduces the new
API and stores the compression algorithm in the custom format header
instead of the compression level integer. Finally 0004 adds support for
LZ4.

I have been looking at 0001, and.. Hmm. I am really wondering
whether it would not be better to just nuke this warning into orbit.
This stuff enforces non-compression even if -Z has been used to a
non-default value. This has been moved to its current location by
cae2bb1 as of this thread:
/messages/by-id/20160526.185551.242041780.horiguchi.kyotaro@lab.ntt.co.jp

However, this is only active if -Z is used when not building with
zlib. At the end, it comes down to whether we want to prioritize the
portability of pg_dump commands specifying a -Z/--compress across
environments knowing that these may or may not be built with zlib,
vs the amount of simplification/uniformity we would get across the
binaries in the tree once we switch everything to use the compression
specifications. Now that pg_basebackup and pg_receivewal are managed
by compression specifications, and that we'd want more compression
options for pg_dump, I would tend to do the latter and from now on
complain if attempting to do a pg_dump -Z under --without-zlib with a
compression level > 0. zlib is also widely available, and we don't
document the fact that non-compression is enforced in this case,
either. (Two TAP tests with the custom format had to be tweaked.)

As per the patch, it is true that we do not need to bump the format of
the dump archives, as we can still store only the compression level
and guess the method from it. I have added some notes about that in
ReadHead and WriteHead to not forget.

Most of the changes are really-straight forward, and it has resisted
my tests, so I think that this is in a rather-commitable shape as-is.
--
Michael

#13

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Michael Paquier (#12)

------- Original Message -------
On Thursday, December 1st, 2022 at 3:05 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Nov 30, 2022 at 05:11:44PM +0000, gkokolatos@pm.me wrote:

Fair enough. The atteched v11 does that. 0001 introduces compression
specification and is using it throughout. 0002 paves the way to the
new interface by homogenizing the use of cfp. 0003 introduces the new
API and stores the compression algorithm in the custom format header
instead of the compression level integer. Finally 0004 adds support for
LZ4.

I have been looking at 0001, and.. Hmm. I am really wondering
whether it would not be better to just nuke this warning into orbit.
This stuff enforces non-compression even if -Z has been used to a
non-default value. This has been moved to its current location by
cae2bb1 as of this thread:
/messages/by-id/20160526.185551.242041780.horiguchi.kyotaro@lab.ntt.co.jp

However, this is only active if -Z is used when not building with
zlib. At the end, it comes down to whether we want to prioritize the
portability of pg_dump commands specifying a -Z/--compress across
environments knowing that these may or may not be built with zlib,
vs the amount of simplification/uniformity we would get across the
binaries in the tree once we switch everything to use the compression
specifications. Now that pg_basebackup and pg_receivewal are managed
by compression specifications, and that we'd want more compression
options for pg_dump, I would tend to do the latter and from now on
complain if attempting to do a pg_dump -Z under --without-zlib with a
compression level > 0. zlib is also widely available, and we don't
document the fact that non-compression is enforced in this case,
either. (Two TAP tests with the custom format had to be tweaked.)

Fair enough. Thank you for looking. However I have a small comment
on your new patch.

- /* Custom and directory formats are compressed by default, others not */
- if (compressLevel == -1)
- {
-#ifdef HAVE_LIBZ
- if (archiveFormat == archCustom || archiveFormat == archDirectory)
- compressLevel = Z_DEFAULT_COMPRESSION;
- else
-#endif
- compressLevel = 0;
- }

Nuking the warning from orbit and changing the behaviour around disabling
the requested compression when the libraries are not present, should not
mean that we need to change the behaviour of default values for different
formats. Please find v13 attached which reinstates it.

Which in itself it got me looking and wondering why the tests succeeded.
The only existing test covering that path is `defaults_dir_format` in
`002_pg_dump.pl`. However as the test is currently written it does not
check whether the output was compressed. The restore command would succeed
in either case. A simple `gzip -t -r` against the directory will not
suffice to test it, because there exist files which are never compressed
in this format (.toc). A little bit more involved test case would need
to be written, yet before I embark to this journey, I would like to know
if you would agree to reinstate the defaults for those formats.

As per the patch, it is true that we do not need to bump the format of
the dump archives, as we can still store only the compression level
and guess the method from it. I have added some notes about that in
ReadHead and WriteHead to not forget.

Agreed. A minor suggestion if you may.

 #ifndef HAVE_LIBZ
-       if (AH->compression != 0)
+       if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
                pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available");
 #endif

It would seem a more consistent to error out in this case. We do error
in all other cases where the compression is not available.

Most of the changes are really-straight forward, and it has resisted
my tests, so I think that this is in a rather-commitable shape as-is.

Thank you.

Cheers,
//Georgios

Show quoted text

--
Michael

#14

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Georgios Kokolatos (#13)

On Thu, Dec 01, 2022 at 02:58:35PM +0000, gkokolatos@pm.me wrote:

Nuking the warning from orbit and changing the behaviour around disabling
the requested compression when the libraries are not present, should not
mean that we need to change the behaviour of default values for different
formats. Please find v13 attached which reinstates it.

Gah, thanks! And this default behavior is documented as dependent on
the compilation as well.

Which in itself it got me looking and wondering why the tests succeeded.
The only existing test covering that path is `defaults_dir_format` in
`002_pg_dump.pl`. However as the test is currently written it does not
check whether the output was compressed. The restore command would succeed
in either case. A simple `gzip -t -r` against the directory will not
suffice to test it, because there exist files which are never compressed
in this format (.toc). A little bit more involved test case would need
to be written, yet before I embark to this journey, I would like to know
if you would agree to reinstate the defaults for those formats.

On top of my mind, I briefly recall that -r is not that portable. And
the toc format makes the files generated non-deterministic as these
use OIDs..

[.. thinks ..]

We are going to need a new thing here, as compress_cmd cannot be
directly used. What if we used only an array of glob()-able elements?
Let's say "expected_contents" that could include a "dir_path/*.gz"
conditional on $supports_gzip? glob() can only be calculated when the
test is run as the file names cannot be known beforehand :/

As per the patch, it is true that we do not need to bump the format of
the dump archives, as we can still store only the compression level
and guess the method from it. I have added some notes about that in
ReadHead and WriteHead to not forget.

Agreed. A minor suggestion if you may.
#ifndef HAVE_LIBZ
-       if (AH->compression != 0)
+       if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
pg_log_warning("archive is compressed, but this installation does not support compression -- no data will be available");
#endif
It would seem a more consistent to error out in this case. We do error
in all other cases where the compression is not available.

Makes sense.

I have gone through the patch again, and applied it. Thanks!
--
Michael

#15

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Michael Paquier (#14)

------- Original Message -------
On Friday, December 2nd, 2022 at 2:56 AM, Michael Paquier <michael@paquier.xyz> wrote:

On top of my mind, I briefly recall that -r is not that portable. And
the toc format makes the files generated non-deterministic as these
use OIDs..

[.. thinks ..]

We are going to need a new thing here, as compress_cmd cannot be
directly used. What if we used only an array of glob()-able elements?
Let's say "expected_contents" that could include a "dir_path/*.gz"
conditional on $supports_gzip? glob() can only be calculated when the
test is run as the file names cannot be known beforehand :/

You are very correct. However one can glob after the fact. Please find
0001 of the attached v14 which attempts to implement it.

I have gone through the patch again, and applied it. Thanks!

Thank you. Please find the rest of of the patchset series rebased on top
of it. I dare to say that 0002 is in a state worth of your consideration.

Cheers,
//Georgios

Show quoted text

--
Michael

#16

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Georgios Kokolatos (#15)

On Fri, Dec 02, 2022 at 04:15:10PM +0000, gkokolatos@pm.me wrote:

You are very correct. However one can glob after the fact. Please find
0001 of the attached v14 which attempts to implement it.

+       if ($pgdump_runs{$run}->{glob_pattern})
+       {
+               my $glob_pattern = $pgdump_runs{$run}->{glob_pattern};
+               my @glob_output = glob($glob_pattern);
+               is(scalar(@glob_output) > 0, 1, "glob pattern matched")
+       }

While this is correct in checking that the contents are compressed
under --with-zlib, this also removes the coverage where we make sure
that this command is able to complete under --without-zlib without
compressing any of the table data files. Hence my point from
upthread: this test had better not use compile_option, but change
glob_pattern depending on if the build uses zlib or not.

In order to check this behavior with defaults_custom_format, perhaps
we could just remove the -Z6 from it or add an extra command for its
default behavior?
--
Michael

#17

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Michael Paquier (#16)

On Sat, Dec 03, 2022 at 11:45:30AM +0900, Michael Paquier wrote:

While this is correct in checking that the contents are compressed
under --with-zlib, this also removes the coverage where we make sure
that this command is able to complete under --without-zlib without
compressing any of the table data files. Hence my point from
upthread: this test had better not use compile_option, but change
glob_pattern depending on if the build uses zlib or not.

In short, I mean something like the attached. I have named the flag
content_patterns, and switched it to an array so as we can check that
toc.dat is always uncompression and that the other data files are
always uncompressed.

In order to check this behavior with defaults_custom_format, perhaps
we could just remove the -Z6 from it or add an extra command for its
default behavior?

This is slightly more complicated as there is just one file generated
for the compression and non-compression cases, so I have let that as
it is now.
--
Michael

#18

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Michael Paquier (#17)

------- Original Message -------
On Monday, December 5th, 2022 at 8:05 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Sat, Dec 03, 2022 at 11:45:30AM +0900, Michael Paquier wrote:

While this is correct in checking that the contents are compressed
under --with-zlib, this also removes the coverage where we make sure
that this command is able to complete under --without-zlib without
compressing any of the table data files. Hence my point from
upthread: this test had better not use compile_option, but change
glob_pattern depending on if the build uses zlib or not.

In short, I mean something like the attached. I have named the flag
content_patterns, and switched it to an array so as we can check that
toc.dat is always uncompression and that the other data files are
always uncompressed.

I see. This approach is much better than my proposal, thanks. If you
allow me, I find 'content_patterns' to be slightly ambiguous. While is
true that it refers to the contents of a directory, it is not the
contents of the dump that it is examining. I took the liberty of proposing
an alternative name in the attached v16.

I also took the liberty of applying the test pattern when it the dump
is explicitly compressed.

In order to check this behavior with defaults_custom_format, perhaps
we could just remove the -Z6 from it or add an extra command for its
default behavior?

This is slightly more complicated as there is just one file generated
for the compression and non-compression cases, so I have let that as
it is now.

I was thinking a bit more about this. I think that we can use the list
TOC option of pg_restore. This option will first print out the header
info which contains the compression. Perl utils already support to
parse the generated output of a command. Please find an attempt to do
so in the attached. The benefits of having some testing for this case
become a bit more obvious in 0004 of the patchset, when lz4 is
introduced.

Cheers,
//Georgios

Show quoted text

--
Michael

#19

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Georgios Kokolatos (#18)

On Mon, Dec 05, 2022 at 12:48:28PM +0000, gkokolatos@pm.me wrote:

I also took the liberty of applying the test pattern when it the dump
is explicitly compressed.

Sticking with glob_patterns is fine by me.

I was thinking a bit more about this. I think that we can use the list
TOC option of pg_restore. This option will first print out the header
info which contains the compression. Perl utils already support to
parse the generated output of a command. Please find an attempt to do
so in the attached. The benefits of having some testing for this case
become a bit more obvious in 0004 of the patchset, when lz4 is
introduced.

This is where the fun is. What you are doing here is more complete,
and we would make sure that the custom and data directory would always
see their contents compressed by default. And it would have caught
the bug you mentioned upthread for the custom format.

I have kept things as you proposed at the end, added a few comments,
documented the new command_like and an extra command_like for
defaults_dir_format. Glad to see this addressed, thanks!
--
Michael

#20

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Michael Paquier (#19)

------- Original Message -------
On Tuesday, December 6th, 2022 at 1:22 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Dec 05, 2022 at 12:48:28PM +0000, gkokolatos@pm.me wrote:

This is where the fun is. What you are doing here is more complete,
and we would make sure that the custom and data directory would always
see their contents compressed by default. And it would have caught
the bug you mentioned upthread for the custom format.

Thank you very much Michael.

I have kept things as you proposed at the end, added a few comments,
documented the new command_like and an extra command_like for
defaults_dir_format. Glad to see this addressed, thanks!

Please find attached v17, which builds on top of what is already
committed. I dare to think 0001 as ready to be reviewed. 0002 is
also complete albeit with some documentation gaps.

Cheers,
//Georgios

Show quoted text

--
Michael

#21

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Georgios Kokolatos (#20)

#22

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Justin Pryzby (#21)

#23

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Michael Paquier (#22)

#24

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Georgios Kokolatos (#23)

#25

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Michael Paquier (#22)

#26

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Justin Pryzby (#24)

#27

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Georgios Kokolatos (#26)

#28

Georgios Kokolatos

gkokolatos@protonmail.com

over 3 years ago

In reply to: Justin Pryzby (#27)

#29

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Georgios Kokolatos (#28)

#30

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Justin Pryzby (#29)

#31

vignesh C

vignesh21@gmail.com

over 3 years ago

In reply to: Georgios Kokolatos (#28)

#32

Justin Pryzby

pryzby@telsasoft.com

over 3 years ago

In reply to: Justin Pryzby (#30)

#33

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Justin Pryzby (#32)

#34

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Justin Pryzby (#32)

#35

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Michael Paquier (#34)

#36

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Justin Pryzby (#35)

#37

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Georgios Kokolatos (#36)

#38

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Justin Pryzby (#35)

#39

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Michael Paquier (#38)

#40

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#39)

#41

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#40)

#42

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#41)

#43

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#42)

#44

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#43)

#45

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Tomas Vondra (#44)

#46

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#45)

#47

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#45)

#48

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Georgios Kokolatos (#47)

#49

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Justin Pryzby (#48)

#50

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Georgios Kokolatos (#49)

#51

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Justin Pryzby (#50)

#52

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#51)

#53

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Georgios Kokolatos (#51)

#54

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#52)

#55

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Justin Pryzby (#53)

#56

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Georgios Kokolatos (#55)

#57

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Justin Pryzby (#53)

#58

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Michael Paquier (#57)

#59

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Justin Pryzby (#58)

#60

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Georgios Kokolatos (#59)

#61

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Michael Paquier (#60)

#62

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Georgios Kokolatos (#61)

#63

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Justin Pryzby (#62)

#64

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Michael Paquier (#38)

#65

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Justin Pryzby (#62)

#66

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Justin Pryzby (#65)

#67

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Georgios Kokolatos (#66)

#68

shiy.fnst@fujitsu.com

about 3 years ago

In reply to: Georgios Kokolatos (#61)

#69

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: shiy.fnst@fujitsu.com (#68)

#70

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#69)

#71

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#70)

#72

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#70)

#73

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#72)

#74

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Tomas Vondra (#73)

#75

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#74)

#76

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Justin Pryzby (#75)

#77

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Michael Paquier (#76)

#78

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Georgios Kokolatos (#77)

#79

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Justin Pryzby (#78)

#80

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Justin Pryzby (#78)

#81

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Justin Pryzby (#79)

#82

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#80)

#83

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Justin Pryzby (#79)

#84

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#74)

#85

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Justin Pryzby (#84)

#86

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Justin Pryzby (#84)

#87

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Michael Paquier (#85)

#88

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#86)

#89

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#83)

#90

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Justin Pryzby (#79)

#91

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Justin Pryzby (#81)

#92

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Georgios Kokolatos (#86)

#93

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#90)

#94

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#90)

#95

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Justin Pryzby (#94)

#96

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#88)

#97

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#91)

#98

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Justin Pryzby (#97)

#99

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#98)

#100

Michael Paquier

michael@paquier.xyz

about 3 years ago

In reply to: Justin Pryzby (#99)

#101

Alexander Lakhin

exclusion@gmail.com

about 3 years ago

In reply to: Tomas Vondra (#74)

#102

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Alexander Lakhin (#101)

#103

Alexander Lakhin

exclusion@gmail.com

about 3 years ago

In reply to: Georgios Kokolatos (#102)

#104

Peter Eisentraut

peter_e@gmx.net

about 3 years ago

In reply to: Alexander Lakhin (#101)

#105

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Peter Eisentraut (#104)

#106

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#102)

#107

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Justin Pryzby (#96)

#108

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#107)

#109

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#106)

#110

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#109)

#111

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#108)

#112

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#107)

#113

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#110)

#114

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#113)

#115

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Justin Pryzby (#112)

#116

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#115)

#117

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Justin Pryzby (#116)

#118

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#114)

#119

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#118)

#120

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#118)

#121

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Georgios Kokolatos (#118)

#122

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Justin Pryzby (#121)

#123

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#122)

#124

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#115)

#125

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Georgios Kokolatos (#123)

#126

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#125)

#127

Justin Pryzby

pryzby@telsasoft.com

about 3 years ago

In reply to: Tomas Vondra (#126)

#128

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#125)

#129

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#124)

#130

Georgios Kokolatos

gkokolatos@protonmail.com

about 3 years ago

In reply to: Tomas Vondra (#128)

#131

Tomas Vondra

tomas.vondra@2ndquadrant.com

about 3 years ago

In reply to: Georgios Kokolatos (#130)

#132

Alexander Lakhin

exclusion@gmail.com

almost 3 years ago

In reply to: Tomas Vondra (#122)

#133

Georgios Kokolatos

gkokolatos@protonmail.com

almost 3 years ago

In reply to: Alexander Lakhin (#132)

#134

Andrew Dunstan

andrew@dunslane.net

almost 3 years ago

In reply to: Georgios Kokolatos (#133)

#135

Georgios Kokolatos

gkokolatos@protonmail.com

almost 3 years ago

In reply to: Andrew Dunstan (#134)

#136

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Georgios Kokolatos (#135)

#137

Georgios Kokolatos

gkokolatos@protonmail.com

almost 3 years ago

In reply to: Michael Paquier (#136)

#138

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Georgios Kokolatos (#137)

#139

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Georgios Kokolatos (#135)

#140

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Michael Paquier (#139)

#141

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Tom Lane (#140)

#142

Tom Lane

tgl@sss.pgh.pa.us

almost 3 years ago

In reply to: Tom Lane (#140)

#143

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 3 years ago

In reply to: Georgios Kokolatos (#137)

#144

Georgios Kokolatos

gkokolatos@protonmail.com

almost 3 years ago

In reply to: Tom Lane (#142)

#145

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 3 years ago

In reply to: Tom Lane (#142)

#146

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 3 years ago

In reply to: Georgios Kokolatos (#144)

#147

Georgios Kokolatos

gkokolatos@protonmail.com

almost 3 years ago

In reply to: Tomas Vondra (#146)

#148

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Tomas Vondra (#145)

#149

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 3 years ago

In reply to: Michael Paquier (#148)

#150

Georgios Kokolatos

gkokolatos@protonmail.com

almost 3 years ago

In reply to: Tomas Vondra (#149)

#151

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Georgios Kokolatos (#150)

#152

Michael Paquier

michael@paquier.xyz

almost 3 years ago

In reply to: Tomas Vondra (#149)

#153

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 3 years ago

In reply to: Michael Paquier (#152)

#154

Tomas Vondra

tomas.vondra@2ndquadrant.com

almost 3 years ago

In reply to: Tomas Vondra (#153)

Re: Add LZ4 compression in pg_dump

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments:

Attachments: