Teach pg_receivewal to use lz4 compression

michael@paquier.xyz

almost 5 years ago

In reply to: Georgios Kokolatos (#1)

Re: Teach pg_receivewal to use lz4 compression

On Tue, Jun 29, 2021 at 02:45:17PM +0000, gkokolatos@pm.me wrote:

The program pg_receivewal can use gzip compression to store the received WAL.
This patch teaches it to be able to use lz4 compression if the binary is build
using the -llz4 flag.

Nice.

Previously, the user had to use the option --compress with a value between [0-9]
to denote that gzip compression was requested. This specific behaviour is
maintained. A newly introduced option --compress-program=lz4 can be used to ask
for the logs to be compressed using lz4 instead. In that case, no compression
values can be selected as it does not seem too useful.

Yes, I am not convinced either that we should care about making the
acceleration customizable.

Under the hood there is nothing exceptional to be noted. Tar based archives have
not yet been taught to use lz4 compression. Those are used by pg_basebackup. If
is is felt useful, then it is easy to be added in a new patch.

Documentation is missing from the patch.

+   LZ4F_compressionContext_t ctx;
+   size_t      outbufCapacity;
+   void       *outbuf;
It may be cleaner to refer to lz4 in the name of those variables?

+       ctx_out = LZ4F_createCompressionContext(&ctx, LZ4F_VERSION);
+       outbufCapacity = LZ4F_compressBound(LZ4_IN_SIZE, NULL /* default preferences */);
Interesting.  So this cannot be done at compilation time because of
the auto-flush mode looking at the LZ4 code.  That looks about right.

getopt_long() is forgotting the new option 'I'.

+   system_or_bail('lz4', '-t', $lz4_wals[0]);
I think that you should just drop this part of the test.  The only
part of LZ4 that we require to be present when Postgres is built with
--with-lz4 is its library liblz4.  Commands associated to it may not
be around, causing this test to fail.  The test checking that one .lz4
file has been created is good to have.  It may be worth adding a test
with a .lz4.partial segment generated and --endpos pointing to a LSN
that does not finish the segment that gets switched.

It seems to me that you are missing some logic in
FindStreamingStart() to handle LZ4-compressed segments, in relation
with IsCompressXLogFileName() and IsPartialCompressXLogFileName().

+ pg_log_error("invalid compress-program \"%s\"", optarg);
"compress-program" sounds weird. Shouldn't that just say "invalid
compression method" or similar?

+ printf(_(" -Z, --compress=0-9 compress logs with given
compression level (available only with compress-program=zlib)\n"));
This line is too long.

Should we have more tests for ZLIB, while on it? That seems like a
good addition as long as we can skip the tests conditionally when
that's not supported.
--
Michael

Dilip Kumar

dilipbalaut@gmail.com

almost 5 years ago

In reply to: Georgios Kokolatos (#1)

Re: Teach pg_receivewal to use lz4 compression

On Tue, Jun 29, 2021 at 8:15 PM <gkokolatos@pm.me> wrote:

Hi,

The program pg_receivewal can use gzip compression to store the received WAL.
This patch teaches it to be able to use lz4 compression if the binary is build
using the -llz4 flag.

+1 for the idea

Some comments/suggestions on the patch

1.
@@ -90,7 +91,8 @@ usage(void)
  printf(_("      --synchronous      flush write-ahead log immediately
after writing\n"));
  printf(_("  -v, --verbose          output verbose messages\n"));
  printf(_("  -V, --version          output version information, then exit\n"));
- printf(_("  -Z, --compress=0-9     compress logs with given
compression level\n"));
+ printf(_("  -I, --compress-program use this program for compression\n"));

Wouldn't it be better to call it compression method instead of
compression program?

2.
+ printf(_(" -Z, --compress=0-9 compress logs with given
compression level (available only with compress-program=zlib)\n"));

I think we can somehow use "acceleration" parameter of lz4 compression
to map on compression level, It is not direct mapping but
can't we create some internal mapping instead of completely ignoring
this option for lz4, or we can provide another option for lz4?

3. Should we also support LZ4 compression using dictionary?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

magnus@hagander.net

almost 5 years ago

In reply to: Dilip Kumar (#3)

Re: Teach pg_receivewal to use lz4 compression

On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Tue, Jun 29, 2021 at 8:15 PM <gkokolatos@pm.me> wrote:

Hi,

The program pg_receivewal can use gzip compression to store the received WAL.
This patch teaches it to be able to use lz4 compression if the binary is build
using the -llz4 flag.

+1 for the idea

Some comments/suggestions on the patch
1.
@@ -90,7 +91,8 @@ usage(void)
printf(_("      --synchronous      flush write-ahead log immediately
after writing\n"));
printf(_("  -v, --verbose          output verbose messages\n"));
printf(_("  -V, --version          output version information, then exit\n"));
- printf(_("  -Z, --compress=0-9     compress logs with given
compression level\n"));
+ printf(_("  -I, --compress-program use this program for compression\n"));
Wouldn't it be better to call it compression method instead of
compression program?

I came here to say exactly that, just had to think up what I thought
was the better name first. Either method or algorithm, but method
seems like the much simpler choice and therefore better in this case.

Should is also then not be --compression-method, rather than --compress-method?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

gkokolatos@protonmail.com

almost 5 years ago

In reply to: Magnus Hagander (#4)

Re: Teach pg_receivewal to use lz4 compression

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, July 1st, 2021 at 12:28, Magnus Hagander <magnus@hagander.net> wrote:

On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar dilipbalaut@gmail.com wrote:

On Tue, Jun 29, 2021 at 8:15 PM gkokolatos@pm.me wrote:

Hi,

The program pg_receivewal can use gzip compression to store the received WAL.

This patch teaches it to be able to use lz4 compression if the binary is build

using the -llz4 flag.

+1 for the idea

Some comments/suggestions on the patch

@@ -90,7 +91,8 @@ usage(void)

printf((" --synchronous flush write-ahead log immediately

after writing\n"));

printf((" -v, --verbose output verbose messages\n"));

printf(_(" -V, --version output version information, then exit\n"));

- printf(_(" -Z, --compress=0-9 compress logs with given

compression level\n"));

- printf(_(" -I, --compress-program use this program for compression\n"));

Wouldn't it be better to call it compression method instead of

compression program?

I came here to say exactly that, just had to think up what I thought

was the better name first. Either method or algorithm, but method

seems like the much simpler choice and therefore better in this case.

Should is also then not be --compression-method, rather than --compress-method?

Not a problem. To be very transparent, I first looked what was already out there.
For example `tar` is using
-I, --use-compress-program=PROG
yet the 'use-' bit would push the alignment of the --help output, so I removed it.

To me, as a non native English speaker, `--compression-method` does sound better.
I can just re-align the rest of the help output.

Updated patch is on the making.

Show quoted text

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Magnus Hagander

Me: https://www.hagander.net/

Work: https://www.redpill-linpro.com/

magnus@hagander.net

almost 5 years ago

In reply to: Georgios Kokolatos (#5)

Re: Teach pg_receivewal to use lz4 compression

On Thu, Jul 1, 2021 at 3:39 PM <gkokolatos@pm.me> wrote:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, July 1st, 2021 at 12:28, Magnus Hagander <magnus@hagander.net> wrote:

On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar dilipbalaut@gmail.com wrote:

On Tue, Jun 29, 2021 at 8:15 PM gkokolatos@pm.me wrote:

Hi,

The program pg_receivewal can use gzip compression to store the received WAL.

This patch teaches it to be able to use lz4 compression if the binary is build

using the -llz4 flag.

+1 for the idea

Some comments/suggestions on the patch

@@ -90,7 +91,8 @@ usage(void)

printf((" --synchronous flush write-ahead log immediately

after writing\n"));

printf((" -v, --verbose output verbose messages\n"));

printf(_(" -V, --version output version information, then exit\n"));

- printf(_(" -Z, --compress=0-9 compress logs with given

compression level\n"));

- printf(_(" -I, --compress-program use this program for compression\n"));

Wouldn't it be better to call it compression method instead of

compression program?

I came here to say exactly that, just had to think up what I thought

was the better name first. Either method or algorithm, but method

seems like the much simpler choice and therefore better in this case.

Should is also then not be --compression-method, rather than --compress-method?

Not a problem. To be very transparent, I first looked what was already out there.
For example `tar` is using
-I, --use-compress-program=PROG
yet the 'use-' bit would push the alignment of the --help output, so I removed it.

I think the difference there is that tar actually calls an external
program to do the work... And we are using the built-in library,
right?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

gkokolatos@protonmail.com

almost 5 years ago

In reply to: Magnus Hagander (#6)

Re: Teach pg_receivewal to use lz4 compression

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, July 1st, 2021 at 15:58, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Jul 1, 2021 at 3:39 PM gkokolatos@pm.me wrote:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, July 1st, 2021 at 12:28, Magnus Hagander magnus@hagander.net wrote:

On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar dilipbalaut@gmail.com wrote:

On Tue, Jun 29, 2021 at 8:15 PM gkokolatos@pm.me wrote:

Hi,

The program pg_receivewal can use gzip compression to store the received WAL.

This patch teaches it to be able to use lz4 compression if the binary is build

using the -llz4 flag.

+1 for the idea

Some comments/suggestions on the patch

@@ -90,7 +91,8 @@ usage(void)

printf((" --synchronous flush write-ahead log immediately

after writing\n"));

printf((" -v, --verbose output verbose messages\n"));

printf(_(" -V, --version output version information, then exit\n"));

- printf(_(" -Z, --compress=0-9 compress logs with given

compression level\n"));

- printf(_(" -I, --compress-program use this program for compression\n"));

Wouldn't it be better to call it compression method instead of

compression program?

I came here to say exactly that, just had to think up what I thought

was the better name first. Either method or algorithm, but method

seems like the much simpler choice and therefore better in this case.

Should is also then not be --compression-method, rather than --compress-method?

Not a problem. To be very transparent, I first looked what was already out there.

For example `tar` is using

-I, --use-compress-program=PROG

yet the 'use-' bit would push the alignment of the --help output, so I removed it.

I think the difference there is that tar actually calls an external

program to do the work... And we are using the built-in library,

right?

You are very correct :) I am not objecting the change at all. Just let you know
how I chose that. You know, naming is dead easy and all...

On a more serious note, what about the `-I` short flag? Should we keep it or
is there a better one to be used?

Micheal suggested on the same thread to move my entry in the help output so that
the output remains ordered. I would like the options for the compression method and
the already existing compression level to next to each other if possible. Then it
should be either 'X' or 'Y'.

Thoughts?

Show quoted text

------------------------------------------------------------------------------------------------------------------------------------------------

Magnus Hagander

Me: https://www.hagander.net/

Work: https://www.redpill-linpro.com/

michael@paquier.xyz

almost 5 years ago

In reply to: Georgios Kokolatos (#7)

Re: Teach pg_receivewal to use lz4 compression

On Thu, Jul 01, 2021 at 02:10:17PM +0000, gkokolatos@pm.me wrote:

Micheal suggested on the same thread to move my entry in the help output so that
the output remains ordered. I would like the options for the compression method and
the already existing compression level to next to each other if possible. Then it
should be either 'X' or 'Y'.

Hmm. Grouping these together makes sense for the user. One choice
that we have here is to drop the short option, and just use a long
one. What I think is important for the user when it comes to this
option is the consistency of its naming across all the tools
supporting it. pg_dump and pg_basebackup, where we could plug LZ4,
already use most of the short options you could use for pg_receivewal,
having only a long one gives a bit more flexibility.
--
Michael

gkokolatos@protonmail.com

almost 5 years ago

In reply to: Michael Paquier (#8)

Re: Teach pg_receivewal to use lz4 compression

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, July 2nd, 2021 at 03:10, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jul 01, 2021 at 02:10:17PM +0000, gkokolatos@pm.me wrote:

Micheal suggested on the same thread to move my entry in the help output so that

the output remains ordered. I would like the options for the compression method and

the already existing compression level to next to each other if possible. Then it

should be either 'X' or 'Y'.

Hmm. Grouping these together makes sense for the user. One choice

that we have here is to drop the short option, and just use a long

one. What I think is important for the user when it comes to this

option is the consistency of its naming across all the tools

supporting it. pg_dump and pg_basebackup, where we could plug LZ4,

already use most of the short options you could use for pg_receivewal,

having only a long one gives a bit more flexibility.

Good point. I am going with that one.

Show quoted text

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Michael

#10

gkokolatos@protonmail.com

almost 5 years ago

In reply to: Georgios Kokolatos (#9)

Re: Teach pg_receivewal to use lz4 compression

Hi,

please find v2 of the patch which tries to address the commends received so far.

Thank you all for your comments.

Michael Paquier wrote:

Documentation is missing from the patch.

It has now been added.

+   LZ4F_compressionContext_t ctx;
+   size_t      outbufCapacity;
+   void       *outbuf;
It may be cleaner to refer to lz4 in the name of those variables?

Agreed and done

+       ctx_out = LZ4F_createCompressionContext(&ctx, LZ4F_VERSION);
+       outbufCapacity = LZ4F_compressBound(LZ4_IN_SIZE, NULL /* default preferences */);
Interesting.  So this cannot be done at compilation time because of
the auto-flush mode looking at the LZ4 code.  That looks about right.

This is also my understanding.

+   system_or_bail('lz4', '-t', $lz4_wals[0]);
I think that you should just drop this part of the test.  The only
part of LZ4 that we require to be present when Postgres is built with
--with-lz4 is its library liblz4.  Commands associated to it may not
be around, causing this test to fail.  The test checking that one .lz4
file has been created is good to have.  It may be worth adding a test
with a .lz4.partial segment generated and --endpos pointing to a LSN
that does not finish the segment that gets switched.

I humbly disagree with the need for the test. It is rather easily possible
to generate a file that can not be decoded, thus becoming useless. Having the
test will provide some guarantee for the fact. In the current patch, there
is code to find out if the program lz4 is available in the system. If it is
not available, then that specific test is skipped. The rest remains as it
were. Also `system_or_bail` is not used anymore in favour of the `system_log`
so that the test counted and the execution of tests continues upon failure.

It seems to me that you are missing some logic in
FindStreamingStart() to handle LZ4-compressed segments, in relation
with IsCompressXLogFileName() and IsPartialCompressXLogFileName().

Very correct. The logic is now added. Given the lz4 api, one has to fill
in the uncompressed size during creation time. Then one can read the
headers and verify the expectations.

Should we have more tests for ZLIB, while on it? That seems like a
good addition as long as we can skip the tests conditionally when
that's not supported.

Agreed. Please allow me to provide a distinct patch for this code.

Dilip Kumar wrote:

Wouldn't it be better to call it compression method instead of
compression program?

Agreed. This is inline with the suggestions of other reviewers.
Find the change in the attached patch.

I think we can somehow use "acceleration" parameter of lz4 compression
to map on compression level, It is not direct mapping but
can't we create some internal mapping instead of completely ignoring
this option for lz4, or we can provide another option for lz4?

We can, though I am not in favour of doing so. There is seemingly
little benefit for added complexity.

Should we also support LZ4 compression using dictionary?

I would we should not do that. If my understanding is correct,
decompression would require the dictionary to be passed along.
The algorithm seems to be very competitive to the current compression
as is.

Magnus Hagander wrote:

I came here to say exactly that, just had to think up what I thought
was the better name first. Either method or algorithm, but method
seems like the much simpler choice and therefore better in this case.

Should is also then not be --compression-method, rather than --compress-method?

Agreed and changed throughout.

Michael Paquier wrote:

What I think is important for the user when it comes to this
option is the consistency of its naming across all the tools
supporting it. pg_dump and pg_basebackup, where we could plug LZ4,
already use most of the short options you could use for pg_receivewal,
having only a long one gives a bit more flexibility.

Done.

Cheers,
//Georgios

#11

michael@paquier.xyz

almost 5 years ago

In reply to: Georgios Kokolatos (#10)

Re: Teach pg_receivewal to use lz4 compression

On Thu, Jul 08, 2021 at 02:18:40PM +0000, gkokolatos@pm.me wrote:

please find v2 of the patch which tries to address the commends
received so far.

Thanks!

Michael Paquier wrote:
+   system_or_bail('lz4', '-t', $lz4_wals[0]);
I think that you should just drop this part of the test.  The only
part of LZ4 that we require to be present when Postgres is built with
--with-lz4 is its library liblz4.  Commands associated to it may not
be around, causing this test to fail.  The test checking that one .lz4
file has been created is good to have.  It may be worth adding a test
with a .lz4.partial segment generated and --endpos pointing to a LSN
that does not finish the segment that gets switched.
I humbly disagree with the need for the test. It is rather easily possible
to generate a file that can not be decoded, thus becoming useless. Having the
test will provide some guarantee for the fact. In the current patch, there
is code to find out if the program lz4 is available in the system. If it is
not available, then that specific test is skipped. The rest remains as it
were. Also `system_or_bail` is not used anymore in favour of the `system_log`
so that the test counted and the execution of tests continues upon failure.

Check. I can see what you are using in the new patch. We could live
with that.

It seems to me that you are missing some logic in
FindStreamingStart() to handle LZ4-compressed segments, in relation
with IsCompressXLogFileName() and IsPartialCompressXLogFileName().

Very correct. The logic is now added. Given the lz4 api, one has to fill
in the uncompressed size during creation time. Then one can read the
headers and verify the expectations.

Yeah, I read that as well with lz4 --list and the kind. That's weird
compared to how ZLIB gives an easy access to it. We may want to do an
effort and tell about more lz4 --content-size/--list, telling that we
add the size in the segment compressed because we have to and LZ4 does
not do it by default?

Should we have more tests for ZLIB, while on it? That seems like a
good addition as long as we can skip the tests conditionally when
that's not supported.

Agreed. Please allow me to provide a distinct patch for this code.

Thanks. Looking forward to seeing it. That may be better on a
separate thread, even if I digressed in this thread :)

I think we can somehow use "acceleration" parameter of lz4 compression
to map on compression level, It is not direct mapping but
can't we create some internal mapping instead of completely ignoring
this option for lz4, or we can provide another option for lz4?

We can, though I am not in favour of doing so. There is seemingly
little benefit for added complexity.

Agreed.

What I think is important for the user when it comes to this
option is the consistency of its naming across all the tools
supporting it. pg_dump and pg_basebackup, where we could plug LZ4,
already use most of the short options you could use for pg_receivewal,
having only a long one gives a bit more flexibility.

Done.

* http://www.zlib.org/rfc-gzip.html.
+ * For lz4 compressed segments
*/
This comment is incomplete.

+#define IsLZ4CompressXLogFileName(fname)    \
+   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
+    strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN &&     \
+    strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
+#define IsLZ4PartialCompressXLogFileName(fname)    \
+   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
+    strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN &&     \
+    strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
This is getting complicated.  Would it be better to change this stuff
and switch to a routine that checks if a segment has a valid name, is
partial, and the type of compression that applied to it?  It seems to
me that we should group iszlibcompress and islz4compress together with
the options available through compression_method.

+   if (compresslevel != 0)
+   {
+       if (compression_method == COMPRESSION_NONE)
+       {
+           compression_method = COMPRESSION_ZLIB;
+       }
+       if (compression_method != COMPRESSION_ZLIB)
+       {
+           pg_log_error("cannot use --compress when "
+                        "--compression-method is not gzip");
+           fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+                   progname);
+           exit(1);
+       }
+   }
For compatibility where --compress enforces the use of zlib that would
work, but this needs a comment explaining the goal of this block.  I
am wondering if it would be better to break the will and just complain
when specifying --compress without --compression-method though.  That
would cause compatibility issues, but this would make folks aware of
the presence of LZ4, which does not sound bad to me either as ZLIB is
slower than LZ4 on all fronts.

+   else if (compression_method == COMPRESSION_ZLIB)
+   {
+       pg_log_error("cannot use --compression-method gzip when "
+                    "--compression is 0");
+       fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+               progname);
+       exit(1);
+   }
Hmm.  It would be more natural to enforce a default compression level
in this case?  The user is asking for a compression with zlib here.

+   my $lz4 = $ENV{LZ4};
[...]
+   # Verify that the stored file is readable if program lz4 is available
+   skip "program lz4 is not found in your system", 1
+     if (!defined $lz4 || $lz4 eq '');
Okay, this is acceptable.  Didn't know the existing trick with TAR
either.

+       /*
+        * XXX: this is crap... lz4preferences now does show the uncompressed
+        * size via lz4 --list <filename> but the compression goes down the
+        * window... also it is not very helpfull to have it at the startm, does
+        * it?
+        */
What do you mean here by "the compression goes out the window"?
--
Michael

#12

gkokolatos@protonmail.com

almost 5 years ago

In reply to: Michael Paquier (#11)

Re: Teach pg_receivewal to use lz4 compression

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, July 9th, 2021 at 04:49, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Jul 08, 2021 at 02:18:40PM +0000, gkokolatos@pm.me wrote:

please find v2 of the patch which tries to address the commends

received so far.

Thanks!

Michael Paquier wrote:

- system_or_bail('lz4', '-t', $lz4_wals[0]);

I think that you should just drop this part of the test. The only

part of LZ4 that we require to be present when Postgres is built with

--with-lz4 is its library liblz4. Commands associated to it may not

be around, causing this test to fail. The test checking that one .lz4

file has been created is good to have. It may be worth adding a test

with a .lz4.partial segment generated and --endpos pointing to a LSN

that does not finish the segment that gets switched.

I humbly disagree with the need for the test. It is rather easily possible

to generate a file that can not be decoded, thus becoming useless. Having the

test will provide some guarantee for the fact. In the current patch, there

is code to find out if the program lz4 is available in the system. If it is

not available, then that specific test is skipped. The rest remains as it

were. Also `system_or_bail` is not used anymore in favour of the `system_log`

so that the test counted and the execution of tests continues upon failure.

Check. I can see what you are using in the new patch. We could live

with that.

Great. Thank you.

It seems to me that you are missing some logic in

FindStreamingStart() to handle LZ4-compressed segments, in relation

with IsCompressXLogFileName() and IsPartialCompressXLogFileName().

Very correct. The logic is now added. Given the lz4 api, one has to fill

in the uncompressed size during creation time. Then one can read the

headers and verify the expectations.

Yeah, I read that as well with lz4 --list and the kind. That's weird

compared to how ZLIB gives an easy access to it. We may want to do an

effort and tell about more lz4 --content-size/--list, telling that we

add the size in the segment compressed because we have to and LZ4 does

not do it by default?

I am afraid I do not follow. In the patch we do add the uncompressed size
and then, the uncompressed size is checked against a known value WalSegSz.
What the compressed size will be checked against?

Should we have more tests for ZLIB, while on it? That seems like a

good addition as long as we can skip the tests conditionally when

that's not supported.

Agreed. Please allow me to provide a distinct patch for this code.

Thanks. Looking forward to seeing it. That may be better on a

separate thread, even if I digressed in this thread :)

Thank you for the comments. I will sent in a separate thread.

I think we can somehow use "acceleration" parameter of lz4 compression

to map on compression level, It is not direct mapping but

can't we create some internal mapping instead of completely ignoring

this option for lz4, or we can provide another option for lz4?

We can, though I am not in favour of doing so. There is seemingly

little benefit for added complexity.

Agreed.

What I think is important for the user when it comes to this

option is the consistency of its naming across all the tools

supporting it. pg_dump and pg_basebackup, where we could plug LZ4,

already use most of the short options you could use for pg_receivewal,

having only a long one gives a bit more flexibility.

Done.

* http://www.zlib.org/rfc-gzip.html.

- - For lz4 compressed segments

*/

This comment is incomplete.

It is. I will fix.

+#define IsLZ4CompressXLogFileName(fname) \
-   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
-   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
-   strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
+#define IsLZ4PartialCompressXLogFileName(fname) \
-   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
-   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
-   strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
This is getting complicated. Would it be better to change this stuff

and switch to a routine that checks if a segment has a valid name, is

partial, and the type of compression that applied to it? It seems to

me that we should group iszlibcompress and islz4compress together with

the options available through compression_method.

I agree with you. I will refactor.

- if (compresslevel != 0)
- {
- if (compression_method == COMPRESSION_NONE)

- {

- compression_method = COMPRESSION_ZLIB;

- }

- if (compression_method != COMPRESSION_ZLIB)

- {

- pg_log_error("cannot use --compress when "

- "--compression-method is not gzip");

- fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),

- progname);

- exit(1);

- }

- }

For compatibility where --compress enforces the use of zlib that would

work, but this needs a comment explaining the goal of this block. I

am wondering if it would be better to break the will and just complain

when specifying --compress without --compression-method though. That

would cause compatibility issues, but this would make folks aware of

the presence of LZ4, which does not sound bad to me either as ZLIB is

slower than LZ4 on all fronts.

I would vote to break the compatibility if that is an option. I chose the
less invasive approach thinking that breaking the compatibility would not
be an option.

Unless others object, I will include --compress as a complimentary option
to --compression-method in updated version of the patch.

- else if (compression_method == COMPRESSION_ZLIB)
- {
- pg_log_error("cannot use --compression-method gzip when "

- "--compression is 0");

- fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),

- progname);

- exit(1);

- }

Hmm. It would be more natural to enforce a default compression level

in this case? The user is asking for a compression with zlib here.

You are correct, in the current patch passing --compression-method=gzip alone
is equivalent to passing --compression=0 in the current master version. This
behaviour may be confusing for the user. What should the default compression
be then? I am inclined to say '5' as a compromise between speed and compression
ration.

- my $lz4 = $ENV{LZ4};

[...]
- Verify that the stored file is readable if program lz4 is available
===================================================================

- skip "program lz4 is not found in your system", 1
- if (!defined $lz4 || $lz4 eq '');

Okay, this is acceptable. Didn't know the existing trick with TAR

either.

Thank you.

- /*

- * XXX: this is crap... lz4preferences now does show the uncompressed

- * size via lz4 --list <filename> but the compression goes down the

- * window... also it is not very helpfull to have it at the startm, does

- * it?

- */

What do you mean here by "the compression goes out the window"?

Please consider me adequately embarrassed. This was a personal comment while I was
working on the code. It is not correct and it should have never seen the public
light.

Cheers,
//Georgios

Show quoted text

---------------------------------------------------------------

Michael

#13

Dilip Kumar

dilipbalaut@gmail.com

almost 5 years ago

In reply to: Georgios Kokolatos (#10)

Re: Teach pg_receivewal to use lz4 compression

On Thu, Jul 8, 2021 at 7:48 PM <gkokolatos@pm.me> wrote:

Dilip Kumar wrote:

Wouldn't it be better to call it compression method instead of
compression program?

Agreed. This is inline with the suggestions of other reviewers.
Find the change in the attached patch.

Thanks, I will have a look.

I think we can somehow use "acceleration" parameter of lz4 compression
to map on compression level, It is not direct mapping but
can't we create some internal mapping instead of completely ignoring
this option for lz4, or we can provide another option for lz4?

We can, though I am not in favour of doing so. There is seemingly
little benefit for added complexity.

I am really not sure what complexity you are talking about, do you
mean since with pglz we were already providing the compression level
so let it be as it is but with the new compression method you don't
see much benefit of providing compression level or speed?

Should we also support LZ4 compression using dictionary?

I would we should not do that. If my understanding is correct,
decompression would require the dictionary to be passed along.
The algorithm seems to be very competitive to the current compression
as is.

I agree, we might not go for a dictionary because we would need to
dictionary to decompress as well. So that will add an extra
complexity for user.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#14

michael@paquier.xyz

almost 5 years ago

In reply to: Dilip Kumar (#13)

Re: Teach pg_receivewal to use lz4 compression

On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:

On Thu, Jul 8, 2021 at 7:48 PM <gkokolatos@pm.me> wrote:

We can, though I am not in favour of doing so. There is seemingly
little benefit for added complexity.

I am really not sure what complexity you are talking about, do you
mean since with pglz we were already providing the compression level
so let it be as it is but with the new compression method you don't
see much benefit of providing compression level or speed?

You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has
in mind, but my opinion stands on the latter: there is little benefit
in making lz4 faster than the default and reduce compression, as the
default is already a rather low CPU user.
--
Michael

#15

gkokolatos@protonmail.com

almost 5 years ago

In reply to: Michael Paquier (#14)

Re: Teach pg_receivewal to use lz4 compression

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, July 12th, 2021 at 07:56, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:

On Thu, Jul 8, 2021 at 7:48 PM gkokolatos@pm.me wrote:

We can, though I am not in favour of doing so. There is seemingly

little benefit for added complexity.

I am really not sure what complexity you are talking about, do you

mean since with pglz we were already providing the compression level

so let it be as it is but with the new compression method you don't

see much benefit of providing compression level or speed?

You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has

in mind, but my opinion stands on the latter: there is little benefit

in making lz4 faster than the default and reduce compression, as the

default is already a rather low CPU user.

Thank you all for your comments. I am sitting on the same side as Micheal
on this one. The complexity is not huge, yet there will need to be code to
pass this option to the lz4 api and various test cases to verify for
correctness and integrity. The burden of maintenance of this code vs the
benefit of the option, tilt the scale towards not including the option.

Of course, I will happily provide whatever the community finds beneficial.

Cheers,
//Georgios

Show quoted text

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Michael

#16

magnus@hagander.net

almost 5 years ago

In reply to: Georgios Kokolatos (#15)

Re: Teach pg_receivewal to use lz4 compression

On Mon, Jul 12, 2021 at 11:33 AM <gkokolatos@pm.me> wrote:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, July 12th, 2021 at 07:56, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:

On Thu, Jul 8, 2021 at 7:48 PM gkokolatos@pm.me wrote:

We can, though I am not in favour of doing so. There is seemingly

little benefit for added complexity.

I am really not sure what complexity you are talking about, do you

mean since with pglz we were already providing the compression level

so let it be as it is but with the new compression method you don't

see much benefit of providing compression level or speed?

You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has

in mind, but my opinion stands on the latter: there is little benefit

in making lz4 faster than the default and reduce compression, as the

default is already a rather low CPU user.

Thank you all for your comments. I am sitting on the same side as Micheal
on this one. The complexity is not huge, yet there will need to be code to
pass this option to the lz4 api and various test cases to verify for
correctness and integrity. The burden of maintenance of this code vs the
benefit of the option, tilt the scale towards not including the option.

+1 for skipping that one, at least for now, and sticking to
default-only for lz4.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#17

Dilip Kumar

dilipbalaut@gmail.com

almost 5 years ago

In reply to: Magnus Hagander (#16)

Re: Teach pg_receivewal to use lz4 compression

On Mon, Jul 12, 2021 at 3:15 PM Magnus Hagander <magnus@hagander.net> wrote:

On Mon, Jul 12, 2021 at 11:33 AM <gkokolatos@pm.me> wrote:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, July 12th, 2021 at 07:56, Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:

On Thu, Jul 8, 2021 at 7:48 PM gkokolatos@pm.me wrote:

We can, though I am not in favour of doing so. There is seemingly

little benefit for added complexity.

I am really not sure what complexity you are talking about, do you

mean since with pglz we were already providing the compression level

so let it be as it is but with the new compression method you don't

see much benefit of providing compression level or speed?

You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has

in mind, but my opinion stands on the latter: there is little benefit

in making lz4 faster than the default and reduce compression, as the

default is already a rather low CPU user.

Thank you all for your comments. I am sitting on the same side as Micheal
on this one. The complexity is not huge, yet there will need to be code to
pass this option to the lz4 api and various test cases to verify for
correctness and integrity. The burden of maintenance of this code vs the
benefit of the option, tilt the scale towards not including the option.

+1 for skipping that one, at least for now, and sticking to
default-only for lz4.

Okay, fine with me as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#18

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#11)

Re: Teach pg_receivewal to use lz4 compression

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, July 9th, 2021 at 04:49, Michael Paquier <michael@paquier.xyz> wrote:

Hi,

please find v3 of the patch attached, rebased to the current head.

Michael Paquier wrote:

* http://www.zlib.org/rfc-gzip.html.

- - For lz4 compressed segments
*/
This comment is incomplete.

Fixed.

+#define IsLZ4CompressXLogFileName(fname) \
-   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
-   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
-   strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
+#define IsLZ4PartialCompressXLogFileName(fname) \
-   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
-   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
-   strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
This is getting complicated. Would it be better to change this stuff
and switch to a routine that checks if a segment has a valid name, is
partial, and the type of compression that applied to it? It seems to
me that we should group iszlibcompress and islz4compress together with
the options available through compression_method.

Agreed and done.

- if (compresslevel != 0)
- {
- if (compression_method == COMPRESSION_NONE)
- {
- compression_method = COMPRESSION_ZLIB;
- }
- if (compression_method != COMPRESSION_ZLIB)
- {
- pg_log_error("cannot use --compress when "
- "--compression-method is not gzip");
- fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
- progname);
- exit(1);
- }
- }

For compatibility where --compress enforces the use of zlib that would
work, but this needs a comment explaining the goal of this block. I
am wondering if it would be better to break the will and just complain
when specifying --compress without --compression-method though. That
would cause compatibility issues, but this would make folks aware of
the presence of LZ4, which does not sound bad to me either as ZLIB is
slower than LZ4 on all fronts.

Fair point. In v3 of the patch --compress requires --compression-method. Passing
0 as value errors out.

- else if (compression_method == COMPRESSION_ZLIB)
- {
- pg_log_error("cannot use --compression-method gzip when "
- "--compression is 0");
- fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
- progname);
- exit(1);
- }

Hmm. It would be more natural to enforce a default compression level
in this case? The user is asking for a compression with zlib here.

Agreed. A default value of 5, which is in the middle point of options, has been
defined and used.

In addition, the tests have been adjusted to mimic the newly added gzip tests.

Cheers,
//Georgios

Show quoted text

---------------------------------------------------------------

Michael

#19

Jian Guo

gjian@vmware.com

over 4 years ago

In reply to: Georgios Kokolatos (#18)

Re: Teach pg_receivewal to use lz4 compression

@@ -250,14 +302,18 @@ FindStreamingStart(uint32 *tli)
 		/*
 		 * Check that the segment has the right size, if it's supposed to be
 		 * completed.  For non-compressed segments just check the on-disk size
-		 * and see if it matches a completed segment. For compressed segments,
-		 * look at the last 4 bytes of the compressed file, which is where the
-		 * uncompressed size is located for gz files with a size lower than
-		 * 4GB, and then compare it to the size of a completed segment. The 4
-		 * last bytes correspond to the ISIZE member according to
+		 * and see if it matches a completed segment. For zlib compressed
+		 * segments, look at the last 4 bytes of the compressed file, which is
+		 * where the uncompressed size is located for gz files with a size lower
+		 * than 4GB, and then compare it to the size of a completed segment.
+		 * The 4 last bytes correspond to the ISIZE member according to
 		 * http://www.zlib.org/rfc-gzip.html.
+		 *
+		 * For lz4 compressed segments read the header using the exposed API and
+		 * compare the uncompressed file size, stored in
+		 * LZ4F_frameInfo_t{.contentSize}, to that of a completed segment.
 		 */
-		if (!ispartial && !iscompress)
+		if (!ispartial && wal_compression_method == COMPRESSION_NONE)
 		{
 			struct stat statbuf;
 			char		fullpath[MAXPGPATH * 2];
@@ -276,7 +332,7 @@ FindStreamingStart(uint32 *tli)
 				continue;
 			}
 		}
-		else if (!ispartial && iscompress)
+		else if (!ispartial && wal_compression_method == COMPRESSION_ZLIB)
 		{
 			int			fd;
 			char		buf[4];
@@ -322,6 +378,70 @@ FindStreamingStart(uint32 *tli)
 				continue;
 			}
 		}
+		else if (!ispartial && compression_method == COMPRESSION_LZ4)
+		{
+#ifdef HAVE_LIBLZ4
+			int			fd;
+			int			r;
+			size_t		consumed_len = LZ4F_HEADER_SIZE_MAX;
+			char	    buf[LZ4F_HEADER_SIZE_MAX];
+			char		fullpath[MAXPGPATH * 2];
+			LZ4F_frameInfo_t frame_info = { 0 };
+			LZ4F_decompressionContext_t ctx = NULL;
+
+			snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, dirent->d_name);
+
+			fd = open(fullpath, O_RDONLY | PG_BINARY, 0);

Should close the fd before exit or abort.

#20

gkokolatos@protonmail.com

over 4 years ago

In reply to: Jian Guo (#19)

Re: Teach pg_receivewal to use lz4 compression

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, September 11th, 2021 at 07:02, Jian Guo <gjian@vmware.com> wrote:

Hi,

thank you for looking at the patch.

- LZ4F_decompressionContext_t ctx = NULL;
- snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, dirent->d_name);
- fd = open(fullpath, O_RDONLY | PG_BINARY, 0);

Should close the fd before exit or abort.

You are correct. Please find version 4 attached.

Cheers,
//Georgios

#21

michael@paquier.xyz

over 4 years ago

In reply to: Georgios Kokolatos (#18)

#22

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#21)

#23

michael@paquier.xyz

over 4 years ago

In reply to: Georgios Kokolatos (#22)

#24

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#23)

#25

michael@paquier.xyz

over 4 years ago

In reply to: Georgios Kokolatos (#24)

#26

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#25)

#27

michael@paquier.xyz

over 4 years ago

In reply to: Georgios Kokolatos (#26)

#28

michael@paquier.xyz

over 4 years ago

In reply to: Michael Paquier (#27)

#29

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#28)

#30

michael@paquier.xyz

over 4 years ago

In reply to: Georgios Kokolatos (#29)

#31

michael@paquier.xyz

over 4 years ago

In reply to: Michael Paquier (#30)

#32

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#31)

#33

magnus@hagander.net

over 4 years ago

In reply to: Michael Paquier (#31)

#34

robertmhaas@gmail.com

over 4 years ago

In reply to: Magnus Hagander (#33)

#35

michael@paquier.xyz

over 4 years ago

In reply to: Robert Haas (#34)

#36

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#35)

#37

gkokolatos@protonmail.com

over 4 years ago

In reply to: Georgios Kokolatos (#36)

#38

michael@paquier.xyz

over 4 years ago

In reply to: Georgios Kokolatos (#37)

#39

michael@paquier.xyz

over 4 years ago

In reply to: Michael Paquier (#38)

#40

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#39)

#41

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#39)

#42

michael@paquier.xyz

over 4 years ago

In reply to: Georgios Kokolatos (#41)

#43

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#42)

#44

Jeevan Ladhe

jeevan.ladhe@enterprisedb.com

over 4 years ago

In reply to: Georgios Kokolatos (#43)

#45

michael@paquier.xyz

over 4 years ago

In reply to: Jeevan Ladhe (#44)

#46

gkokolatos@protonmail.com

over 4 years ago

In reply to: Michael Paquier (#45)

#47

Jeevan Ladhe

jeevan.ladhe@enterprisedb.com

over 4 years ago

In reply to: Michael Paquier (#45)

#48

robertmhaas@gmail.com

over 4 years ago

In reply to: Jeevan Ladhe (#47)

#49

michael@paquier.xyz

over 4 years ago

In reply to: Robert Haas (#48)

#50

Jeevan Ladhe

jeevan.ladhe@enterprisedb.com

over 4 years ago

In reply to: Michael Paquier (#49)

#51

robertmhaas@gmail.com

over 4 years ago

In reply to: Michael Paquier (#42)

#52

michael@paquier.xyz

over 4 years ago

In reply to: Robert Haas (#51)

#53

robertmhaas@gmail.com

over 4 years ago

In reply to: Michael Paquier (#52)

#54

Justin Pryzby

pryzby@telsasoft.com

about 4 years ago

In reply to: Michael Paquier (#52)

#55