pgsql: Add LZ4 compression to pg_dump

Started by Tomas Vondraalmost 3 years ago11 messages
#1Tomas Vondra
tomas.vondra@postgresql.org

Add LZ4 compression to pg_dump

Expand pg_dump's compression streaming and file APIs to support the lz4
algorithm. The newly added compress_lz4.{c,h} files cover all the
functionality of the aforementioned APIs. Minor changes were necessary
in various pg_backup_* files, where code for the 'lz4' file suffix has
been added, as well as pg_dump's compression option parsing.

Author: Georgios Kokolatos
Reviewed-by: Michael Paquier, Rachel Heaton, Justin Pryzby, Shi Yu, Tomas Vondra
Discussion: /messages/by-id/faUNEOpts9vunEaLnmxmG-DldLSg_ql137OC3JYDmgrOMHm1RvvWY2IdBkv_CRxm5spCCb_OmKNk2T03TMm0fBEWveFF9wA1WizPuAgB7Ss=@protonmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/0da243fed0875932f781aff08df782b56af58d02

Modified Files
--------------
doc/src/sgml/ref/pg_dump.sgml | 13 +-
src/bin/pg_dump/Makefile | 2 +
src/bin/pg_dump/compress_io.c | 26 +-
src/bin/pg_dump/compress_lz4.c | 626 ++++++++++++++++++++++++++++++++++
src/bin/pg_dump/compress_lz4.h | 24 ++
src/bin/pg_dump/meson.build | 8 +-
src/bin/pg_dump/pg_backup_archiver.c | 6 +-
src/bin/pg_dump/pg_backup_directory.c | 9 +-
src/bin/pg_dump/pg_dump.c | 5 +-
src/bin/pg_dump/t/002_pg_dump.pl | 82 ++++-
src/tools/pginclude/cpluspluscheck | 1 +
src/tools/pgindent/typedefs.list | 2 +
12 files changed, 782 insertions(+), 22 deletions(-)

#2Christoph Berg
myon@debian.org
In reply to: Tomas Vondra (#1)
lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

Re: Tomas Vondra

Add LZ4 compression to pg_dump

This broke the TAP tests on Ubuntu 18.04 (bionic):

[17:06:45.513](0.000s) ok 1927 - compression_lz4_custom: should not dump test_table with 4-row INSERTs
# Running: pg_dump --jobs=2 --format=directory --compress=lz4:1 --file=/home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir postgres
[17:06:46.651](1.137s) ok 1928 - compression_lz4_dir: pg_dump runs
# Running: /usr/bin/lz4 -z -f --rm /home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir/blobs.toc /home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir/blobs.toc.lz4
Incorrect parameters
Usage :
/usr/bin/lz4 [arg] [input] [output]

input : a filename
with no FILE, or when FILE is - or stdin, read standard input
Arguments :
-1 : Fast compression (default)
-9 : High compression
-d : decompression (default for .lz4 extension)
-z : force compression
-f : overwrite output without prompting
-h/-H : display help/long help and exit
[17:06:46.667](0.016s) not ok 1929 - compression_lz4_dir: compression commands
[17:06:46.668](0.001s)
[17:06:46.668](0.001s) # Failed test 'compression_lz4_dir: compression commands'
[17:06:46.669](0.000s) # at t/002_pg_dump.pl line 4274.
[17:06:46.670](0.001s) ok 1930 - compression_lz4_dir: glob check for /home/myon/projects/postgresql/pg/master/build/src/bin/pg_dump/tmp_check/tmp_test__aAO/compression_lz4_dir/toc.dat

The lz4 binary there doesn't have the --rm option yet.

liblz4-tool 0.0~r131-2ubuntu3

--rm appears in a single place only:

# Give coverage for manually compressed blob.toc files during
# restore.
compress_cmd => {
program => $ENV{'LZ4'},
args => [
'-z', '-f', '--rm',
"$tempdir/compression_lz4_dir/blobs.toc",
"$tempdir/compression_lz4_dir/blobs.toc.lz4",
],
},

18.04 will be EOL in a few weeks so it might be ok to just say it's
not supported, but removing the input file manually after calling lz4
would be an easy fix.

Christoph

#3Daniel Gustafsson
daniel@yesql.se
In reply to: Christoph Berg (#2)
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

On 8 Mar 2023, at 18:55, Christoph Berg <myon@debian.org> wrote:

18.04 will be EOL in a few weeks so it might be ok to just say it's
not supported, but removing the input file manually after calling lz4
would be an easy fix.

Is it reasonable to expect that this version of LZ4 can/will appear on any
other platform outside of archeology? Removing the file manually would be a
trivial way to stabilize but if it's only expected to happen on platforms which
are long since EOL by the time 16 ships then the added complication could be
hard to justify.

--
Daniel Gustafsson

#4Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Daniel Gustafsson (#3)
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

On 3/8/23 20:20, Daniel Gustafsson wrote:

On 8 Mar 2023, at 18:55, Christoph Berg <myon@debian.org> wrote:

18.04 will be EOL in a few weeks so it might be ok to just say it's
not supported, but removing the input file manually after calling lz4
would be an easy fix.

Is it reasonable to expect that this version of LZ4 can/will appear on any
other platform outside of archeology? Removing the file manually would be a
trivial way to stabilize but if it's only expected to happen on platforms which
are long since EOL by the time 16 ships then the added complication could be
hard to justify.

IMO we should fix that. We have a bunch of buildfarm members running on
Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
tests. But considering how trivial the fix is ...

Barring objections, I'll push a fix early next week.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Michael Paquier
michael@paquier.xyz
In reply to: Tomas Vondra (#4)
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:

IMO we should fix that. We have a bunch of buildfarm members running on
Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
tests. But considering how trivial the fix is ...

Barring objections, I'll push a fix early next week.

+1, better to change that, thanks.  Actually, would --rm be OK even on
Windows?  As far as I can see, the CI detects a LZ4 command for the
VS2019 environment but not the liblz4 libraries that would be needed
to trigger the set of tests.
--
Michael
#6Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Michael Paquier (#5)
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

On 3/9/23 01:30, Michael Paquier wrote:

On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:

IMO we should fix that. We have a bunch of buildfarm members running on
Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
tests. But considering how trivial the fix is ...

Barring objections, I'll push a fix early next week.

+1, better to change that, thanks. Actually, would --rm be OK even on
Windows? As far as I can see, the CI detects a LZ4 command for the
VS2019 environment but not the liblz4 libraries that would be needed
to trigger the set of tests.

Thanks for noticing that. I'll investigate next week.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Tomas Vondra (#6)
1 attachment(s)
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

On 3/9/23 19:00, Tomas Vondra wrote:

On 3/9/23 01:30, Michael Paquier wrote:

On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:

IMO we should fix that. We have a bunch of buildfarm members running on
Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
tests. But considering how trivial the fix is ...

Barring objections, I'll push a fix early next week.

+1, better to change that, thanks. Actually, would --rm be OK even on
Windows? As far as I can see, the CI detects a LZ4 command for the
VS2019 environment but not the liblz4 libraries that would be needed
to trigger the set of tests.

Thanks for noticing that. I'll investigate next week.

So, here's a fix that should (I think) replace the 'lz4 --rm' with a
simple 'rm'. I have two doubts about this, though:

1) I haven't found a simple way to inject additional command into the
test. The pg_dump runs have a predefined list of "steps" to run:

-- compress_cmd
-- glob_patterns
-- command_like
-- restore_cmd

and I don't think there's a good place to inject the 'rm' so I ended up
adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit
strange / hacky. Maybe there's a better way?

2) I wonder if Windows will know what 'rm' means. I haven't found any
TAP test doing 'rm' and don't see 'rm' in any $ENV either.

That being said, I have no idea how to make this work on our Windows CI.
As mentioned, the environment is missing the lz4 library - there's a

setup_additional_packages_script: |
REM choco install -y --no-progress ...

in the .yml file, but AFAICS the chocolatey does not have lz4 :-/

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

lz4-rm-tap-test.patchtext/x-patch; charset=UTF-8; name=lz4-rm-tap-test.patchDownload
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index 9c354213ce..b3dcf6ff6d 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -178,11 +178,18 @@ my %pgdump_runs = (
 		compress_cmd => {
 			program => $ENV{'LZ4'},
 			args    => [
-				'-z', '-f', '--rm',
+				'-z', '-f',
 				"$tempdir/compression_lz4_dir/blobs.toc",
 				"$tempdir/compression_lz4_dir/blobs.toc.lz4",
 			],
 		},
+		# remove the source (uncompressed) .toc file
+		cleanup_cmd => {
+			program => 'rm',
+			args    => [
+				"$tempdir/compression_lz4_dir/blobs.toc",
+			],
+		},
 		# Verify that data files were compressed
 		glob_patterns => [
 			"$tempdir/compression_lz4_dir/toc.dat",
@@ -4274,6 +4281,20 @@ foreach my $run (sort keys %pgdump_runs)
 		command_ok(\@full_compress_cmd, "$run: compression commands");
 	}
 
+	if ($pgdump_runs{$run}->{cleanup_cmd})
+	{
+		my ($cleanup_cmd) = $pgdump_runs{$run}->{cleanup_cmd};
+		my $cleanup_program = $cleanup_cmd->{program};
+
+		# Skip the rest of the test if the compression program is
+		# not defined.
+		next if (!defined($cleanup_cmd) || $cleanup_cmd eq '');
+
+		my @full_cleanup_cmd =
+		  ($cleanup_cmd->{program}, @{ $cleanup_cmd->{args} });
+		command_ok(\@full_cleanup_cmd, "$run: cleanup commands");
+	}
+
 	if ($pgdump_runs{$run}->{glob_patterns})
 	{
 		my $glob_patterns = $pgdump_runs{$run}->{glob_patterns};
#8Christoph Berg
myon@debian.org
In reply to: Tomas Vondra (#7)
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

Re: Tomas Vondra

and I don't think there's a good place to inject the 'rm' so I ended up
adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit
strange / hacky. Maybe there's a better way?

Does the file need to be removed at all? Could we leave it there and
have "make clean" remove it?

Christoph

#9Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Christoph Berg (#8)
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

On 3/14/23 11:34, Christoph Berg wrote:

Re: Tomas Vondra

and I don't think there's a good place to inject the 'rm' so I ended up
adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit
strange / hacky. Maybe there's a better way?

Does the file need to be removed at all? Could we leave it there and
have "make clean" remove it?

I don't think that'd work, because of the automatic "discovery" where we
check if a file exists, and if not we try to append .gz and .lz4. So if
you leave the .toc, we'd not find the .lz4, making the test useless ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#7)
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

On Tue, Mar 14, 2023 at 12:16:16AM +0100, Tomas Vondra wrote:

On 3/9/23 19:00, Tomas Vondra wrote:

On 3/9/23 01:30, Michael Paquier wrote:

On Thu, Mar 09, 2023 at 12:39:08AM +0100, Tomas Vondra wrote:

IMO we should fix that. We have a bunch of buildfarm members running on
Ubuntu 18.04 (or older) - it's true none of them seems to be running TAP
tests. But considering how trivial the fix is ...

Barring objections, I'll push a fix early next week.

+1, better to change that, thanks. Actually, would --rm be OK even on
Windows? As far as I can see, the CI detects a LZ4 command for the
VS2019 environment but not the liblz4 libraries that would be needed
to trigger the set of tests.

Thanks for noticing that. I'll investigate next week.

So, here's a fix that should (I think) replace the 'lz4 --rm' with a
simple 'rm'. I have two doubts about this, though:

1) I haven't found a simple way to inject additional command into the
test. The pg_dump runs have a predefined list of "steps" to run:

-- compress_cmd
-- glob_patterns
-- command_like
-- restore_cmd

and I don't think there's a good place to inject the 'rm' so I ended up
adding a 'cleanup_cmd' right after 'compress_cmd'. But it seems a bit
strange / hacky. Maybe there's a better way?

I don't know if there's a better way, and I don't think it's worth
complicating the tests by more than about 2 lines to handle this.

2) I wonder if Windows will know what 'rm' means. I haven't found any
TAP test doing 'rm' and don't see 'rm' in any $ENV either.

CI probably will, since it's Andres' image built with git-tools and
other helpful stuff installed. But in general I think it won't; perl is
being used for all the portable stuff.

*If* you wanted to do something to fix this, you could create a key
called files_to_remove_after_loading, and run unlink on those files
rather than running a shell command. Or maybe just remove the file
unconditionally at the start of the script ?

That being said, I have no idea how to make this work on our Windows CI.
As mentioned, the environment is missing the lz4 library - there's a

setup_additional_packages_script: |
REM choco install -y --no-progress ...

in the .yml file, but AFAICS the chocolatey does not have lz4 :-/

I updated what I'd done in the zstd patch to also run with LZ4.
This won't apply directly due to other patches, but you get the idea...

Maybe it'd be good to have a commented-out "wraps" hint like there is
for choco. The downloaded files could be cached, too.

diff --git a/.cirrus.yml b/.cirrus.yml
index a3977a4036e..b4387a739f3 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -644,9 +644,11 @@ task:
     vcvarsall x64
     mkdir subprojects
     meson wrap install zstd
-    meson configure -D zstd:multithread=enabled --force-fallback-for=zstd
+    meson wrap install lz4
+    meson subprojects download
+    meson configure -D zstd:multithread=enabled --force-fallback-for=zstd --force-fallback-for=lz4
     set CC=c:\ProgramData\chocolatey\lib\ccache\tools\ccache-4.8-windows-x86_64\ccache.exe cl.exe
-    meson setup --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Db_pch=false -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" -D zstd=enabled -Dc_args="/Z7 /MDd" build
+    meson setup --backend ninja --buildtype debug -Dc_link_args=/DEBUG:FASTLINK -Dcassert=true -Db_pch=false -Dextra_lib_dirs=c:\openssl\1.1\lib -Dextra_include_dirs=c:\openssl\1.1\include -DTAR=%TAR% -DPG_TEST_EXTRA="%PG_TEST_EXTRA%" -D zstd=enabled -D lz4=enabled -Dc_args="/Z7 /MDd" build

build_script: |
vcvarsall x64

--
Justin

#11Daniel Gustafsson
daniel@yesql.se
In reply to: Justin Pryzby (#10)
Re: lz4 --rm on Ubuntu 18.04 (Add LZ4 compression to pg_dump)

On 6 Apr 2023, at 18:39, Justin Pryzby <pryzby@telsasoft.com> wrote:

*If* you wanted to do something to fix this, you could create a key
called files_to_remove_after_loading, and run unlink on those files
rather than running a shell command. Or maybe just remove the file
unconditionally at the start of the script ?

Since the test is written in Perl, and Perl has a function for deleting files
which abstracts the platform differences, using it seems like a logical choice?
{cleanup_cmd} can be replaced with {cleanup_files} with an unlink called on
that?

--
Daniel Gustafsson