wal_compression=zstd

Started by Justin Pryzbyalmost 4 years ago13 messages
#1Justin Pryzby
pryzby@telsasoft.com
6 attachment(s)

Since 4035cd5d4, wal_compression=lz4 is supported. I think pg15 should also
support wal_compression=zstd. There are free bits in the WAL header.

The last message on that thread includes a patch doing just that, which I've
rebased.
/messages/by-id/YNqWd2GSMrnqWIfx@paquier.xyz

It might be nice if to also support wal_compression=zstd-level, but that seems
optional and could wait for pg16...

In [0]/messages/by-id/20210614012412.GA31772@telsasoft.com, I showed a case where lz4 is just as fast as uncompressed data, and
writes ~60% as much. zstd writes half as much as LZ4 and 12% slower.
[0]: /messages/by-id/20210614012412.GA31772@telsasoft.com

I am not claiming that zstd is generally better for WAL. Rather, if we're
going to support alternate compression methods, it's nice to give a couple
options (in addition to pglz). Some use cases would certainly suffer from
slower WAL. But providing the option will benefit some others. Note that a
superuser can set wal_compression for a given session - this would probably
benefit bulk-loading in an otherwise OLTP environment.

As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
level is 6).

0001 adds support for zstd
0002 makes more efficient our use of bits in the WAL header
0003 makes it the default compression, to exercise on CI (windows will fail).
0004 adds support for zstd-level
0005-6 are needed to allow recovery tests to pass when wal compression is enabled;
0007 (not included) also adds support for zlib. I'm of the impression nobody
cares about this, otherwise it would've been included 5-10 years ago.

Only 0001 should be reviewed for pg15 - the others are optional/future work.

Attachments:

0001-add-wal_compression-zstd.patchtext/x-diff; charset=us-asciiDownload
From 9253013c789ffb121272bfeeaa9dcdebbef79ced Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 18 Feb 2022 22:54:18 -0600
Subject: [PATCH 1/6] add wal_compression=zstd

---
 doc/src/sgml/config.sgml                      |  9 +++---
 doc/src/sgml/installation.sgml                | 10 +++++++
 src/backend/access/transam/xloginsert.c       | 30 ++++++++++++++++++-
 src/backend/access/transam/xlogreader.c       | 20 +++++++++++++
 src/backend/utils/misc/guc.c                  |  3 ++
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/bin/pg_waldump/pg_waldump.c               |  2 ++
 src/include/access/xlog.h                     |  3 +-
 src/include/access/xlogrecord.h               |  5 +++-
 9 files changed, 76 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ed8c82a9d..97740c7b66 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3154,10 +3154,11 @@ include_dir 'conf.d'
         server compresses full page images written to WAL when
         <xref linkend="guc-full-page-writes"/> is on or during a base backup.
         A compressed page image will be decompressed during WAL replay.
-        The supported methods are <literal>pglz</literal> and
-        <literal>lz4</literal> (if <productname>PostgreSQL</productname> was
-        compiled with <option>--with-lz4</option>). The default value is
-        <literal>off</literal>. Only superusers can change this setting.
+        The supported methods are <literal>pglz</literal>, and
+        (if configured when <productname>PostgreSQL</productname> was built) 
+        <literal>lz4</literal> and zstd.
+        The default value is <literal>off</literal>.
+        Only superusers can change this setting.
        </para>
 
        <para>
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 0f74252590..d32767b229 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -271,6 +271,14 @@ su - postgres
      </para>
     </listitem>
 
+    <listitem>
+     <para>
+      The <productname>ZSTD</productname> library can be used to enable
+      compression using that method; see
+      <xref linkend="guc-wal-compression"/>.
+     </para>
+    </listitem>
+
     <listitem>
      <para>
       To build the <productname>PostgreSQL</productname> documentation,
@@ -988,6 +996,8 @@ build-postgresql:
        <listitem>
         <para>
          Build with <productname>ZSTD</productname> compression support.
+         This enables use of <productname>ZSTD</productname> for
+         compression of WAL data.
         </para>
        </listitem>
       </varlistentry>
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index c260310c4c..847ce0c3fc 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -44,9 +44,17 @@
 #define LZ4_MAX_BLCKSZ		0
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#define ZSTD_MAX_BLCKSZ		ZSTD_COMPRESSBOUND(BLCKSZ)
+#else
+#define ZSTD_MAX_BLCKSZ		0
+#endif
+
+/* Buffer size required to store a compressed version of backup block image */
 #define PGLZ_MAX_BLCKSZ		PGLZ_MAX_OUTPUT(BLCKSZ)
 
-#define COMPRESS_BUFSIZE	Max(PGLZ_MAX_BLCKSZ, LZ4_MAX_BLCKSZ)
+#define COMPRESS_BUFSIZE	Max(Max(PGLZ_MAX_BLCKSZ, LZ4_MAX_BLCKSZ), ZSTD_MAX_BLCKSZ)
 
 /*
  * For each block reference registered with XLogRegisterBuffer, we fill in
@@ -695,6 +703,14 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 #endif
 						break;
 
+					case WAL_COMPRESSION_ZSTD:
+#ifdef USE_ZSTD
+						bimg.bimg_info |= BKPIMAGE_COMPRESS_ZSTD;
+#else
+						elog(ERROR, "ZSTD is not supported by this build");
+#endif
+						break;
+
 					case WAL_COMPRESSION_NONE:
 						Assert(false);	/* cannot happen */
 						break;
@@ -903,6 +919,18 @@ XLogCompressBackupBlock(char *page, uint16 hole_offset, uint16 hole_length,
 #endif
 			break;
 
+		case WAL_COMPRESSION_ZSTD:
+#ifdef USE_ZSTD
+			/* Uses level=1, not ZSTD_CLEVEL_DEFAULT */
+			len = ZSTD_compress(dest, COMPRESS_BUFSIZE, source, orig_len,
+								1);
+			if (ZSTD_isError(len))
+				len = -1;
+#else
+			elog(ERROR, "ZSTD is not supported by this build");
+#endif
+			break;
+
 		case WAL_COMPRESSION_NONE:
 			Assert(false);		/* cannot happen */
 			break;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 35029cf97d..d60e4cbaf6 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -21,6 +21,9 @@
 #ifdef USE_LZ4
 #include <lz4.h>
 #endif
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
 
 #include "access/transam.h"
 #include "access/xlog_internal.h"
@@ -1618,6 +1621,23 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 								  "LZ4",
 								  block_id);
 			return false;
+#endif
+		}
+		else if ((bkpb->bimg_info & BKPIMAGE_COMPRESS_ZSTD) != 0)
+		{
+#ifdef USE_ZSTD
+			size_t decomp_result = ZSTD_decompress(tmp.data,
+												   BLCKSZ-bkpb->hole_length,
+												   ptr, bkpb->bimg_len);
+
+			if (ZSTD_isError(decomp_result))
+				decomp_success = false;
+#else
+			report_invalid_record(record, "image at %X/%X compressed with %s not supported by build, block %d",
+								  LSN_FORMAT_ARGS(record->ReadRecPtr),
+								  "zstd",
+								  block_id);
+			return false;
 #endif
 		}
 		else
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e4afd07bfe..66e6d664b0 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -550,6 +550,9 @@ static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
 #ifdef USE_LZ4
 	{"lz4", WAL_COMPRESSION_LZ4, false},
+#endif
+#ifdef USE_ZSTD
+	{"zstd", WAL_COMPRESSION_ZSTD, false},
 #endif
 	{"on", WAL_COMPRESSION_PGLZ, false},
 	{"off", WAL_COMPRESSION_NONE, false},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4a094bb38b..4cf5b26a36 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -220,7 +220,7 @@
 #wal_log_hints = off			# also do full page writes of non-critical updates
 					# (change requires restart)
 #wal_compression = off			# enables compression of full-page writes;
-					# off, pglz, lz4, or on
+					# off, pglz, lz4, zstd, or on
 #wal_init_zero = on			# zero-fill new WAL files
 #wal_recycle = on			# recycle WAL files
 #wal_buffers = -1			# min 32kB, -1 sets based on shared_buffers
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index a6251e1a96..c448658ea2 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -565,6 +565,8 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
 						method = "pglz";
 					else if ((bimg_info & BKPIMAGE_COMPRESS_LZ4) != 0)
 						method = "lz4";
+					else if ((bimg_info & BKPIMAGE_COMPRESS_ZSTD) != 0)
+						method = "zstd";
 					else
 						method = "unknown";
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4b45ac64db..09f6464331 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -75,7 +75,8 @@ typedef enum WalCompression
 {
 	WAL_COMPRESSION_NONE = 0,
 	WAL_COMPRESSION_PGLZ,
-	WAL_COMPRESSION_LZ4
+	WAL_COMPRESSION_LZ4,
+	WAL_COMPRESSION_ZSTD
 } WalCompression;
 
 /* Recovery states */
diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index c1b1137aa7..052ac6817a 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -149,8 +149,11 @@ typedef struct XLogRecordBlockImageHeader
 /* compression methods supported */
 #define BKPIMAGE_COMPRESS_PGLZ	0x04
 #define BKPIMAGE_COMPRESS_LZ4	0x08
+#define BKPIMAGE_COMPRESS_ZSTD	0x10
+
 #define	BKPIMAGE_COMPRESSED(info) \
-	((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_LZ4)) != 0)
+	((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_LZ4 | \
+			  BKPIMAGE_COMPRESS_ZSTD)) != 0)
 
 /*
  * Extra header information used when page image has "hole" and
-- 
2.17.1

0002-Save-bits-in-xlog-FPI-record.patchtext/x-diff; charset=us-asciiDownload
From 75a3fa4e68235f956f8761e43adf25be2fc9147c Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sat, 23 Oct 2021 10:41:19 -0500
Subject: [PATCH 2/6] Save bits in xlog FPI record..

2 bits can support 4 methods (including 'none'), and 3 bits can support 8

See also:
https://www.postgresql.org/message-id/20220131222800.GY23027@telsasoft.com
---
 src/backend/access/transam/xloginsert.c | 29 +++----------------------
 src/backend/access/transam/xlogreader.c |  7 +++---
 src/bin/pg_waldump/pg_waldump.c         | 16 +++++---------
 src/include/access/xlogrecord.h         | 17 ++++++++++-----
 4 files changed, 23 insertions(+), 46 deletions(-)

diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 847ce0c3fc..5f9d07156a 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -689,33 +689,10 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 				bimg.length = compressed_len;
 
 				/* Set the compression method used for this block */
-				switch ((WalCompression) wal_compression)
-				{
-					case WAL_COMPRESSION_PGLZ:
-						bimg.bimg_info |= BKPIMAGE_COMPRESS_PGLZ;
-						break;
-
-					case WAL_COMPRESSION_LZ4:
-#ifdef USE_LZ4
-						bimg.bimg_info |= BKPIMAGE_COMPRESS_LZ4;
-#else
-						elog(ERROR, "LZ4 is not supported by this build");
-#endif
-						break;
+				Assert(wal_compression < (1 << BKPIMAGE_COMPRESS_BITS));
 
-					case WAL_COMPRESSION_ZSTD:
-#ifdef USE_ZSTD
-						bimg.bimg_info |= BKPIMAGE_COMPRESS_ZSTD;
-#else
-						elog(ERROR, "ZSTD is not supported by this build");
-#endif
-						break;
-
-					case WAL_COMPRESSION_NONE:
-						Assert(false);	/* cannot happen */
-						break;
-						/* no default case, so that compiler will warn */
-				}
+				bimg.bimg_info |=
+					wal_compression << BKPIMAGE_COMPRESS_OFFSET_BITS;
 
 				rdt_datas_last->data = regbuf->compressed_page;
 				rdt_datas_last->len = compressed_len;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index d60e4cbaf6..d2d882d0d1 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1602,14 +1602,15 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 	{
 		/* If a backup block image is compressed, decompress it */
 		bool		decomp_success = true;
+		int			compressid = BKPIMAGE_COMPRESSION(bkpb->bimg_info);
 
-		if ((bkpb->bimg_info & BKPIMAGE_COMPRESS_PGLZ) != 0)
+		if (compressid == WAL_COMPRESSION_PGLZ)
 		{
 			if (pglz_decompress(ptr, bkpb->bimg_len, tmp.data,
 								BLCKSZ - bkpb->hole_length, true) < 0)
 				decomp_success = false;
 		}
-		else if ((bkpb->bimg_info & BKPIMAGE_COMPRESS_LZ4) != 0)
+		else if (compressid == WAL_COMPRESSION_LZ4)
 		{
 #ifdef USE_LZ4
 			if (LZ4_decompress_safe(ptr, tmp.data,
@@ -1623,7 +1624,7 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 			return false;
 #endif
 		}
-		else if ((bkpb->bimg_info & BKPIMAGE_COMPRESS_ZSTD) != 0)
+		else if (compressid == WAL_COMPRESSION_ZSTD)
 		{
 #ifdef USE_ZSTD
 			size_t decomp_result = ZSTD_decompress(tmp.data,
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index c448658ea2..bec3fda2aa 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -18,6 +18,7 @@
 #include <unistd.h>
 
 #include "access/transam.h"
+#include "access/xlog.h"
 #include "access/xlog_internal.h"
 #include "access/xlogreader.h"
 #include "access/xlogrecord.h"
@@ -559,17 +560,10 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
 
 				if (BKPIMAGE_COMPRESSED(bimg_info))
 				{
-					const char *method;
-
-					if ((bimg_info & BKPIMAGE_COMPRESS_PGLZ) != 0)
-						method = "pglz";
-					else if ((bimg_info & BKPIMAGE_COMPRESS_LZ4) != 0)
-						method = "lz4";
-					else if ((bimg_info & BKPIMAGE_COMPRESS_ZSTD) != 0)
-						method = "zstd";
-					else
-						method = "unknown";
+					int compressid = BKPIMAGE_COMPRESSION(bimg_info);
+					const char *methods[] = {"none", "pglz", "lz4", "zstd"};
 
+					Assert(compressid < lengthof(methods));
 					printf(" (FPW%s); hole: offset: %u, length: %u, "
 						   "compression saved: %u, method: %s",
 						   XLogRecBlockImageApply(record, block_id) ?
@@ -579,7 +573,7 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
 						   BLCKSZ -
 						   record->blocks[block_id].hole_length -
 						   record->blocks[block_id].bimg_len,
-						   method);
+						   methods[compressid]);
 				}
 				else
 				{
diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index 052ac6817a..2080fb707f 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -147,13 +147,18 @@ typedef struct XLogRecordBlockImageHeader
 #define BKPIMAGE_APPLY			0x02	/* page image should be restored
 										 * during replay */
 /* compression methods supported */
-#define BKPIMAGE_COMPRESS_PGLZ	0x04
-#define BKPIMAGE_COMPRESS_LZ4	0x08
-#define BKPIMAGE_COMPRESS_ZSTD	0x10
-
+#define BKPIMAGE_COMPRESS_METHOD1	0x04	/* bits to encode compression method */
+#define BKPIMAGE_COMPRESS_METHOD2	0x08	/* 0=none, 1=pglz, 2=lz4, 3=zstd */
+
+/* How many bits to shift to extract compression */
+#define	BKPIMAGE_COMPRESS_OFFSET_BITS	2
+/* How many bits are for compression */
+#define	BKPIMAGE_COMPRESS_BITS			2
+/* Extract the compression from the bimg_info */
+#define	BKPIMAGE_COMPRESSION(info)		((info >> BKPIMAGE_COMPRESS_OFFSET_BITS) & ((1<<BKPIMAGE_COMPRESS_BITS) - 1))
+// #define	BKPIMAGE_COMPRESSED(info)	(BKPIMAGE_COMPRESSION(info) != 0)
 #define	BKPIMAGE_COMPRESSED(info) \
-	((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_LZ4 | \
-			  BKPIMAGE_COMPRESS_ZSTD)) != 0)
+	((info & (BKPIMAGE_COMPRESS_METHOD1 | BKPIMAGE_COMPRESS_METHOD2)) != 0)
 
 /*
  * Extra header information used when page image has "hole" and
-- 
2.17.1

0003-Default-to-zstd.patchtext/x-diff; charset=us-asciiDownload
From 118dbf303c39412d38c4105bc323927bffa97ca1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 12 Mar 2021 15:35:53 -0600
Subject: [PATCH 3/6] Default to zstd..

for CI, not for merge
---
 .cirrus.yml                          | 14 ++++++++++++++
 configure                            |  6 ++++--
 configure.ac                         |  2 +-
 src/backend/access/transam/xlog.c    |  2 +-
 src/backend/utils/misc/guc.c         |  2 +-
 src/tools/ci/windows_build_config.pl |  1 +
 6 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index d10b0a82f9..4735c67608 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -77,6 +77,7 @@ task:
     mkdir -m 770 /tmp/cores
     chown root:postgres /tmp/cores
     sysctl kern.corefile='/tmp/cores/%N.%P.core'
+    pkg install -y zstd
 
   # NB: Intentionally build without --with-llvm. The freebsd image size is
   # already large enough to make VM startup slow, and even without llvm
@@ -184,6 +185,8 @@ task:
     mkdir -m 770 /tmp/cores
     chown root:postgres /tmp/cores
     sysctl kernel.core_pattern='/tmp/cores/%e-%s-%p.core'
+    apt-get update
+    apt-get -y install libzstd-dev
 
   configure_script: |
     su postgres <<-EOF
@@ -265,6 +268,7 @@ task:
       openldap \
       openssl \
       python \
+      zstd \
       tcl-tk
 
     brew cleanup -s # to reduce cache size
@@ -385,6 +389,12 @@ task:
     set
 
   configure_script:
+    # XXX: not working
+    - choco install -y cmake
+    - curl -L -o zstd.zip https://github.com/facebook/zstd/archive/refs/tags/v1.5.2.zip
+    - unzip zstd.zip
+    - cd zstd-1.5.2 && vcvarsall x64 && build\VS_scripts\build.VS2017.cmd && cd ..
+    #- cp zstd-1.5.2/build/VS_scripts/build.VS2017.cmd
     # copy errors out when using forward slashes
     - copy src\tools\ci\windows_build_config.pl src\tools\msvc\config.pl
     - vcvarsall x64
@@ -467,6 +477,10 @@ task:
     image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest
     cpu: $CPUS
 
+  setup_os_script: |
+    apt-get update
+    apt-get -y install libzstd-dev
+
   sysinfo_script: |
     id
     uname -a
diff --git a/configure b/configure
index f3cb5c2b51..b88917fcb5 100755
--- a/configure
+++ b/configure
@@ -1584,7 +1584,7 @@ Optional Packages:
                           use system time zone data in DIR
   --without-zlib          do not use Zlib
   --with-lz4              build with LZ4 support
-  --with-zstd             build with ZSTD support
+  --without-zstd          build without Zstd support
   --with-gnu-ld           assume the C compiler uses GNU ld [default=no]
   --with-ssl=LIB          use LIB for SSL/TLS support (openssl)
   --with-openssl          obsolete spelling of --with-ssl=openssl
@@ -9070,7 +9070,9 @@ $as_echo "#define USE_ZSTD 1" >>confdefs.h
   esac
 
 else
-  with_zstd=no
+  with_zstd=yes
+
+$as_echo "#define USE_ZSTD 1" >>confdefs.h
 
 fi
 
diff --git a/configure.ac b/configure.ac
index 19d1a80367..4ae681d8bf 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1060,7 +1060,7 @@ fi
 # ZSTD
 #
 AC_MSG_CHECKING([whether to build with ZSTD support])
-PGAC_ARG_BOOL(with, zstd, no, [build with ZSTD support],
+PGAC_ARG_BOOL(with, zstd, yes, [build with ZSTD support],
               [AC_DEFINE([USE_ZSTD], 1, [Define to 1 to build with ZSTD support. (--with-zstd)])])
 AC_MSG_RESULT([$with_zstd])
 AC_SUBST(with_zstd)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0d2bd7a357..9aa4d3eaea 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -120,7 +120,7 @@ char	   *XLogArchiveCommand = NULL;
 bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
-int			wal_compression = WAL_COMPRESSION_NONE;
+int			wal_compression = WAL_COMPRESSION_ZSTD;
 char	   *wal_consistency_checking_string = NULL;
 bool	   *wal_consistency_checking = NULL;
 bool		wal_init_zero = true;
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 66e6d664b0..96854b6a2b 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -4900,7 +4900,7 @@ static struct config_enum ConfigureNamesEnum[] =
 			NULL
 		},
 		&wal_compression,
-		WAL_COMPRESSION_NONE, wal_compression_options,
+		WAL_COMPRESSION_ZSTD, wal_compression_options,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/tools/ci/windows_build_config.pl b/src/tools/ci/windows_build_config.pl
index b0d4360c74..97ce79ea7f 100644
--- a/src/tools/ci/windows_build_config.pl
+++ b/src/tools/ci/windows_build_config.pl
@@ -9,5 +9,6 @@ $config->{"asserts"} = 1;
 $config->{"openssl"} = "c:/openssl/1.1/";
 $config->{"perl"} = "c:/strawberry/$ENV{DEFAULT_PERL_VERSION}/perl/";
 $config->{"python"} = "c:/python/";
+$config->{"zstd"} = "c:/zstd/";
 
 1;
-- 
2.17.1

0004-Use-GUC-hooks-to-support-compression-level.patchtext/x-diff; charset=us-asciiDownload
From dc145ccbe47146bdf9dc86910d3135e7e64873ea Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 14 Mar 2021 17:12:07 -0500
Subject: [PATCH 4/6] Use GUC hooks to support compression 'level'..

..which is useful for zstd and zlib, but less so for lz4.
---
 src/backend/access/transam/xlog.c       |  20 ++++
 src/backend/access/transam/xloginsert.c |   7 +-
 src/backend/access/transam/xlogreader.c | 129 ++++++++++++++++++++++++
 src/backend/utils/misc/guc.c            |  39 ++-----
 src/include/access/xlog.h               |  11 ++
 src/include/access/xlogreader.h         |   2 +
 6 files changed, 175 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9aa4d3eaea..198e6906ee 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -121,6 +121,8 @@ bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		wal_log_hints = false;
 int			wal_compression = WAL_COMPRESSION_ZSTD;
+char		*wal_compression_string = ""; /* Overwritten by GUC */
+int			wal_compression_level = 1;
 char	   *wal_consistency_checking_string = NULL;
 bool	   *wal_consistency_checking = NULL;
 bool		wal_init_zero = true;
@@ -7923,6 +7925,24 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 	}
 }
 
+bool
+check_wal_compression(char **newval, void **extra, GucSource source)
+{
+	int tmp;
+	if (get_compression_level(*newval, &tmp) != -1)
+		return true;
+
+	return false;
+}
+
+/* Parse the GUC into integers for wal_compression and wal_compression_level */
+void
+assign_wal_compression(const char *newval, void *extra)
+{
+	wal_compression = get_compression_level(newval, &wal_compression_level);
+	Assert(wal_compression >= 0);
+}
+
 
 /*
  * Issue appropriate kind of fsync (if any) for an XLOG output file.
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 5f9d07156a..0e0cfc199c 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -887,8 +887,8 @@ XLogCompressBackupBlock(char *page, uint16 hole_offset, uint16 hole_length,
 
 		case WAL_COMPRESSION_LZ4:
 #ifdef USE_LZ4
-			len = LZ4_compress_default(source, dest, orig_len,
-									   COMPRESS_BUFSIZE);
+			len = LZ4_compress_fast(source, dest, orig_len,
+									   COMPRESS_BUFSIZE, wal_compression_level);
 			if (len <= 0)
 				len = -1;		/* failure */
 #else
@@ -898,9 +898,8 @@ XLogCompressBackupBlock(char *page, uint16 hole_offset, uint16 hole_length,
 
 		case WAL_COMPRESSION_ZSTD:
 #ifdef USE_ZSTD
-			/* Uses level=1, not ZSTD_CLEVEL_DEFAULT */
 			len = ZSTD_compress(dest, COMPRESS_BUFSIZE, source, orig_len,
-								1);
+								wal_compression_level);
 			if (ZSTD_isError(len))
 				len = -1;
 #else
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index d2d882d0d1..c548898eb0 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -18,6 +18,8 @@
 #include "postgres.h"
 
 #include <unistd.h>
+#include <limits.h>
+
 #ifdef USE_LZ4
 #include <lz4.h>
 #endif
@@ -53,6 +55,36 @@ static void ResetDecoder(XLogReaderState *state);
 static void WALOpenSegmentInit(WALOpenSegment *seg, WALSegmentContext *segcxt,
 							   int segsize, const char *waldir);
 
+static const struct {
+	char *name;
+	enum WalCompression compress_id; /* The internal ID */
+	bool has_level; /* If it accepts a numeric "level" */
+	int min_level, dfl_level, max_level;
+} wal_compression_options[] = {
+	{"pglz", WAL_COMPRESSION_PGLZ, false},
+
+#ifdef USE_LZ4
+	{"lz4", WAL_COMPRESSION_LZ4, false}, // XXX
+#endif
+
+#ifdef USE_ZSTD
+	/* XXX: the minimum level depends on the version */
+	/* Must be first */
+	{"zstd-fast", WAL_COMPRESSION_ZSTD, true, -7, -1, 0},
+
+	{"zstd", WAL_COMPRESSION_ZSTD, true, -7, 1, 22},
+#endif
+
+	{"on", WAL_COMPRESSION_PGLZ, false},
+	{"off", WAL_COMPRESSION_NONE, false},
+	{"true", WAL_COMPRESSION_PGLZ, false},
+	{"false", WAL_COMPRESSION_NONE, false},
+	{"yes", WAL_COMPRESSION_PGLZ, false},
+	{"no", WAL_COMPRESSION_NONE, false},
+	{"1", WAL_COMPRESSION_PGLZ, false},
+	{"0", WAL_COMPRESSION_NONE, false},
+};
+
 /* size of the buffer allocated for error message. */
 #define MAX_ERRORMSG_LEN 1000
 
@@ -1578,6 +1610,103 @@ XLogRecGetBlockData(XLogReaderState *record, uint8 block_id, Size *len)
 	}
 }
 
+/*
+ * Return the wal compression ID, or -1 if the input is
+ * invalid/unrecognized/unsupported.
+ * The compression level is stored in *level.
+ */
+int
+get_compression_level(const char *in, int *level)
+{
+	for (int idx=0; idx < lengthof(wal_compression_options); ++idx)
+	{
+		int len;
+		long tmp;
+		char *end;
+
+		if (strcmp(in, wal_compression_options[idx].name) == 0)
+		{
+			/* it has no -level suffix */
+			*level = wal_compression_options[idx].dfl_level;
+			return wal_compression_options[idx].compress_id;
+		}
+
+		len = strlen(wal_compression_options[idx].name);
+		if (strncmp(in, wal_compression_options[idx].name, len) != 0)
+			continue;
+		if (in[len] != '-')
+			continue;
+
+		/* it has a -level suffix, but level is not allowed */
+		if (!wal_compression_options[idx].has_level)
+		{
+#ifndef FRONTEND
+			GUC_check_errdetail("Compression method does not accept a compression level");
+#endif
+			return -1;
+		}
+
+		in += len + 1;
+		len = strlen(in);
+		errno = 0;
+		/* pg_strtoint16 throws an error, which we don't want */
+		/* option_parse_int is frontend only */
+		tmp = strtol(in, &end, 0);
+		if (end != in+len || end == in ||
+				(errno != 0 && tmp == 0) ||
+				(errno == ERANGE && (tmp == LONG_MIN || tmp == LONG_MAX)))
+		{
+#ifndef FRONTEND
+			GUC_check_errdetail("Could not parse compression level: %s", in);
+#endif
+			return -1;
+		}
+
+		/*
+		 * For convenience, allow specification of zstd-fast-N, which is
+		 * interpretted as a negative compression level.
+		 */
+		if (strncmp(wal_compression_options[idx].name, "zstd-fast", 9) == 0 &&
+			tmp > 0)
+			tmp = -tmp;
+
+		if (tmp < wal_compression_options[idx].min_level ||
+				tmp > wal_compression_options[idx].max_level)
+		{
+#ifndef FRONTEND
+			GUC_check_errdetail("Compression level is outside of allowed range: %d...%d",
+					wal_compression_options[idx].min_level,
+					wal_compression_options[idx].max_level);
+#endif
+			return -1;
+		}
+
+		*level = tmp;
+		return wal_compression_options[idx].compress_id;
+	}
+
+#ifndef FRONTEND
+	// XXX: this is trying to distinguish between invalid an unsupported algorithms?
+	if (strcmp(in, "zlib") == 0 && false)
+		GUC_check_errdetail("Compression method is not supported by this build.");
+	else {
+		StringInfoData all_methods;
+
+		initStringInfo(&all_methods);
+		for (int idx=0; idx < lengthof(wal_compression_options); ++idx)
+		{
+			if (idx > 0)
+				appendStringInfoString(&all_methods, ", ");
+			appendStringInfoString(&all_methods, wal_compression_options[idx].name);
+		}
+
+		GUC_check_errdetail("Supported compression methods are: %s", all_methods.data);
+		pfree(all_methods.data);
+	}
+#endif
+	return -1;
+}
+
 /*
  * Restore a full-page image from a backup block attached to an XLOG record.
  *
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 96854b6a2b..9d3fb67fad 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -546,25 +546,6 @@ static struct config_enum_entry default_toast_compression_options[] = {
 	{NULL, 0, false}
 };
 
-static const struct config_enum_entry wal_compression_options[] = {
-	{"pglz", WAL_COMPRESSION_PGLZ, false},
-#ifdef USE_LZ4
-	{"lz4", WAL_COMPRESSION_LZ4, false},
-#endif
-#ifdef USE_ZSTD
-	{"zstd", WAL_COMPRESSION_ZSTD, false},
-#endif
-	{"on", WAL_COMPRESSION_PGLZ, false},
-	{"off", WAL_COMPRESSION_NONE, false},
-	{"true", WAL_COMPRESSION_PGLZ, true},
-	{"false", WAL_COMPRESSION_NONE, true},
-	{"yes", WAL_COMPRESSION_PGLZ, true},
-	{"no", WAL_COMPRESSION_NONE, true},
-	{"1", WAL_COMPRESSION_PGLZ, true},
-	{"0", WAL_COMPRESSION_NONE, true},
-	{NULL, 0, false}
-};
-
 /*
  * Options for enum values stored in other modules
  */
@@ -4643,6 +4624,16 @@ static struct config_string ConfigureNamesString[] =
 		check_wal_consistency_checking, assign_wal_consistency_checking, NULL
 	},
 
+	{
+		{"wal_compression", PGC_SUSET, WAL_SETTINGS,
+			gettext_noop("Compresses full-page writes written in WAL file with specified method."),
+			NULL
+		},
+		&wal_compression_string,
+		"zstd-6",
+		check_wal_compression, assign_wal_compression, NULL
+	},
+
 	{
 		{"jit_provider", PGC_POSTMASTER, CLIENT_CONN_PRELOAD,
 			gettext_noop("JIT provider to use."),
@@ -4894,16 +4885,6 @@ static struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
-	{
-		{"wal_compression", PGC_SUSET, WAL_SETTINGS,
-			gettext_noop("Compresses full-page writes written in WAL file with specified method."),
-			NULL
-		},
-		&wal_compression,
-		WAL_COMPRESSION_ZSTD, wal_compression_options,
-		NULL, NULL, NULL
-	},
-
 	{
 		{"wal_level", PGC_POSTMASTER, WAL_SETTINGS,
 			gettext_noop("Sets the level of information written to the WAL."),
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 09f6464331..521aac2ce4 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -16,6 +16,8 @@
 #include "datatype/timestamp.h"
 #include "lib/stringinfo.h"
 #include "nodes/pg_list.h"
+#include "storage/fd.h"
+#include "utils/guc.h"
 
 
 /* Sync methods */
@@ -51,6 +53,10 @@ extern char *wal_consistency_checking_string;
 extern bool log_checkpoints;
 extern bool track_wal_io_timing;
 
+extern char *wal_compression_string;
+extern int wal_compression;
+extern int wal_compression_level;
+
 extern int	CheckPointSegments;
 
 /* Archive modes */
@@ -248,6 +254,11 @@ extern void SetWalWriterSleeping(bool sleeping);
 extern void assign_max_wal_size(int newval, void *extra);
 extern void assign_checkpoint_completion_target(double newval, void *extra);
 
+/* GUC */
+extern bool check_wal_compression(char **newval, void **extra, GucSource source);
+extern void assign_wal_compression(const char *newval, void *extra);
+
+
 /*
  * Routines used by xlogrecovery.c to call back into xlog.c during recovery.
  */
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 477f0efe26..dd6c117ef7 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -337,4 +337,6 @@ extern bool XLogRecGetBlockTag(XLogReaderState *record, uint8 block_id,
 							   RelFileNode *rnode, ForkNumber *forknum,
 							   BlockNumber *blknum);
 
+extern int get_compression_level(const char *in, int *level);
+
 #endif							/* XLOGREADER_H */
-- 
2.17.1

0005-Run-011_crash_recovery.pl-with-wal_level-minimal.patchtext/x-diff; charset=us-asciiDownload
From a238e5b6b4868d256b58fb631beace2e9c1f2fc6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 8 Mar 2021 15:32:30 +0900
Subject: [PATCH 5/6] Run 011_crash_recovery.pl with wal_level=minimal

The test doesn't need that feature and pg_current_xact_id() is better
exercised by turning off the feature.

Copied from: https://www.postgresql.org/message-id/20210308.173242.463790587797836129.horikyota.ntt%40gmail.com
---
 src/test/recovery/t/011_crash_recovery.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/test/recovery/t/011_crash_recovery.pl b/src/test/recovery/t/011_crash_recovery.pl
index 14154d1ce0..4b11e40544 100644
--- a/src/test/recovery/t/011_crash_recovery.pl
+++ b/src/test/recovery/t/011_crash_recovery.pl
@@ -12,7 +12,7 @@ use Test::More;
 use Config;
 
 my $node = PostgreSQL::Test::Cluster->new('primary');
-$node->init(allows_streaming => 1);
+$node->init(allows_streaming => 0);
 $node->start;
 
 my ($stdin, $stdout, $stderr) = ('', '', '');
-- 
2.17.1

0006-Make-sure-published-XIDs-are-persistent.patchtext/x-diff; charset=us-asciiDownload
From 81044d628a20f5b2ee0d4d36b013d59d0bc33643 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 8 Mar 2021 15:43:01 +0900
Subject: [PATCH 6/6] Make sure published XIDs are persistent

pg_xact_status() premises that XIDs obtained by
pg_current_xact_id(_if_assigned)() are persistent beyond a crash. But
XIDs are not guaranteed to go beyond WAL buffers before commit and
thus XIDs may vanish if server crashes before commit. This patch
guarantees the XID shown by the functions to be flushed out to disk.

Copied from: https://www.postgresql.org/message-id/20210308.173242.463790587797836129.horikyota.ntt%40gmail.com
---
 src/backend/access/transam/xact.c | 56 +++++++++++++++++++++++++------
 src/backend/access/transam/xlog.c |  2 +-
 src/backend/utils/adt/xid8funcs.c | 12 ++++++-
 src/include/access/xact.h         |  3 +-
 4 files changed, 60 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index bb1f106946..f0ba9c33ed 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -204,7 +204,7 @@ typedef struct TransactionStateData
 	int			prevSecContext; /* previous SecurityRestrictionContext */
 	bool		prevXactReadOnly;	/* entry-time xact r/o state */
 	bool		startedInRecovery;	/* did we start in recovery? */
-	bool		didLogXid;		/* has xid been included in WAL record? */
+	XLogRecPtr 	minLSN;			/* LSN needed to reach to record the xid */
 	int			parallelModeLevel;	/* Enter/ExitParallelMode counter */
 	bool		chain;			/* start a new block after this one */
 	bool		topXidLogged;	/* for a subxact: is top-level XID logged? */
@@ -523,12 +523,16 @@ GetCurrentFullTransactionIdIfAny(void)
  *	MarkCurrentTransactionIdLoggedIfAny
  *
  * Remember that the current xid - if it is assigned - now has been wal logged.
+ *
+ * upto is the LSN up to which we need to flush WAL to ensure the current xid
+ * to be persistent. See EnsureCurrentTransactionIdLogged().
  */
 void
-MarkCurrentTransactionIdLoggedIfAny(void)
+MarkCurrentTransactionIdLoggedIfAny(XLogRecPtr upto)
 {
-	if (FullTransactionIdIsValid(CurrentTransactionState->fullTransactionId))
-		CurrentTransactionState->didLogXid = true;
+	if (FullTransactionIdIsValid(CurrentTransactionState->fullTransactionId) &&
+		XLogRecPtrIsInvalid(CurrentTransactionState->minLSN))
+		CurrentTransactionState->minLSN = upto;
 }
 
 /*
@@ -582,6 +586,35 @@ MarkSubxactTopXidLogged(void)
 	CurrentTransactionState->topXidLogged = true;
 }
 
+/*
+ *	EnsureCurrentTransactionIdLogged
+ *
+ * Make sure that the current top XID is WAL-logged.
+ */
+void
+EnsureTopTransactionIdLogged(void)
+{
+	/*
+	 * We need at least one WAL record for the current top transaction to be
+	 * flushed out.  Write one if we don't have one yet.
+	 */
+	if (XLogRecPtrIsInvalid(TopTransactionStateData.minLSN))
+	{
+		xl_xact_assignment xlrec;
+
+		xlrec.xtop = XidFromFullTransactionId(XactTopFullTransactionId);
+		Assert(TransactionIdIsValid(xlrec.xtop));
+		xlrec.nsubxacts = 0;
+
+		XLogBeginInsert();
+		XLogRegisterData((char *) &xlrec, MinSizeOfXactAssignment);
+		TopTransactionStateData.minLSN =
+			XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT);
+	}
+
+	XLogFlush(TopTransactionStateData.minLSN);
+}
+
 /*
  *	GetStableLatestTransactionId
  *
@@ -669,14 +702,14 @@ AssignTransactionId(TransactionState s)
 	 * When wal_level=logical, guarantee that a subtransaction's xid can only
 	 * be seen in the WAL stream if its toplevel xid has been logged before.
 	 * If necessary we log an xact_assignment record with fewer than
-	 * PGPROC_MAX_CACHED_SUBXIDS. Note that it is fine if didLogXid isn't set
+	 * PGPROC_MAX_CACHED_SUBXIDS. Note that it is fine if minLSN isn't set
 	 * for a transaction even though it appears in a WAL record, we just might
 	 * superfluously log something. That can happen when an xid is included
 	 * somewhere inside a wal record, but not in XLogRecord->xl_xid, like in
 	 * xl_standby_locks.
 	 */
 	if (isSubXact && XLogLogicalInfoActive() &&
-		!TopTransactionStateData.didLogXid)
+		XLogRecPtrIsInvalid(TopTransactionStateData.minLSN))
 		log_unknown_top = true;
 
 	/*
@@ -746,6 +779,7 @@ AssignTransactionId(TransactionState s)
 			log_unknown_top)
 		{
 			xl_xact_assignment xlrec;
+			XLogRecPtr		   endptr;
 
 			/*
 			 * xtop is always set by now because we recurse up transaction
@@ -760,11 +794,13 @@ AssignTransactionId(TransactionState s)
 			XLogRegisterData((char *) unreportedXids,
 							 nUnreportedXids * sizeof(TransactionId));
 
-			(void) XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT);
+			endptr = XLogInsert(RM_XACT_ID, XLOG_XACT_ASSIGNMENT);
 
 			nUnreportedXids = 0;
-			/* mark top, not current xact as having been logged */
-			TopTransactionStateData.didLogXid = true;
+
+			/* set minLSN of top, not of current xact if not yet */
+			if (XLogRecPtrIsInvalid(TopTransactionStateData.minLSN))
+				TopTransactionStateData.minLSN = endptr;
 		}
 	}
 }
@@ -2049,7 +2085,7 @@ StartTransaction(void)
 	 * initialize reported xid accounting
 	 */
 	nUnreportedXids = 0;
-	s->didLogXid = false;
+	s->minLSN = InvalidXLogRecPtr;
 
 	/*
 	 * must initialize resource-management stuff first
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 198e6906ee..7c7bea768c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -915,7 +915,7 @@ XLogInsertRecord(XLogRecData *rdata,
 
 	END_CRIT_SECTION();
 
-	MarkCurrentTransactionIdLoggedIfAny();
+	MarkCurrentTransactionIdLoggedIfAny(EndPos);
 
 	/*
 	 * Mark top transaction id is logged (if needed) so that we should not try
diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c
index 60e57876de..47bae59898 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -357,6 +357,8 @@ bad_format:
 Datum
 pg_current_xact_id(PG_FUNCTION_ARGS)
 {
+	FullTransactionId xid;
+
 	/*
 	 * Must prevent during recovery because if an xid is not assigned we try
 	 * to assign one, which would fail. Programs already rely on this function
@@ -365,7 +367,12 @@ pg_current_xact_id(PG_FUNCTION_ARGS)
 	 */
 	PreventCommandDuringRecovery("pg_current_xact_id()");
 
-	PG_RETURN_FULLTRANSACTIONID(GetTopFullTransactionId());
+	xid = GetTopFullTransactionId();
+
+	/* the XID is going to be published, make sure it is psersistent */
+	EnsureTopTransactionIdLogged();
+
+	PG_RETURN_FULLTRANSACTIONID(xid);
 }
 
 /*
@@ -380,6 +387,9 @@ pg_current_xact_id_if_assigned(PG_FUNCTION_ARGS)
 	if (!FullTransactionIdIsValid(topfxid))
 		PG_RETURN_NULL();
 
+	/* the XID is going to be published, make sure it is psersistent */
+	EnsureTopTransactionIdLogged();
+
 	PG_RETURN_FULLTRANSACTIONID(topfxid);
 }
 
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 17a6fa4abd..33d03e1913 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -386,7 +386,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
 extern FullTransactionId GetTopFullTransactionIdIfAny(void);
 extern FullTransactionId GetCurrentFullTransactionId(void);
 extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
-extern void MarkCurrentTransactionIdLoggedIfAny(void);
+extern void MarkCurrentTransactionIdLoggedIfAny(XLogRecPtr upto);
+extern void EnsureTopTransactionIdLogged(void);
 extern bool SubTransactionIsActive(SubTransactionId subxid);
 extern CommandId GetCurrentCommandId(bool used);
 extern void SetParallelStartTimestamps(TimestampTz xact_ts, TimestampTz stmt_ts);
-- 
2.17.1

#2Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#1)
Re: wal_compression=zstd

On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:

I am not claiming that zstd is generally better for WAL. Rather, if we're
going to support alternate compression methods, it's nice to give a couple
options (in addition to pglz). Some use cases would certainly suffer from
slower WAL. But providing the option will benefit some others. Note that a
superuser can set wal_compression for a given session - this would probably
benefit bulk-loading in an otherwise OLTP environment.

Well, I cannot say which one is better either as it depends on your
workload, mostly, but we know for sure that both have benefits, so I
don't mind adding it now. What you are proposing is built on top of
the existing code, making it a very simple change.

As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
level is 6).

Why? ZSTD using this default has its reasons, no? And it would be
consistent to do the same for ZSTD as for the other two methods.

-        The supported methods are <literal>pglz</literal> and
-        <literal>lz4</literal> (if <productname>PostgreSQL</productname> was
-        compiled with <option>--with-lz4</option>). The default value is
-        <literal>off</literal>. Only superusers can change this setting.
+        The supported methods are <literal>pglz</literal>, and
+        (if configured when <productname>PostgreSQL</productname>was built)
+        <literal>lz4</literal> and zstd.
+        The default value is <literal>off</literal>.
+        Only superusers can change this setting.

This is making the docs less verbose. I think that you should keep
the mention to respectively --with-lz4 and --with-zstd after each
value.

       <para>
        Build with <productname>ZSTD</productname> compression support.
+       This enables use of <productname>ZSTD</productname> for
+       compression of WAL data.

This addition is not necessary, see d7a9786.

Not related to your patch, but we have more reasons to add an
check in the block of BKPIMAGE_COMPRESSED() in RestoreBlockImage() of
xlogreader.c to make sure that only one bit is set for the compression
type. We could use a pg_popcount() == 1 for that, returning
report_invalid_record() when we see some corrupted data.

As a whole, 0001 looks pretty good to me after a detailed read (not
tested, though).

Only 0001 should be reviewed for pg15 - the others are optional/future work.

That's wiser for v15. FWIW, I have the impression that we don't need
most of what's proposed in 0002~.

 /* compression methods supported */
-#define BKPIMAGE_COMPRESS_PGLZ 0x04
-#define BKPIMAGE_COMPRESS_LZ4  0x08
-#define BKPIMAGE_COMPRESS_ZSTD 0x10
-
+#define BKPIMAGE_COMPRESS_METHOD1      0x04    /* bits to encode  compression method */
+#define BKPIMAGE_COMPRESS_METHOD2      0x08    /* 0=none, 1=pglz, 2=lz4, 3=zstd */

As of 0002. We still have some room for this data and this makes the
code harder to follow, so I'd live this part of the code as it is
proposed in 0001.

0003, defaulting to zstd, and 0004 to extend compression to support a
level would require a lot of benchmarking to be justified. I have
argued against making the code more complicated for such things in the
past, with reloptions. The footprint on the code is much smaller,
here, still..

0007, for ZLIB, does not make sense once one can choose between LZ4
and ZSTD.
--
Michael

#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#2)
Re: wal_compression=zstd

On Fri, Mar 04, 2022 at 04:19:32PM +0900, Michael Paquier wrote:

On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:

As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
level is 6).

Why? ZSTD using this default has its reasons, no? And it would be
consistent to do the same for ZSTD as for the other two methods.

In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
cost.

postgres=# SET wal_compression= 'zstd-6';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
Duraci�n: 2729,017 ms (00:02,729)
wal_bytes | 6102403

postgres=# SET wal_compression= 'zstd-1';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
Duraci�n: 2090,459 ms (00:02,090)
wal_bytes | 6330269

It may be relevant that we're only compressing 8k [0].
It would probably pay off to preserve a compression "dictionary" to be re-used
across FPIs.

-        The supported methods are <literal>pglz</literal> and
-        <literal>lz4</literal> (if <productname>PostgreSQL</productname> was
-        compiled with <option>--with-lz4</option>). The default value is
-        <literal>off</literal>. Only superusers can change this setting.
+        The supported methods are <literal>pglz</literal>, and
+        (if configured when <productname>PostgreSQL</productname>was built)
+        <literal>lz4</literal> and zstd.
+        The default value is <literal>off</literal>.
+        Only superusers can change this setting.

This is making the docs less verbose. I think that you should keep
the mention to respectively --with-lz4 and --with-zstd after each
value.

I changed this relative to your latest zstd patch in July since it reads better
to me. YMMV.

On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:

0001 adds support for zstd
0002 makes more efficient our use of bits in the WAL header
0003 makes it the default compression, to exercise on CI (windows will fail).
0004 adds support for zstd-level
0005-6 are needed to allow recovery tests to pass when wal compression is enabled;
0007 (not included) also adds support for zlib. I'm of the impression nobody
cares about this, otherwise it would've been included 5-10 years ago.

0003, defaulting to zstd, would require a lot of benchmarking to be
justified.

Maybe you didn't see that I wrote that it was for CI ?
(In any case, it's impossible to do that without first taking care of 005-6).

and 0004 to extend compression to support a level
I have argued against making the code more complicated for such things in the
past, with reloptions.

I suppose that you're referring to reloptions for toast compression, which was
about controlling LZ4 compression level.

/messages/by-id/CAFiTN-vMHU_yakMgNo90SUih_6LtvmqZ5hpQb2iTgQtVyOjyFA@mail.gmail.com

I think you're right that it's not interesting to control the compression level
of LZ4 - but there's no reason to say the same thing of zstd. We're already
debating which is the "right" level to use (1 or 6). I don't think there is a
"right" level - some people will want to trade more CPU for disk writes and
some people won't.

#4Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#3)
Re: wal_compression=zstd

On Fri, Mar 4, 2022 at 6:44 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Why? ZSTD using this default has its reasons, no? And it would be
consistent to do the same for ZSTD as for the other two methods.

In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
cost.

I agree with Michael. Your 1-off test is exactly that, and the results
will have depended on the data you used for the test. I'm not saying
we could never decide to default to a compression level other than the
library's default, but I do not think we should do it casually or as
the result of any small number of tests. There should be a strong
presumption that the authors of the library have a good idea what is
sensible in general unless we can explain compellingly why our use
case is different from typical ones.

There's an ease-of-use concern here too. It's not going to make things
any easier for users to grok if zstd is available in different parts
of the system but has different defaults in each place. It wouldn't be
the end of the world if that happened, but neither would it be ideal.

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Justin Pryzby (#1)
Re: wal_compression=zstd

On 2/22/22 18:19, Justin Pryzby wrote:

As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
level is 6).

I think this choice needs to be supported by some benchmarks.

0001 adds support for zstd
0002 makes more efficient our use of bits in the WAL header
0003 makes it the default compression, to exercise on CI (windows will fail).
0004 adds support for zstd-level
0005-6 are needed to allow recovery tests to pass when wal compression is enabled;
0007 (not included) also adds support for zlib. I'm of the impression nobody
cares about this, otherwise it would've been included 5-10 years ago.

Only 0001 should be reviewed for pg15 - the others are optional/future work.

I don't see why patch 5 shouldn't be applied forthwith.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#6Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#4)
Re: wal_compression=zstd

On Fri, Mar 04, 2022 at 08:08:03AM -0500, Robert Haas wrote:

On Fri, Mar 4, 2022 at 6:44 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
cost.

Hmm, it may be good to start afresh and compile numbers in a single
chart. I did that here with some numbers on the user and system CPU:
/messages/by-id/YMmlvyVyAFlxZ+/H@paquier.xyz

For this test, regarding ZSTD, the lowest level did not have much
difference with the default level, and at the highest level the user
CPU spiked for little gain in compression. All of them compressed
more than LZ4, with more CPU used in each case, but the default or a
level value lower than the default gives me the impression that it
won't matter much in terms of compression gains and CPU usage.

I agree with Michael. Your 1-off test is exactly that, and the results
will have depended on the data you used for the test. I'm not saying
we could never decide to default to a compression level other than the
library's default, but I do not think we should do it casually or as
the result of any small number of tests. There should be a strong
presumption that the authors of the library have a good idea what is
sensible in general unless we can explain compellingly why our use
case is different from typical ones.

There's an ease-of-use concern here too. It's not going to make things
any easier for users to grok if zstd is available in different parts
of the system but has different defaults in each place. It wouldn't be
the end of the world if that happened, but neither would it be ideal.

I'd like to believe that anybody who writes his/her own compression
algorithm have a good idea of the default behavior they want to show,
so we could remain simple, and believe in them. Now, I would not
object to see some fresh numbers, and assuming that all FPIs have the
same page size, we could go down to designing a couple of test cases
that produce a fixed number of FPIs and measure the compressability in
a single session.

Repeatability and randomness of data counts, we could have for example
one case with a set of 5~7 int attributes, a second with text values
that include random data, up to 10~12 bytes each to count on the tuple
header to be able to compress some data, and a third with more
repeatable data, like one attribute with an int column populate
with generate_series(). Just to give an idea.
--
Michael

#7Michael Paquier
michael@paquier.xyz
In reply to: Andrew Dunstan (#5)
Re: wal_compression=zstd

On Fri, Mar 04, 2022 at 08:10:35AM -0500, Andrew Dunstan wrote:

I don't see why patch 5 shouldn't be applied forthwith.

Only applying 0005 would result in a failure in the TAP test for a
problem whose fix is attempted in 0006. This is an issue unrelated to
this thread.

FWIW, I am a bit disturbed by EnsureTopTransactionIdLogged() and its
design in 0006, where we'd finish by using a XLogFlush() call within
two SQL functions, but I have not really looked at the problem to see
if it is a viable solution or not.
--
Michael

#8Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#6)
3 attachment(s)
Re: wal_compression=zstd

On Sat, Mar 05, 2022 at 07:26:39PM +0900, Michael Paquier wrote:

Repeatability and randomness of data counts, we could have for example
one case with a set of 5~7 int attributes, a second with text values
that include random data, up to 10~12 bytes each to count on the tuple
header to be able to compress some data, and a third with more
repeatable data, like one attribute with an int column populate
with generate_series(). Just to give an idea.

And that's what I did as of the attached set of test:
- Cluster on tmpfs.
- max_wal_size, min_wal_size at 2GB and shared_buffers at 1GB, aka
large enough to include the full data set in memory.
- Rather than using Justin's full patch set, I have just patched the
code in xloginsert.c to switch the level.
- One case with table that uses one int attribute, with rather
repetitive data worth 484MB.
- One case with table using (int, text), where the text data is made
of 10~11 bytes of random data, worth ~340MB.
- Use pg_prewarm to load the data into shared buffers. With the
cluster mounted on a tmpfs that should not matter though.
- Both tables have a fillfactor at 50 to give room to the updates.

I have measured the CPU usage with a toy extension, also attached,
called pg_rusage() that is a simple wrapper to upstream's pg_rusage.c
to initialize a rusage snapshot and print its data with two SQL
functions that are called just before and after the FPIs are generated
(aka the UPDATE query that rewrites the whole table in the script
attached).

The quickly-hacked test script and the results are in test.tar.gz, for
reference. The toy extension is pg_rusage.tar.gz.

Here are the results I compiled, as of results_format.sql in the
tarball attached:
descr | rel_size | fpi_size | time_s
-------------------------------+----------+----------+--------
int column no compression | 429 MB | 727 MB | 13.15
int column ztsd default level | 429 MB | 523 MB | 14.23
int column zstd level 1 | 429 MB | 524 MB | 13.94
int column zstd level 10 | 429 MB | 523 MB | 23.46
int column zstd level 19 | 429 MB | 523 MB | 103.71
int column lz4 default level | 429 MB | 575 MB | 13.37
int/text no compression | 344 MB | 558 MB | 10.08
int/text lz4 default level | 344 MB | 463 MB | 10.29
int/text zstd default level | 344 MB | 415 MB | 11.48
int/text zstd level 1 | 344 MB | 418 MB | 11.25
int/text zstd level 10 | 344 MB | 415 MB | 20.59
int/text zstd level 19 | 344 MB | 413 MB | 62.64
(12 rows)

I did not expect zstd to be this slow at a level of 10~ actually. The
runtime (elapsed CPU time) got severely impacted at level 19, that I
ran just for fun to see how that it would become compared to a level
of 10. There is a slight difference between the default level and a
level of 1, and the compression size does not change much, nor does
the CPU usage really change.

Attached is an updated patch, while on it, that I have tweaked before
running my own tests.

At the end, I'd still like to think that we'd better stick with the
default level for this parameter, and that's the suggestion of
upstream. So I would like to move on with that for this patch.
--
Michael

Attachments:

test.tar.gzapplication/gzipDownload
pg_rusage.tar.gzapplication/gzipDownload
v2-0001-add-wal_compression-zstd.patchtext/x-diff; charset=us-asciiDownload
From 254ddbf4223c35a7990e301e53d6ddbffcf210c0 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 18 Feb 2022 22:54:18 -0600
Subject: [PATCH v2] add wal_compression=zstd

---
 src/include/access/xlog.h                     |  3 +-
 src/include/access/xlogrecord.h               |  5 +++-
 src/backend/access/transam/xloginsert.c       | 30 ++++++++++++++++++-
 src/backend/access/transam/xlogreader.c       | 20 +++++++++++++
 src/backend/utils/misc/guc.c                  |  3 ++
 src/backend/utils/misc/postgresql.conf.sample |  2 +-
 src/bin/pg_waldump/pg_waldump.c               |  2 ++
 doc/src/sgml/config.sgml                      | 11 ++++---
 doc/src/sgml/installation.sgml                |  8 +++++
 9 files changed, 76 insertions(+), 8 deletions(-)

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4b45ac64db..09f6464331 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -75,7 +75,8 @@ typedef enum WalCompression
 {
 	WAL_COMPRESSION_NONE = 0,
 	WAL_COMPRESSION_PGLZ,
-	WAL_COMPRESSION_LZ4
+	WAL_COMPRESSION_LZ4,
+	WAL_COMPRESSION_ZSTD
 } WalCompression;
 
 /* Recovery states */
diff --git a/src/include/access/xlogrecord.h b/src/include/access/xlogrecord.h
index c1b1137aa7..052ac6817a 100644
--- a/src/include/access/xlogrecord.h
+++ b/src/include/access/xlogrecord.h
@@ -149,8 +149,11 @@ typedef struct XLogRecordBlockImageHeader
 /* compression methods supported */
 #define BKPIMAGE_COMPRESS_PGLZ	0x04
 #define BKPIMAGE_COMPRESS_LZ4	0x08
+#define BKPIMAGE_COMPRESS_ZSTD	0x10
+
 #define	BKPIMAGE_COMPRESSED(info) \
-	((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_LZ4)) != 0)
+	((info & (BKPIMAGE_COMPRESS_PGLZ | BKPIMAGE_COMPRESS_LZ4 | \
+			  BKPIMAGE_COMPRESS_ZSTD)) != 0)
 
 /*
  * Extra header information used when page image has "hole" and
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index c260310c4c..b61c08e586 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -44,9 +44,17 @@
 #define LZ4_MAX_BLCKSZ		0
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#define ZSTD_MAX_BLCKSZ		ZSTD_COMPRESSBOUND(BLCKSZ)
+#else
+#define ZSTD_MAX_BLCKSZ		0
+#endif
+
+/* Buffer size required to store a compressed version of backup block image */
 #define PGLZ_MAX_BLCKSZ		PGLZ_MAX_OUTPUT(BLCKSZ)
 
-#define COMPRESS_BUFSIZE	Max(PGLZ_MAX_BLCKSZ, LZ4_MAX_BLCKSZ)
+#define COMPRESS_BUFSIZE	Max(Max(PGLZ_MAX_BLCKSZ, LZ4_MAX_BLCKSZ), ZSTD_MAX_BLCKSZ)
 
 /*
  * For each block reference registered with XLogRegisterBuffer, we fill in
@@ -695,6 +703,14 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
 #endif
 						break;
 
+					case WAL_COMPRESSION_ZSTD:
+#ifdef USE_ZSTD
+						bimg.bimg_info |= BKPIMAGE_COMPRESS_ZSTD;
+#else
+						elog(ERROR, "ZSTD is not supported by this build");
+#endif
+						break;
+
 					case WAL_COMPRESSION_NONE:
 						Assert(false);	/* cannot happen */
 						break;
@@ -903,6 +919,18 @@ XLogCompressBackupBlock(char *page, uint16 hole_offset, uint16 hole_length,
 #endif
 			break;
 
+		case WAL_COMPRESSION_ZSTD:
+#ifdef USE_ZSTD
+			/* Uses level=1, not ZSTD_CLEVEL_DEFAULT */
+			len = ZSTD_compress(dest, COMPRESS_BUFSIZE, source, orig_len,
+								ZSTD_CLEVEL_DEFAULT);
+			if (ZSTD_isError(len))
+				len = -1;		/* failure */
+#else
+			elog(ERROR, "ZSTD is not supported by this build");
+#endif
+			break;
+
 		case WAL_COMPRESSION_NONE:
 			Assert(false);		/* cannot happen */
 			break;
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 35029cf97d..d60e4cbaf6 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -21,6 +21,9 @@
 #ifdef USE_LZ4
 #include <lz4.h>
 #endif
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
 
 #include "access/transam.h"
 #include "access/xlog_internal.h"
@@ -1618,6 +1621,23 @@ RestoreBlockImage(XLogReaderState *record, uint8 block_id, char *page)
 								  "LZ4",
 								  block_id);
 			return false;
+#endif
+		}
+		else if ((bkpb->bimg_info & BKPIMAGE_COMPRESS_ZSTD) != 0)
+		{
+#ifdef USE_ZSTD
+			size_t decomp_result = ZSTD_decompress(tmp.data,
+												   BLCKSZ-bkpb->hole_length,
+												   ptr, bkpb->bimg_len);
+
+			if (ZSTD_isError(decomp_result))
+				decomp_success = false;
+#else
+			report_invalid_record(record, "image at %X/%X compressed with %s not supported by build, block %d",
+								  LSN_FORMAT_ARGS(record->ReadRecPtr),
+								  "zstd",
+								  block_id);
+			return false;
 #endif
 		}
 		else
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 6d11f9c71b..e7f0a380e6 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -550,6 +550,9 @@ static const struct config_enum_entry wal_compression_options[] = {
 	{"pglz", WAL_COMPRESSION_PGLZ, false},
 #ifdef USE_LZ4
 	{"lz4", WAL_COMPRESSION_LZ4, false},
+#endif
+#ifdef USE_ZSTD
+	{"zstd", WAL_COMPRESSION_ZSTD, false},
 #endif
 	{"on", WAL_COMPRESSION_PGLZ, false},
 	{"off", WAL_COMPRESSION_NONE, false},
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 4a094bb38b..4cf5b26a36 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -220,7 +220,7 @@
 #wal_log_hints = off			# also do full page writes of non-critical updates
 					# (change requires restart)
 #wal_compression = off			# enables compression of full-page writes;
-					# off, pglz, lz4, or on
+					# off, pglz, lz4, zstd, or on
 #wal_init_zero = on			# zero-fill new WAL files
 #wal_recycle = on			# recycle WAL files
 #wal_buffers = -1			# min 32kB, -1 sets based on shared_buffers
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 2340dc247b..f128050b4e 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -562,6 +562,8 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogReaderState *record)
 						method = "pglz";
 					else if ((bimg_info & BKPIMAGE_COMPRESS_LZ4) != 0)
 						method = "lz4";
+					else if ((bimg_info & BKPIMAGE_COMPRESS_ZSTD) != 0)
+						method = "zstd";
 					else
 						method = "unknown";
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7ed8c82a9d..5612e80453 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3154,10 +3154,13 @@ include_dir 'conf.d'
         server compresses full page images written to WAL when
         <xref linkend="guc-full-page-writes"/> is on or during a base backup.
         A compressed page image will be decompressed during WAL replay.
-        The supported methods are <literal>pglz</literal> and
-        <literal>lz4</literal> (if <productname>PostgreSQL</productname> was
-        compiled with <option>--with-lz4</option>). The default value is
-        <literal>off</literal>. Only superusers can change this setting.
+        The supported methods are <literal>pglz</literal>,
+        <literal>lz4</literal> (if <productname>PostgreSQL</productname>
+        was compiled with <option>--with-lz4</option>) and
+        <literal>zstd</literal> (if <productname>PostgreSQL</productname>
+        was compiled with <option>--with-zstd</option>) and
+        The default value is <literal>off</literal>.
+        Only superusers can change this setting.
        </para>
 
        <para>
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 0f74252590..bd09003d59 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -271,6 +271,14 @@ su - postgres
      </para>
     </listitem>
 
+    <listitem>
+     <para>
+      You need <productname>ZSTD</productname>, if you want to support
+      compression of data with this method; see
+      <xref linkend="guc-wal-compression"/>.
+     </para>
+    </listitem>
+
     <listitem>
      <para>
       To build the <productname>PostgreSQL</productname> documentation,
-- 
2.35.1

#9Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#3)
Re: wal_compression=zstd

On Fri, Mar 04, 2022 at 05:44:06AM -0600, Justin Pryzby wrote:

On Fri, Mar 04, 2022 at 04:19:32PM +0900, Michael Paquier wrote:

On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:

As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
level is 6).

Why? ZSTD using this default has its reasons, no? And it would be
consistent to do the same for ZSTD as for the other two methods.

In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
cost.

Actually, my test used zstd-6, rather than the correct default of 3.

The comparison should have been:

postgres=# SET wal_compression='zstd-1';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
Time: 2074.046 ms (00:02.074)
2763 | 2758 | 6343591 | 0 | 5 | 5 | 0 | 0 | 2022-03-05 05:04:08.599867-06

vs

postgres=# SET wal_compression='zstd-3';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
Time: 2471.552 ms (00:02.472)
wal_records | wal_fpi | wal_bytes | wal_buffers_full | wal_write | wal_sync | wal_write_time | wal_sync_time | stats_reset
-------------+---------+-----------+------------------+-----------+----------+----------------+---------------+-------------------------------
2762 | 2746 | 6396890 | 274 | 274 | 0 | 0 | 0 | 2022-03-05 05:04:31.283432-06

=> zstd-1 actually wrote less than zstd-3 (which is odd) but by an
insignificant amount. It's no surprise that zstd-1 is faster than zstd-3, but
(of course) by a smaller amount than zstd-6.

Anyway there's no compelling reason to not use the default. If we were to use
a non-default default, we'd have to choose between 1 and 2 (or some negative
compression level). My thinking was that zstd-1 would give the lowest-hanging
fruits for zstd, while minimizing performance tradeoff, since WAL affects
interactivity. But choosing between 1 and 2 seems like bikeshedding.

#10Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#9)
Re: wal_compression=zstd

On Wed, Mar 09, 2022 at 07:14:11AM -0600, Justin Pryzby wrote:

Anyway there's no compelling reason to not use the default. If we were to use
a non-default default, we'd have to choose between 1 and 2 (or some negative
compression level). My thinking was that zstd-1 would give the lowest-hanging
fruits for zstd, while minimizing performance tradeoff, since WAL affects
interactivity. But choosing between 1 and 2 seems like bikeshedding.

Yeah, I have looked again at the patch today, and I saw no reason to
not apply it to give more options to the user as zstd or lz4 are both
good in their own ways. So done, with the default level used.
--
Michael

#11Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#10)
Re: wal_compression=zstd

On Fri, Mar 11, 2022 at 12:23:59PM +0900, Michael Paquier wrote:

On Wed, Mar 09, 2022 at 07:14:11AM -0600, Justin Pryzby wrote:

Anyway there's no compelling reason to not use the default. If we were to use
a non-default default, we'd have to choose between 1 and 2 (or some negative
compression level). My thinking was that zstd-1 would give the lowest-hanging
fruits for zstd, while minimizing performance tradeoff, since WAL affects
interactivity. But choosing between 1 and 2 seems like bikeshedding.

Yeah, I have looked again at the patch today, and I saw no reason to
not apply it to give more options to the user as zstd or lz4 are both
good in their own ways. So done, with the default level used.

It's great news - thanks.

--
Justin

#12Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#11)
Re: wal_compression=zstd

While rebasing, I realized this should have bumped XLOG_PAGE_MAGIC.

Also, there's a dangling "and".

+        The supported methods are <literal>pglz</literal>,
+        <literal>lz4</literal> (if <productname>PostgreSQL</productname>
+        was compiled with <option>--with-lz4</option>) and
+        <literal>zstd</literal> (if <productname>PostgreSQL</productname>
+        was compiled with <option>--with-zstd</option>) and
+        The default value is <literal>off</literal>.
+        Only superusers can change this setting.

--
Justin

#13Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#12)
Re: wal_compression=zstd

On Fri, Mar 11, 2022 at 03:49:00PM -0600, Justin Pryzby wrote:

While rebasing, I realized this should have bumped XLOG_PAGE_MAGIC.

Also, there's a dangling "and".

Right. I'll address that a bit later today. Thanks!
--
Michael