Compress ReorderBuffer spill files using LZ4

Started by Julien Tachoiresover 1 year ago28 messages
#1Julien Tachoires
julmon@gmail.com
1 attachment(s)

Hi,

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

3 different compression strategies are implemented:

1. LZ4 streaming compression is the preferred one and works
efficiently for small individual changes.
2. LZ4 regular compression when the changes are too large for using
the streaming API.
3. No compression when compression fails, the change is then stored
not compressed.

When not using compression, the following case generates 1590MB of
spill files:

CREATE TABLE t (i INTEGER PRIMARY KEY, t TEXT);
INSERT INTO t
SELECT i, 'Hello number n°'||i::TEXT
FROM generate_series(1, 10000000) as i;

With LZ4 compression, it creates 653MB of spill files: 58.9% less
disk space usage.

Open items:

1. The spill_bytes column from pg_stat_get_replication_slot() still returns
plain data size, not the compressed data size. Should we expose the
compressed data size when compression occurs?

2. Do we want a GUC to switch compression on/off?

Regards,

JT

Attachments:

v1-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchapplication/octet-stream; name=v1-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchDownload
From 6d907203daf20c4923c3c744d25f8ff4c42802de Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 6 Jun 2024 00:57:38 -0700
Subject: [PATCH] Compress ReorderBuffer spill files using LZ4
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

3 different compression strategies are implemented:

1. LZ4 streaming compression is the preferred one and works
   efficiently for small individual changes.
2. LZ4 regular compression when the changes are too large for using
   the streaming API.
3. No compression when compression fails, the change is then stored
   not compressed.

When not using compression, the following case generates 1590MB of
spill files:

  CREATE TABLE t (i INTEGER PRIMARY KEY, t TEXT);
  INSERT INTO t
    SELECT i, 'Hello number n°'||i::TEXT
    FROM generate_series(1, 10000000) as i;

With LZ4 compression, it creates 653MB of spill files: 58.9% less
disk space usage.
---
 .../replication/logical/reorderbuffer.c       | 526 ++++++++++++++++--
 src/include/replication/reorderbuffer.h       |  22 +
 2 files changed, 496 insertions(+), 52 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 00a8327e77..8ac216a9c8 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -88,6 +88,9 @@
 
 #include <unistd.h>
 #include <sys/stat.h>
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
 
 #include "access/detoast.h"
 #include "access/heapam.h"
@@ -181,6 +184,24 @@ typedef struct ReorderBufferDiskChange
 	/* data follows */
 } ReorderBufferDiskChange;
 
+#ifdef USE_LZ4
+/* Possible reorder buffer ondisk strategies */
+typedef enum ReorderBufferCompressStrat
+{
+	REORDER_BUFFER_NO_COMPRESSION,
+	REORDER_BUFFER_LZ4_STREAMING,
+	REORDER_BUFFER_LZ4,
+} ReorderBufferCompressStrat;
+
+typedef struct ReorderBufferCompressDiskChange
+{
+	ReorderBufferCompressStrat strat;	/* Ondisk compression strategy */
+	Size		size;					/* Ondisk data size */
+	Size		orig_size;				/* Original data size */
+	/* data follows */
+} ReorderBufferCompressDiskChange;
+#endif
+
 #define IsSpecInsert(action) \
 ( \
 	((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT) \
@@ -255,6 +276,13 @@ static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *tx
 										 int fd, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										TXNEntryFile *file, XLogSegNo *segno);
+#ifdef USE_LZ4
+static bool LZ4_ReadOndiskBufferChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+									   TXNEntryFile *file, XLogSegNo *segno);
+#else
+static bool ReadOndiskBufferChange(ReorderBuffer *rb, TXNEntryFile *file,
+								   XLogSegNo *segno);
+#endif
 static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 									   char *data);
 static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
@@ -428,6 +456,20 @@ ReorderBufferGetTXN(ReorderBuffer *rb)
 	txn->command_id = InvalidCommandId;
 	txn->output_plugin_private = NULL;
 
+#ifdef USE_LZ4
+	/*
+	 * We do not allocate LZ4 resources at this point because we have no
+	 * guarantee that we will need them later. Let's allocate only when we
+	 * are about to use them.
+	 */
+	txn->lz4_in_buf = NULL;
+	txn->lz4_out_buf = NULL;
+	txn->lz4_in_buf_offset = 0;
+	txn->lz4_out_buf_offset = 0;
+	txn->lz4_stream = NULL;
+	txn->lz4_stream_decode = NULL;
+#endif
+
 	return txn;
 }
 
@@ -464,6 +506,31 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+#ifdef USE_LZ4
+	if (txn->lz4_in_buf != NULL)
+	{
+		MemoryContext oldcontext = MemoryContextSwitchTo(rb->context);
+
+		pfree(txn->lz4_in_buf);
+		LZ4_freeStream(txn->lz4_stream);
+		txn->lz4_in_buf = NULL;
+		txn->lz4_stream = NULL;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+	if (txn->lz4_out_buf != NULL)
+	{
+		MemoryContext oldcontext = MemoryContextSwitchTo(rb->context);
+
+		pfree(txn->lz4_out_buf);
+		LZ4_freeStreamDecode(txn->lz4_stream_decode);
+		txn->lz4_out_buf = NULL;
+		txn->lz4_stream_decode = NULL;
+
+		MemoryContextSwitchTo(oldcontext);
+	}
+#endif
+
 	/* Reset the toast hash */
 	ReorderBufferToastReset(rb, txn);
 
@@ -3778,6 +3845,15 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 {
 	ReorderBufferDiskChange *ondisk;
 	Size		sz = sizeof(ReorderBufferDiskChange);
+#ifdef USE_LZ4
+	char	   *buf;				/* LZ4/plain buffer */
+	Size		buf_size;			/* LZ4/plain buffer size */
+	char	   *writePtr;
+	Size		write_size;
+	int			lz4_cmp_size = 0;	/* compressed size */
+	ReorderBufferCompressDiskChange *cmp_ondisk;
+	char	   *lz4_in_bufPtr = NULL;
+#endif
 
 	ReorderBufferSerializeReserve(rb, sz);
 
@@ -3957,7 +4033,167 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 	errno = 0;
 	pgstat_report_wait_start(WAIT_EVENT_REORDER_BUFFER_WRITE);
+
+#ifdef USE_LZ4
+	/*
+	 * Use LZ4 streaming compression iff we can keep at least 2 plain changes
+	 * into the LZ4 input ring buffer. If plain data size is too large, let's
+	 * use regular LZ4 compression.
+	 */
+	if (sz < (LZ4_RING_BUFFER_SIZE / 2))
+	{
+
+		/*
+		 * Allocate LZ4 resources in ReorderBuffer's memory context.
+		 */
+		if (txn->lz4_in_buf == NULL)
+		{
+			MemoryContext oldcontext = MemoryContextSwitchTo(rb->context);
+
+			txn->lz4_in_buf = (char *) palloc0(LZ4_RING_BUFFER_SIZE);
+			txn->lz4_stream = LZ4_createStream();
+
+			MemoryContextSwitchTo(oldcontext);
+		}
+
+		/* Ring buffer offset wraparound */
+		if ((txn->lz4_in_buf_offset + sz) > LZ4_RING_BUFFER_SIZE)
+			txn->lz4_in_buf_offset = 0;
+
+		/* Get the pointer of the current entry in the ring buffer */
+		lz4_in_bufPtr = txn->lz4_in_buf + txn->lz4_in_buf_offset;
+
+		/* Copy data that should be compressed into LZ4 input ring buffer */
+		memcpy(lz4_in_bufPtr, rb->outbuf, sz);
+
+		/*
+		 * Allocate space for storing the compressed content of the reorder
+		 * buffer output buffer. What we need to write on disk is formed by a
+		 * ReorderBufferCompressDiskChange structure followed by compressed
+		 * data.
+		 */
+		buf_size = LZ4_COMPRESSBOUND(sz) + sizeof(ReorderBufferCompressDiskChange);
+		buf = (char *) palloc0(buf_size);
+
+		/* Use LZ4 streaming compression API */
+		lz4_cmp_size = LZ4_compress_fast_continue(txn->lz4_stream,
+												  lz4_in_bufPtr,
+												  buf + sizeof(ReorderBufferCompressDiskChange),
+												  sz,
+												  buf_size - sizeof(ReorderBufferCompressDiskChange),
+												  1);
+
+		if (lz4_cmp_size > 0)
+		{
+			/* Move the input ring buffer offset */
+			txn->lz4_in_buf_offset += sz;
+
+			cmp_ondisk = (ReorderBufferCompressDiskChange *) buf;
+			cmp_ondisk->strat = REORDER_BUFFER_LZ4_STREAMING;
+			/* Store the original data size (before compression) */
+			cmp_ondisk->orig_size = sz;
+			/*
+			 * Store the ondisk size: compressed size + size of
+			 * ReorderBufferCompressDiskChange.
+			 */
+			cmp_ondisk->size = (Size) lz4_cmp_size + sizeof(ReorderBufferCompressDiskChange);
+
+			/*
+			 * Pointing write pointer and size to buf. buf contains
+			 * ReorderBufferCompressDiskChange followed by compressed data.
+			 */
+			writePtr = buf;
+			write_size = cmp_ondisk->size;
+		}
+		else
+		{
+			/*
+			 * LZ4 streaming compression failed, let's store the change not
+			 * compressed.
+			 */
+			cmp_ondisk = (ReorderBufferCompressDiskChange *) buf;
+			cmp_ondisk->strat = REORDER_BUFFER_NO_COMPRESSION;
+			cmp_ondisk->orig_size = sz;
+			cmp_ondisk->size = sz + sizeof(ReorderBufferCompressDiskChange);
+
+			/*
+			 * Write ReorderBufferCompressDiskChange only, later we will write
+			 * the reorder buffer output content.
+			 */
+			if (write(fd, buf, sizeof(ReorderBufferCompressDiskChange)) != sizeof(ReorderBufferCompressDiskChange))
+			{
+				int			save_errno = errno;
+
+				CloseTransientFile(fd);
+
+				/* if write didn't set errno, assume problem is no disk space */
+				errno = save_errno ? save_errno : ENOSPC;
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not write to data file for XID %u: %m",
+								txn->xid)));
+			}
+
+			/* Pointing write pointer and size to the reorder buffer output */
+			writePtr = rb->outbuf;
+			write_size = sz;
+		}
+	}
+	else
+	/* Regular LZ4 compression */
+	{
+		buf_size = LZ4_COMPRESSBOUND(sz) + sizeof(ReorderBufferCompressDiskChange);
+		buf = (char *) palloc0(buf_size);
+
+		/* Use LZ4 regular compression API */
+		lz4_cmp_size = LZ4_compress_default(rb->outbuf,
+											buf + sizeof(ReorderBufferCompressDiskChange),
+											sz,
+											buf_size - sizeof(ReorderBufferCompressDiskChange));
+
+		if (lz4_cmp_size > 0)
+		{
+			cmp_ondisk = (ReorderBufferCompressDiskChange *) buf;
+			cmp_ondisk->strat = REORDER_BUFFER_LZ4;
+			cmp_ondisk->orig_size = sz;
+			cmp_ondisk->size = (Size) lz4_cmp_size + sizeof(ReorderBufferCompressDiskChange);
+
+			writePtr = buf;
+			write_size = cmp_ondisk->size;
+		}
+		else
+		{
+			/*
+			 * LZ4 regular compression failed, let's store the change not
+			 * compressed.
+			 */
+			cmp_ondisk = (ReorderBufferCompressDiskChange *) buf;
+			cmp_ondisk->strat = REORDER_BUFFER_NO_COMPRESSION;
+			cmp_ondisk->orig_size = sz;
+			cmp_ondisk->size = sz + sizeof(ReorderBufferCompressDiskChange);
+
+			if (write(fd, buf, sizeof(ReorderBufferCompressDiskChange)) != sizeof(ReorderBufferCompressDiskChange))
+			{
+				int			save_errno = errno;
+
+				CloseTransientFile(fd);
+
+				errno = save_errno ? save_errno : ENOSPC;
+				ereport(ERROR,
+						(errcode_for_file_access(),
+						 errmsg("could not write to data file for XID %u: %m",
+								txn->xid)));
+			}
+
+			writePtr = rb->outbuf;
+			write_size = sz;
+		}
+	}
+
+	if (write(fd, writePtr, write_size) != write_size)
+#else
 	if (write(fd, rb->outbuf, ondisk->size) != ondisk->size)
+#endif
 	{
 		int			save_errno = errno;
 
@@ -3984,6 +4220,10 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		txn->final_lsn = change->lsn;
 
 	Assert(ondisk->change.action == change->action);
+
+#ifdef USE_LZ4
+	pfree(buf);
+#endif
 }
 
 /* Returns true, if the output plugin supports streaming, false, otherwise. */
@@ -4252,9 +4492,6 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 	while (restored < max_changes_in_memory && *segno <= last_segno)
 	{
-		int			readBytes;
-		ReorderBufferDiskChange *ondisk;
-
 		CHECK_FOR_INTERRUPTS();
 
 		if (*fd == -1)
@@ -4293,60 +4530,19 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		}
 
 		/*
-		 * Read the statically sized part of a change which has information
-		 * about the total size. If we couldn't read a record, we're at the
-		 * end of this file.
+		 * Read the full change from disk.
+		 * If ReadOndiskBufferChange returns false, then we are at the eof, so,
+		 * move the next segment.
 		 */
-		ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
-		readBytes = FileRead(file->vfd, rb->outbuf,
-							 sizeof(ReorderBufferDiskChange),
-							 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
-
-		/* eof */
-		if (readBytes == 0)
+#ifdef USE_LZ4
+		if (!LZ4_ReadOndiskBufferChange(rb, txn, file, segno))
+#else
+		if (!ReadOndiskBufferChange(rb, file, segno))
+#endif
 		{
-			FileClose(*fd);
 			*fd = -1;
-			(*segno)++;
 			continue;
 		}
-		else if (readBytes < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: %m")));
-		else if (readBytes != sizeof(ReorderBufferDiskChange))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
-							readBytes,
-							(uint32) sizeof(ReorderBufferDiskChange))));
-
-		file->curOffset += readBytes;
-
-		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-
-		ReorderBufferSerializeReserve(rb,
-									  sizeof(ReorderBufferDiskChange) + ondisk->size);
-		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-
-		readBytes = FileRead(file->vfd,
-							 rb->outbuf + sizeof(ReorderBufferDiskChange),
-							 ondisk->size - sizeof(ReorderBufferDiskChange),
-							 file->curOffset,
-							 WAIT_EVENT_REORDER_BUFFER_READ);
-
-		if (readBytes < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: %m")));
-		else if (readBytes != ondisk->size - sizeof(ReorderBufferDiskChange))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
-							readBytes,
-							(uint32) (ondisk->size - sizeof(ReorderBufferDiskChange)))));
-
-		file->curOffset += readBytes;
 
 		/*
 		 * ok, read a full change from disk, now restore it into proper
@@ -4359,6 +4555,232 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	return restored;
 }
 
+#ifdef USE_LZ4
+static bool
+LZ4_ReadOndiskBufferChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+						   TXNEntryFile *file, XLogSegNo *segno)
+{
+	int			readBytes;
+	ReorderBufferCompressDiskChange *cmp_ondisk;
+	char	   *cmp_header;		/* compressed data header */
+	char	   *cmp_data;		/* compressed data */
+	int			decBytes;		/* decompressed data size */
+	char	   *lz4_out_bufPtr;	/* LZ4 ring buffer entry pointer */
+
+	/*
+	 * Read the statically sized part of a change which has information about
+	 * the total size and compression method. If we couldn't read a record,
+	 * we're at the end of this file.
+	 */
+	cmp_header = (char *) palloc0(sizeof(ReorderBufferCompressDiskChange));
+	readBytes = FileRead(file->vfd, cmp_header,
+						 sizeof(ReorderBufferCompressDiskChange),
+						 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
+
+	/* eof */
+	if (readBytes == 0)
+	{
+
+		FileClose(file->vfd);
+		(*segno)++;
+		pfree(cmp_header);
+
+		return false;
+	}
+	else if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != sizeof(ReorderBufferCompressDiskChange))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) sizeof(ReorderBufferCompressDiskChange))));
+
+	file->curOffset += readBytes;
+
+	cmp_ondisk = (ReorderBufferCompressDiskChange *) cmp_header;
+
+	/* Read ondisk data */
+	cmp_data = (char *) palloc0(cmp_ondisk->size - sizeof(ReorderBufferCompressDiskChange));
+	readBytes = FileRead(file->vfd,
+						 cmp_data,
+						 cmp_ondisk->size - sizeof(ReorderBufferCompressDiskChange),
+						 file->curOffset,
+						 WAIT_EVENT_REORDER_BUFFER_READ);
+
+	if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != cmp_ondisk->size - sizeof(ReorderBufferCompressDiskChange))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) (cmp_ondisk->size - sizeof(ReorderBufferCompressDiskChange)))));
+
+	switch (cmp_ondisk->strat)
+	{
+		case REORDER_BUFFER_NO_COMPRESSION:
+			/*
+			 * No compression: make a copy of what was read on disk into the
+			 * reorder buffer.
+			 */
+			ReorderBufferSerializeReserve(rb, cmp_ondisk->orig_size);
+
+			memcpy(rb->outbuf, cmp_data, cmp_ondisk->orig_size);
+			break;
+		case REORDER_BUFFER_LZ4:
+			/* LZ4 regular decompression */
+
+			/*
+			 * Make sure the output reorder buffer has enough space to store
+			 * decompressed data.
+			 */
+			ReorderBufferSerializeReserve(rb, cmp_ondisk->orig_size);
+
+			decBytes = LZ4_decompress_safe(cmp_data,
+										   rb->outbuf,
+										   cmp_ondisk->size - sizeof(ReorderBufferCompressDiskChange),
+										   cmp_ondisk->orig_size);
+
+			Assert(decBytes == cmp_ondisk->orig_size);
+
+			if (decBytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed LZ4 data is corrupt")));
+			else if (decBytes != cmp_ondisk->orig_size)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("decompressed LZ4 data size differs from original size")));
+			break;
+		case REORDER_BUFFER_LZ4_STREAMING:
+			/* LZ4 streaming decompression */
+			/*
+			 * Allocate LZ4 resources in ReorderBuffer's memory context.
+			 */
+			if (txn->lz4_out_buf == NULL)
+			{
+				MemoryContext oldcontext = MemoryContextSwitchTo(rb->context);
+
+				txn->lz4_out_buf = (char *) palloc0(LZ4_RING_BUFFER_SIZE);
+				txn->lz4_stream_decode = LZ4_createStreamDecode();
+
+				MemoryContextSwitchTo(oldcontext);
+			}
+
+			/* Ring buffer offset wraparound */
+			if ((txn->lz4_out_buf_offset + cmp_ondisk->orig_size) > LZ4_RING_BUFFER_SIZE)
+				txn->lz4_out_buf_offset = 0;
+
+			/* Get the pointer of the current entry in the ring buffer */
+			lz4_out_bufPtr = txn->lz4_out_buf + txn->lz4_out_buf_offset;
+
+			decBytes = LZ4_decompress_safe_continue(txn->lz4_stream_decode,
+													cmp_data,
+													lz4_out_bufPtr,
+													cmp_ondisk->size - sizeof(ReorderBufferCompressDiskChange),
+													cmp_ondisk->orig_size);
+
+			Assert(decBytes == cmp_ondisk->orig_size);
+
+			if (decBytes < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("compressed LZ4 data is corrupt")));
+			else if (decBytes != cmp_ondisk->orig_size)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("decompressed LZ4 data size differs from original size")));
+			/*
+			 * Make sure the output reorder buffer has enough space to store
+			 * decompressed data.
+			 */
+			ReorderBufferSerializeReserve(rb, cmp_ondisk->orig_size);
+
+			memcpy(rb->outbuf, lz4_out_bufPtr, decBytes);
+
+			/* Move the output ring buffer offset */
+			txn->lz4_out_buf_offset += decBytes;
+			break;
+	}
+
+	pfree(cmp_data);
+	pfree(cmp_header);
+
+	file->curOffset += readBytes;
+
+	return true;
+}
+#else
+static bool
+ReadOndiskBufferChange(ReorderBuffer *rb, TXNEntryFile *file, XLogSegNo *segno)
+{
+	int			readBytes;
+	ReorderBufferDiskChange *ondisk;
+
+	/*
+	 * Read the statically sized part of a change which has information about
+	 * the total size. If we couldn't read a record, we're at the end of this
+	 * file.
+	 */
+	ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
+	readBytes = FileRead(file->vfd, rb->outbuf,
+						 sizeof(ReorderBufferDiskChange),
+						 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
+
+	/* eof */
+	if (readBytes == 0)
+	{
+		FileClose(file->vfd);
+		(*segno)++;
+		return false;
+	}
+	else if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != sizeof(ReorderBufferDiskChange))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) sizeof(ReorderBufferDiskChange))));
+
+	file->curOffset += readBytes;
+
+	ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+
+	ReorderBufferSerializeReserve(rb,
+								  sizeof(ReorderBufferDiskChange) + ondisk->size);
+	ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+
+	readBytes = FileRead(file->vfd,
+						 rb->outbuf + sizeof(ReorderBufferDiskChange),
+						 ondisk->size - sizeof(ReorderBufferDiskChange),
+						 file->curOffset,
+						 WAIT_EVENT_REORDER_BUFFER_READ);
+
+	if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != ondisk->size - sizeof(ReorderBufferDiskChange))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) (ondisk->size - sizeof(ReorderBufferDiskChange)))));
+
+	file->curOffset += readBytes;
+
+	return true;
+}
+#endif
+
 /*
  * Convert change from its on-disk format to in-memory format and queue it onto
  * the TXN's ->changes list.
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 851a001c8b..2ee35e5f7a 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -9,6 +9,10 @@
 #ifndef REORDERBUFFER_H
 #define REORDERBUFFER_H
 
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
 #include "access/htup_details.h"
 #include "lib/ilist.h"
 #include "lib/pairingheap.h"
@@ -422,8 +426,26 @@ typedef struct ReorderBufferTXN
 	 * Private data pointer of the output plugin.
 	 */
 	void	   *output_plugin_private;
+#ifdef USE_LZ4
+	LZ4_stream_t *lz4_stream;
+	LZ4_streamDecode_t *lz4_stream_decode;
+	/* LZ4 in/out ring buffers used for streaming compression */
+	char	   *lz4_in_buf;
+	int			lz4_in_buf_offset;
+	char	   *lz4_out_buf;
+	int			lz4_out_buf_offset;
+#endif
 } ReorderBufferTXN;
 
+#ifdef USE_LZ4
+/*
+ * We use a fairly small LZ4 ring buffer size (64kB). Using a larger buffer
+ * size provide better compression ratio, but as long as we have to allocate
+ * two LZ4 ring buffers per ReorderBufferTXN created, we should keep it small.
+ */
+#define LZ4_RING_BUFFER_SIZE (64 * 1024)
+#endif
+
 /* so we can define the callbacks used inside struct ReorderBuffer itself */
 typedef struct ReorderBuffer ReorderBuffer;
 
-- 
2.43.0

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Julien Tachoires (#1)
Re: Compress ReorderBuffer spill files using LZ4

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

2. Do we want a GUC to switch compression on/off?

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

--
With Regards,
Amit Kapila.

#3Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Kapila (#2)
Re: Compress ReorderBuffer spill files using LZ4

On Thu, Jun 6, 2024 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

2. Do we want a GUC to switch compression on/off?

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

I think it depends on the trade-off between the I/O savings from
reducing the data size and the performance cost of compressing and
decompressing the data. This balance is highly dependent on the
hardware. For example, if you have a very slow disk and a powerful
processor, compression could be advantageous. Conversely, if the disk
is very fast, the I/O savings might be minimal, and the compression
overhead could outweigh the benefits. Additionally, the effectiveness
of compression also depends on the compression ratio, which varies
with the type of data being compressed.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#4Julien Tachoires
julmon@gmail.com
In reply to: Amit Kapila (#2)
Re: Compress ReorderBuffer spill files using LZ4

Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit :

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

That's right, setting subscription's option 'streaming' to 'on' moves
the problem away from the publisher to the subscribers. This patch
tries to improve the default situation when 'streaming' is set to
'off'.

2. Do we want a GUC to switch compression on/off?

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

Quick benchmarking executed on my laptop shows 1% overhead.

Table DDL:
CREATE TABLE t (i INTEGER PRIMARY KEY, t TEXT);

Data generated with:
INSERT INTO t SELECT i, 'Text number n°'||i::TEXT FROM
generate_series(1, 10000000) as i;

Restoration duration measured using timestamps of log messages:
"DEBUG: restored XXXX/YYYY changes from disk"

HEAD: 25.54s, 25.94s, 25.516s, 26.267s, 26.11s / avg=25.874s
Patch: 26.872s, 26.311s, 25.753s, 26.003, 25.843s / avg=26.156s

Regards,

JT

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Julien Tachoires (#4)
Re: Compress ReorderBuffer spill files using LZ4

On Thu, Jun 6, 2024 at 6:22 PM Julien Tachoires <julmon@gmail.com> wrote:

Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit :

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

That's right, setting subscription's option 'streaming' to 'on' moves
the problem away from the publisher to the subscribers. This patch
tries to improve the default situation when 'streaming' is set to
'off'.

Can we think of changing the default to 'parallel'? BTW, it would be
better to use 'parallel' for the 'streaming' option, if the workload
has large transactions. Is there a reason to use a default value in
this case?

2. Do we want a GUC to switch compression on/off?

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

Quick benchmarking executed on my laptop shows 1% overhead.

Thanks. We probably need different types of data (say random data in
bytea column, etc.) for this.

--
With Regards,
Amit Kapila.

#6Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Amit Kapila (#2)
Re: Compress ReorderBuffer spill files using LZ4

On 2024-Jun-06, Amit Kapila wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

I think a GUC would be a good idea. Also, what if for whatever reason
you want a different compression algorithm or different compression
parameters? Looking at the existing compression UI we offer in
pg_basebackup, perhaps you could add something like this:

compress_logical_decoding = none
compress_logical_decoding = lz4:42
compress_logical_decoding = spill-zstd:99

"none" says to never use compression (perhaps should be the default),
"lz4:42" says to use lz4 with parameters 42 on both spilling and
streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
only for spilling to disk.

(I don't mean to say that you should implement Zstd compression with
this patch, only that you should choose the implementation so that
adding Zstd support (or whatever) later is just a matter of adding some
branches here and there. With the current #ifdef you propose, it's hard
to do that. Maybe separate the parts that depend on the specific
algorithm to algorithm-agnostic functions.)

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/

#7Julien Tachoires
julmon@gmail.com
In reply to: Amit Kapila (#5)
Re: Compress ReorderBuffer spill files using LZ4

Le jeu. 6 juin 2024 à 06:40, Amit Kapila <amit.kapila16@gmail.com> a écrit :

On Thu, Jun 6, 2024 at 6:22 PM Julien Tachoires <julmon@gmail.com> wrote:

Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit :

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

That's right, setting subscription's option 'streaming' to 'on' moves
the problem away from the publisher to the subscribers. This patch
tries to improve the default situation when 'streaming' is set to
'off'.

Can we think of changing the default to 'parallel'? BTW, it would be
better to use 'parallel' for the 'streaming' option, if the workload
has large transactions. Is there a reason to use a default value in
this case?

You're certainly right, if using the streaming API helps to avoid bad
situations and there is no downside, it could be used by default.

2. Do we want a GUC to switch compression on/off?

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

Quick benchmarking executed on my laptop shows 1% overhead.

Thanks. We probably need different types of data (say random data in
bytea column, etc.) for this.

Yes, good idea, will run new tests in that sense.

Thank you!

Regards,

JT

#8Julien Tachoires
julmon@gmail.com
In reply to: Alvaro Herrera (#6)
Re: Compress ReorderBuffer spill files using LZ4

Le jeu. 6 juin 2024 à 07:24, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :

On 2024-Jun-06, Amit Kapila wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

Interesting idea, will try to evaluate how to compress/decompress data
transiting via streaming and how good the compression ratio would be.

I think a GUC would be a good idea. Also, what if for whatever reason
you want a different compression algorithm or different compression
parameters? Looking at the existing compression UI we offer in
pg_basebackup, perhaps you could add something like this:

compress_logical_decoding = none
compress_logical_decoding = lz4:42
compress_logical_decoding = spill-zstd:99

"none" says to never use compression (perhaps should be the default),
"lz4:42" says to use lz4 with parameters 42 on both spilling and
streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
only for spilling to disk.

I agree, if the server was compiled with support of multiple
compression libraries, users should be able to choose which one they
want to use.

(I don't mean to say that you should implement Zstd compression with
this patch, only that you should choose the implementation so that
adding Zstd support (or whatever) later is just a matter of adding some
branches here and there. With the current #ifdef you propose, it's hard
to do that. Maybe separate the parts that depend on the specific
algorithm to algorithm-agnostic functions.)

Makes sense, will rework this patch in that way.

Thank you!

Regards,

JT

#9Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#6)
Re: Compress ReorderBuffer spill files using LZ4

On Thu, Jun 6, 2024 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Jun-06, Amit Kapila wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

+1

I think a GUC would be a good idea. Also, what if for whatever reason
you want a different compression algorithm or different compression
parameters? Looking at the existing compression UI we offer in
pg_basebackup, perhaps you could add something like this:

compress_logical_decoding = none
compress_logical_decoding = lz4:42
compress_logical_decoding = spill-zstd:99

"none" says to never use compression (perhaps should be the default),
"lz4:42" says to use lz4 with parameters 42 on both spilling and
streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
only for spilling to disk.

I think the compression option should be supported at the CREATE
SUBSCRIPTION level instead of being controlled by a GUC. This way, we
can decide on compression for each subscription individually rather
than applying it to all subscribers. It makes more sense for the
subscriber to control this, especially when we are planning to
compress the data sent downstream.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#10Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dilip Kumar (#9)
Re: Compress ReorderBuffer spill files using LZ4

On 2024-Jun-07, Dilip Kumar wrote:

I think the compression option should be supported at the CREATE
SUBSCRIPTION level instead of being controlled by a GUC. This way, we
can decide on compression for each subscription individually rather
than applying it to all subscribers. It makes more sense for the
subscriber to control this, especially when we are planning to
compress the data sent downstream.

True. (I think we have some options that are in GUCs for the general
behavior and can be overridden by per-subscription options for specific
tailoring; would that make sense here? I think it does, considering
that what we mostly want is to save disk space in the publisher when
spilling to disk.)

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu. Five minutes later I realize that it's also talking
about food" (Donald Knuth)

#11Dilip Kumar
dilipbalaut@gmail.com
In reply to: Alvaro Herrera (#10)
Re: Compress ReorderBuffer spill files using LZ4

On Fri, Jun 7, 2024 at 2:39 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Jun-07, Dilip Kumar wrote:

I think the compression option should be supported at the CREATE
SUBSCRIPTION level instead of being controlled by a GUC. This way, we
can decide on compression for each subscription individually rather
than applying it to all subscribers. It makes more sense for the
subscriber to control this, especially when we are planning to
compress the data sent downstream.

True. (I think we have some options that are in GUCs for the general
behavior and can be overridden by per-subscription options for specific
tailoring; would that make sense here? I think it does, considering
that what we mostly want is to save disk space in the publisher when
spilling to disk.)

Yeah, that makes sense.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#6)
Re: Compress ReorderBuffer spill files using LZ4

On Thu, Jun 6, 2024 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2024-Jun-06, Amit Kapila wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

Fair enough. it would be an interesting feature if we see the wider
usefulness of compression/decompression of logical changes. For
example, if this can improve the performance of applying large
transactions (aka reduce the apply lag for them) even when the
'streaming' option is 'parallel' then it would have a much wider
impact.

--
With Regards,
Amit Kapila.

#13Amit Kapila
amit.kapila16@gmail.com
In reply to: Dilip Kumar (#9)
Re: Compress ReorderBuffer spill files using LZ4

On Fri, Jun 7, 2024 at 2:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

I think the compression option should be supported at the CREATE
SUBSCRIPTION level instead of being controlled by a GUC. This way, we
can decide on compression for each subscription individually rather
than applying it to all subscribers. It makes more sense for the
subscriber to control this, especially when we are planning to
compress the data sent downstream.

Yes, that makes sense. However, we then need to provide this option
via SQL APIs as well for other plugins.

--
With Regards,
Amit Kapila.

#14Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Alvaro Herrera (#6)
Re: Compress ReorderBuffer spill files using LZ4

On 6/6/24 16:24, Alvaro Herrera wrote:

On 2024-Jun-06, Amit Kapila wrote:

On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

I think a GUC would be a good idea. Also, what if for whatever reason
you want a different compression algorithm or different compression
parameters? Looking at the existing compression UI we offer in
pg_basebackup, perhaps you could add something like this:

compress_logical_decoding = none
compress_logical_decoding = lz4:42
compress_logical_decoding = spill-zstd:99

"none" says to never use compression (perhaps should be the default),
"lz4:42" says to use lz4 with parameters 42 on both spilling and
streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
only for spilling to disk.

(I don't mean to say that you should implement Zstd compression with
this patch, only that you should choose the implementation so that
adding Zstd support (or whatever) later is just a matter of adding some
branches here and there. With the current #ifdef you propose, it's hard
to do that. Maybe separate the parts that depend on the specific
algorithm to algorithm-agnostic functions.)

I haven't been following the "libpq compression" thread, but wouldn't
that also do compression for the streaming case? That was my assumption,
at least, and it seems like the right way - we probably don't want to
patch every place that sends data over network independently, right?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Julien Tachoires (#1)
Re: Compress ReorderBuffer spill files using LZ4

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Julien Tachoires
julmon@gmail.com
In reply to: Tomas Vondra (#15)
Re: Compress ReorderBuffer spill files using LZ4

Le ven. 7 juin 2024 à 05:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

That's right, reworking this patch in that sense.

Regards,

JT

#17Julien Tachoires
julmon@gmail.com
In reply to: Julien Tachoires (#16)
7 attachment(s)
Re: Compress ReorderBuffer spill files using LZ4

Hi,

Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit :

Le ven. 7 juin 2024 à 05:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

That's right, reworking this patch in that sense.

Please find a new version of this patch adding support for LZ4, pglz
and ZSTD. It introduces the new GUC logical_decoding_spill_compression
which is used to set the compression method. In order to stay aligned
with the other server side GUCs related to compression methods
(wal_compression, default_toast_compression), the compression level is
not exposed to users.

The last patch of this set is still in WIP, it adds the machinery
required for setting the compression methods as a subscription option:
CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
I think there is a major problem with this approach: the logical
decoding context is tied to one replication slot, but multiple
subscriptions can use the same replication slot. How should this work
if 2 subscriptions want to use the same replication slot but different
compression methods?

At this point, compression is only available for the changes spilled
on disk. It is still not clear to me if the compression of data
transiting through the streaming protocol should be addressed by this
patch set or by another one. Thought ?

Regards,

JT

Attachments:

v2-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchapplication/octet-stream; name=v2-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchDownload
From 2423e35294425d2415d456b5ad920e1dea22279d Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 6 Jun 2024 00:57:38 -0700
Subject: [PATCH 1/7] Compress ReorderBuffer spill files using LZ4

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.

This behavior happens only when the subscriber's option "streaming"
is set to "off", which is the default value.

In this case, large transactions decoding by multiple replication
slots can lead to disk space saturation and high I/O utilization.

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk is now compressed and wrapped in
a new structure named ReorderBufferDiskHeader.

3 different compression strategies are currently implemented:

1. LZ4 streaming compression is the preferred one and works
   efficiently for small individual changes.
2. LZ4 regular compression when the changes are too large for using
   LZ4 streaming API.
3. No compression.
---
 src/backend/replication/logical/Makefile      |   1 +
 src/backend/replication/logical/meson.build   |   1 +
 .../replication/logical/reorderbuffer.c       | 201 ++++---
 .../logical/reorderbuffer_compression.c       | 502 ++++++++++++++++++
 src/include/replication/reorderbuffer.h       |   6 +
 .../replication/reorderbuffer_compression.h   |  95 ++++
 6 files changed, 723 insertions(+), 83 deletions(-)
 create mode 100644 src/backend/replication/logical/reorderbuffer_compression.c
 create mode 100644 src/include/replication/reorderbuffer_compression.h

diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index ba03eeff1c..88bf698a53 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -25,6 +25,7 @@ OBJS = \
 	proto.o \
 	relation.o \
 	reorderbuffer.o \
+	reorderbuffer_compression.o \
 	slotsync.o \
 	snapbuild.o \
 	tablesync.o \
diff --git a/src/backend/replication/logical/meson.build b/src/backend/replication/logical/meson.build
index 3dec36a6de..f0dd82bae2 100644
--- a/src/backend/replication/logical/meson.build
+++ b/src/backend/replication/logical/meson.build
@@ -11,6 +11,7 @@ backend_sources += files(
   'proto.c',
   'relation.c',
   'reorderbuffer.c',
+  'reorderbuffer_compression.c',
   'slotsync.c',
   'snapbuild.c',
   'tablesync.c',
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 00a8327e77..4e08167d03 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -102,6 +102,7 @@
 #include "pgstat.h"
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slot.h"
 #include "replication/snapbuild.h"	/* just for SnapBuildSnapDecRefcount */
 #include "storage/bufmgr.h"
@@ -112,6 +113,13 @@
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
 
+/* GUC */
+#ifdef USE_LZ4
+int			logical_decoding_spill_compression = REORDER_BUFFER_LZ4_COMPRESSION;
+#else
+int			logical_decoding_spill_compression = REORDER_BUFFER_NO_COMPRESSION;
+#endif
+
 /* entry for a hash table we use to map from xid to our transaction state */
 typedef struct ReorderBufferTXNByIdEnt
 {
@@ -173,14 +181,6 @@ typedef struct ReorderBufferToastEnt
 									 * main tup */
 } ReorderBufferToastEnt;
 
-/* Disk serialization support datastructures */
-typedef struct ReorderBufferDiskChange
-{
-	Size		size;
-	ReorderBufferChange change;
-	/* data follows */
-} ReorderBufferDiskChange;
-
 #define IsSpecInsert(action) \
 ( \
 	((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT) \
@@ -255,6 +255,8 @@ static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *tx
 										 int fd, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										TXNEntryFile *file, XLogSegNo *segno);
+static bool ReorderBufferReadOndiskChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+								   TXNEntryFile *file, XLogSegNo *segno);
 static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 									   char *data);
 static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
@@ -427,6 +429,8 @@ ReorderBufferGetTXN(ReorderBuffer *rb)
 	/* InvalidCommandId is not zero, so set it explicitly */
 	txn->command_id = InvalidCommandId;
 	txn->output_plugin_private = NULL;
+	txn->compressor_state = ReorderBufferNewCompressorState(rb->context,
+															logical_decoding_spill_compression);
 
 	return txn;
 }
@@ -464,6 +468,10 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	ReorderBufferFreeCompressorState(rb->context,
+									 logical_decoding_spill_compression,
+									 txn->compressor_state);
+
 	/* Reset the toast hash */
 	ReorderBufferToastReset(rb, txn);
 
@@ -3776,13 +3784,13 @@ static void
 ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 							 int fd, ReorderBufferChange *change)
 {
-	ReorderBufferDiskChange *ondisk;
-	Size		sz = sizeof(ReorderBufferDiskChange);
+	ReorderBufferDiskHeader *disk_hdr;
+	Size		sz = sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 	ReorderBufferSerializeReserve(rb, sz);
 
-	ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-	memcpy(&ondisk->change, change, sizeof(ReorderBufferChange));
+	disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+	memcpy((char *)rb->outbuf + sizeof(ReorderBufferDiskHeader), change, sizeof(ReorderBufferChange));
 
 	switch (change->action)
 	{
@@ -3818,9 +3826,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				if (oldlen)
 				{
@@ -3850,10 +3858,10 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					sizeof(Size) + sizeof(Size);
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				/* write the prefix including the size */
 				memcpy(data, &prefix_size, sizeof(Size));
@@ -3880,10 +3888,10 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				sz += inval_size;
 
 				ReorderBufferSerializeReserve(rb, sz);
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 				memcpy(data, change->data.inval.invalidations, inval_size);
 				data += inval_size;
 
@@ -3902,9 +3910,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				memcpy(data, snap, sizeof(SnapshotData));
 				data += sizeof(SnapshotData);
@@ -3936,9 +3944,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				memcpy(data, change->data.truncate.relids, size);
 				data += size;
@@ -3953,11 +3961,14 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			break;
 	}
 
-	ondisk->size = sz;
+	/* Inplace ReorderBuffer content compression before writing it on disk */
+	ReorderBufferCompress(rb, &disk_hdr, logical_decoding_spill_compression,
+						  sz, txn->compressor_state);
 
 	errno = 0;
 	pgstat_report_wait_start(WAIT_EVENT_REORDER_BUFFER_WRITE);
-	if (write(fd, rb->outbuf, ondisk->size) != ondisk->size)
+
+	if (write(fd, rb->outbuf, disk_hdr->size) != disk_hdr->size)
 	{
 		int			save_errno = errno;
 
@@ -3982,8 +3993,6 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	 */
 	if (txn->final_lsn < change->lsn)
 		txn->final_lsn = change->lsn;
-
-	Assert(ondisk->change.action == change->action);
 }
 
 /* Returns true, if the output plugin supports streaming, false, otherwise. */
@@ -4252,9 +4261,6 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 	while (restored < max_changes_in_memory && *segno <= last_segno)
 	{
-		int			readBytes;
-		ReorderBufferDiskChange *ondisk;
-
 		CHECK_FOR_INTERRUPTS();
 
 		if (*fd == -1)
@@ -4293,60 +4299,15 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		}
 
 		/*
-		 * Read the statically sized part of a change which has information
-		 * about the total size. If we couldn't read a record, we're at the
-		 * end of this file.
+		 * Read the full change from disk.
+		 * If ReorderBufferReadOndiskChange returns false, then we are at the
+		 * eof, so, move the next segment.
 		 */
-		ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
-		readBytes = FileRead(file->vfd, rb->outbuf,
-							 sizeof(ReorderBufferDiskChange),
-							 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
-
-		/* eof */
-		if (readBytes == 0)
+		if (!ReorderBufferReadOndiskChange(rb, txn, file, segno))
 		{
-			FileClose(*fd);
 			*fd = -1;
-			(*segno)++;
 			continue;
 		}
-		else if (readBytes < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: %m")));
-		else if (readBytes != sizeof(ReorderBufferDiskChange))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
-							readBytes,
-							(uint32) sizeof(ReorderBufferDiskChange))));
-
-		file->curOffset += readBytes;
-
-		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-
-		ReorderBufferSerializeReserve(rb,
-									  sizeof(ReorderBufferDiskChange) + ondisk->size);
-		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-
-		readBytes = FileRead(file->vfd,
-							 rb->outbuf + sizeof(ReorderBufferDiskChange),
-							 ondisk->size - sizeof(ReorderBufferDiskChange),
-							 file->curOffset,
-							 WAIT_EVENT_REORDER_BUFFER_READ);
-
-		if (readBytes < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: %m")));
-		else if (readBytes != ondisk->size - sizeof(ReorderBufferDiskChange))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
-							readBytes,
-							(uint32) (ondisk->size - sizeof(ReorderBufferDiskChange)))));
-
-		file->curOffset += readBytes;
 
 		/*
 		 * ok, read a full change from disk, now restore it into proper
@@ -4359,6 +4320,83 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	return restored;
 }
 
+/*
+ * Read a change spilled to disk and decompress it if compressed.
+ */
+static bool
+ReorderBufferReadOndiskChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+							  TXNEntryFile *file, XLogSegNo *segno)
+{
+	int			readBytes;
+	ReorderBufferDiskHeader *disk_hdr;
+	char	   *header;			/* disk header buffer*/
+	char	   *data;			/* data buffer */
+
+	/*
+	 * Read the statically sized part of a change which has information about
+	 * the total size and compression method. If we couldn't read a record,
+	 * we're at the end of this file.
+	 */
+	header = (char *) palloc0(sizeof(ReorderBufferDiskHeader));
+	readBytes = FileRead(file->vfd, header,
+						 sizeof(ReorderBufferDiskHeader),
+						 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
+
+	/* eof */
+	if (readBytes == 0)
+	{
+
+		FileClose(file->vfd);
+		(*segno)++;
+		pfree(header);
+
+		return false;
+	}
+	else if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != sizeof(ReorderBufferDiskHeader))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) sizeof(ReorderBufferDiskHeader))));
+
+	file->curOffset += readBytes;
+
+	disk_hdr = (ReorderBufferDiskHeader *) header;
+
+	/* Read ondisk data */
+	data = (char *) palloc0(disk_hdr->size - sizeof(ReorderBufferDiskHeader));
+	readBytes = FileRead(file->vfd,
+						 data,
+						 disk_hdr->size - sizeof(ReorderBufferDiskHeader),
+						 file->curOffset,
+						 WAIT_EVENT_REORDER_BUFFER_READ);
+
+	if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != (disk_hdr->size - sizeof(ReorderBufferDiskHeader)))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) (disk_hdr->size - sizeof(ReorderBufferDiskHeader)))));
+
+	/* Decompress data */
+	ReorderBufferDecompress(rb, data, disk_hdr, txn->compressor_state);
+
+	pfree(data);
+	pfree(header);
+
+	file->curOffset += readBytes;
+
+	return true;
+}
+
 /*
  * Convert change from its on-disk format to in-memory format and queue it onto
  * the TXN's ->changes list.
@@ -4371,17 +4409,14 @@ static void
 ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 						   char *data)
 {
-	ReorderBufferDiskChange *ondisk;
 	ReorderBufferChange *change;
 
-	ondisk = (ReorderBufferDiskChange *) data;
-
 	change = ReorderBufferGetChange(rb);
 
 	/* copy static part */
-	memcpy(change, &ondisk->change, sizeof(ReorderBufferChange));
+	memcpy(change, data + sizeof(ReorderBufferDiskHeader), sizeof(ReorderBufferChange));
 
-	data += sizeof(ReorderBufferDiskChange);
+	data += sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 	/* restore individual stuff */
 	switch (change->action)
diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
new file mode 100644
index 0000000000..77f5c76929
--- /dev/null
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -0,0 +1,502 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer_compression.c
+ *	  Functions for ReorderBuffer compression.
+ *
+ * Copyright (c) 2024-2024, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/common/reorderbuffer_compression.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#include "replication/reorderbuffer_compression.h"
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
+
+/*
+ * Allocate a new LZ4StreamingCompressorState.
+ */
+static void *
+lz4_NewCompressorState(MemoryContext context)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+	return NULL;				/* keep compiler quiet */
+#else
+	LZ4StreamingCompressorState *cstate;
+
+	cstate = (LZ4StreamingCompressorState *)
+		MemoryContextAlloc(context, sizeof(LZ4StreamingCompressorState));
+
+	/*
+	 * We do not allocate LZ4 ring buffers and streaming handlers at this
+	 * point because we have no guarantee that we will need them later. Let's
+	 * allocate only when we are about to use them.
+	 */
+	cstate->lz4_in_buf = NULL;
+	cstate->lz4_out_buf = NULL;
+	cstate->lz4_in_buf_offset = 0;
+	cstate->lz4_out_buf_offset = 0;
+	cstate->lz4_stream = NULL;
+	cstate->lz4_stream_decode = NULL;
+
+	return (void *) cstate;
+#endif
+}
+
+/*
+ * Free LZ4 memory resources and the compressor state.
+ */
+static void
+lz4_FreeCompressorState(MemoryContext context, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext;
+
+	if (compressor_state == NULL)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	if (cstate->lz4_in_buf != NULL)
+	{
+		pfree(cstate->lz4_in_buf);
+		LZ4_freeStream(cstate->lz4_stream);
+	}
+	if (cstate->lz4_out_buf != NULL)
+	{
+		pfree(cstate->lz4_out_buf);
+		LZ4_freeStreamDecode(cstate->lz4_stream_decode);
+	}
+
+	pfree(compressor_state);
+
+	MemoryContextSwitchTo(oldcontext);
+#endif
+}
+
+#ifdef USE_LZ4
+/*
+ * Allocate LZ4 input ring buffer and create the streaming compression handler.
+ */
+static void
+lz4_CreateStreamCompressorState(MemoryContext context, void *compressor_state)
+{
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+	cstate->lz4_in_buf = (char *) palloc0(LZ4_RING_BUFFER_SIZE);
+	cstate->lz4_stream = LZ4_createStream();
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+#ifdef USE_LZ4
+/*
+ * Allocate LZ4 output ring buffer and create the streaming decompression
+ * handler.
+ */
+static void
+lz4_CreateStreamDecodeCompressorState(MemoryContext context,
+									  void *compressor_state)
+{
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+	cstate->lz4_out_buf = (char *) palloc0(LZ4_RING_BUFFER_SIZE);
+	cstate->lz4_stream_decode = LZ4_createStreamDecode();
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+/*
+ * Data compression using LZ4 streaming API.
+ * Caller must ensure that the source data can fit in LZ4 input ring buffer,
+ * this checking must be done by lz4_CanDoStreamingCompression().
+ */
+static void
+lz4_StreamingCompressData(MemoryContext context, char *src, Size src_size,
+						  char **dst, Size *dst_size, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	int			lz4_cmp_size = 0;	/* compressed size */
+	char	   *buf;				/* buffer used for compression */
+	Size		buf_size;			/* buffer size */
+	char	   *lz4_in_bufPtr;		/* input ring buffer pointer */
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	/* Allocate LZ4 input ring buffer and streaming compression handler */
+	if (cstate->lz4_in_buf == NULL)
+		lz4_CreateStreamCompressorState(context, compressor_state);
+
+	/* Ring buffer offset wraparound */
+	if ((cstate->lz4_in_buf_offset + src_size) > LZ4_RING_BUFFER_SIZE)
+		cstate->lz4_in_buf_offset = 0;
+
+	/* Get the pointer of the next entry in the ring buffer */
+	lz4_in_bufPtr = cstate->lz4_in_buf + cstate->lz4_in_buf_offset;
+
+	/* Copy data that should be compressed into LZ4 input ring buffer */
+	memcpy(lz4_in_bufPtr, src, src_size);
+
+	/* Allocate space for compressed data */
+	buf_size = LZ4_COMPRESSBOUND(src_size);
+	buf = (char *) palloc0(buf_size);
+
+	/* Use LZ4 streaming compression API */
+	lz4_cmp_size = LZ4_compress_fast_continue(cstate->lz4_stream,
+											  lz4_in_bufPtr, buf, src_size,
+											  buf_size, 1);
+
+	if (lz4_cmp_size <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("LZ4 compression failed")));
+
+	/* Move the input ring buffer offset */
+	cstate->lz4_in_buf_offset += src_size;
+
+	*dst_size = lz4_cmp_size;
+	*dst = buf;
+#endif
+}
+
+/*
+ * Data compression using LZ4 API.
+ */
+static void
+lz4_CompressData(char *src, Size src_size, char **dst,  Size *dst_size)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	int			lz4_cmp_size = 0;	/* compressed size */
+	char	   *buf;				/* buffer used for compression */
+	Size		buf_size;			/* buffer size */
+
+	buf_size = LZ4_COMPRESSBOUND(src_size);
+	buf = (char *) palloc0(buf_size);
+
+	/* Use LZ4 regular compression API */
+	lz4_cmp_size = LZ4_compress_default(src, buf, src_size, buf_size);
+
+	if (lz4_cmp_size <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("LZ4 compression failed")));
+
+	*dst_size = lz4_cmp_size;
+	*dst = buf;
+#endif
+}
+
+/*
+ * Data decompression using LZ4 streaming API.
+ * LZ4 decompression uses the output ring buffer to store decompressed data,
+ * thus, we don't need to create a new buffer. We return the pointer to data
+ * location.
+ */
+static void
+lz4_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
+							char **dst, Size dst_size, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	char	   *lz4_out_bufPtr;		/* output ring buffer pointer */
+	int			lz4_dec_size;		/* decompressed data size */
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	/* Allocate LZ4 output ring buffer and streaming decompression handler */
+	if (cstate->lz4_out_buf == NULL)
+		lz4_CreateStreamDecodeCompressorState(context, compressor_state);
+
+	/* Ring buffer offset wraparound */
+	if ((cstate->lz4_out_buf_offset + dst_size) > LZ4_RING_BUFFER_SIZE)
+		cstate->lz4_out_buf_offset = 0;
+
+	/* Get current entry pointer in the ring buffer */
+	lz4_out_bufPtr = cstate->lz4_out_buf + cstate->lz4_out_buf_offset;
+
+	lz4_dec_size = LZ4_decompress_safe_continue(cstate->lz4_stream_decode,
+												src,
+												lz4_out_bufPtr,
+												src_size,
+												dst_size);
+
+	Assert(lz4_dec_size == dst_size);
+
+	if (lz4_dec_size < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("compressed LZ4 data is corrupted")));
+	else if (lz4_dec_size != dst_size)
+		ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			 errmsg_internal("decompressed LZ4 data size differs from original size")));
+
+	/* Move the output ring buffer offset */
+	cstate->lz4_out_buf_offset += lz4_dec_size;
+
+	/* Point to the decompressed data location */
+	*dst = lz4_out_bufPtr;
+#endif
+}
+
+/*
+ * Data decompression using LZ4 API.
+ */
+static void
+lz4_DecompressData(char *src, Size src_size, char **dst, Size dst_size)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	int			lz4_dec_bytes;
+	char	   *buf;
+
+	buf = (char *) palloc0(dst_size);
+
+	lz4_dec_bytes = LZ4_decompress_safe(src, buf, src_size, dst_size);
+
+	Assert(lz4_dec_bytes == dst_size);
+
+	if (lz4_dec_bytes < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("compressed LZ4 data is corrupted")));
+	else if (lz4_dec_bytes != dst_size)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("decompressed LZ4 data size differs from original size")));
+
+	*dst = buf;
+#endif
+}
+
+/*
+ * Allocate a new Compressor State, depending on the compression method.
+ */
+void *
+ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
+{
+	switch (compression_method)
+	{
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+			return lz4_NewCompressorState(context);
+			break;
+		case REORDER_BUFFER_NO_COMPRESSION:
+		default:
+			return NULL;
+			break;
+	}
+}
+
+/*
+ * Free memory allocated to a Compressor State, depending on the compression
+ * method.
+ */
+void
+ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
+								 void *compressor_state)
+{
+	switch (compression_method)
+	{
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+			return lz4_FreeCompressorState(context, compressor_state);
+			break;
+		case REORDER_BUFFER_NO_COMPRESSION:
+		default:
+			break;
+	}
+}
+
+/*
+ * Ensure the IO buffer is >= sz.
+ */
+static void
+ReorderBufferReserve(ReorderBuffer *rb, Size sz)
+{
+	if (rb->outbufsize < sz)
+	{
+		rb->outbuf = repalloc(rb->outbuf, sz);
+		rb->outbufsize = sz;
+	}
+}
+
+/*
+ * Compress ReorderBuffer content. This function is called in order to compress
+ * data before spilling on disk.
+ */
+void
+ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
+					  int compression_method, Size data_size,
+					  void *compressor_state)
+{
+	ReorderBufferDiskHeader *hdr = *header;
+
+	switch (compression_method)
+	{
+		/* No compression */
+		case REORDER_BUFFER_NO_COMPRESSION:
+		{
+			hdr->comp_strat = REORDER_BUFFER_STRAT_UNCOMPRESSED;
+			hdr->size = data_size;
+			hdr->raw_size = data_size - sizeof(ReorderBufferDiskHeader);
+
+			break;
+		}
+		/* LZ4 Compression */
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+		{
+			char	   *dst = NULL;
+			Size		dst_size = 0;
+			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+			Size		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+			ReorderBufferCompressionStrategy strat;
+
+			if (lz4_CanDoStreamingCompression(src_size))
+			{
+				/* Use LZ4 streaming compression if possible */
+				lz4_StreamingCompressData(rb->context, src, src_size, &dst,
+										  &dst_size, compressor_state);
+				strat = REORDER_BUFFER_STRAT_LZ4_STREAMING;
+			}
+			else
+			{
+				/* Fallback to LZ4 regular compression */
+				lz4_CompressData(src, src_size, &dst, &dst_size);
+				strat = REORDER_BUFFER_STRAT_LZ4_REGULAR;
+			}
+
+			/*
+			 * Make sure the ReorderBuffer has enough space to store compressed
+			 * data. Compressed data must be smaller than raw data, so, the
+			 * ReorderBuffer should already have room for compressed data, but
+			 * we do this to avoid buffer overflow risks.
+			 */
+			ReorderBufferReserve(rb, (dst_size + sizeof(ReorderBufferDiskHeader)));
+
+			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+			hdr->comp_strat = strat;
+			hdr->size = dst_size + sizeof(ReorderBufferDiskHeader);
+			hdr->raw_size = src_size;
+
+			/*
+			 * Update header: hdr pointer has potentially changed due to
+			 * ReorderBufferReserve()
+			 */
+			*header = hdr;
+
+			/* Copy back compressed data into the ReorderBuffer */
+			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+				   dst_size);
+
+			pfree(dst);
+
+			break;
+		}
+	}
+}
+
+/*
+ * Decompress data read from disk and copy it into the ReorderBuffer.
+ */
+void
+ReorderBufferDecompress(ReorderBuffer *rb, char *data,
+						ReorderBufferDiskHeader *header, void *compressor_state)
+{
+	Size		raw_outbufsize = header->raw_size + sizeof(ReorderBufferDiskHeader);
+	/*
+	 * Make sure the output reorder buffer has enough space to store
+	 * decompressed/raw data.
+	 */
+	if (rb->outbufsize < raw_outbufsize)
+	{
+		rb->outbuf = repalloc(rb->outbuf, raw_outbufsize);
+		rb->outbufsize = raw_outbufsize;
+	}
+
+	/* Make a copy of the header read on disk into the ReorderBuffer */
+	memcpy(rb->outbuf, (char *) header, sizeof(ReorderBufferDiskHeader));
+
+	switch (header->comp_strat)
+	{
+		/* No decompression */
+		case REORDER_BUFFER_STRAT_UNCOMPRESSED:
+			{
+				/*
+				 * Make a copy of what was read on disk into the reorder
+				 * buffer.
+				 */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   data, header->raw_size);
+				break;
+			}
+		/* LZ4 regular decompression */
+		case REORDER_BUFFER_STRAT_LZ4_REGULAR:
+			{
+				char	   *buf;
+				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
+				Size		buf_size = header->raw_size;
+
+				lz4_DecompressData(data, src_size, &buf, buf_size);
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   buf, buf_size);
+
+				pfree(buf);
+				break;
+			}
+		/* LZ4 streaming decompression */
+		case REORDER_BUFFER_STRAT_LZ4_STREAMING:
+			{
+				char	   *buf;
+				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
+				Size		buf_size = header->raw_size;
+
+				lz4_StreamingDecompressData(rb->context, data, src_size, &buf,
+										   buf_size, compressor_state);
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   buf, buf_size);
+				/*
+				 * Not necessary to free buf in this case: it points to the
+				 * decompressed data stored in LZ4 output ring buffer.
+				 */
+				break;
+			}
+		default:
+			/* Other compression methods not yet supported */
+			break;
+	}
+}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 851a001c8b..bf979e0b14 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -9,6 +9,10 @@
 #ifndef REORDERBUFFER_H
 #define REORDERBUFFER_H
 
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
 #include "access/htup_details.h"
 #include "lib/ilist.h"
 #include "lib/pairingheap.h"
@@ -422,6 +426,8 @@ typedef struct ReorderBufferTXN
 	 * Private data pointer of the output plugin.
 	 */
 	void	   *output_plugin_private;
+
+	void	   *compressor_state;
 } ReorderBufferTXN;
 
 /* so we can define the callbacks used inside struct ReorderBuffer itself */
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
new file mode 100644
index 0000000000..9aa8aea56f
--- /dev/null
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -0,0 +1,95 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer_compression.h
+ *	  Functions for ReorderBuffer compression.
+ *
+ * Copyright (c) 2024-2024, PostgreSQL Global Development Group
+ *
+ * src/include/access/reorderbuffer_compression.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef REORDERBUFFER_COMPRESSION_H
+#define REORDERBUFFER_COMPRESSION_H
+
+#include "replication/reorderbuffer.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+/* ReorderBuffer on disk compression algorithms */
+typedef enum ReorderBufferCompressionMethod
+{
+	REORDER_BUFFER_NO_COMPRESSION,
+	REORDER_BUFFER_LZ4_COMPRESSION,
+} ReorderBufferCompressionMethod;
+
+/*
+ * Compression strategy applied to ReorderBuffer records spilled on disk
+ */
+typedef enum ReorderBufferCompressionStrategy
+{
+	REORDER_BUFFER_STRAT_UNCOMPRESSED,
+	REORDER_BUFFER_STRAT_LZ4_STREAMING,
+	REORDER_BUFFER_STRAT_LZ4_REGULAR,
+} ReorderBufferCompressionStrategy;
+
+/* Disk serialization support datastructures */
+typedef struct ReorderBufferDiskHeader
+{
+	ReorderBufferCompressionStrategy comp_strat; /* Compression strategy */
+	Size		size;					/* Ondisk size */
+	Size		raw_size;				/* Raw/uncompressed data size */
+	/* ReorderBufferChange + data follows */
+} ReorderBufferDiskHeader;
+
+#ifdef USE_LZ4
+/*
+ * We use a fairly small LZ4 ring buffer size (64kB). Using a larger buffer
+ * size provide better compression ratio, but as long as we have to allocate
+ * two LZ4 ring buffers per ReorderBufferTXN, we should keep it small.
+ */
+#define LZ4_RING_BUFFER_SIZE (64 * 1024)
+
+/*
+ * Use LZ4 streaming compression iff we can keep at least 2 uncompressed
+ * records into the LZ4 input ring buffer. If raw data size is too large, let's
+ * use regular LZ4 compression.
+ */
+#define lz4_CanDoStreamingCompression(s) (s < (LZ4_RING_BUFFER_SIZE / 2))
+
+/*
+ * LZ4 streaming compression/decompression handlers and ring
+ * buffers.
+ */
+typedef struct LZ4StreamingCompressorState {
+	/* Streaming compression handler */
+	LZ4_stream_t *lz4_stream;
+	/* Streaming decompression handler */
+	LZ4_streamDecode_t *lz4_stream_decode;
+	/* LZ4 in/out ring buffers used for streaming compression */
+	char	   *lz4_in_buf;
+	int			lz4_in_buf_offset;
+	char	   *lz4_out_buf;
+	int			lz4_out_buf_offset;
+} LZ4StreamingCompressorState;
+#else
+#define lz4_CanDoStreamingCompression(s) (false)
+#endif
+
+extern void *ReorderBufferNewCompressorState(MemoryContext context,
+											 int compression_method);
+extern void ReorderBufferFreeCompressorState(MemoryContext context,
+											 int compression_method,
+											 void *compressor_state);
+extern void ReorderBufferCompress(ReorderBuffer *rb,
+								  ReorderBufferDiskHeader **header,
+								  int compression_method, Size data_size,
+								  void *compressor_state);
+extern void ReorderBufferDecompress(ReorderBuffer *rb, char *data,
+									ReorderBufferDiskHeader *header,
+									void *compressor_state);
+
+#endif							/* REORDERBUFFER_COMPRESSION_H */
-- 
2.43.0

v2-0002-Add-GUC-logical_decoding_spill_compression.patchapplication/octet-stream; name=v2-0002-Add-GUC-logical_decoding_spill_compression.patchDownload
From 557c21a0255144af3deff45caf61b8aac7cb2c51 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Sun, 23 Jun 2024 08:22:10 -0700
Subject: [PATCH 2/7] Add GUC logical_decoding_spill_compression

This new GUC defines the compression method used to compress
decoded changes spilled on disk during logical decoding.

Default value is 'off', meaning no compression involved.
---
 doc/src/sgml/config.sgml                      | 25 +++++++++++++++++++
 .../replication/logical/reorderbuffer.c       |  4 ---
 src/backend/utils/misc/guc_tables.c           | 22 ++++++++++++++++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 .../replication/reorderbuffer_compression.h   |  3 +++
 5 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b14c5d81a1..955f4f4a8b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2002,6 +2002,31 @@ include_dir 'conf.d'
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-logical-decoding-spill-files-compression" xreflabel="logical_decoding_spill_compression">
+      <term><varname>logical_decoding_spill_compression</varname> (<type>enum</type>)
+      <indexterm>
+       <primary><varname>logical_decoding_spill_compression</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        This parameter enables compression of decoded changes written to local
+        disk by logical decoding when the transaction size does not fit in
+        <xref linkend="guc-logical-decoding-work-mem"/>.
+        When the subscribtion is created with the option <varname>streaming</varname>
+        set to <varname>on</varname> or <varname>parallel</varname>, then
+        the transaction are not fully decoded on the publisher, then, this
+        parameter has not effect if there is no data to spill on disk.
+        The supported methods are <literal>lz4</literal> (if
+        <productname>PostgreSQL</productname> was compiled with
+        <option>--with-lz4</option>) and <literal>off></literal>.
+        The default value is <literal>off</literal>.
+        Only superusers and users with the appropriate <literal>SET</literal>
+        privilege can change this setting.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-commit-timestamp-buffers" xreflabel="commit_timestamp_buffers">
       <term><varname>commit_timestamp_buffers</varname> (<type>integer</type>)
       <indexterm>
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 4e08167d03..ef2cd8bdf3 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -114,11 +114,7 @@
 #include "utils/relfilenumbermap.h"
 
 /* GUC */
-#ifdef USE_LZ4
-int			logical_decoding_spill_compression = REORDER_BUFFER_LZ4_COMPRESSION;
-#else
 int			logical_decoding_spill_compression = REORDER_BUFFER_NO_COMPRESSION;
-#endif
 
 /* entry for a hash table we use to map from xid to our transaction state */
 typedef struct ReorderBufferTXNByIdEnt
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 630ed0f162..27ce376fd4 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -67,6 +67,7 @@
 #include "postmaster/walsummarizer.h"
 #include "postmaster/walwriter.h"
 #include "replication/logicallauncher.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slot.h"
 #include "replication/slotsync.h"
 #include "replication/syncrep.h"
@@ -484,6 +485,17 @@ static const struct config_enum_entry wal_compression_options[] = {
 	{NULL, 0, false}
 };
 
+static const struct config_enum_entry logical_decoding_spill_compression_options[] = {
+#ifdef  USE_LZ4
+	{"lz4", REORDER_BUFFER_LZ4_COMPRESSION, false},
+#endif
+	{"off", REORDER_BUFFER_NO_COMPRESSION, false},
+	{"false", REORDER_BUFFER_NO_COMPRESSION, true},
+	{"no", REORDER_BUFFER_NO_COMPRESSION, true},
+	{"0", REORDER_BUFFER_NO_COMPRESSION, true},
+	{NULL, 0, false}
+};
+
 /*
  * Options for enum values stored in other modules
  */
@@ -5120,6 +5132,16 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"logical_decoding_spill_compression", PGC_SUSET, RESOURCES_DISK,
+			gettext_noop("Compresses logical decoding spill files."),
+			NULL
+		},
+		&logical_decoding_spill_compression,
+		REORDER_BUFFER_NO_COMPRESSION, logical_decoding_spill_compression_options,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, 0, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 9ec9f97e92..a3a35c4ad8 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -335,6 +335,7 @@
 #wal_sender_timeout = 60s	# in milliseconds; 0 disables
 #track_commit_timestamp = off	# collect timestamp of transaction commit
 				# (change requires restart)
+#logical_decoding_spill_compression = off	# Compress decoded changes spilled on disk
 
 # - Primary Server -
 
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index 9aa8aea56f..d59e9543a8 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -19,6 +19,9 @@
 #include <lz4.h>
 #endif
 
+/* GUC support */
+extern PGDLLIMPORT int logical_decoding_spill_compression;
+
 /* ReorderBuffer on disk compression algorithms */
 typedef enum ReorderBufferCompressionMethod
 {
-- 
2.43.0

v2-0005-Compress-ReorderBuffer-spill-files-using-ZSTD.patchapplication/octet-stream; name=v2-0005-Compress-ReorderBuffer-spill-files-using-ZSTD.patchDownload
From 79753deeef3ebf971df70dbaa974c9ec8a1f74b6 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Fri, 5 Jul 2024 05:25:48 -0700
Subject: [PATCH 5/7] Compress ReorderBuffer spill files using ZSTD

---
 doc/src/sgml/config.sgml                      |   4 +-
 .../logical/reorderbuffer_compression.c       | 364 ++++++++++++++++++
 src/backend/utils/misc/guc_tables.c           |   3 +
 .../replication/reorderbuffer_compression.h   |  39 ++
 4 files changed, 409 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2697ebc435..7629e10f45 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2019,7 +2019,9 @@ include_dir 'conf.d'
         parameter has not effect if there is no data to spill on disk.
         The supported methods are <literal>pglz</literal>, <literal>lz4</literal> (if
         <productname>PostgreSQL</productname> was compiled with
-        <option>--with-lz4</option>) and <literal>off></literal>.
+        <option>--with-lz4</option>), <literal>zstd</literal> (if
+        <productname>PostgreSQL</productname> was compiled with
+        <option>--with-zstd</option>) and <literal>off</literal>.
         The default value is <literal>off</literal>.
         Only superusers and users with the appropriate <literal>SET</literal>
         privilege can change this setting.
diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index a05393cc61..9bda286cb8 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -19,6 +19,10 @@
 #include <lz4.h>
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
+
 #include "replication/reorderbuffer_compression.h"
 
 #define NO_LZ4_SUPPORT() \
@@ -27,6 +31,12 @@
 			 errmsg("compression method lz4 not supported"), \
 			 errdetail("This functionality requires the server to be built with lz4 support.")))
 
+#define NO_ZSTD_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method zstd not supported"), \
+			 errdetail("This functionality requires the server to be built with zstd support.")))
+
 /*
  * Allocate a new LZ4StreamingCompressorState.
  */
@@ -303,6 +313,309 @@ lz4_DecompressData(char *src, Size src_size, char **dst, Size dst_size)
 #endif
 }
 
+/*
+ * Allocate a new ZSTDStreamingCompressorState.
+ */
+static void *
+zstd_NewCompressorState(MemoryContext context)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+	return NULL;				/* keep compiler quiet */
+#else
+	ZSTDStreamingCompressorState *cstate;
+
+	cstate = (ZSTDStreamingCompressorState *)
+		MemoryContextAlloc(context, sizeof(ZSTDStreamingCompressorState));
+
+	/*
+	 * We do not allocate ZSTD buffers and contexts at this point because we
+	 * have no guarantee that we will need them later. Let's allocate only when
+	 * we are about to use them.
+	 */
+	cstate->zstd_c_ctx = NULL;
+	cstate->zstd_c_in_buf = NULL;
+	cstate->zstd_c_in_buf_size = 0;
+	cstate->zstd_c_out_buf = NULL;
+	cstate->zstd_c_out_buf_size = 0;
+	cstate->zstd_frame_size = 0;
+	cstate->zstd_d_ctx = NULL;
+	cstate->zstd_d_in_buf = NULL;
+	cstate->zstd_d_in_buf_size = 0;
+	cstate->zstd_d_out_buf = NULL;
+	cstate->zstd_d_out_buf_size = 0;
+
+	return (void *) cstate;
+#endif
+}
+
+/*
+ * Free ZSTD memory resources and the compressor state.
+ */
+static void
+zstd_FreeCompressorState(MemoryContext context, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext;
+
+	if (compressor_state == NULL)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+
+	if (cstate->zstd_c_ctx != NULL)
+	{
+		/* Compressor state was used for compression */
+		pfree(cstate->zstd_c_in_buf);
+		pfree(cstate->zstd_c_out_buf);
+		ZSTD_freeCCtx(cstate->zstd_c_ctx);
+	}
+	if (cstate->zstd_d_ctx != NULL)
+	{
+		/* Compressor state was used for decompression */
+		pfree(cstate->zstd_d_in_buf);
+		pfree(cstate->zstd_d_out_buf);
+		ZSTD_freeDCtx(cstate->zstd_d_ctx);
+	}
+
+	pfree(compressor_state);
+
+	MemoryContextSwitchTo(oldcontext);
+#endif
+}
+
+#ifdef USE_ZSTD
+/*
+ * Allocate ZSTD compression buffers and create the ZSTD compression context.
+ */
+static void
+zstd_CreateStreamCompressorState(MemoryContext context, void *compressor_state)
+{
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	cstate->zstd_c_in_buf_size = ZSTD_CStreamInSize();
+	cstate->zstd_c_in_buf = (char *) palloc0(cstate->zstd_c_in_buf_size);
+	cstate->zstd_c_out_buf_size = ZSTD_CStreamOutSize();
+	cstate->zstd_c_out_buf = (char *) palloc0(cstate->zstd_c_out_buf_size);
+	cstate->zstd_c_ctx = ZSTD_createCCtx();
+
+	if (cstate->zstd_c_ctx == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("could not create ZSTD compression context")));
+
+	/* Set compression level */
+	ZSTD_CCtx_setParameter(cstate->zstd_c_ctx, ZSTD_c_compressionLevel,
+						   ZSTD_COMPRESSION_LEVEL);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+#ifdef USE_ZSTD
+/*
+ * Allocate ZSTD decompression buffers and create the ZSTD decompression
+ * context.
+ */
+static void
+zstd_CreateStreamDecodeCompressorState(MemoryContext context, void *compressor_state)
+{
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	cstate->zstd_d_in_buf_size = ZSTD_DStreamInSize();
+	cstate->zstd_d_in_buf = (char *) palloc0(cstate->zstd_d_in_buf_size);
+	cstate->zstd_d_out_buf_size = ZSTD_DStreamOutSize();
+	cstate->zstd_d_out_buf = (char *) palloc0(cstate->zstd_d_out_buf_size);
+	cstate->zstd_d_ctx = ZSTD_createDCtx();
+
+	if (cstate->zstd_d_ctx == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("could not create ZSTD decompression context")));
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+/*
+ * Data compression using ZSTD streaming API.
+ */
+static void
+zstd_StreamingCompressData(MemoryContext context, char *src, Size src_size,
+						   char **dst, Size *dst_size, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+	/* Size of remaining data to be copied from src into ZSTD input buffer */
+	Size		toCpy = src_size;
+	char	   *dst_data;
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	/* Allocate ZSTD buffers and context */
+	if (cstate->zstd_c_ctx == NULL)
+		zstd_CreateStreamCompressorState(context, compressor_state);
+
+	/* Allocate memory that will be used to store compressed data */
+	*dst = (char *) palloc0(ZSTD_compressBound(src_size));
+
+	dst_data = *dst;
+	*dst_size = 0;
+
+	/*
+	 * ZSTD streaming compression works with chunks: the source data needs to
+	 * be splitted out in chunks, each of them is then copied into ZSTD input
+	 * buffer.
+	 * For each chunk, we proceed with compression. Streaming compression is
+	 * not intended to compress the whole input chunk, so we have the call
+	 * ZSTD_compressStream2() multiple times until the entire chunk is
+	 * consumed.
+	 */
+	while (toCpy > 0)
+	{
+		/* Are we on the last chunk? */
+		bool		last_chunk = (toCpy < cstate->zstd_c_in_buf_size);
+		/* Size of the data copied into ZSTD input buffer */
+		Size		cpySize = last_chunk ? toCpy : cstate->zstd_c_in_buf_size;
+		bool		finished = false;
+		ZSTD_inBuffer input;
+		ZSTD_EndDirective mode = last_chunk ? ZSTD_e_flush : ZSTD_e_continue;
+
+		/* Copy data from src into ZSTD input buffer */
+		memcpy(cstate->zstd_c_in_buf, src, cpySize);
+
+		/*
+		 * Close the frame when we are on the last chunk and we've reached max
+		 * frame size.
+		 */
+		if (last_chunk && (cstate->zstd_frame_size > ZSTD_MAX_FRAME_SIZE))
+		{
+			mode = ZSTD_e_end;
+			cstate->zstd_frame_size = 0;
+		}
+
+		cstate->zstd_frame_size += cpySize;
+
+		input.src = cstate->zstd_c_in_buf;
+		input.size = cpySize;
+		input.pos = 0;
+
+		do
+		{
+			Size		remaining;
+			ZSTD_outBuffer output;
+
+			output.dst = cstate->zstd_c_out_buf;
+			output.size = cstate->zstd_c_out_buf_size;
+			output.pos = 0;
+
+			remaining = ZSTD_compressStream2(cstate->zstd_c_ctx, &output,
+											 &input, mode);
+
+			if (ZSTD_isError(remaining))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("ZSTD compression failed")));
+
+			/* Copy back compressed data from ZSTD output buffer */
+			memcpy(dst_data, (char *) cstate->zstd_c_out_buf, output.pos);
+
+			dst_data += output.pos;
+			*dst_size += output.pos;
+
+			/*
+			 * Compression is done when we are working on the last chunk and
+			 * there is nothing left to compress, or, when we reach the end of
+			 * the chunk.
+			 */
+			finished = last_chunk ? (remaining == 0) : (input.pos == input.size);
+		} while (!finished);
+
+		src += cpySize;
+		toCpy -= cpySize;
+	}
+#endif
+}
+
+/*
+ * Data decompression using ZSTD streaming API.
+ */
+static void
+zstd_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
+							char **dst, Size dst_size, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+	/* Size of remaining data to be copied from src into ZSTD input buffer */
+	Size		toCpy = src_size;
+	char	   *dst_data;
+	Size		decBytes = 0;	/* Size of decompressed data */
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	/* Allocate ZSTD buffers and context */
+	if (cstate->zstd_d_ctx == NULL)
+		zstd_CreateStreamDecodeCompressorState(context, compressor_state);
+
+	/* Allocate memory that will be used to store decompressed data */
+	*dst = (char *) palloc0(dst_size);
+
+	dst_data = *dst;
+
+	while (toCpy > 0)
+	{
+		ZSTD_inBuffer input;
+		Size		cpySize = (toCpy > cstate->zstd_d_in_buf_size) ? cstate->zstd_d_in_buf_size : toCpy;
+
+		/* Copy data from src into ZSTD input buffer */
+		memcpy(cstate->zstd_d_in_buf, src, cpySize);
+
+		input.src = cstate->zstd_d_in_buf;
+		input.size = cpySize;
+		input.pos = 0;
+
+		while (input.pos < input.size)
+		{
+			ZSTD_outBuffer output;
+			Size		ret;
+
+			output.dst = cstate->zstd_d_out_buf;
+			output.size = cstate->zstd_d_out_buf_size;
+			output.pos = 0;
+
+			ret = ZSTD_decompressStream(cstate->zstd_d_ctx, &output , &input);
+
+			if (ZSTD_isError(ret))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("ZSTD decompression failed")));
+
+			/* Copy back compressed data from ZSTD output buffer */
+			memcpy(dst_data, (char *) cstate->zstd_d_out_buf, output.pos);
+
+			dst_data += output.pos;
+			decBytes += output.pos;
+		}
+
+		src += cpySize;
+		toCpy -= cpySize;
+	}
+
+	Assert(dst_size == decBytes);
+#endif
+}
+
 /*
  * Allocate a new Compressor State, depending on the compression method.
  */
@@ -314,6 +627,9 @@ ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
 		case REORDER_BUFFER_LZ4_COMPRESSION:
 			return lz4_NewCompressorState(context);
 			break;
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+			return zstd_NewCompressorState(context);
+			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
 		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
@@ -335,6 +651,9 @@ ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
 		case REORDER_BUFFER_LZ4_COMPRESSION:
 			return lz4_FreeCompressorState(context, compressor_state);
 			break;
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+			return zstd_FreeCompressorState(context, compressor_state);
+			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
 		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
@@ -459,6 +778,35 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 
 			pfree(dst);
 
+			break;
+		}
+		/* ZSTD Compression */
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+		{
+			char	   *dst = NULL;
+			Size		dst_size = 0;
+			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+			Size		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+
+			/* Use ZSTD streaming compression */
+			zstd_StreamingCompressData(rb->context, src, src_size, &dst,
+									   &dst_size, compressor_state);
+
+			ReorderBufferReserve(rb, (dst_size + sizeof(ReorderBufferDiskHeader)));
+
+			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+			hdr->comp_strat = REORDER_BUFFER_STRAT_ZSTD_STREAMING;
+			hdr->size = dst_size + sizeof(ReorderBufferDiskHeader);
+			hdr->raw_size = src_size;
+
+			*header = hdr;
+
+			/* Copy back compressed data into the ReorderBuffer */
+			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+				   dst_size);
+
+			pfree(dst);
+
 			break;
 		}
 	}
@@ -553,6 +901,22 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 							 errmsg_internal("compressed PGLZ data is corrupted")));
 				break;
 			}
+		/* ZSTD streaming decompression */
+		case REORDER_BUFFER_STRAT_ZSTD_STREAMING:
+			{
+				char	   *buf;
+				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
+				Size		buf_size = header->raw_size;
+
+				zstd_StreamingDecompressData(rb->context, data, src_size, &buf,
+											 buf_size, compressor_state);
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   buf, buf_size);
+				pfree(buf);
+				break;
+			}
 		default:
 			/* Other compression methods not yet supported */
 			break;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 0209a3a517..16023fb686 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -488,6 +488,9 @@ static const struct config_enum_entry wal_compression_options[] = {
 static const struct config_enum_entry logical_decoding_spill_compression_options[] = {
 #ifdef  USE_LZ4
 	{"lz4", REORDER_BUFFER_LZ4_COMPRESSION, false},
+#endif
+#ifdef USE_ZSTD
+	{"zstd", REORDER_BUFFER_ZSTD_COMPRESSION, false},
 #endif
 	{"pglz", REORDER_BUFFER_PGLZ_COMPRESSION, false},
 	{"off", REORDER_BUFFER_NO_COMPRESSION, false},
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index ea77ed1358..d3a847b770 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -19,6 +19,10 @@
 #include <lz4.h>
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
+
 /* GUC support */
 extern PGDLLIMPORT int logical_decoding_spill_compression;
 
@@ -28,6 +32,7 @@ typedef enum ReorderBufferCompressionMethod
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_LZ4_COMPRESSION,
 	REORDER_BUFFER_PGLZ_COMPRESSION,
+	REORDER_BUFFER_ZSTD_COMPRESSION,
 } ReorderBufferCompressionMethod;
 
 /*
@@ -39,6 +44,7 @@ typedef enum ReorderBufferCompressionStrategy
 	REORDER_BUFFER_STRAT_LZ4_STREAMING,
 	REORDER_BUFFER_STRAT_LZ4_REGULAR,
 	REORDER_BUFFER_STRAT_PGLZ,
+	REORDER_BUFFER_STRAT_ZSTD_STREAMING,
 } ReorderBufferCompressionStrategy;
 
 /* Disk serialization support datastructures */
@@ -84,6 +90,39 @@ typedef struct LZ4StreamingCompressorState {
 #define lz4_CanDoStreamingCompression(s) (false)
 #endif
 
+#ifdef USE_ZSTD
+/*
+ * Low compression level provides high compression speed and decent compression
+ * rate. Minimum level is 1, maximum is 22.
+ */
+#define ZSTD_COMPRESSION_LEVEL 1
+
+/*
+ * Maximum volume of data encoded in the current ZSTD frame. When this
+ * threshold is reached then we close the current frame and start a new one.
+ */
+#define ZSTD_MAX_FRAME_SIZE (64 * 1024)
+
+/*
+ * ZSTD streaming compression/decompression handlers and buffers.
+ */
+typedef struct ZSTDStreamingCompressorState {
+	/* Compression */
+	ZSTD_CCtx  *zstd_c_ctx;
+	Size		zstd_c_in_buf_size;
+	char	   *zstd_c_in_buf;
+	Size		zstd_c_out_buf_size;
+	char	   *zstd_c_out_buf;
+	Size		zstd_frame_size;
+	/* Decompression */
+	ZSTD_DCtx  *zstd_d_ctx;
+	Size		zstd_d_in_buf_size;
+	char	   *zstd_d_in_buf;
+	Size		zstd_d_out_buf_size;
+	char	   *zstd_d_out_buf;
+} ZSTDStreamingCompressorState;
+#endif
+
 extern void *ReorderBufferNewCompressorState(MemoryContext context,
 											 int compression_method);
 extern void ReorderBufferFreeCompressorState(MemoryContext context,
-- 
2.43.0

v2-0004-Compress-ReorderBuffer-spill-files-using-PGLZ.patchapplication/octet-stream; name=v2-0004-Compress-ReorderBuffer-spill-files-using-PGLZ.patchDownload
From d7daa87489b1cc2583d643db3396872f902a2d15 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Tue, 25 Jun 2024 05:34:43 -0700
Subject: [PATCH 4/7] Compress ReorderBuffer spill files using PGLZ

---
 doc/src/sgml/config.sgml                      |  2 +-
 .../logical/reorderbuffer_compression.c       | 58 +++++++++++++++++++
 src/backend/utils/misc/guc_tables.c           |  5 ++
 .../replication/reorderbuffer_compression.h   |  2 +
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 955f4f4a8b..2697ebc435 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2017,7 +2017,7 @@ include_dir 'conf.d'
         set to <varname>on</varname> or <varname>parallel</varname>, then
         the transaction are not fully decoded on the publisher, then, this
         parameter has not effect if there is no data to spill on disk.
-        The supported methods are <literal>lz4</literal> (if
+        The supported methods are <literal>pglz</literal>, <literal>lz4</literal> (if
         <productname>PostgreSQL</productname> was compiled with
         <option>--with-lz4</option>) and <literal>off></literal>.
         The default value is <literal>off</literal>.
diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index 77f5c76929..a05393cc61 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "common/pg_lzcompress.h"
+
 #ifdef USE_LZ4
 #include <lz4.h>
 #endif
@@ -313,6 +315,7 @@ ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
 			return lz4_NewCompressorState(context);
 			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
 			return NULL;
 			break;
@@ -333,6 +336,7 @@ ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
 			return lz4_FreeCompressorState(context, compressor_state);
 			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
 			break;
 	}
@@ -421,6 +425,40 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 
 			pfree(dst);
 
+			break;
+		}
+		/* PGLZ compression */
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
+		{
+			int32		dst_size = 0;
+			char	   *dst = NULL;
+			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+			int32		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+			int32		max_size = PGLZ_MAX_OUTPUT(src_size);
+
+			dst = (char *) palloc0(max_size);
+			dst_size = pglz_compress(src, src_size, dst, PGLZ_strategy_always);
+
+			if (dst_size < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("PGLZ compression failed")));
+
+			ReorderBufferReserve(rb, (Size) (dst_size + sizeof(ReorderBufferDiskHeader)));
+
+			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+			hdr->comp_strat = REORDER_BUFFER_STRAT_PGLZ;
+			hdr->size = (Size) dst_size + sizeof(ReorderBufferDiskHeader);
+			hdr->raw_size = (Size) src_size;
+
+			*header = hdr;
+
+			/* Copy back compressed data into the ReorderBuffer */
+			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+				   dst_size);
+
+			pfree(dst);
+
 			break;
 		}
 	}
@@ -495,6 +533,26 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 				 */
 				break;
 			}
+		/* PGLZ decompression */
+		case REORDER_BUFFER_STRAT_PGLZ:
+			{
+				char	   *buf;
+				int32		src_size = (int32) header->size - sizeof(ReorderBufferDiskHeader);
+				int32		buf_size = (int32) header->raw_size;
+				int32		decBytes;
+
+				/* Decompress data directly into the ReorderBuffer */
+				buf = (char *) rb->outbuf;
+				buf += sizeof(ReorderBufferDiskHeader);
+
+				decBytes = pglz_decompress(data, src_size, buf, buf_size, false);
+
+				if (decBytes < 0)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("compressed PGLZ data is corrupted")));
+				break;
+			}
 		default:
 			/* Other compression methods not yet supported */
 			break;
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 27ce376fd4..0209a3a517 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -489,10 +489,15 @@ static const struct config_enum_entry logical_decoding_spill_compression_options
 #ifdef  USE_LZ4
 	{"lz4", REORDER_BUFFER_LZ4_COMPRESSION, false},
 #endif
+	{"pglz", REORDER_BUFFER_PGLZ_COMPRESSION, false},
 	{"off", REORDER_BUFFER_NO_COMPRESSION, false},
+	{"on", REORDER_BUFFER_PGLZ_COMPRESSION, false},
 	{"false", REORDER_BUFFER_NO_COMPRESSION, true},
+	{"true", REORDER_BUFFER_PGLZ_COMPRESSION, true},
 	{"no", REORDER_BUFFER_NO_COMPRESSION, true},
+	{"yes", REORDER_BUFFER_PGLZ_COMPRESSION, true},
 	{"0", REORDER_BUFFER_NO_COMPRESSION, true},
+	{"1", REORDER_BUFFER_PGLZ_COMPRESSION, true},
 	{NULL, 0, false}
 };
 
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index d59e9543a8..ea77ed1358 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -27,6 +27,7 @@ typedef enum ReorderBufferCompressionMethod
 {
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_LZ4_COMPRESSION,
+	REORDER_BUFFER_PGLZ_COMPRESSION,
 } ReorderBufferCompressionMethod;
 
 /*
@@ -37,6 +38,7 @@ typedef enum ReorderBufferCompressionStrategy
 	REORDER_BUFFER_STRAT_UNCOMPRESSED,
 	REORDER_BUFFER_STRAT_LZ4_STREAMING,
 	REORDER_BUFFER_STRAT_LZ4_REGULAR,
+	REORDER_BUFFER_STRAT_PGLZ,
 } ReorderBufferCompressionStrategy;
 
 /* Disk serialization support datastructures */
-- 
2.43.0

v2-0003-Fix-spill_bytes-counter.patchapplication/octet-stream; name=v2-0003-Fix-spill_bytes-counter.patchDownload
From f80182d643b23ff2706dac87e2d0a03e31789c2d Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Sun, 23 Jun 2024 14:42:04 -0700
Subject: [PATCH 3/7] Fix spill_bytes counter

The spill_bytes counter considers now the fact that decoded changes
are spilled on disk compressed.
---
 src/backend/replication/logical/reorderbuffer.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index ef2cd8bdf3..b4a42c615f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -247,7 +247,7 @@ static void ReorderBufferExecuteInvalidations(uint32 nmsgs, SharedInvalidationMe
  */
 static void ReorderBufferCheckMemoryLimit(ReorderBuffer *rb);
 static void ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+static Size ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										 int fd, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										TXNEntryFile *file, XLogSegNo *segno);
@@ -3690,6 +3690,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	XLogSegNo	curOpenSegNo = 0;
 	Size		spilled = 0;
 	Size		size = txn->size;
+	Size		spillBytes = 0;
 
 	elog(DEBUG2, "spill %u changes in XID %u to disk",
 		 (uint32) txn->nentries_mem, txn->xid);
@@ -3741,7 +3742,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 						 errmsg("could not open file \"%s\": %m", path)));
 		}
 
-		ReorderBufferSerializeChange(rb, txn, fd, change);
+		spillBytes += ReorderBufferSerializeChange(rb, txn, fd, change);
 		dlist_delete(&change->node);
 		ReorderBufferReturnChange(rb, change, false);
 
@@ -3755,7 +3756,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	if (spilled)
 	{
 		rb->spillCount += 1;
-		rb->spillBytes += size;
+		rb->spillBytes += spillBytes;
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
@@ -3776,7 +3777,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 /*
  * Serialize individual change to disk.
  */
-static void
+static Size
 ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 							 int fd, ReorderBufferChange *change)
 {
@@ -3989,6 +3990,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	 */
 	if (txn->final_lsn < change->lsn)
 		txn->final_lsn = change->lsn;
+
+	/* Return data size written to disk */
+	return disk_hdr->size;
 }
 
 /* Returns true, if the output plugin supports streaming, false, otherwise. */
-- 
2.43.0

v2-0007-WIP-Add-the-subscription-option-spill_compression.patchapplication/octet-stream; name=v2-0007-WIP-Add-the-subscription-option-spill_compression.patchDownload
From a77ec048092ccca0ce574f3ad6c8649d48faad01 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Mon, 15 Jul 2024 03:07:38 -0700
Subject: [PATCH 7/7] WIP Add the subscription option spill_compression

---
 src/backend/catalog/pg_subscription.c      |  10 ++
 src/backend/catalog/system_views.sql       |   3 +-
 src/backend/commands/subscriptioncmds.c    |  37 ++++-
 src/bin/pg_dump/pg_dump.c                  |  21 ++-
 src/bin/pg_dump/pg_dump.h                  |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl           |   8 +-
 src/bin/psql/describe.c                    |   7 +-
 src/include/catalog/pg_subscription.h      |   4 +
 src/test/regress/expected/subscription.out | 157 +++++++++++----------
 src/test/regress/sql/subscription.sql      |   4 +
 10 files changed, 166 insertions(+), 86 deletions(-)

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 9efc9159f2..b0a08254db 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -110,6 +110,16 @@ GetSubscription(Oid subid, bool missing_ok)
 	/* Is the subscription owner a superuser? */
 	sub->ownersuperuser = superuser_arg(sub->owner);
 
+	/* Get splillcompression */
+	datum = SysCacheGetAttr(SUBSCRIPTIONOID,
+							tup,
+							Anum_pg_subscription_subspillcompression,
+							&isnull);
+	if (!isnull)
+		sub->spillcompression = TextDatumGetCString(datum);
+	else
+		sub->spillcompression = NULL;
+
 	ReleaseSysCache(tup);
 
 	return sub;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 19cabc9a47..c84c283b9f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1356,7 +1356,8 @@ REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
               subbinary, substream, subtwophasestate, subdisableonerr,
 			  subpasswordrequired, subrunasowner, subfailover,
-              subslotname, subsynccommit, subpublications, suborigin)
+              subslotname, subsynccommit, subpublications, suborigin,
+              subspillcompression)
     ON pg_subscription TO public;
 
 CREATE VIEW pg_stat_subscription_stats AS
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 16d83b3253..a19b256d96 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -72,6 +72,7 @@
 #define SUBOPT_FAILOVER				0x00002000
 #define SUBOPT_LSN					0x00004000
 #define SUBOPT_ORIGIN				0x00008000
+#define SUBOPT_SPILL_COMPRESSION	0x00010000
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -99,6 +100,7 @@ typedef struct SubOpts
 	bool		failover;
 	char	   *origin;
 	XLogRecPtr	lsn;
+	char	   *spill_compression;
 } SubOpts;
 
 static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
@@ -366,6 +368,24 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_LSN;
 			opts->lsn = lsn;
 		}
+		else if (IsSet(supported_opts, SUBOPT_SPILL_COMPRESSION) &&
+				 strcmp(defel->defname, "spill_compression") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_SPILL_COMPRESSION))
+				errorConflictingDefElem(defel, pstate);
+
+			opts->specified_opts |= SUBOPT_SPILL_COMPRESSION;
+			opts->spill_compression = defGetString(defel);
+
+			/*
+			 * Test if the given value is valid for
+			 * logical_decoding_spill_compression GUC.
+			 */
+			(void) set_config_option("logical_decoding_spill_compression",
+									 opts->spill_compression, PGC_BACKEND,
+									 PGC_S_TEST, GUC_ACTION_SET, false, 0,
+									 false);
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -603,7 +623,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 					  SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
 					  SUBOPT_STREAMING | SUBOPT_TWOPHASE_COMMIT |
 					  SUBOPT_DISABLE_ON_ERR | SUBOPT_PASSWORD_REQUIRED |
-					  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN);
+					  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
+					  SUBOPT_SPILL_COMPRESSION);
 	parse_subscription_options(pstate, stmt->options, supported_opts, &opts);
 
 	/*
@@ -723,6 +744,11 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 		publicationListToArray(publications);
 	values[Anum_pg_subscription_suborigin - 1] =
 		CStringGetTextDatum(opts.origin);
+	if (opts.spill_compression)
+		values[Anum_pg_subscription_subspillcompression - 1] =
+			CStringGetTextDatum(opts.spill_compression);
+	else
+		nulls[Anum_pg_subscription_subspillcompression - 1] = true;
 
 	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
 
@@ -1148,7 +1174,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								  SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
 								  SUBOPT_PASSWORD_REQUIRED |
 								  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
-								  SUBOPT_ORIGIN);
+								  SUBOPT_ORIGIN | SUBOPT_SPILL_COMPRESSION);
 
 				parse_subscription_options(pstate, stmt->options,
 										   supported_opts, &opts);
@@ -1265,6 +1291,13 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 					replaces[Anum_pg_subscription_suborigin - 1] = true;
 				}
 
+				if (opts.spill_compression)
+				{
+					values[Anum_pg_subscription_subspillcompression - 1] =
+						CStringGetTextDatum(opts.spill_compression);
+					replaces[Anum_pg_subscription_subspillcompression - 1] = true;
+				}
+
 				update_tuple = true;
 				break;
 			}
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index b8b1888bd3..aed50f1674 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4760,6 +4760,7 @@ getSubscriptions(Archive *fout)
 	int			i_suboriginremotelsn;
 	int			i_subenabled;
 	int			i_subfailover;
+	int			i_subspillcompression;
 	int			i,
 				ntups;
 
@@ -4832,10 +4833,17 @@ getSubscriptions(Archive *fout)
 
 	if (fout->remoteVersion >= 170000)
 		appendPQExpBufferStr(query,
-							 " s.subfailover\n");
+							 " s.subfailover,\n");
 	else
 		appendPQExpBuffer(query,
-						  " false AS subfailover\n");
+						  " false AS subfailover,\n");
+
+	if (fout->remoteVersion >= 180000)
+		appendPQExpBufferStr(query,
+							 " s.subspillcompression\n");
+	else
+		appendPQExpBuffer(query,
+						  " 'off' AS subspillcompression\n");
 
 	appendPQExpBufferStr(query,
 						 "FROM pg_subscription s\n");
@@ -4875,6 +4883,7 @@ getSubscriptions(Archive *fout)
 	i_suboriginremotelsn = PQfnumber(res, "suboriginremotelsn");
 	i_subenabled = PQfnumber(res, "subenabled");
 	i_subfailover = PQfnumber(res, "subfailover");
+	i_subspillcompression = PQfnumber(res, "subspillcompression");
 
 	subinfo = pg_malloc(ntups * sizeof(SubscriptionInfo));
 
@@ -4921,6 +4930,11 @@ getSubscriptions(Archive *fout)
 			pg_strdup(PQgetvalue(res, i, i_subenabled));
 		subinfo[i].subfailover =
 			pg_strdup(PQgetvalue(res, i, i_subfailover));
+		if (PQgetisnull(res, i, i_subspillcompression))
+			subinfo[i].subspillcompression = NULL;
+		else
+			subinfo[i].subspillcompression =
+				pg_strdup(PQgetvalue(res, i, i_subspillcompression));
 
 		/* Decide whether we want to dump it */
 		selectDumpableObject(&(subinfo[i].dobj), fout);
@@ -5167,6 +5181,9 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	if (pg_strcasecmp(subinfo->suborigin, LOGICALREP_ORIGIN_ANY) != 0)
 		appendPQExpBuffer(query, ", origin = %s", subinfo->suborigin);
 
+	if (subinfo->subspillcompression)
+		appendPQExpBuffer(query, ", spill_compression = %s", fmtId(subinfo->subspillcompression));
+
 	appendPQExpBufferStr(query, ");\n");
 
 	/*
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 4b2e5870a9..12588070f4 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -671,6 +671,7 @@ typedef struct _SubscriptionInfo
 	char	   *suborigin;
 	char	   *suboriginremotelsn;
 	char	   *subfailover;
+	char	   *subspillcompression;
 } SubscriptionInfo;
 
 /*
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index d3dd8784d6..abf1d76d09 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -2965,9 +2965,9 @@ my %tests = (
 		create_order => 50,
 		create_sql => 'CREATE SUBSCRIPTION sub2
 						 CONNECTION \'dbname=doesnotexist\' PUBLICATION pub1
-						 WITH (connect = false, origin = none);',
+						 WITH (connect = false, origin = none, spill_compression = off);',
 		regexp => qr/^
-			\QCREATE SUBSCRIPTION sub2 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub2', origin = none);\E
+			\QCREATE SUBSCRIPTION sub2 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub2', origin = none, spill_compression = off);\E
 			/xm,
 		like => { %full_runs, section_post_data => 1, },
 	},
@@ -2976,9 +2976,9 @@ my %tests = (
 		create_order => 50,
 		create_sql => 'CREATE SUBSCRIPTION sub3
 						 CONNECTION \'dbname=doesnotexist\' PUBLICATION pub1
-						 WITH (connect = false, origin = any);',
+						 WITH (connect = false, origin = any, spill_compression = pglz);',
 		regexp => qr/^
-			\QCREATE SUBSCRIPTION sub3 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub3');\E
+			\QCREATE SUBSCRIPTION sub3 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub3', spill_compression = pglz);\E
 			/xm,
 		like => { %full_runs, section_post_data => 1, },
 	},
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 7c9a1f234c..495a065849 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -6539,7 +6539,7 @@ describeSubscriptions(const char *pattern, bool verbose)
 	printQueryOpt myopt = pset.popt;
 	static const bool translate_columns[] = {false, false, false, false,
 		false, false, false, false, false, false, false, false, false, false,
-	false};
+	false, false};
 
 	if (pset.sversion < 100000)
 	{
@@ -6619,6 +6619,11 @@ describeSubscriptions(const char *pattern, bool verbose)
 			appendPQExpBuffer(&buf,
 							  ", subskiplsn AS \"%s\"\n",
 							  gettext_noop("Skip LSN"));
+
+		if (pset.sversion >= 180000)
+			appendPQExpBuffer(&buf,
+							  ", subspillcompression AS \"%s\"\n",
+							  gettext_noop("Spill files compression"));
 	}
 
 	/* Only display subscriptions in current database. */
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index 0aa14ec4a2..63bc527083 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -113,6 +113,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
 
 	/* Only publish data originating from the specified origin */
 	text		suborigin BKI_DEFAULT(LOGICALREP_ORIGIN_ANY);
+
+	/* Spill files compression algorithm */
+	text		subspillcompression BKI_FORCE_NULL;
 #endif
 } FormData_pg_subscription;
 
@@ -157,6 +160,7 @@ typedef struct Subscription
 	List	   *publications;	/* List of publication names to subscribe to */
 	char	   *origin;			/* Only publish data originating from the
 								 * specified origin */
+	char	   *spillcompression;	/* Spill files compression algorithm */
 } Subscription;
 
 /* Disallow streaming in-progress transactions. */
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 5c2f1ee517..5e9b55baf9 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -116,18 +116,18 @@ CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PU
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+ regress_testsub4
-                                                                                                                 List of subscriptions
-       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
-------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | none   | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                              List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | none   | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub4 SET (origin = any);
 \dRs+ regress_testsub4
-                                                                                                                 List of subscriptions
-       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
-------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                              List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub3;
@@ -145,10 +145,10 @@ ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 ERROR:  invalid connection string syntax: missing "=" after "foobar" in connection info string
 
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET PUBLICATION testpub2, testpub3 WITH (refresh = false);
@@ -157,10 +157,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = 'newname');
 ALTER SUBSCRIPTION regress_testsub SET (password_required = false);
 ALTER SUBSCRIPTION regress_testsub SET (run_as_owner = true);
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | f                 | t             | f        | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | f                 | t             | f        | off                | dbname=regress_doesnotexist2 | 0/0      | 
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (password_required = true);
@@ -176,10 +176,10 @@ ERROR:  unrecognized subscription parameter: "create_slot"
 -- ok
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/12345');
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/12345
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/12345  | 
 (1 row)
 
 -- ok - with lsn = NONE
@@ -188,10 +188,10 @@ ALTER SUBSCRIPTION regress_testsub SKIP (lsn = NONE);
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/0');
 ERROR:  invalid WAL location (LSN): 0/0
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/0      | 
 (1 row)
 
 BEGIN;
@@ -222,11 +222,16 @@ ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = local);
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 ERROR:  invalid value for parameter "synchronous_commit": "foobar"
 HINT:  Available values: local, remote_write, remote_apply, on, off.
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = pglz);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = off);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = foobar);
+ERROR:  invalid value for parameter "logical_decoding_spill_compression": "foobar"
+HINT:  Available values: lz4, zstd, pglz, off, on.
 \dRs+
-                                                                                                                       List of subscriptions
-        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
----------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | local              | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                    List of subscriptions
+        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+---------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | local              | dbname=regress_doesnotexist2 | 0/0      | off
 (1 row)
 
 -- rename back to keep the rest simple
@@ -255,19 +260,19 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | t      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | t      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (binary = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -279,27 +284,27 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = parallel);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 -- fail - publication already exists
@@ -314,10 +319,10 @@ ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refr
 ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refresh = false);
 ERROR:  publication "testpub1" is already in subscription "regress_testsub"
 \dRs+
-                                                                                                                        List of subscriptions
-      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                                     List of subscriptions
+      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 -- fail - publication used more than once
@@ -332,10 +337,10 @@ ERROR:  publication "testpub3" is not in subscription "regress_testsub"
 -- ok - delete publications
 ALTER SUBSCRIPTION regress_testsub DROP PUBLICATION testpub1, testpub2 WITH (refresh = false);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -371,10 +376,10 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 --fail - alter of two_phase option not supported.
@@ -383,10 +388,10 @@ ERROR:  unrecognized subscription parameter: "two_phase"
 -- but can alter streaming when two_phase enabled
 ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -396,10 +401,10 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -412,18 +417,18 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | t                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | t                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | 
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c..2d891b2c06 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -140,6 +140,10 @@ ALTER SUBSCRIPTION regress_testsub RENAME TO regress_testsub_foo;
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = local);
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = pglz);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = off);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = foobar);
+
 \dRs+
 
 -- rename back to keep the rest simple
-- 
2.43.0

v2-0006-Add-ReorderBuffer-ondisk-compression-tests.patchapplication/octet-stream; name=v2-0006-Add-ReorderBuffer-ondisk-compression-tests.patchDownload
From 78564cd0731586c99ce579b0c6f1b7eba76da06c Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Sat, 6 Jul 2024 07:51:54 -0700
Subject: [PATCH 6/7] Add ReorderBuffer ondisk compression tests

---
 src/test/subscription/Makefile                |   2 +
 src/test/subscription/meson.build             |   7 +-
 .../t/034_reorderbuffer_compression.pl        | 100 ++++++++++++++++++
 3 files changed, 108 insertions(+), 1 deletion(-)
 create mode 100644 src/test/subscription/t/034_reorderbuffer_compression.pl

diff --git a/src/test/subscription/Makefile b/src/test/subscription/Makefile
index ce1ca43009..9341f1493c 100644
--- a/src/test/subscription/Makefile
+++ b/src/test/subscription/Makefile
@@ -16,6 +16,8 @@ include $(top_builddir)/src/Makefile.global
 EXTRA_INSTALL = contrib/hstore
 
 export with_icu
+export with_lz4
+export with_zstd
 
 check:
 	$(prove_check)
diff --git a/src/test/subscription/meson.build b/src/test/subscription/meson.build
index c591cd7d61..772eeb817f 100644
--- a/src/test/subscription/meson.build
+++ b/src/test/subscription/meson.build
@@ -5,7 +5,11 @@ tests += {
   'sd': meson.current_source_dir(),
   'bd': meson.current_build_dir(),
   'tap': {
-    'env': {'with_icu': icu.found() ? 'yes' : 'no'},
+    'env': {
+      'with_icu': icu.found() ? 'yes' : 'no',
+      'with_lz4': lz4.found() ? 'yes' : 'no',
+      'with_zstd': zstd.found() ? 'yes' : 'no',
+    },
     'tests': [
       't/001_rep_changes.pl',
       't/002_types.pl',
@@ -40,6 +44,7 @@ tests += {
       't/031_column_list.pl',
       't/032_subscribe_use_index.pl',
       't/033_run_as_table_owner.pl',
+      't/034_reorderbuffer_compression.pl',
       't/100_bugs.pl',
     ],
   },
diff --git a/src/test/subscription/t/034_reorderbuffer_compression.pl b/src/test/subscription/t/034_reorderbuffer_compression.pl
new file mode 100644
index 0000000000..4d18b4e661
--- /dev/null
+++ b/src/test/subscription/t/034_reorderbuffer_compression.pl
@@ -0,0 +1,100 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test ReorderBuffer compression
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub test_reorderbuffer_compression
+{
+	my ($node_publisher, $node_subscriber, $appname, $compression) = @_;
+
+	# Set logical_decoding_spill_compression
+	$node_publisher->safe_psql('postgres',
+		"ALTER SYSTEM SET logical_decoding_spill_compression TO $compression");
+	$node_publisher->reload;
+
+	# Make sure the table is empty
+	$node_publisher->safe_psql('postgres', 'TRUNCATE test_tab');
+
+	# Reset replication slot stats
+	$node_publisher->safe_psql('postgres',
+		"SELECT pg_stat_reset_replication_slot('tap_sub')");
+
+	# Insert 1 million rows in the table
+	$node_publisher->safe_psql('postgres',
+		"INSERT INTO test_tab SELECT i, 'Message number #'||i::TEXT FROM generate_series(1, 1000000) as i"
+	);
+
+	$node_publisher->wait_for_catchup($appname);
+
+	# Check if table content is replicated
+	my $result =
+	  $node_subscriber->safe_psql('postgres',
+		"SELECT count(*) FROM test_tab");
+	is($result, qq(1000000), 'check data was copied to subscriber');
+
+	# Check if some changes were spilled on disk
+	my $res_stats =
+	  $node_publisher->safe_psql('postgres',
+		"SELECT spill_txns FROM pg_catalog.pg_stat_get_replication_slot('tap_sub');");
+	is($res_stats, qq(1), 'check if the transaction was spilled on disk');
+}
+
+# Create publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf',
+	'logical_decoding_work_mem = 64');
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$node_subscriber->init;
+$node_subscriber->start;
+
+# Setup structure on publisher
+$node_publisher->safe_psql('postgres',
+	"CREATE TABLE test_tab (a int primary key, b text)");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres',
+	"CREATE TABLE test_tab (a int primary key, b text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname = 'tap_sub';
+
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = off)"
+);
+
+test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+	'off');
+test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+	'pglz');
+
+SKIP:
+{
+	skip "LZ4 not supported by this build", 2 if ($ENV{with_lz4} ne 'yes');
+	test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+		'lz4');
+}
+
+SKIP:
+{
+	skip "ZSTD not supported by this build", 2 if ($ENV{with_zstd} ne 'yes');
+	test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+		'zstd');
+}
+
+$node_subscriber->stop;
+$node_publisher->stop;
+
+done_testing();
-- 
2.43.0

#18Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Julien Tachoires (#17)
Re: Compress ReorderBuffer spill files using LZ4

On 7/15/24 20:50, Julien Tachoires wrote:

Hi,

Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit :

Le ven. 7 juin 2024 à 05:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

That's right, reworking this patch in that sense.

Please find a new version of this patch adding support for LZ4, pglz
and ZSTD. It introduces the new GUC logical_decoding_spill_compression
which is used to set the compression method. In order to stay aligned
with the other server side GUCs related to compression methods
(wal_compression, default_toast_compression), the compression level is
not exposed to users.

Sounds reasonable. I wonder if it might be useful to allow specifying
the compression level in those places, but that's clearly not something
this patch needs to do.

The last patch of this set is still in WIP, it adds the machinery
required for setting the compression methods as a subscription option:
CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
I think there is a major problem with this approach: the logical
decoding context is tied to one replication slot, but multiple
subscriptions can use the same replication slot. How should this work
if 2 subscriptions want to use the same replication slot but different
compression methods?

Do we really support multiple subscriptions sharing the same slot? I
don't think we do, but maybe I'm missing something.

At this point, compression is only available for the changes spilled
on disk. It is still not clear to me if the compression of data
transiting through the streaming protocol should be addressed by this
patch set or by another one. Thought ?

I'd stick to only compressing the data spilled to disk. It might be
useful to compress the streamed data too, but why shouldn't we compress
the regular (non-streamed) transactions too? Yeah, it's more efficient
to compress larger chunks, but we can fit quite large transactions into
logical_decoding_work_mem without spilling.

FWIW I'd expect that to be handled at the libpq level - there's already
a patch for that, but I haven't checked if it would handle this. But
maybe more importantly, I think compressing streamed data might need to
handle some sort of negotiation of the compression algorithm, which
seems fairly complex.

To conclude, I'd leave this out of scope for this patch.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Julien Tachoires
julmon@gmail.com
In reply to: Tomas Vondra (#18)
Re: Compress ReorderBuffer spill files using LZ4

Le lun. 15 juil. 2024 à 12:28, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 7/15/24 20:50, Julien Tachoires wrote:

The last patch of this set is still in WIP, it adds the machinery
required for setting the compression methods as a subscription option:
CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
I think there is a major problem with this approach: the logical
decoding context is tied to one replication slot, but multiple
subscriptions can use the same replication slot. How should this work
if 2 subscriptions want to use the same replication slot but different
compression methods?

Do we really support multiple subscriptions sharing the same slot? I
don't think we do, but maybe I'm missing something.

You are right, it's not supported, the following error is raised in this case:
ERROR: replication slot "sub1" is active for PID 51735

I was distracted by the fact that nothing prevents the configuration
of multiple subscriptions sharing the same replication slot.

Thanks,

JT

#20Amit Kapila
amit.kapila16@gmail.com
In reply to: Tomas Vondra (#18)
Re: Compress ReorderBuffer spill files using LZ4

On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/15/24 20:50, Julien Tachoires wrote:

Hi,

Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit :

Le ven. 7 juin 2024 à 05:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

That's right, reworking this patch in that sense.

Please find a new version of this patch adding support for LZ4, pglz
and ZSTD. It introduces the new GUC logical_decoding_spill_compression
which is used to set the compression method. In order to stay aligned
with the other server side GUCs related to compression methods
(wal_compression, default_toast_compression), the compression level is
not exposed to users.

Sounds reasonable. I wonder if it might be useful to allow specifying
the compression level in those places, but that's clearly not something
this patch needs to do.

The last patch of this set is still in WIP, it adds the machinery
required for setting the compression methods as a subscription option:
CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
I think there is a major problem with this approach: the logical
decoding context is tied to one replication slot, but multiple
subscriptions can use the same replication slot. How should this work
if 2 subscriptions want to use the same replication slot but different
compression methods?

Do we really support multiple subscriptions sharing the same slot? I
don't think we do, but maybe I'm missing something.

At this point, compression is only available for the changes spilled
on disk. It is still not clear to me if the compression of data
transiting through the streaming protocol should be addressed by this
patch set or by another one. Thought ?

I'd stick to only compressing the data spilled to disk. It might be
useful to compress the streamed data too, but why shouldn't we compress
the regular (non-streamed) transactions too? Yeah, it's more efficient
to compress larger chunks, but we can fit quite large transactions into
logical_decoding_work_mem without spilling.

FWIW I'd expect that to be handled at the libpq level - there's already
a patch for that, but I haven't checked if it would handle this. But
maybe more importantly, I think compressing streamed data might need to
handle some sort of negotiation of the compression algorithm, which
seems fairly complex.

To conclude, I'd leave this out of scope for this patch.

Your point sounds reasonable to me. OTOH, if we want to support
compression for spill case then shouldn't there be a question how
frequent such an option would be required? Users currently have an
option to stream large transactions for parallel apply or otherwise in
which case no spilling is required. I feel sooner or later we will
make such behavior (streaming=parallel) as default, and then spilling
should happen in very few cases. Is it worth adding this new option
and GUC if that is true?

--
With Regards,
Amit Kapila.

#21Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Amit Kapila (#20)
Re: Compress ReorderBuffer spill files using LZ4

On 7/16/24 14:52, Amit Kapila wrote:

On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/15/24 20:50, Julien Tachoires wrote:

Hi,

Le ven. 7 juin 2024 à 06:18, Julien Tachoires <julmon@gmail.com> a écrit :

Le ven. 7 juin 2024 à 05:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :

On 6/6/24 12:58, Julien Tachoires wrote:

...

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?

That's right, reworking this patch in that sense.

Please find a new version of this patch adding support for LZ4, pglz
and ZSTD. It introduces the new GUC logical_decoding_spill_compression
which is used to set the compression method. In order to stay aligned
with the other server side GUCs related to compression methods
(wal_compression, default_toast_compression), the compression level is
not exposed to users.

Sounds reasonable. I wonder if it might be useful to allow specifying
the compression level in those places, but that's clearly not something
this patch needs to do.

The last patch of this set is still in WIP, it adds the machinery
required for setting the compression methods as a subscription option:
CREATE SUBSCRIPTION ... WITH (spill_compression = ...);
I think there is a major problem with this approach: the logical
decoding context is tied to one replication slot, but multiple
subscriptions can use the same replication slot. How should this work
if 2 subscriptions want to use the same replication slot but different
compression methods?

Do we really support multiple subscriptions sharing the same slot? I
don't think we do, but maybe I'm missing something.

At this point, compression is only available for the changes spilled
on disk. It is still not clear to me if the compression of data
transiting through the streaming protocol should be addressed by this
patch set or by another one. Thought ?

I'd stick to only compressing the data spilled to disk. It might be
useful to compress the streamed data too, but why shouldn't we compress
the regular (non-streamed) transactions too? Yeah, it's more efficient
to compress larger chunks, but we can fit quite large transactions into
logical_decoding_work_mem without spilling.

FWIW I'd expect that to be handled at the libpq level - there's already
a patch for that, but I haven't checked if it would handle this. But
maybe more importantly, I think compressing streamed data might need to
handle some sort of negotiation of the compression algorithm, which
seems fairly complex.

To conclude, I'd leave this out of scope for this patch.

Your point sounds reasonable to me. OTOH, if we want to support
compression for spill case then shouldn't there be a question how
frequent such an option would be required? Users currently have an
option to stream large transactions for parallel apply or otherwise in
which case no spilling is required. I feel sooner or later we will
make such behavior (streaming=parallel) as default, and then spilling
should happen in very few cases. Is it worth adding this new option
and GUC if that is true?

I don't know, but streaming is 'off' by default, and I'm not aware of
any proposals to change this, so when you suggest "sooner or later"
we'll change this, I'd probably bet on "later or never".

I haven't been following the discussions about parallel apply very
closely, but my impression from dealing with similar stuff in other
tools is that it's rather easy to run into issues with some workloads,
which just makes me more skeptical about "streamin=parallel" by default.
But as I said, I'm out of the loop so I may be wrong ...

As for whether the GUC is needed, I don't know. I guess we might do the
same thing we do for streaming - we don't have a GUC to enable this, but
we default to 'off' and the client has to request that when opening the
replication connection. So it'd be specified at the subscription level,
more or less.

But then how would we specify compression for cases that invoke decoding
directly by pg_logical_slot_get_changes()? Through options?

BTW if we specify this at subscription level, will it be possible to
change the compression method?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Tomas Vondra (#21)
Re: Compress ReorderBuffer spill files using LZ4

On Tue, Jul 16, 2024 at 7:31 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/16/24 14:52, Amit Kapila wrote:

On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

FWIW I'd expect that to be handled at the libpq level - there's already
a patch for that, but I haven't checked if it would handle this. But
maybe more importantly, I think compressing streamed data might need to
handle some sort of negotiation of the compression algorithm, which
seems fairly complex.

To conclude, I'd leave this out of scope for this patch.

Your point sounds reasonable to me. OTOH, if we want to support
compression for spill case then shouldn't there be a question how
frequent such an option would be required? Users currently have an
option to stream large transactions for parallel apply or otherwise in
which case no spilling is required. I feel sooner or later we will
make such behavior (streaming=parallel) as default, and then spilling
should happen in very few cases. Is it worth adding this new option
and GUC if that is true?

I don't know, but streaming is 'off' by default, and I'm not aware of
any proposals to change this, so when you suggest "sooner or later"
we'll change this, I'd probably bet on "later or never".

I haven't been following the discussions about parallel apply very
closely, but my impression from dealing with similar stuff in other
tools is that it's rather easy to run into issues with some workloads,
which just makes me more skeptical about "streamin=parallel" by default.
But as I said, I'm out of the loop so I may be wrong ...

It is difficult to say whether enabling it by default will have issues
or not but till now we haven't seen many reports for the streaming =
'parallel' option. It could be due to the reason that not many people
enable it in their workloads. We can probably find out by enabling it
by default.

As for whether the GUC is needed, I don't know. I guess we might do the
same thing we do for streaming - we don't have a GUC to enable this, but
we default to 'off' and the client has to request that when opening the
replication connection. So it'd be specified at the subscription level,
more or less.

But then how would we specify compression for cases that invoke decoding
directly by pg_logical_slot_get_changes()? Through options?

If we decide to go with this then yeah that is one way, another
possibility is to make it a slot's property, so we can allow to take a
new parameter in pg_create_logical_replication_slot(). We can even
think of inventing a new API to alter the slot's properties if we
decide to go this route.

BTW if we specify this at subscription level, will it be possible to
change the compression method?

This needs analysis but offhand I can't see the problems with it.

--
With Regards,
Amit Kapila.

#23Julien Tachoires
julmon@gmail.com
In reply to: Amit Kapila (#22)
6 attachment(s)
Re: Compress ReorderBuffer spill files using LZ4

Le mer. 17 juil. 2024 à 02:12, Amit Kapila <amit.kapila16@gmail.com> a écrit :

On Tue, Jul 16, 2024 at 7:31 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

On 7/16/24 14:52, Amit Kapila wrote:

On Tue, Jul 16, 2024 at 12:58 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

FWIW I'd expect that to be handled at the libpq level - there's already
a patch for that, but I haven't checked if it would handle this. But
maybe more importantly, I think compressing streamed data might need to
handle some sort of negotiation of the compression algorithm, which
seems fairly complex.

To conclude, I'd leave this out of scope for this patch.

Your point sounds reasonable to me. OTOH, if we want to support
compression for spill case then shouldn't there be a question how
frequent such an option would be required? Users currently have an
option to stream large transactions for parallel apply or otherwise in
which case no spilling is required. I feel sooner or later we will
make such behavior (streaming=parallel) as default, and then spilling
should happen in very few cases. Is it worth adding this new option
and GUC if that is true?

I don't know, but streaming is 'off' by default, and I'm not aware of
any proposals to change this, so when you suggest "sooner or later"
we'll change this, I'd probably bet on "later or never".

I haven't been following the discussions about parallel apply very
closely, but my impression from dealing with similar stuff in other
tools is that it's rather easy to run into issues with some workloads,
which just makes me more skeptical about "streamin=parallel" by default.
But as I said, I'm out of the loop so I may be wrong ...

It is difficult to say whether enabling it by default will have issues
or not but till now we haven't seen many reports for the streaming =
'parallel' option. It could be due to the reason that not many people
enable it in their workloads. We can probably find out by enabling it
by default.

As for whether the GUC is needed, I don't know. I guess we might do the
same thing we do for streaming - we don't have a GUC to enable this, but
we default to 'off' and the client has to request that when opening the
replication connection. So it'd be specified at the subscription level,
more or less.

But then how would we specify compression for cases that invoke decoding
directly by pg_logical_slot_get_changes()? Through options?

If we decide to go with this then yeah that is one way, another
possibility is to make it a slot's property, so we can allow to take a
new parameter in pg_create_logical_replication_slot(). We can even
think of inventing a new API to alter the slot's properties if we
decide to go this route.

Please find a new version of this patch set. The compression method is
now set on subscriber level via CREATE SUBSCRIPTION or ALTER
SUBSCRIPTION and can be passed to
pg_logical_slot_get_changes()/pg_logical_slot_get_binary_changes()
through the option spill_compression.

BTW if we specify this at subscription level, will it be possible to
change the compression method?

This needs analysis but offhand I can't see the problems with it.

I didn't notice any issue, the compression method can be changed even
when a decoding is in progress, in this case, the replication worker
restart due to parameter change.

JT

Attachments:

v3-0002-Fix-spill_bytes-counter.patchapplication/octet-stream; name=v3-0002-Fix-spill_bytes-counter.patchDownload
From a09269f20c7e3d273590d0f0c9d83cccdee645dc Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Sun, 23 Jun 2024 14:42:04 -0700
Subject: [PATCH 2/6] Fix spill_bytes counter

The spill_bytes counter considers now the fact that decoded changes
are spilled on disk compressed.
---
 src/backend/replication/logical/reorderbuffer.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 2519f799bc..964a861bbf 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -246,7 +246,7 @@ static void ReorderBufferExecuteInvalidations(uint32 nmsgs, SharedInvalidationMe
  */
 static void ReorderBufferCheckMemoryLimit(ReorderBuffer *rb);
 static void ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+static Size ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										 int fd, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										TXNEntryFile *file, XLogSegNo *segno);
@@ -3689,6 +3689,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	XLogSegNo	curOpenSegNo = 0;
 	Size		spilled = 0;
 	Size		size = txn->size;
+	Size		spillBytes = 0;
 
 	elog(DEBUG2, "spill %u changes in XID %u to disk",
 		 (uint32) txn->nentries_mem, txn->xid);
@@ -3740,7 +3741,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 						 errmsg("could not open file \"%s\": %m", path)));
 		}
 
-		ReorderBufferSerializeChange(rb, txn, fd, change);
+		spillBytes += ReorderBufferSerializeChange(rb, txn, fd, change);
 		dlist_delete(&change->node);
 		ReorderBufferReturnChange(rb, change, false);
 
@@ -3754,7 +3755,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	if (spilled)
 	{
 		rb->spillCount += 1;
-		rb->spillBytes += size;
+		rb->spillBytes += spillBytes;
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
@@ -3775,7 +3776,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 /*
  * Serialize individual change to disk.
  */
-static void
+static Size
 ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 							 int fd, ReorderBufferChange *change)
 {
@@ -3988,6 +3989,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	 */
 	if (txn->final_lsn < change->lsn)
 		txn->final_lsn = change->lsn;
+
+	/* Return data size written to disk */
+	return disk_hdr->size;
 }
 
 /* Returns true, if the output plugin supports streaming, false, otherwise. */
-- 
2.43.0

v3-0004-Compress-ReorderBuffer-spill-files-using-ZSTD.patchapplication/octet-stream; name=v3-0004-Compress-ReorderBuffer-spill-files-using-ZSTD.patchDownload
From cc267b2da6ff05860144bed0b0bac0bbd856830f Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 18 Jul 2024 07:39:25 -0700
Subject: [PATCH 4/6] Compress ReorderBuffer spill files using ZSTD

---
 .../logical/reorderbuffer_compression.c       | 364 ++++++++++++++++++
 .../replication/reorderbuffer_compression.h   |  39 ++
 2 files changed, 403 insertions(+)

diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index a05393cc61..9bda286cb8 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -19,6 +19,10 @@
 #include <lz4.h>
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
+
 #include "replication/reorderbuffer_compression.h"
 
 #define NO_LZ4_SUPPORT() \
@@ -27,6 +31,12 @@
 			 errmsg("compression method lz4 not supported"), \
 			 errdetail("This functionality requires the server to be built with lz4 support.")))
 
+#define NO_ZSTD_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method zstd not supported"), \
+			 errdetail("This functionality requires the server to be built with zstd support.")))
+
 /*
  * Allocate a new LZ4StreamingCompressorState.
  */
@@ -303,6 +313,309 @@ lz4_DecompressData(char *src, Size src_size, char **dst, Size dst_size)
 #endif
 }
 
+/*
+ * Allocate a new ZSTDStreamingCompressorState.
+ */
+static void *
+zstd_NewCompressorState(MemoryContext context)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+	return NULL;				/* keep compiler quiet */
+#else
+	ZSTDStreamingCompressorState *cstate;
+
+	cstate = (ZSTDStreamingCompressorState *)
+		MemoryContextAlloc(context, sizeof(ZSTDStreamingCompressorState));
+
+	/*
+	 * We do not allocate ZSTD buffers and contexts at this point because we
+	 * have no guarantee that we will need them later. Let's allocate only when
+	 * we are about to use them.
+	 */
+	cstate->zstd_c_ctx = NULL;
+	cstate->zstd_c_in_buf = NULL;
+	cstate->zstd_c_in_buf_size = 0;
+	cstate->zstd_c_out_buf = NULL;
+	cstate->zstd_c_out_buf_size = 0;
+	cstate->zstd_frame_size = 0;
+	cstate->zstd_d_ctx = NULL;
+	cstate->zstd_d_in_buf = NULL;
+	cstate->zstd_d_in_buf_size = 0;
+	cstate->zstd_d_out_buf = NULL;
+	cstate->zstd_d_out_buf_size = 0;
+
+	return (void *) cstate;
+#endif
+}
+
+/*
+ * Free ZSTD memory resources and the compressor state.
+ */
+static void
+zstd_FreeCompressorState(MemoryContext context, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext;
+
+	if (compressor_state == NULL)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+
+	if (cstate->zstd_c_ctx != NULL)
+	{
+		/* Compressor state was used for compression */
+		pfree(cstate->zstd_c_in_buf);
+		pfree(cstate->zstd_c_out_buf);
+		ZSTD_freeCCtx(cstate->zstd_c_ctx);
+	}
+	if (cstate->zstd_d_ctx != NULL)
+	{
+		/* Compressor state was used for decompression */
+		pfree(cstate->zstd_d_in_buf);
+		pfree(cstate->zstd_d_out_buf);
+		ZSTD_freeDCtx(cstate->zstd_d_ctx);
+	}
+
+	pfree(compressor_state);
+
+	MemoryContextSwitchTo(oldcontext);
+#endif
+}
+
+#ifdef USE_ZSTD
+/*
+ * Allocate ZSTD compression buffers and create the ZSTD compression context.
+ */
+static void
+zstd_CreateStreamCompressorState(MemoryContext context, void *compressor_state)
+{
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	cstate->zstd_c_in_buf_size = ZSTD_CStreamInSize();
+	cstate->zstd_c_in_buf = (char *) palloc0(cstate->zstd_c_in_buf_size);
+	cstate->zstd_c_out_buf_size = ZSTD_CStreamOutSize();
+	cstate->zstd_c_out_buf = (char *) palloc0(cstate->zstd_c_out_buf_size);
+	cstate->zstd_c_ctx = ZSTD_createCCtx();
+
+	if (cstate->zstd_c_ctx == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("could not create ZSTD compression context")));
+
+	/* Set compression level */
+	ZSTD_CCtx_setParameter(cstate->zstd_c_ctx, ZSTD_c_compressionLevel,
+						   ZSTD_COMPRESSION_LEVEL);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+#ifdef USE_ZSTD
+/*
+ * Allocate ZSTD decompression buffers and create the ZSTD decompression
+ * context.
+ */
+static void
+zstd_CreateStreamDecodeCompressorState(MemoryContext context, void *compressor_state)
+{
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	cstate->zstd_d_in_buf_size = ZSTD_DStreamInSize();
+	cstate->zstd_d_in_buf = (char *) palloc0(cstate->zstd_d_in_buf_size);
+	cstate->zstd_d_out_buf_size = ZSTD_DStreamOutSize();
+	cstate->zstd_d_out_buf = (char *) palloc0(cstate->zstd_d_out_buf_size);
+	cstate->zstd_d_ctx = ZSTD_createDCtx();
+
+	if (cstate->zstd_d_ctx == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("could not create ZSTD decompression context")));
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+/*
+ * Data compression using ZSTD streaming API.
+ */
+static void
+zstd_StreamingCompressData(MemoryContext context, char *src, Size src_size,
+						   char **dst, Size *dst_size, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+	/* Size of remaining data to be copied from src into ZSTD input buffer */
+	Size		toCpy = src_size;
+	char	   *dst_data;
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	/* Allocate ZSTD buffers and context */
+	if (cstate->zstd_c_ctx == NULL)
+		zstd_CreateStreamCompressorState(context, compressor_state);
+
+	/* Allocate memory that will be used to store compressed data */
+	*dst = (char *) palloc0(ZSTD_compressBound(src_size));
+
+	dst_data = *dst;
+	*dst_size = 0;
+
+	/*
+	 * ZSTD streaming compression works with chunks: the source data needs to
+	 * be splitted out in chunks, each of them is then copied into ZSTD input
+	 * buffer.
+	 * For each chunk, we proceed with compression. Streaming compression is
+	 * not intended to compress the whole input chunk, so we have the call
+	 * ZSTD_compressStream2() multiple times until the entire chunk is
+	 * consumed.
+	 */
+	while (toCpy > 0)
+	{
+		/* Are we on the last chunk? */
+		bool		last_chunk = (toCpy < cstate->zstd_c_in_buf_size);
+		/* Size of the data copied into ZSTD input buffer */
+		Size		cpySize = last_chunk ? toCpy : cstate->zstd_c_in_buf_size;
+		bool		finished = false;
+		ZSTD_inBuffer input;
+		ZSTD_EndDirective mode = last_chunk ? ZSTD_e_flush : ZSTD_e_continue;
+
+		/* Copy data from src into ZSTD input buffer */
+		memcpy(cstate->zstd_c_in_buf, src, cpySize);
+
+		/*
+		 * Close the frame when we are on the last chunk and we've reached max
+		 * frame size.
+		 */
+		if (last_chunk && (cstate->zstd_frame_size > ZSTD_MAX_FRAME_SIZE))
+		{
+			mode = ZSTD_e_end;
+			cstate->zstd_frame_size = 0;
+		}
+
+		cstate->zstd_frame_size += cpySize;
+
+		input.src = cstate->zstd_c_in_buf;
+		input.size = cpySize;
+		input.pos = 0;
+
+		do
+		{
+			Size		remaining;
+			ZSTD_outBuffer output;
+
+			output.dst = cstate->zstd_c_out_buf;
+			output.size = cstate->zstd_c_out_buf_size;
+			output.pos = 0;
+
+			remaining = ZSTD_compressStream2(cstate->zstd_c_ctx, &output,
+											 &input, mode);
+
+			if (ZSTD_isError(remaining))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("ZSTD compression failed")));
+
+			/* Copy back compressed data from ZSTD output buffer */
+			memcpy(dst_data, (char *) cstate->zstd_c_out_buf, output.pos);
+
+			dst_data += output.pos;
+			*dst_size += output.pos;
+
+			/*
+			 * Compression is done when we are working on the last chunk and
+			 * there is nothing left to compress, or, when we reach the end of
+			 * the chunk.
+			 */
+			finished = last_chunk ? (remaining == 0) : (input.pos == input.size);
+		} while (!finished);
+
+		src += cpySize;
+		toCpy -= cpySize;
+	}
+#endif
+}
+
+/*
+ * Data decompression using ZSTD streaming API.
+ */
+static void
+zstd_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
+							char **dst, Size dst_size, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+	/* Size of remaining data to be copied from src into ZSTD input buffer */
+	Size		toCpy = src_size;
+	char	   *dst_data;
+	Size		decBytes = 0;	/* Size of decompressed data */
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	/* Allocate ZSTD buffers and context */
+	if (cstate->zstd_d_ctx == NULL)
+		zstd_CreateStreamDecodeCompressorState(context, compressor_state);
+
+	/* Allocate memory that will be used to store decompressed data */
+	*dst = (char *) palloc0(dst_size);
+
+	dst_data = *dst;
+
+	while (toCpy > 0)
+	{
+		ZSTD_inBuffer input;
+		Size		cpySize = (toCpy > cstate->zstd_d_in_buf_size) ? cstate->zstd_d_in_buf_size : toCpy;
+
+		/* Copy data from src into ZSTD input buffer */
+		memcpy(cstate->zstd_d_in_buf, src, cpySize);
+
+		input.src = cstate->zstd_d_in_buf;
+		input.size = cpySize;
+		input.pos = 0;
+
+		while (input.pos < input.size)
+		{
+			ZSTD_outBuffer output;
+			Size		ret;
+
+			output.dst = cstate->zstd_d_out_buf;
+			output.size = cstate->zstd_d_out_buf_size;
+			output.pos = 0;
+
+			ret = ZSTD_decompressStream(cstate->zstd_d_ctx, &output , &input);
+
+			if (ZSTD_isError(ret))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("ZSTD decompression failed")));
+
+			/* Copy back compressed data from ZSTD output buffer */
+			memcpy(dst_data, (char *) cstate->zstd_d_out_buf, output.pos);
+
+			dst_data += output.pos;
+			decBytes += output.pos;
+		}
+
+		src += cpySize;
+		toCpy -= cpySize;
+	}
+
+	Assert(dst_size == decBytes);
+#endif
+}
+
 /*
  * Allocate a new Compressor State, depending on the compression method.
  */
@@ -314,6 +627,9 @@ ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
 		case REORDER_BUFFER_LZ4_COMPRESSION:
 			return lz4_NewCompressorState(context);
 			break;
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+			return zstd_NewCompressorState(context);
+			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
 		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
@@ -335,6 +651,9 @@ ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
 		case REORDER_BUFFER_LZ4_COMPRESSION:
 			return lz4_FreeCompressorState(context, compressor_state);
 			break;
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+			return zstd_FreeCompressorState(context, compressor_state);
+			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
 		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
@@ -459,6 +778,35 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 
 			pfree(dst);
 
+			break;
+		}
+		/* ZSTD Compression */
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+		{
+			char	   *dst = NULL;
+			Size		dst_size = 0;
+			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+			Size		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+
+			/* Use ZSTD streaming compression */
+			zstd_StreamingCompressData(rb->context, src, src_size, &dst,
+									   &dst_size, compressor_state);
+
+			ReorderBufferReserve(rb, (dst_size + sizeof(ReorderBufferDiskHeader)));
+
+			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+			hdr->comp_strat = REORDER_BUFFER_STRAT_ZSTD_STREAMING;
+			hdr->size = dst_size + sizeof(ReorderBufferDiskHeader);
+			hdr->raw_size = src_size;
+
+			*header = hdr;
+
+			/* Copy back compressed data into the ReorderBuffer */
+			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+				   dst_size);
+
+			pfree(dst);
+
 			break;
 		}
 	}
@@ -553,6 +901,22 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 							 errmsg_internal("compressed PGLZ data is corrupted")));
 				break;
 			}
+		/* ZSTD streaming decompression */
+		case REORDER_BUFFER_STRAT_ZSTD_STREAMING:
+			{
+				char	   *buf;
+				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
+				Size		buf_size = header->raw_size;
+
+				zstd_StreamingDecompressData(rb->context, data, src_size, &buf,
+											 buf_size, compressor_state);
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   buf, buf_size);
+				pfree(buf);
+				break;
+			}
 		default:
 			/* Other compression methods not yet supported */
 			break;
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index 5abc875ac4..b668f33d5a 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -19,12 +19,17 @@
 #include <lz4.h>
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
+
 /* ReorderBuffer on disk compression algorithms */
 typedef enum ReorderBufferCompressionMethod
 {
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_LZ4_COMPRESSION,
 	REORDER_BUFFER_PGLZ_COMPRESSION,
+	REORDER_BUFFER_ZSTD_COMPRESSION,
 } ReorderBufferCompressionMethod;
 
 /*
@@ -36,6 +41,7 @@ typedef enum ReorderBufferCompressionStrategy
 	REORDER_BUFFER_STRAT_LZ4_STREAMING,
 	REORDER_BUFFER_STRAT_LZ4_REGULAR,
 	REORDER_BUFFER_STRAT_PGLZ,
+	REORDER_BUFFER_STRAT_ZSTD_STREAMING,
 } ReorderBufferCompressionStrategy;
 
 /* Disk serialization support datastructures */
@@ -81,6 +87,39 @@ typedef struct LZ4StreamingCompressorState {
 #define lz4_CanDoStreamingCompression(s) (false)
 #endif
 
+#ifdef USE_ZSTD
+/*
+ * Low compression level provides high compression speed and decent compression
+ * rate. Minimum level is 1, maximum is 22.
+ */
+#define ZSTD_COMPRESSION_LEVEL 1
+
+/*
+ * Maximum volume of data encoded in the current ZSTD frame. When this
+ * threshold is reached then we close the current frame and start a new one.
+ */
+#define ZSTD_MAX_FRAME_SIZE (64 * 1024)
+
+/*
+ * ZSTD streaming compression/decompression handlers and buffers.
+ */
+typedef struct ZSTDStreamingCompressorState {
+	/* Compression */
+	ZSTD_CCtx  *zstd_c_ctx;
+	Size		zstd_c_in_buf_size;
+	char	   *zstd_c_in_buf;
+	Size		zstd_c_out_buf_size;
+	char	   *zstd_c_out_buf;
+	Size		zstd_frame_size;
+	/* Decompression */
+	ZSTD_DCtx  *zstd_d_ctx;
+	Size		zstd_d_in_buf_size;
+	char	   *zstd_d_in_buf;
+	Size		zstd_d_out_buf_size;
+	char	   *zstd_d_out_buf;
+} ZSTDStreamingCompressorState;
+#endif
+
 extern void *ReorderBufferNewCompressorState(MemoryContext context,
 											 int compression_method);
 extern void ReorderBufferFreeCompressorState(MemoryContext context,
-- 
2.43.0

v3-0003-Compress-ReorderBuffer-spill-files-using-PGLZ.patchapplication/octet-stream; name=v3-0003-Compress-ReorderBuffer-spill-files-using-PGLZ.patchDownload
From ef6ad37d547e158f11b4fc73439ebcd9c9bd253f Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 18 Jul 2024 07:36:04 -0700
Subject: [PATCH 3/6] Compress ReorderBuffer spill files using PGLZ

---
 .../logical/reorderbuffer_compression.c       | 58 +++++++++++++++++++
 .../replication/reorderbuffer_compression.h   |  2 +
 2 files changed, 60 insertions(+)

diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index 77f5c76929..a05393cc61 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -13,6 +13,8 @@
  */
 #include "postgres.h"
 
+#include "common/pg_lzcompress.h"
+
 #ifdef USE_LZ4
 #include <lz4.h>
 #endif
@@ -313,6 +315,7 @@ ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
 			return lz4_NewCompressorState(context);
 			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
 			return NULL;
 			break;
@@ -333,6 +336,7 @@ ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
 			return lz4_FreeCompressorState(context, compressor_state);
 			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
 			break;
 	}
@@ -421,6 +425,40 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 
 			pfree(dst);
 
+			break;
+		}
+		/* PGLZ compression */
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
+		{
+			int32		dst_size = 0;
+			char	   *dst = NULL;
+			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+			int32		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+			int32		max_size = PGLZ_MAX_OUTPUT(src_size);
+
+			dst = (char *) palloc0(max_size);
+			dst_size = pglz_compress(src, src_size, dst, PGLZ_strategy_always);
+
+			if (dst_size < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("PGLZ compression failed")));
+
+			ReorderBufferReserve(rb, (Size) (dst_size + sizeof(ReorderBufferDiskHeader)));
+
+			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+			hdr->comp_strat = REORDER_BUFFER_STRAT_PGLZ;
+			hdr->size = (Size) dst_size + sizeof(ReorderBufferDiskHeader);
+			hdr->raw_size = (Size) src_size;
+
+			*header = hdr;
+
+			/* Copy back compressed data into the ReorderBuffer */
+			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+				   dst_size);
+
+			pfree(dst);
+
 			break;
 		}
 	}
@@ -495,6 +533,26 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 				 */
 				break;
 			}
+		/* PGLZ decompression */
+		case REORDER_BUFFER_STRAT_PGLZ:
+			{
+				char	   *buf;
+				int32		src_size = (int32) header->size - sizeof(ReorderBufferDiskHeader);
+				int32		buf_size = (int32) header->raw_size;
+				int32		decBytes;
+
+				/* Decompress data directly into the ReorderBuffer */
+				buf = (char *) rb->outbuf;
+				buf += sizeof(ReorderBufferDiskHeader);
+
+				decBytes = pglz_decompress(data, src_size, buf, buf_size, false);
+
+				if (decBytes < 0)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("compressed PGLZ data is corrupted")));
+				break;
+			}
 		default:
 			/* Other compression methods not yet supported */
 			break;
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index 9aa8aea56f..5abc875ac4 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -24,6 +24,7 @@ typedef enum ReorderBufferCompressionMethod
 {
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_LZ4_COMPRESSION,
+	REORDER_BUFFER_PGLZ_COMPRESSION,
 } ReorderBufferCompressionMethod;
 
 /*
@@ -34,6 +35,7 @@ typedef enum ReorderBufferCompressionStrategy
 	REORDER_BUFFER_STRAT_UNCOMPRESSED,
 	REORDER_BUFFER_STRAT_LZ4_STREAMING,
 	REORDER_BUFFER_STRAT_LZ4_REGULAR,
+	REORDER_BUFFER_STRAT_PGLZ,
 } ReorderBufferCompressionStrategy;
 
 /* Disk serialization support datastructures */
-- 
2.43.0

v3-0005-Add-the-subscription-option-spill_compression.patchapplication/octet-stream; name=v3-0005-Add-the-subscription-option-spill_compression.patchDownload
From bca6a7c9e2b09638376d179452863006d1ebc20f Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 18 Jul 2024 07:45:43 -0700
Subject: [PATCH 5/6] Add the subscription option spill_compression

---
 doc/src/sgml/ref/alter_subscription.sgml      |   5 +-
 doc/src/sgml/ref/create_subscription.sgml     |  24 +++
 src/backend/catalog/pg_subscription.c         |   6 +
 src/backend/catalog/system_views.sql          |   3 +-
 src/backend/commands/subscriptioncmds.c       |  31 +++-
 .../libpqwalreceiver/libpqwalreceiver.c       |   5 +
 src/backend/replication/logical/logical.c     |   4 +
 .../replication/logical/reorderbuffer.c       |  15 +-
 .../logical/reorderbuffer_compression.c       |  57 +++++++
 src/backend/replication/logical/worker.c      |  13 +-
 src/backend/replication/pgoutput/pgoutput.c   |  28 ++++
 src/bin/pg_dump/pg_dump.c                     |  18 +-
 src/bin/pg_dump/pg_dump.h                     |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl              |   8 +-
 src/bin/psql/describe.c                       |   7 +-
 src/bin/psql/tab-complete.c                   |   5 +-
 src/include/catalog/pg_subscription.h         |   4 +
 src/include/replication/logical.h             |   2 +
 src/include/replication/pgoutput.h            |   1 +
 .../replication/reorderbuffer_compression.h   |   4 +
 src/include/replication/walreceiver.h         |   1 +
 src/test/regress/expected/subscription.out    | 156 +++++++++---------
 src/test/regress/sql/subscription.sql         |   4 +
 23 files changed, 308 insertions(+), 94 deletions(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 476f195622..44df09b854 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -228,8 +228,9 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
       <link linkend="sql-createsubscription-params-with-disable-on-error"><literal>disable_on_error</literal></link>,
       <link linkend="sql-createsubscription-params-with-password-required"><literal>password_required</literal></link>,
       <link linkend="sql-createsubscription-params-with-run-as-owner"><literal>run_as_owner</literal></link>,
-      <link linkend="sql-createsubscription-params-with-origin"><literal>origin</literal></link>, and
-      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>.
+      <link linkend="sql-createsubscription-params-with-origin"><literal>origin</literal></link>,
+      <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>, and
+      <link linkend="sql-createsubscription-params-with-spill-compression"><literal>spill_compression</literal></link>.
       Only a superuser can set <literal>password_required = false</literal>.
      </para>
 
diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 740b7d9421..56f733eaf8 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -428,6 +428,30 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry id="sql-createsubscription-params-with-spill-compression">
+        <term><literal>spill_compression</literal> (<type>enum</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the decoded changes that eventually need to be
+          temporarily written on disk by the publisher are compressed or not.
+          Default value is <literal>off</literal> meaning no data compression
+          involved. Setting <literal>spill_compression</literal> to
+          <literal>on</literal> or <literal>pglz</literal> means that the
+          decoded changes are compressed using the internal
+          <literal>PGLZ</literal> compression algorithm.
+         </para>
+
+         <para>
+          If the <productname>PostgreSQL</productname> server running the
+          publisher node supports the external compression libraries
+          <productname>LZ4</productname> or
+          <productname>Zstandard</productname>,
+          <literal>spill_compression</literal> can be set respectively to
+          <literal>lz4</literal> or <literal>zstd</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist></para>
 
     </listitem>
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 9efc9159f2..a3329043dc 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -110,6 +110,12 @@ GetSubscription(Oid subid, bool missing_ok)
 	/* Is the subscription owner a superuser? */
 	sub->ownersuperuser = superuser_arg(sub->owner);
 
+	/* Get spill_compression */
+	datum = SysCacheGetAttrNotNull(SUBSCRIPTIONOID,
+								   tup,
+								   Anum_pg_subscription_subspillcompression);
+	sub->spill_compression = TextDatumGetCString(datum);
+
 	ReleaseSysCache(tup);
 
 	return sub;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 19cabc9a47..c84c283b9f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1356,7 +1356,8 @@ REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
               subbinary, substream, subtwophasestate, subdisableonerr,
 			  subpasswordrequired, subrunasowner, subfailover,
-              subslotname, subsynccommit, subpublications, suborigin)
+              subslotname, subsynccommit, subpublications, suborigin,
+              subspillcompression)
     ON pg_subscription TO public;
 
 CREATE VIEW pg_stat_subscription_stats AS
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 16d83b3253..b2d35eac53 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -39,6 +39,7 @@
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
 #include "replication/origin.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slot.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
@@ -72,6 +73,7 @@
 #define SUBOPT_FAILOVER				0x00002000
 #define SUBOPT_LSN					0x00004000
 #define SUBOPT_ORIGIN				0x00008000
+#define SUBOPT_SPILL_COMPRESSION	0x00010000
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -99,6 +101,7 @@ typedef struct SubOpts
 	bool		failover;
 	char	   *origin;
 	XLogRecPtr	lsn;
+	char	   *spill_compression;
 } SubOpts;
 
 static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
@@ -161,6 +164,8 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		opts->failover = false;
 	if (IsSet(supported_opts, SUBOPT_ORIGIN))
 		opts->origin = pstrdup(LOGICALREP_ORIGIN_ANY);
+	if (IsSet(supported_opts, SUBOPT_SPILL_COMPRESSION))
+		opts->spill_compression = "off";
 
 	/* Parse options */
 	foreach(lc, stmt_options)
@@ -366,6 +371,18 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_LSN;
 			opts->lsn = lsn;
 		}
+		else if (IsSet(supported_opts, SUBOPT_SPILL_COMPRESSION) &&
+				 strcmp(defel->defname, "spill_compression") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_SPILL_COMPRESSION))
+				errorConflictingDefElem(defel, pstate);
+
+			opts->specified_opts |= SUBOPT_SPILL_COMPRESSION;
+			opts->spill_compression = defGetString(defel);
+
+			ReorderBufferValidateCompressionMethod(opts->spill_compression,
+												   ERROR);
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -603,7 +620,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 					  SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
 					  SUBOPT_STREAMING | SUBOPT_TWOPHASE_COMMIT |
 					  SUBOPT_DISABLE_ON_ERR | SUBOPT_PASSWORD_REQUIRED |
-					  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN);
+					  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
+					  SUBOPT_SPILL_COMPRESSION);
 	parse_subscription_options(pstate, stmt->options, supported_opts, &opts);
 
 	/*
@@ -723,6 +741,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 		publicationListToArray(publications);
 	values[Anum_pg_subscription_suborigin - 1] =
 		CStringGetTextDatum(opts.origin);
+	values[Anum_pg_subscription_subspillcompression - 1] =
+		CStringGetTextDatum(opts.spill_compression);
 
 	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
 
@@ -1148,7 +1168,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								  SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR |
 								  SUBOPT_PASSWORD_REQUIRED |
 								  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
-								  SUBOPT_ORIGIN);
+								  SUBOPT_ORIGIN | SUBOPT_SPILL_COMPRESSION);
 
 				parse_subscription_options(pstate, stmt->options,
 										   supported_opts, &opts);
@@ -1265,6 +1285,13 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 					replaces[Anum_pg_subscription_suborigin - 1] = true;
 				}
 
+				if (IsSet(opts.specified_opts, SUBOPT_SPILL_COMPRESSION))
+				{
+					values[Anum_pg_subscription_subspillcompression - 1] =
+						CStringGetTextDatum(opts.spill_compression);
+					replaces[Anum_pg_subscription_subspillcompression - 1] = true;
+				}
+
 				update_tuple = true;
 				break;
 			}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 6c42c209d2..fda751da14 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -620,6 +620,11 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
 			PQserverVersion(conn->streamConn) >= 140000)
 			appendStringInfoString(&cmd, ", binary 'true'");
 
+		if (options->proto.logical.spill_compression &&
+			PQserverVersion(conn->streamConn) >= 180000)
+			appendStringInfo(&cmd, ", spill_compression '%s'",
+							 options->proto.logical.spill_compression);
+
 		appendStringInfoChar(&cmd, ')');
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index f8ef5d56d2..05eae65cca 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -36,6 +36,7 @@
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slotsync.h"
 #include "replication/snapbuild.h"
 #include "storage/proc.h"
@@ -298,6 +299,9 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->fast_forward = fast_forward;
 
+	/* No spill files compression by default */
+	ctx->spill_compression_method = REORDER_BUFFER_NO_COMPRESSION;
+
 	MemoryContextSwitchTo(old_context);
 
 	return ctx;
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 964a861bbf..328c26e994 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -404,6 +404,15 @@ ReorderBufferFree(ReorderBuffer *rb)
 	ReorderBufferCleanupSerializedTXNs(NameStr(MyReplicationSlot->data.name));
 }
 
+/* Returns spill files compression method */
+static inline uint8
+ReorderBufferSpillCompressionMethod(ReorderBuffer *rb)
+{
+	LogicalDecodingContext *ctx = rb->private_data;
+
+	return ctx->spill_compression_method;
+}
+
 /*
  * Get an unused, possibly preallocated, ReorderBufferTXN.
  */
@@ -425,7 +434,7 @@ ReorderBufferGetTXN(ReorderBuffer *rb)
 	txn->command_id = InvalidCommandId;
 	txn->output_plugin_private = NULL;
 	txn->compressor_state = ReorderBufferNewCompressorState(rb->context,
-															logical_decoding_spill_compression);
+															ReorderBufferSpillCompressionMethod(rb));
 
 	return txn;
 }
@@ -464,7 +473,7 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	}
 
 	ReorderBufferFreeCompressorState(rb->context,
-									 logical_decoding_spill_compression,
+									 ReorderBufferSpillCompressionMethod(rb),
 									 txn->compressor_state);
 
 	/* Reset the toast hash */
@@ -3958,7 +3967,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	}
 
 	/* Inplace ReorderBuffer content compression before writing it on disk */
-	ReorderBufferCompress(rb, &disk_hdr, logical_decoding_spill_compression,
+	ReorderBufferCompress(rb, &disk_hdr, ReorderBufferSpillCompressionMethod(rb),
 						  sz, txn->compressor_state);
 
 	errno = 0;
diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index 9bda286cb8..bf0d206095 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -922,3 +922,60 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 			break;
 	}
 }
+
+/*
+ * According to a given compression method (as string representation), returns
+ * the corresponding ReorderBufferCompressionMethod
+ */
+ReorderBufferCompressionMethod
+ReorderBufferParseCompressionMethod(const char *method)
+{
+	if (pg_strcasecmp(method, "on") == 0)
+		return REORDER_BUFFER_PGLZ_COMPRESSION;
+	else if (pg_strcasecmp(method, "pglz") == 0)
+		return REORDER_BUFFER_PGLZ_COMPRESSION;
+	else if (pg_strcasecmp(method, "off") == 0)
+		return REORDER_BUFFER_NO_COMPRESSION;
+#ifdef USE_LZ4
+	else if (pg_strcasecmp(method, "lz4") == 0)
+		return REORDER_BUFFER_LZ4_COMPRESSION;
+#endif
+#ifdef USE_ZSTD
+	else if (pg_strcasecmp(method, "zstd") == 0)
+		return REORDER_BUFFER_ZSTD_COMPRESSION;
+#endif
+	else
+		return REORDER_BUFFER_INVALID_COMPRESSION;
+}
+
+/*
+ * Check whether the passed compression method is valid and report errors at
+ * elevel.
+ *
+ * As this validation is intended to be executed on subscriber side, then we
+ * actually don't know if the server running the publisher supports external
+ * compression libraries. We only check if the compression method is
+ * potentially supported. The real validation is done by the publisher when
+ * the replication starts, an error is then triggered if the compression method
+ * is not supported.
+ */
+void
+ReorderBufferValidateCompressionMethod(const char *method, int elevel)
+{
+	bool		valid = false;
+	char		methods[5][5] = {"on", "off", "pglz", "lz4", "zstd"};
+
+	for (int i = 0; i < 5; i++)
+	{
+		if (pg_strcasecmp(method, methods[i]) == 0)
+		{
+			valid = true;
+			break;
+		}
+	}
+
+	if (!valid)
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("compression method \"%s\" not valid", method)));
+}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index c0bda6269b..dea3be009a 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -3929,7 +3929,8 @@ maybe_reread_subscription(void)
 		newsub->passwordrequired != MySubscription->passwordrequired ||
 		strcmp(newsub->origin, MySubscription->origin) != 0 ||
 		newsub->owner != MySubscription->owner ||
-		!equal(newsub->publications, MySubscription->publications))
+		!equal(newsub->publications, MySubscription->publications) ||
+		strcmp(newsub->spill_compression, MySubscription->spill_compression) != 0)
 	{
 		if (am_parallel_apply_worker())
 			ereport(LOG,
@@ -4377,6 +4378,16 @@ set_stream_options(WalRcvStreamOptions *options,
 		MyLogicalRepWorker->parallel_apply = false;
 	}
 
+	if (server_version >= 180000 &&
+			 MySubscription->stream == LOGICALREP_STREAM_OFF &&
+			 MySubscription->spill_compression != NULL)
+	{
+		options->proto.logical.spill_compression =
+			pstrdup(MySubscription->spill_compression);
+	}
+	else
+		options->proto.logical.spill_compression = NULL;
+
 	options->proto.logical.twophase = false;
 	options->proto.logical.origin = pstrdup(MySubscription->origin);
 }
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index abef4eaf68..d9469f45f6 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -27,6 +27,7 @@
 #include "replication/logicalproto.h"
 #include "replication/origin.h"
 #include "replication/pgoutput.h"
+#include "replication/reorderbuffer_compression.h"
 #include "utils/builtins.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
@@ -283,11 +284,13 @@ parse_output_parameters(List *options, PGOutputData *data)
 	bool		streaming_given = false;
 	bool		two_phase_option_given = false;
 	bool		origin_option_given = false;
+	bool		spill_compression_option_given = false;
 
 	data->binary = false;
 	data->streaming = LOGICALREP_STREAM_OFF;
 	data->messages = false;
 	data->two_phase = false;
+	data->spill_compression_method = REORDER_BUFFER_NO_COMPRESSION;
 
 	foreach(lc, options)
 	{
@@ -396,6 +399,28 @@ parse_output_parameters(List *options, PGOutputData *data)
 						errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						errmsg("unrecognized origin value: \"%s\"", origin));
 		}
+		else if (strcmp(defel->defname, "spill_compression") == 0)
+		{
+			uint8		method;
+			char	   *method_str;
+
+			if (spill_compression_option_given)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+			spill_compression_option_given = true;
+
+			method_str = defGetString(defel);
+			method = ReorderBufferParseCompressionMethod(method_str);
+
+			if (method == REORDER_BUFFER_INVALID_COMPRESSION)
+				ereport(ERROR,
+						errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						errmsg("invalid spill files compression method: \"%s\"",
+							   method_str));
+
+			data->spill_compression_method = method;
+		}
 		else
 			elog(ERROR, "unrecognized pgoutput option: %s", defel->defname);
 	}
@@ -508,6 +533,9 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 		data->publications = NIL;
 		publications_valid = false;
 
+		/* Init spill files compression method */
+		ctx->spill_compression_method = data->spill_compression_method;
+
 		/*
 		 * Register callback for pg_publication if we didn't already do that
 		 * during some previous call in this process.
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index b8b1888bd3..d8508bb684 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4760,6 +4760,7 @@ getSubscriptions(Archive *fout)
 	int			i_suboriginremotelsn;
 	int			i_subenabled;
 	int			i_subfailover;
+	int			i_subspillcompression;
 	int			i,
 				ntups;
 
@@ -4832,10 +4833,17 @@ getSubscriptions(Archive *fout)
 
 	if (fout->remoteVersion >= 170000)
 		appendPQExpBufferStr(query,
-							 " s.subfailover\n");
+							 " s.subfailover,\n");
 	else
 		appendPQExpBuffer(query,
-						  " false AS subfailover\n");
+						  " false AS subfailover,\n");
+
+	if (fout->remoteVersion >= 180000)
+		appendPQExpBufferStr(query,
+							 " s.subspillcompression\n");
+	else
+		appendPQExpBuffer(query,
+						  " 'off' AS subspillcompression\n");
 
 	appendPQExpBufferStr(query,
 						 "FROM pg_subscription s\n");
@@ -4875,6 +4883,7 @@ getSubscriptions(Archive *fout)
 	i_suboriginremotelsn = PQfnumber(res, "suboriginremotelsn");
 	i_subenabled = PQfnumber(res, "subenabled");
 	i_subfailover = PQfnumber(res, "subfailover");
+	i_subspillcompression = PQfnumber(res, "subspillcompression");
 
 	subinfo = pg_malloc(ntups * sizeof(SubscriptionInfo));
 
@@ -4921,6 +4930,8 @@ getSubscriptions(Archive *fout)
 			pg_strdup(PQgetvalue(res, i, i_subenabled));
 		subinfo[i].subfailover =
 			pg_strdup(PQgetvalue(res, i, i_subfailover));
+		subinfo[i].subspillcompression =
+			pg_strdup(PQgetvalue(res, i, i_subspillcompression));
 
 		/* Decide whether we want to dump it */
 		selectDumpableObject(&(subinfo[i].dobj), fout);
@@ -5167,6 +5178,9 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	if (pg_strcasecmp(subinfo->suborigin, LOGICALREP_ORIGIN_ANY) != 0)
 		appendPQExpBuffer(query, ", origin = %s", subinfo->suborigin);
 
+	if (strcmp(subinfo->subspillcompression, "off") != 0)
+		appendPQExpBuffer(query, ", spill_compression = %s", subinfo->subspillcompression);
+
 	appendPQExpBufferStr(query, ");\n");
 
 	/*
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 4b2e5870a9..12588070f4 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -671,6 +671,7 @@ typedef struct _SubscriptionInfo
 	char	   *suborigin;
 	char	   *suboriginremotelsn;
 	char	   *subfailover;
+	char	   *subspillcompression;
 } SubscriptionInfo;
 
 /*
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index d3dd8784d6..8fd71f5cf6 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -2965,9 +2965,9 @@ my %tests = (
 		create_order => 50,
 		create_sql => 'CREATE SUBSCRIPTION sub2
 						 CONNECTION \'dbname=doesnotexist\' PUBLICATION pub1
-						 WITH (connect = false, origin = none);',
+						 WITH (connect = false, origin = none, spill_compression = on);',
 		regexp => qr/^
-			\QCREATE SUBSCRIPTION sub2 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub2', origin = none);\E
+			\QCREATE SUBSCRIPTION sub2 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub2', origin = none, spill_compression = on);\E
 			/xm,
 		like => { %full_runs, section_post_data => 1, },
 	},
@@ -2976,9 +2976,9 @@ my %tests = (
 		create_order => 50,
 		create_sql => 'CREATE SUBSCRIPTION sub3
 						 CONNECTION \'dbname=doesnotexist\' PUBLICATION pub1
-						 WITH (connect = false, origin = any);',
+						 WITH (connect = false, origin = any, spill_compression = pglz);',
 		regexp => qr/^
-			\QCREATE SUBSCRIPTION sub3 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub3');\E
+			\QCREATE SUBSCRIPTION sub3 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub3', spill_compression = pglz);\E
 			/xm,
 		like => { %full_runs, section_post_data => 1, },
 	},
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 7c9a1f234c..495a065849 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -6539,7 +6539,7 @@ describeSubscriptions(const char *pattern, bool verbose)
 	printQueryOpt myopt = pset.popt;
 	static const bool translate_columns[] = {false, false, false, false,
 		false, false, false, false, false, false, false, false, false, false,
-	false};
+	false, false};
 
 	if (pset.sversion < 100000)
 	{
@@ -6619,6 +6619,11 @@ describeSubscriptions(const char *pattern, bool verbose)
 			appendPQExpBuffer(&buf,
 							  ", subskiplsn AS \"%s\"\n",
 							  gettext_noop("Skip LSN"));
+
+		if (pset.sversion >= 180000)
+			appendPQExpBuffer(&buf,
+							  ", subspillcompression AS \"%s\"\n",
+							  gettext_noop("Spill files compression"));
 	}
 
 	/* Only display subscriptions in current database. */
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index d453e224d9..011ec52550 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1948,7 +1948,7 @@ psql_completion(const char *text, int start, int end)
 	else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SET", "("))
 		COMPLETE_WITH("binary", "disable_on_error", "failover", "origin",
 					  "password_required", "run_as_owner", "slot_name",
-					  "streaming", "synchronous_commit");
+					  "spill_compression", "streaming", "synchronous_commit");
 	/* ALTER SUBSCRIPTION <name> SKIP ( */
 	else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SKIP", "("))
 		COMPLETE_WITH("lsn");
@@ -3365,7 +3365,8 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
 					  "disable_on_error", "enabled", "failover", "origin",
 					  "password_required", "run_as_owner", "slot_name",
-					  "streaming", "synchronous_commit", "two_phase");
+					  "spill_compression", "streaming", "synchronous_commit",
+					  "two_phase");
 
 /* CREATE TRIGGER --- is allowed inside CREATE SCHEMA, so use TailMatches */
 
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index 0aa14ec4a2..61c284349c 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -113,6 +113,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
 
 	/* Only publish data originating from the specified origin */
 	text		suborigin BKI_DEFAULT(LOGICALREP_ORIGIN_ANY);
+
+	/* Spill files compression algorithm */
+	text		subspillcompression BKI_FORCE_NOT_NULL;
 #endif
 } FormData_pg_subscription;
 
@@ -157,6 +160,7 @@ typedef struct Subscription
 	List	   *publications;	/* List of publication names to subscribe to */
 	char	   *origin;			/* Only publish data originating from the
 								 * specified origin */
+	char	   *spill_compression;	/* Spill files compression algorithm */
 } Subscription;
 
 /* Disallow streaming in-progress transactions. */
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index aff38e8d04..75c17866c3 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -112,6 +112,8 @@ typedef struct LogicalDecodingContext
 
 	/* Do we need to process any change in fast_forward mode? */
 	bool		processing_required;
+	/* Compression method used to compress spill files */
+	uint8		spill_compression_method;
 } LogicalDecodingContext;
 
 
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 89f94e1147..eabcca62af 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -33,6 +33,7 @@ typedef struct PGOutputData
 	bool		messages;
 	bool		two_phase;
 	bool		publish_no_origin;
+	uint8		spill_compression_method;
 } PGOutputData;
 
 #endif							/* PGOUTPUT_H */
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index b668f33d5a..2f0d5b8bd9 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -26,6 +26,7 @@
 /* ReorderBuffer on disk compression algorithms */
 typedef enum ReorderBufferCompressionMethod
 {
+	REORDER_BUFFER_INVALID_COMPRESSION,
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_LZ4_COMPRESSION,
 	REORDER_BUFFER_PGLZ_COMPRESSION,
@@ -132,5 +133,8 @@ extern void ReorderBufferCompress(ReorderBuffer *rb,
 extern void ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 									ReorderBufferDiskHeader *header,
 									void *compressor_state);
+extern ReorderBufferCompressionMethod ReorderBufferParseCompressionMethod(const char *method);
+extern void ReorderBufferValidateCompressionMethod(const char *method,
+												   int elevel);
 
 #endif							/* REORDERBUFFER_COMPRESSION_H */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 12f71fa99b..b027e4ce89 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -186,6 +186,7 @@ typedef struct
 									 * prepare time */
 			char	   *origin; /* Only publish data originating from the
 								 * specified origin */
+			char	   *spill_compression;	/* Spill files compression algo */
 		}			logical;
 	}			proto;
 } WalRcvStreamOptions;
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 5c2f1ee517..b108ed3113 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -116,18 +116,18 @@ CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PU
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+ regress_testsub4
-                                                                                                                 List of subscriptions
-       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
-------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | none   | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                              List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | none   | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub4 SET (origin = any);
 \dRs+ regress_testsub4
-                                                                                                                 List of subscriptions
-       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
-------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                              List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub3;
@@ -145,10 +145,10 @@ ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 ERROR:  invalid connection string syntax: missing "=" after "foobar" in connection info string
 
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET PUBLICATION testpub2, testpub3 WITH (refresh = false);
@@ -157,10 +157,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = 'newname');
 ALTER SUBSCRIPTION regress_testsub SET (password_required = false);
 ALTER SUBSCRIPTION regress_testsub SET (run_as_owner = true);
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | f                 | t             | f        | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | f                 | t             | f        | off                | dbname=regress_doesnotexist2 | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (password_required = true);
@@ -176,10 +176,10 @@ ERROR:  unrecognized subscription parameter: "create_slot"
 -- ok
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/12345');
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/12345
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/12345  | off
 (1 row)
 
 -- ok - with lsn = NONE
@@ -188,10 +188,10 @@ ALTER SUBSCRIPTION regress_testsub SKIP (lsn = NONE);
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/0');
 ERROR:  invalid WAL location (LSN): 0/0
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/0      | off
 (1 row)
 
 BEGIN;
@@ -222,11 +222,15 @@ ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = local);
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 ERROR:  invalid value for parameter "synchronous_commit": "foobar"
 HINT:  Available values: local, remote_write, remote_apply, on, off.
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = pglz);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = off);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = foobar);
+ERROR:  compression method "foobar" not valid
 \dRs+
-                                                                                                                       List of subscriptions
-        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
----------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | local              | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                    List of subscriptions
+        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+---------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | local              | dbname=regress_doesnotexist2 | 0/0      | off
 (1 row)
 
 -- rename back to keep the rest simple
@@ -255,19 +259,19 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | t      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | t      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (binary = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -279,27 +283,27 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = parallel);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 -- fail - publication already exists
@@ -314,10 +318,10 @@ ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refr
 ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refresh = false);
 ERROR:  publication "testpub1" is already in subscription "regress_testsub"
 \dRs+
-                                                                                                                        List of subscriptions
-      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                                     List of subscriptions
+      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 -- fail - publication used more than once
@@ -332,10 +336,10 @@ ERROR:  publication "testpub3" is not in subscription "regress_testsub"
 -- ok - delete publications
 ALTER SUBSCRIPTION regress_testsub DROP PUBLICATION testpub1, testpub2 WITH (refresh = false);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -371,10 +375,10 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 --fail - alter of two_phase option not supported.
@@ -383,10 +387,10 @@ ERROR:  unrecognized subscription parameter: "two_phase"
 -- but can alter streaming when two_phase enabled
 ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -396,10 +400,10 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -412,18 +416,18 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | t                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | t                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 3e5ba4cb8c..2d891b2c06 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -140,6 +140,10 @@ ALTER SUBSCRIPTION regress_testsub RENAME TO regress_testsub_foo;
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = local);
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = pglz);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = off);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = foobar);
+
 \dRs+
 
 -- rename back to keep the rest simple
-- 
2.43.0

v3-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchapplication/octet-stream; name=v3-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchDownload
From 31875f6ae2301b3c981fefcac8d33748a051631c Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 6 Jun 2024 00:57:38 -0700
Subject: [PATCH 1/6] Compress ReorderBuffer spill files using LZ4

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.

This behavior happens only when the subscriber's option "streaming"
is set to "off", which is the default value.

In this case, large transactions decoding by multiple replication
slots can lead to disk space saturation and high I/O utilization.

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk is now compressed and wrapped in
a new structure named ReorderBufferDiskHeader.

3 different compression strategies are currently implemented:

1. LZ4 streaming compression is the preferred one and works
   efficiently for small individual changes.
2. LZ4 regular compression when the changes are too large for using
   LZ4 streaming API.
3. No compression.
---
 src/backend/replication/logical/Makefile      |   1 +
 src/backend/replication/logical/meson.build   |   1 +
 .../replication/logical/reorderbuffer.c       | 196 ++++---
 .../logical/reorderbuffer_compression.c       | 502 ++++++++++++++++++
 src/include/replication/reorderbuffer.h       |   6 +
 .../replication/reorderbuffer_compression.h   |  95 ++++
 6 files changed, 718 insertions(+), 83 deletions(-)
 create mode 100644 src/backend/replication/logical/reorderbuffer_compression.c
 create mode 100644 src/include/replication/reorderbuffer_compression.h

diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index ba03eeff1c..88bf698a53 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -25,6 +25,7 @@ OBJS = \
 	proto.o \
 	relation.o \
 	reorderbuffer.o \
+	reorderbuffer_compression.o \
 	slotsync.o \
 	snapbuild.o \
 	tablesync.o \
diff --git a/src/backend/replication/logical/meson.build b/src/backend/replication/logical/meson.build
index 3dec36a6de..f0dd82bae2 100644
--- a/src/backend/replication/logical/meson.build
+++ b/src/backend/replication/logical/meson.build
@@ -11,6 +11,7 @@ backend_sources += files(
   'proto.c',
   'relation.c',
   'reorderbuffer.c',
+  'reorderbuffer_compression.c',
   'slotsync.c',
   'snapbuild.c',
   'tablesync.c',
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 00a8327e77..2519f799bc 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -102,6 +102,7 @@
 #include "pgstat.h"
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slot.h"
 #include "replication/snapbuild.h"	/* just for SnapBuildSnapDecRefcount */
 #include "storage/bufmgr.h"
@@ -112,6 +113,8 @@
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
 
+int			logical_decoding_spill_compression = REORDER_BUFFER_NO_COMPRESSION;
+
 /* entry for a hash table we use to map from xid to our transaction state */
 typedef struct ReorderBufferTXNByIdEnt
 {
@@ -173,14 +176,6 @@ typedef struct ReorderBufferToastEnt
 									 * main tup */
 } ReorderBufferToastEnt;
 
-/* Disk serialization support datastructures */
-typedef struct ReorderBufferDiskChange
-{
-	Size		size;
-	ReorderBufferChange change;
-	/* data follows */
-} ReorderBufferDiskChange;
-
 #define IsSpecInsert(action) \
 ( \
 	((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT) \
@@ -255,6 +250,8 @@ static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *tx
 										 int fd, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										TXNEntryFile *file, XLogSegNo *segno);
+static bool ReorderBufferReadOndiskChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+								   TXNEntryFile *file, XLogSegNo *segno);
 static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 									   char *data);
 static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
@@ -427,6 +424,8 @@ ReorderBufferGetTXN(ReorderBuffer *rb)
 	/* InvalidCommandId is not zero, so set it explicitly */
 	txn->command_id = InvalidCommandId;
 	txn->output_plugin_private = NULL;
+	txn->compressor_state = ReorderBufferNewCompressorState(rb->context,
+															logical_decoding_spill_compression);
 
 	return txn;
 }
@@ -464,6 +463,10 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	ReorderBufferFreeCompressorState(rb->context,
+									 logical_decoding_spill_compression,
+									 txn->compressor_state);
+
 	/* Reset the toast hash */
 	ReorderBufferToastReset(rb, txn);
 
@@ -3776,13 +3779,13 @@ static void
 ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 							 int fd, ReorderBufferChange *change)
 {
-	ReorderBufferDiskChange *ondisk;
-	Size		sz = sizeof(ReorderBufferDiskChange);
+	ReorderBufferDiskHeader *disk_hdr;
+	Size		sz = sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 	ReorderBufferSerializeReserve(rb, sz);
 
-	ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-	memcpy(&ondisk->change, change, sizeof(ReorderBufferChange));
+	disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+	memcpy((char *)rb->outbuf + sizeof(ReorderBufferDiskHeader), change, sizeof(ReorderBufferChange));
 
 	switch (change->action)
 	{
@@ -3818,9 +3821,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				if (oldlen)
 				{
@@ -3850,10 +3853,10 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					sizeof(Size) + sizeof(Size);
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				/* write the prefix including the size */
 				memcpy(data, &prefix_size, sizeof(Size));
@@ -3880,10 +3883,10 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				sz += inval_size;
 
 				ReorderBufferSerializeReserve(rb, sz);
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 				memcpy(data, change->data.inval.invalidations, inval_size);
 				data += inval_size;
 
@@ -3902,9 +3905,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				memcpy(data, snap, sizeof(SnapshotData));
 				data += sizeof(SnapshotData);
@@ -3936,9 +3939,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				memcpy(data, change->data.truncate.relids, size);
 				data += size;
@@ -3953,11 +3956,14 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			break;
 	}
 
-	ondisk->size = sz;
+	/* Inplace ReorderBuffer content compression before writing it on disk */
+	ReorderBufferCompress(rb, &disk_hdr, logical_decoding_spill_compression,
+						  sz, txn->compressor_state);
 
 	errno = 0;
 	pgstat_report_wait_start(WAIT_EVENT_REORDER_BUFFER_WRITE);
-	if (write(fd, rb->outbuf, ondisk->size) != ondisk->size)
+
+	if (write(fd, rb->outbuf, disk_hdr->size) != disk_hdr->size)
 	{
 		int			save_errno = errno;
 
@@ -3982,8 +3988,6 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	 */
 	if (txn->final_lsn < change->lsn)
 		txn->final_lsn = change->lsn;
-
-	Assert(ondisk->change.action == change->action);
 }
 
 /* Returns true, if the output plugin supports streaming, false, otherwise. */
@@ -4252,9 +4256,6 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 	while (restored < max_changes_in_memory && *segno <= last_segno)
 	{
-		int			readBytes;
-		ReorderBufferDiskChange *ondisk;
-
 		CHECK_FOR_INTERRUPTS();
 
 		if (*fd == -1)
@@ -4293,60 +4294,15 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		}
 
 		/*
-		 * Read the statically sized part of a change which has information
-		 * about the total size. If we couldn't read a record, we're at the
-		 * end of this file.
+		 * Read the full change from disk.
+		 * If ReorderBufferReadOndiskChange returns false, then we are at the
+		 * eof, so, move the next segment.
 		 */
-		ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
-		readBytes = FileRead(file->vfd, rb->outbuf,
-							 sizeof(ReorderBufferDiskChange),
-							 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
-
-		/* eof */
-		if (readBytes == 0)
+		if (!ReorderBufferReadOndiskChange(rb, txn, file, segno))
 		{
-			FileClose(*fd);
 			*fd = -1;
-			(*segno)++;
 			continue;
 		}
-		else if (readBytes < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: %m")));
-		else if (readBytes != sizeof(ReorderBufferDiskChange))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
-							readBytes,
-							(uint32) sizeof(ReorderBufferDiskChange))));
-
-		file->curOffset += readBytes;
-
-		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-
-		ReorderBufferSerializeReserve(rb,
-									  sizeof(ReorderBufferDiskChange) + ondisk->size);
-		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-
-		readBytes = FileRead(file->vfd,
-							 rb->outbuf + sizeof(ReorderBufferDiskChange),
-							 ondisk->size - sizeof(ReorderBufferDiskChange),
-							 file->curOffset,
-							 WAIT_EVENT_REORDER_BUFFER_READ);
-
-		if (readBytes < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: %m")));
-		else if (readBytes != ondisk->size - sizeof(ReorderBufferDiskChange))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
-							readBytes,
-							(uint32) (ondisk->size - sizeof(ReorderBufferDiskChange)))));
-
-		file->curOffset += readBytes;
 
 		/*
 		 * ok, read a full change from disk, now restore it into proper
@@ -4359,6 +4315,83 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	return restored;
 }
 
+/*
+ * Read a change spilled to disk and decompress it if compressed.
+ */
+static bool
+ReorderBufferReadOndiskChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+							  TXNEntryFile *file, XLogSegNo *segno)
+{
+	int			readBytes;
+	ReorderBufferDiskHeader *disk_hdr;
+	char	   *header;			/* disk header buffer*/
+	char	   *data;			/* data buffer */
+
+	/*
+	 * Read the statically sized part of a change which has information about
+	 * the total size and compression method. If we couldn't read a record,
+	 * we're at the end of this file.
+	 */
+	header = (char *) palloc0(sizeof(ReorderBufferDiskHeader));
+	readBytes = FileRead(file->vfd, header,
+						 sizeof(ReorderBufferDiskHeader),
+						 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
+
+	/* eof */
+	if (readBytes == 0)
+	{
+
+		FileClose(file->vfd);
+		(*segno)++;
+		pfree(header);
+
+		return false;
+	}
+	else if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != sizeof(ReorderBufferDiskHeader))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) sizeof(ReorderBufferDiskHeader))));
+
+	file->curOffset += readBytes;
+
+	disk_hdr = (ReorderBufferDiskHeader *) header;
+
+	/* Read ondisk data */
+	data = (char *) palloc0(disk_hdr->size - sizeof(ReorderBufferDiskHeader));
+	readBytes = FileRead(file->vfd,
+						 data,
+						 disk_hdr->size - sizeof(ReorderBufferDiskHeader),
+						 file->curOffset,
+						 WAIT_EVENT_REORDER_BUFFER_READ);
+
+	if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != (disk_hdr->size - sizeof(ReorderBufferDiskHeader)))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) (disk_hdr->size - sizeof(ReorderBufferDiskHeader)))));
+
+	/* Decompress data */
+	ReorderBufferDecompress(rb, data, disk_hdr, txn->compressor_state);
+
+	pfree(data);
+	pfree(header);
+
+	file->curOffset += readBytes;
+
+	return true;
+}
+
 /*
  * Convert change from its on-disk format to in-memory format and queue it onto
  * the TXN's ->changes list.
@@ -4371,17 +4404,14 @@ static void
 ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 						   char *data)
 {
-	ReorderBufferDiskChange *ondisk;
 	ReorderBufferChange *change;
 
-	ondisk = (ReorderBufferDiskChange *) data;
-
 	change = ReorderBufferGetChange(rb);
 
 	/* copy static part */
-	memcpy(change, &ondisk->change, sizeof(ReorderBufferChange));
+	memcpy(change, data + sizeof(ReorderBufferDiskHeader), sizeof(ReorderBufferChange));
 
-	data += sizeof(ReorderBufferDiskChange);
+	data += sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 	/* restore individual stuff */
 	switch (change->action)
diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
new file mode 100644
index 0000000000..77f5c76929
--- /dev/null
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -0,0 +1,502 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer_compression.c
+ *	  Functions for ReorderBuffer compression.
+ *
+ * Copyright (c) 2024-2024, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/common/reorderbuffer_compression.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#include "replication/reorderbuffer_compression.h"
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
+
+/*
+ * Allocate a new LZ4StreamingCompressorState.
+ */
+static void *
+lz4_NewCompressorState(MemoryContext context)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+	return NULL;				/* keep compiler quiet */
+#else
+	LZ4StreamingCompressorState *cstate;
+
+	cstate = (LZ4StreamingCompressorState *)
+		MemoryContextAlloc(context, sizeof(LZ4StreamingCompressorState));
+
+	/*
+	 * We do not allocate LZ4 ring buffers and streaming handlers at this
+	 * point because we have no guarantee that we will need them later. Let's
+	 * allocate only when we are about to use them.
+	 */
+	cstate->lz4_in_buf = NULL;
+	cstate->lz4_out_buf = NULL;
+	cstate->lz4_in_buf_offset = 0;
+	cstate->lz4_out_buf_offset = 0;
+	cstate->lz4_stream = NULL;
+	cstate->lz4_stream_decode = NULL;
+
+	return (void *) cstate;
+#endif
+}
+
+/*
+ * Free LZ4 memory resources and the compressor state.
+ */
+static void
+lz4_FreeCompressorState(MemoryContext context, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext;
+
+	if (compressor_state == NULL)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	if (cstate->lz4_in_buf != NULL)
+	{
+		pfree(cstate->lz4_in_buf);
+		LZ4_freeStream(cstate->lz4_stream);
+	}
+	if (cstate->lz4_out_buf != NULL)
+	{
+		pfree(cstate->lz4_out_buf);
+		LZ4_freeStreamDecode(cstate->lz4_stream_decode);
+	}
+
+	pfree(compressor_state);
+
+	MemoryContextSwitchTo(oldcontext);
+#endif
+}
+
+#ifdef USE_LZ4
+/*
+ * Allocate LZ4 input ring buffer and create the streaming compression handler.
+ */
+static void
+lz4_CreateStreamCompressorState(MemoryContext context, void *compressor_state)
+{
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+	cstate->lz4_in_buf = (char *) palloc0(LZ4_RING_BUFFER_SIZE);
+	cstate->lz4_stream = LZ4_createStream();
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+#ifdef USE_LZ4
+/*
+ * Allocate LZ4 output ring buffer and create the streaming decompression
+ * handler.
+ */
+static void
+lz4_CreateStreamDecodeCompressorState(MemoryContext context,
+									  void *compressor_state)
+{
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+	cstate->lz4_out_buf = (char *) palloc0(LZ4_RING_BUFFER_SIZE);
+	cstate->lz4_stream_decode = LZ4_createStreamDecode();
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+/*
+ * Data compression using LZ4 streaming API.
+ * Caller must ensure that the source data can fit in LZ4 input ring buffer,
+ * this checking must be done by lz4_CanDoStreamingCompression().
+ */
+static void
+lz4_StreamingCompressData(MemoryContext context, char *src, Size src_size,
+						  char **dst, Size *dst_size, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	int			lz4_cmp_size = 0;	/* compressed size */
+	char	   *buf;				/* buffer used for compression */
+	Size		buf_size;			/* buffer size */
+	char	   *lz4_in_bufPtr;		/* input ring buffer pointer */
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	/* Allocate LZ4 input ring buffer and streaming compression handler */
+	if (cstate->lz4_in_buf == NULL)
+		lz4_CreateStreamCompressorState(context, compressor_state);
+
+	/* Ring buffer offset wraparound */
+	if ((cstate->lz4_in_buf_offset + src_size) > LZ4_RING_BUFFER_SIZE)
+		cstate->lz4_in_buf_offset = 0;
+
+	/* Get the pointer of the next entry in the ring buffer */
+	lz4_in_bufPtr = cstate->lz4_in_buf + cstate->lz4_in_buf_offset;
+
+	/* Copy data that should be compressed into LZ4 input ring buffer */
+	memcpy(lz4_in_bufPtr, src, src_size);
+
+	/* Allocate space for compressed data */
+	buf_size = LZ4_COMPRESSBOUND(src_size);
+	buf = (char *) palloc0(buf_size);
+
+	/* Use LZ4 streaming compression API */
+	lz4_cmp_size = LZ4_compress_fast_continue(cstate->lz4_stream,
+											  lz4_in_bufPtr, buf, src_size,
+											  buf_size, 1);
+
+	if (lz4_cmp_size <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("LZ4 compression failed")));
+
+	/* Move the input ring buffer offset */
+	cstate->lz4_in_buf_offset += src_size;
+
+	*dst_size = lz4_cmp_size;
+	*dst = buf;
+#endif
+}
+
+/*
+ * Data compression using LZ4 API.
+ */
+static void
+lz4_CompressData(char *src, Size src_size, char **dst,  Size *dst_size)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	int			lz4_cmp_size = 0;	/* compressed size */
+	char	   *buf;				/* buffer used for compression */
+	Size		buf_size;			/* buffer size */
+
+	buf_size = LZ4_COMPRESSBOUND(src_size);
+	buf = (char *) palloc0(buf_size);
+
+	/* Use LZ4 regular compression API */
+	lz4_cmp_size = LZ4_compress_default(src, buf, src_size, buf_size);
+
+	if (lz4_cmp_size <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("LZ4 compression failed")));
+
+	*dst_size = lz4_cmp_size;
+	*dst = buf;
+#endif
+}
+
+/*
+ * Data decompression using LZ4 streaming API.
+ * LZ4 decompression uses the output ring buffer to store decompressed data,
+ * thus, we don't need to create a new buffer. We return the pointer to data
+ * location.
+ */
+static void
+lz4_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
+							char **dst, Size dst_size, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	char	   *lz4_out_bufPtr;		/* output ring buffer pointer */
+	int			lz4_dec_size;		/* decompressed data size */
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	/* Allocate LZ4 output ring buffer and streaming decompression handler */
+	if (cstate->lz4_out_buf == NULL)
+		lz4_CreateStreamDecodeCompressorState(context, compressor_state);
+
+	/* Ring buffer offset wraparound */
+	if ((cstate->lz4_out_buf_offset + dst_size) > LZ4_RING_BUFFER_SIZE)
+		cstate->lz4_out_buf_offset = 0;
+
+	/* Get current entry pointer in the ring buffer */
+	lz4_out_bufPtr = cstate->lz4_out_buf + cstate->lz4_out_buf_offset;
+
+	lz4_dec_size = LZ4_decompress_safe_continue(cstate->lz4_stream_decode,
+												src,
+												lz4_out_bufPtr,
+												src_size,
+												dst_size);
+
+	Assert(lz4_dec_size == dst_size);
+
+	if (lz4_dec_size < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("compressed LZ4 data is corrupted")));
+	else if (lz4_dec_size != dst_size)
+		ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			 errmsg_internal("decompressed LZ4 data size differs from original size")));
+
+	/* Move the output ring buffer offset */
+	cstate->lz4_out_buf_offset += lz4_dec_size;
+
+	/* Point to the decompressed data location */
+	*dst = lz4_out_bufPtr;
+#endif
+}
+
+/*
+ * Data decompression using LZ4 API.
+ */
+static void
+lz4_DecompressData(char *src, Size src_size, char **dst, Size dst_size)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	int			lz4_dec_bytes;
+	char	   *buf;
+
+	buf = (char *) palloc0(dst_size);
+
+	lz4_dec_bytes = LZ4_decompress_safe(src, buf, src_size, dst_size);
+
+	Assert(lz4_dec_bytes == dst_size);
+
+	if (lz4_dec_bytes < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("compressed LZ4 data is corrupted")));
+	else if (lz4_dec_bytes != dst_size)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("decompressed LZ4 data size differs from original size")));
+
+	*dst = buf;
+#endif
+}
+
+/*
+ * Allocate a new Compressor State, depending on the compression method.
+ */
+void *
+ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
+{
+	switch (compression_method)
+	{
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+			return lz4_NewCompressorState(context);
+			break;
+		case REORDER_BUFFER_NO_COMPRESSION:
+		default:
+			return NULL;
+			break;
+	}
+}
+
+/*
+ * Free memory allocated to a Compressor State, depending on the compression
+ * method.
+ */
+void
+ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
+								 void *compressor_state)
+{
+	switch (compression_method)
+	{
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+			return lz4_FreeCompressorState(context, compressor_state);
+			break;
+		case REORDER_BUFFER_NO_COMPRESSION:
+		default:
+			break;
+	}
+}
+
+/*
+ * Ensure the IO buffer is >= sz.
+ */
+static void
+ReorderBufferReserve(ReorderBuffer *rb, Size sz)
+{
+	if (rb->outbufsize < sz)
+	{
+		rb->outbuf = repalloc(rb->outbuf, sz);
+		rb->outbufsize = sz;
+	}
+}
+
+/*
+ * Compress ReorderBuffer content. This function is called in order to compress
+ * data before spilling on disk.
+ */
+void
+ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
+					  int compression_method, Size data_size,
+					  void *compressor_state)
+{
+	ReorderBufferDiskHeader *hdr = *header;
+
+	switch (compression_method)
+	{
+		/* No compression */
+		case REORDER_BUFFER_NO_COMPRESSION:
+		{
+			hdr->comp_strat = REORDER_BUFFER_STRAT_UNCOMPRESSED;
+			hdr->size = data_size;
+			hdr->raw_size = data_size - sizeof(ReorderBufferDiskHeader);
+
+			break;
+		}
+		/* LZ4 Compression */
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+		{
+			char	   *dst = NULL;
+			Size		dst_size = 0;
+			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+			Size		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+			ReorderBufferCompressionStrategy strat;
+
+			if (lz4_CanDoStreamingCompression(src_size))
+			{
+				/* Use LZ4 streaming compression if possible */
+				lz4_StreamingCompressData(rb->context, src, src_size, &dst,
+										  &dst_size, compressor_state);
+				strat = REORDER_BUFFER_STRAT_LZ4_STREAMING;
+			}
+			else
+			{
+				/* Fallback to LZ4 regular compression */
+				lz4_CompressData(src, src_size, &dst, &dst_size);
+				strat = REORDER_BUFFER_STRAT_LZ4_REGULAR;
+			}
+
+			/*
+			 * Make sure the ReorderBuffer has enough space to store compressed
+			 * data. Compressed data must be smaller than raw data, so, the
+			 * ReorderBuffer should already have room for compressed data, but
+			 * we do this to avoid buffer overflow risks.
+			 */
+			ReorderBufferReserve(rb, (dst_size + sizeof(ReorderBufferDiskHeader)));
+
+			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+			hdr->comp_strat = strat;
+			hdr->size = dst_size + sizeof(ReorderBufferDiskHeader);
+			hdr->raw_size = src_size;
+
+			/*
+			 * Update header: hdr pointer has potentially changed due to
+			 * ReorderBufferReserve()
+			 */
+			*header = hdr;
+
+			/* Copy back compressed data into the ReorderBuffer */
+			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+				   dst_size);
+
+			pfree(dst);
+
+			break;
+		}
+	}
+}
+
+/*
+ * Decompress data read from disk and copy it into the ReorderBuffer.
+ */
+void
+ReorderBufferDecompress(ReorderBuffer *rb, char *data,
+						ReorderBufferDiskHeader *header, void *compressor_state)
+{
+	Size		raw_outbufsize = header->raw_size + sizeof(ReorderBufferDiskHeader);
+	/*
+	 * Make sure the output reorder buffer has enough space to store
+	 * decompressed/raw data.
+	 */
+	if (rb->outbufsize < raw_outbufsize)
+	{
+		rb->outbuf = repalloc(rb->outbuf, raw_outbufsize);
+		rb->outbufsize = raw_outbufsize;
+	}
+
+	/* Make a copy of the header read on disk into the ReorderBuffer */
+	memcpy(rb->outbuf, (char *) header, sizeof(ReorderBufferDiskHeader));
+
+	switch (header->comp_strat)
+	{
+		/* No decompression */
+		case REORDER_BUFFER_STRAT_UNCOMPRESSED:
+			{
+				/*
+				 * Make a copy of what was read on disk into the reorder
+				 * buffer.
+				 */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   data, header->raw_size);
+				break;
+			}
+		/* LZ4 regular decompression */
+		case REORDER_BUFFER_STRAT_LZ4_REGULAR:
+			{
+				char	   *buf;
+				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
+				Size		buf_size = header->raw_size;
+
+				lz4_DecompressData(data, src_size, &buf, buf_size);
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   buf, buf_size);
+
+				pfree(buf);
+				break;
+			}
+		/* LZ4 streaming decompression */
+		case REORDER_BUFFER_STRAT_LZ4_STREAMING:
+			{
+				char	   *buf;
+				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
+				Size		buf_size = header->raw_size;
+
+				lz4_StreamingDecompressData(rb->context, data, src_size, &buf,
+										   buf_size, compressor_state);
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   buf, buf_size);
+				/*
+				 * Not necessary to free buf in this case: it points to the
+				 * decompressed data stored in LZ4 output ring buffer.
+				 */
+				break;
+			}
+		default:
+			/* Other compression methods not yet supported */
+			break;
+	}
+}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 851a001c8b..bf979e0b14 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -9,6 +9,10 @@
 #ifndef REORDERBUFFER_H
 #define REORDERBUFFER_H
 
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
 #include "access/htup_details.h"
 #include "lib/ilist.h"
 #include "lib/pairingheap.h"
@@ -422,6 +426,8 @@ typedef struct ReorderBufferTXN
 	 * Private data pointer of the output plugin.
 	 */
 	void	   *output_plugin_private;
+
+	void	   *compressor_state;
 } ReorderBufferTXN;
 
 /* so we can define the callbacks used inside struct ReorderBuffer itself */
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
new file mode 100644
index 0000000000..9aa8aea56f
--- /dev/null
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -0,0 +1,95 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer_compression.h
+ *	  Functions for ReorderBuffer compression.
+ *
+ * Copyright (c) 2024-2024, PostgreSQL Global Development Group
+ *
+ * src/include/access/reorderbuffer_compression.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef REORDERBUFFER_COMPRESSION_H
+#define REORDERBUFFER_COMPRESSION_H
+
+#include "replication/reorderbuffer.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+/* ReorderBuffer on disk compression algorithms */
+typedef enum ReorderBufferCompressionMethod
+{
+	REORDER_BUFFER_NO_COMPRESSION,
+	REORDER_BUFFER_LZ4_COMPRESSION,
+} ReorderBufferCompressionMethod;
+
+/*
+ * Compression strategy applied to ReorderBuffer records spilled on disk
+ */
+typedef enum ReorderBufferCompressionStrategy
+{
+	REORDER_BUFFER_STRAT_UNCOMPRESSED,
+	REORDER_BUFFER_STRAT_LZ4_STREAMING,
+	REORDER_BUFFER_STRAT_LZ4_REGULAR,
+} ReorderBufferCompressionStrategy;
+
+/* Disk serialization support datastructures */
+typedef struct ReorderBufferDiskHeader
+{
+	ReorderBufferCompressionStrategy comp_strat; /* Compression strategy */
+	Size		size;					/* Ondisk size */
+	Size		raw_size;				/* Raw/uncompressed data size */
+	/* ReorderBufferChange + data follows */
+} ReorderBufferDiskHeader;
+
+#ifdef USE_LZ4
+/*
+ * We use a fairly small LZ4 ring buffer size (64kB). Using a larger buffer
+ * size provide better compression ratio, but as long as we have to allocate
+ * two LZ4 ring buffers per ReorderBufferTXN, we should keep it small.
+ */
+#define LZ4_RING_BUFFER_SIZE (64 * 1024)
+
+/*
+ * Use LZ4 streaming compression iff we can keep at least 2 uncompressed
+ * records into the LZ4 input ring buffer. If raw data size is too large, let's
+ * use regular LZ4 compression.
+ */
+#define lz4_CanDoStreamingCompression(s) (s < (LZ4_RING_BUFFER_SIZE / 2))
+
+/*
+ * LZ4 streaming compression/decompression handlers and ring
+ * buffers.
+ */
+typedef struct LZ4StreamingCompressorState {
+	/* Streaming compression handler */
+	LZ4_stream_t *lz4_stream;
+	/* Streaming decompression handler */
+	LZ4_streamDecode_t *lz4_stream_decode;
+	/* LZ4 in/out ring buffers used for streaming compression */
+	char	   *lz4_in_buf;
+	int			lz4_in_buf_offset;
+	char	   *lz4_out_buf;
+	int			lz4_out_buf_offset;
+} LZ4StreamingCompressorState;
+#else
+#define lz4_CanDoStreamingCompression(s) (false)
+#endif
+
+extern void *ReorderBufferNewCompressorState(MemoryContext context,
+											 int compression_method);
+extern void ReorderBufferFreeCompressorState(MemoryContext context,
+											 int compression_method,
+											 void *compressor_state);
+extern void ReorderBufferCompress(ReorderBuffer *rb,
+								  ReorderBufferDiskHeader **header,
+								  int compression_method, Size data_size,
+								  void *compressor_state);
+extern void ReorderBufferDecompress(ReorderBuffer *rb, char *data,
+									ReorderBufferDiskHeader *header,
+									void *compressor_state);
+
+#endif							/* REORDERBUFFER_COMPRESSION_H */
-- 
2.43.0

v3-0006-Add-ReorderBuffer-ondisk-compression-tests.patchapplication/octet-stream; name=v3-0006-Add-ReorderBuffer-ondisk-compression-tests.patchDownload
From a12ff0b1debf76823c4c68cbc9a5570ceb9c28dd Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 18 Jul 2024 07:51:29 -0700
Subject: [PATCH 6/6] Add ReorderBuffer ondisk compression tests

---
 src/test/subscription/Makefile                |  2 +
 src/test/subscription/meson.build             |  7 +-
 .../t/034_reorderbuffer_compression.pl        | 99 +++++++++++++++++++
 3 files changed, 107 insertions(+), 1 deletion(-)
 create mode 100644 src/test/subscription/t/034_reorderbuffer_compression.pl

diff --git a/src/test/subscription/Makefile b/src/test/subscription/Makefile
index ce1ca43009..9341f1493c 100644
--- a/src/test/subscription/Makefile
+++ b/src/test/subscription/Makefile
@@ -16,6 +16,8 @@ include $(top_builddir)/src/Makefile.global
 EXTRA_INSTALL = contrib/hstore
 
 export with_icu
+export with_lz4
+export with_zstd
 
 check:
 	$(prove_check)
diff --git a/src/test/subscription/meson.build b/src/test/subscription/meson.build
index c591cd7d61..772eeb817f 100644
--- a/src/test/subscription/meson.build
+++ b/src/test/subscription/meson.build
@@ -5,7 +5,11 @@ tests += {
   'sd': meson.current_source_dir(),
   'bd': meson.current_build_dir(),
   'tap': {
-    'env': {'with_icu': icu.found() ? 'yes' : 'no'},
+    'env': {
+      'with_icu': icu.found() ? 'yes' : 'no',
+      'with_lz4': lz4.found() ? 'yes' : 'no',
+      'with_zstd': zstd.found() ? 'yes' : 'no',
+    },
     'tests': [
       't/001_rep_changes.pl',
       't/002_types.pl',
@@ -40,6 +44,7 @@ tests += {
       't/031_column_list.pl',
       't/032_subscribe_use_index.pl',
       't/033_run_as_table_owner.pl',
+      't/034_reorderbuffer_compression.pl',
       't/100_bugs.pl',
     ],
   },
diff --git a/src/test/subscription/t/034_reorderbuffer_compression.pl b/src/test/subscription/t/034_reorderbuffer_compression.pl
new file mode 100644
index 0000000000..65c9be14a2
--- /dev/null
+++ b/src/test/subscription/t/034_reorderbuffer_compression.pl
@@ -0,0 +1,99 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test ReorderBuffer compression
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub test_reorderbuffer_compression
+{
+	my ($node_publisher, $node_subscriber, $appname, $compression) = @_;
+
+	# Set subscriber's spill_compression option
+	$node_subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION tap_sub SET (spill_compression = $compression)");
+
+	# Make sure the table is empty
+	$node_publisher->safe_psql('postgres', 'TRUNCATE test_tab');
+
+	# Reset replication slot stats
+	$node_publisher->safe_psql('postgres',
+		"SELECT pg_stat_reset_replication_slot('tap_sub')");
+
+	# Insert 1 million rows in the table
+	$node_publisher->safe_psql('postgres',
+		"INSERT INTO test_tab SELECT i, 'Message number #'||i::TEXT FROM generate_series(1, 1000000) as i"
+	);
+
+	$node_publisher->wait_for_catchup($appname);
+
+	# Check if table content is replicated
+	my $result =
+	  $node_subscriber->safe_psql('postgres',
+		"SELECT count(*) FROM test_tab");
+	is($result, qq(1000000), 'check data was copied to subscriber');
+
+	# Check if the transaction was spilled on disk
+	my $res_stats =
+	  $node_publisher->safe_psql('postgres',
+		"SELECT spill_txns FROM pg_catalog.pg_stat_get_replication_slot('tap_sub');");
+	is($res_stats, qq(1), 'check if the transaction was spilled on disk');
+}
+
+# Create publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf',
+	'logical_decoding_work_mem = 64');
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$node_subscriber->init;
+$node_subscriber->start;
+
+# Setup structure on publisher
+$node_publisher->safe_psql('postgres',
+	"CREATE TABLE test_tab (a int primary key, b text)");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres',
+	"CREATE TABLE test_tab (a int primary key, b text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname = 'tap_sub';
+
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = off)"
+);
+
+test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+	'off');
+test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+	'pglz');
+
+SKIP:
+{
+	skip "LZ4 not supported by this build", 2 if ($ENV{with_lz4} ne 'yes');
+	test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+		'lz4');
+}
+
+SKIP:
+{
+	skip "ZSTD not supported by this build", 2 if ($ENV{with_zstd} ne 'yes');
+	test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+		'zstd');
+}
+
+$node_subscriber->stop;
+$node_publisher->stop;
+
+done_testing();
-- 
2.43.0

#24Tomas Vondra
tomas@vondra.me
In reply to: Julien Tachoires (#23)
11 attachment(s)
Re: Compress ReorderBuffer spill files using LZ4

Hi Julien,

Thanks for the last patch version and sorry for the delay. Here's a
quick review, I plan to do a bit of additional testing, hopefully
sometime this week.

Attached is v4, which is your v3 rebased to current master, with a
couple "review" commits, adding comments to relevant places, which I
find easier to follow. There's also a "pgindent" commit at the end,
showing some formatting issues, adding structs to typedefs, etc. This
needs to be applied to the earlier patches, I didn't want to have too
many pgindent commits.

v4-0002-review.patch
--------------------

1) I don't think we need to rename ReorderBufferDiskChange to
ReorderBufferDiskHeader. It seems we could easily just add a field to
ReorderBufferDiskChange, and keep using that, no? It's still just a
wrapper for ReorderBufferChange, even if it's compressed.

2) Likewise, I don't see the point in moving ReorderBufferDiskChange to
a different file. It seems like something that should remain private to
reorderbuffer.c. IMHO the correct thing would be to move the various
ReorderBuffer* funct from reorderbuffer_compress.c to reorderbuffer.c.
The compression is something the ReorderBuffer is responsible for, so
reorderbuffer.c is the right place for that.

3) That means the reorderbuffer_compress.c would have only the actual
compression code. But wouldn't it be better to have one file for each
compression algorithm, similar to src/fe_utils/astreamer_{pglz,lz4,...}?
Just a suggestion, though. Not sure.

4) logical_decoding_spill_compression moved a bit, next to the variables
for other logical_decoding GUCs. It's also missing the definition in a
header file, so I get a compiler warning, so add it to reorderbuffer.h.

5) Won't the code in ReorderBufferCompress() do palloc/pfree for each
piece of data we compress? That seems it might be pretty expensive, at
least for records that happen to be large (>8kB), because oversized
chunks are not cached in the context. IMHO this should use a single
buffer, similarly to what astreamer_lz4_compressor_content does.

6) I'm really confused by having "streaming" and "regular" compression.
I know lz4 supports different modes, but I'd expect only one of them
being really suitable for this, so why support both? But even more
importantly, ReorderBufferCompress() sets the strategy using

#define lz4_CanDoStreamingCompression(s) \
(s < (LZ4_RING_BUFFER_SIZE / 2))

But it does that with "s" being the "change size", and that can vary
wildly. Doesn't that means the strategy will change withing a single
file? Does that even make sense?

v4-0004-review.patch
--------------------

7) I'm not sure the "Fix spill_bytes counter" is actually a fix. Even
now (i.e. without compression) it tracks the size of transactions, not
the bytes written to disk exactly, because it doesn't include the bytes
for the size field, so maybe we should not change that ...

v4-0006-review.patch
--------------------

8) It seems a bit strange that we add lz4 first, and only later pglz.
IMHO it should be the other way around, as pglz is the default algorithm
we can rely to have, while lz4 etc. are optional.

One question I have is whether it might be better to compress stuff at a
lower layer - not in reorderbuffer.c, but for the whole file. But maybe
there are reasons why that would be difficult, I haven't tried that.

regards

--
Tomas Vondra

Attachments:

v4-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchtext/x-patch; charset=UTF-8; name=v4-0001-Compress-ReorderBuffer-spill-files-using-LZ4.patchDownload
From c677ca6271b3b433a65f089a19c371b568bf7a0b Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 6 Jun 2024 00:57:38 -0700
Subject: [PATCH v4 01/11] Compress ReorderBuffer spill files using LZ4

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.

This behavior happens only when the subscriber's option "streaming"
is set to "off", which is the default value.

In this case, large transactions decoding by multiple replication
slots can lead to disk space saturation and high I/O utilization.

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk is now compressed and wrapped in
a new structure named ReorderBufferDiskHeader.

3 different compression strategies are currently implemented:

1. LZ4 streaming compression is the preferred one and works
   efficiently for small individual changes.
2. LZ4 regular compression when the changes are too large for using
   LZ4 streaming API.
3. No compression.
---
 src/backend/replication/logical/Makefile      |   1 +
 src/backend/replication/logical/meson.build   |   1 +
 .../replication/logical/reorderbuffer.c       | 196 ++++---
 .../logical/reorderbuffer_compression.c       | 502 ++++++++++++++++++
 src/include/replication/reorderbuffer.h       |   6 +
 .../replication/reorderbuffer_compression.h   |  95 ++++
 6 files changed, 718 insertions(+), 83 deletions(-)
 create mode 100644 src/backend/replication/logical/reorderbuffer_compression.c
 create mode 100644 src/include/replication/reorderbuffer_compression.h

diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 1e08bbbd4eb..6f9b0c6f8a1 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -26,6 +26,7 @@ OBJS = \
 	proto.o \
 	relation.o \
 	reorderbuffer.o \
+	reorderbuffer_compression.o \
 	slotsync.o \
 	snapbuild.o \
 	tablesync.o \
diff --git a/src/backend/replication/logical/meson.build b/src/backend/replication/logical/meson.build
index 3d36249d8ad..3f8ff4d798f 100644
--- a/src/backend/replication/logical/meson.build
+++ b/src/backend/replication/logical/meson.build
@@ -12,6 +12,7 @@ backend_sources += files(
   'proto.c',
   'relation.c',
   'reorderbuffer.c',
+  'reorderbuffer_compression.c',
   'slotsync.c',
   'snapbuild.c',
   'tablesync.c',
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 22bcf171ff0..c36179d44b5 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -102,6 +102,7 @@
 #include "pgstat.h"
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slot.h"
 #include "replication/snapbuild.h"	/* just for SnapBuildSnapDecRefcount */
 #include "storage/bufmgr.h"
@@ -112,6 +113,8 @@
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
 
+int			logical_decoding_spill_compression = REORDER_BUFFER_NO_COMPRESSION;
+
 /* entry for a hash table we use to map from xid to our transaction state */
 typedef struct ReorderBufferTXNByIdEnt
 {
@@ -173,14 +176,6 @@ typedef struct ReorderBufferToastEnt
 									 * main tup */
 } ReorderBufferToastEnt;
 
-/* Disk serialization support datastructures */
-typedef struct ReorderBufferDiskChange
-{
-	Size		size;
-	ReorderBufferChange change;
-	/* data follows */
-} ReorderBufferDiskChange;
-
 #define IsSpecInsert(action) \
 ( \
 	((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT) \
@@ -255,6 +250,8 @@ static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *tx
 										 int fd, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										TXNEntryFile *file, XLogSegNo *segno);
+static bool ReorderBufferReadOndiskChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+								   TXNEntryFile *file, XLogSegNo *segno);
 static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 									   char *data);
 static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
@@ -427,6 +424,8 @@ ReorderBufferGetTXN(ReorderBuffer *rb)
 	/* InvalidCommandId is not zero, so set it explicitly */
 	txn->command_id = InvalidCommandId;
 	txn->output_plugin_private = NULL;
+	txn->compressor_state = ReorderBufferNewCompressorState(rb->context,
+															logical_decoding_spill_compression);
 
 	return txn;
 }
@@ -464,6 +463,10 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	ReorderBufferFreeCompressorState(rb->context,
+									 logical_decoding_spill_compression,
+									 txn->compressor_state);
+
 	/* Reset the toast hash */
 	ReorderBufferToastReset(rb, txn);
 
@@ -3800,13 +3803,13 @@ static void
 ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 							 int fd, ReorderBufferChange *change)
 {
-	ReorderBufferDiskChange *ondisk;
-	Size		sz = sizeof(ReorderBufferDiskChange);
+	ReorderBufferDiskHeader *disk_hdr;
+	Size		sz = sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 	ReorderBufferSerializeReserve(rb, sz);
 
-	ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-	memcpy(&ondisk->change, change, sizeof(ReorderBufferChange));
+	disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+	memcpy((char *)rb->outbuf + sizeof(ReorderBufferDiskHeader), change, sizeof(ReorderBufferChange));
 
 	switch (change->action)
 	{
@@ -3842,9 +3845,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				if (oldlen)
 				{
@@ -3874,10 +3877,10 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					sizeof(Size) + sizeof(Size);
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				/* write the prefix including the size */
 				memcpy(data, &prefix_size, sizeof(Size));
@@ -3904,10 +3907,10 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				sz += inval_size;
 
 				ReorderBufferSerializeReserve(rb, sz);
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 				memcpy(data, change->data.inval.invalidations, inval_size);
 				data += inval_size;
 
@@ -3926,9 +3929,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				memcpy(data, snap, sizeof(SnapshotData));
 				data += sizeof(SnapshotData);
@@ -3960,9 +3963,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
-				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
+				disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
 
 				memcpy(data, change->data.truncate.relids, size);
 				data += size;
@@ -3977,11 +3980,14 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			break;
 	}
 
-	ondisk->size = sz;
+	/* Inplace ReorderBuffer content compression before writing it on disk */
+	ReorderBufferCompress(rb, &disk_hdr, logical_decoding_spill_compression,
+						  sz, txn->compressor_state);
 
 	errno = 0;
 	pgstat_report_wait_start(WAIT_EVENT_REORDER_BUFFER_WRITE);
-	if (write(fd, rb->outbuf, ondisk->size) != ondisk->size)
+
+	if (write(fd, rb->outbuf, disk_hdr->size) != disk_hdr->size)
 	{
 		int			save_errno = errno;
 
@@ -4006,8 +4012,6 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	 */
 	if (txn->final_lsn < change->lsn)
 		txn->final_lsn = change->lsn;
-
-	Assert(ondisk->change.action == change->action);
 }
 
 /* Returns true, if the output plugin supports streaming, false, otherwise. */
@@ -4276,9 +4280,6 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 	while (restored < max_changes_in_memory && *segno <= last_segno)
 	{
-		int			readBytes;
-		ReorderBufferDiskChange *ondisk;
-
 		CHECK_FOR_INTERRUPTS();
 
 		if (*fd == -1)
@@ -4317,60 +4318,15 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		}
 
 		/*
-		 * Read the statically sized part of a change which has information
-		 * about the total size. If we couldn't read a record, we're at the
-		 * end of this file.
+		 * Read the full change from disk.
+		 * If ReorderBufferReadOndiskChange returns false, then we are at the
+		 * eof, so, move the next segment.
 		 */
-		ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
-		readBytes = FileRead(file->vfd, rb->outbuf,
-							 sizeof(ReorderBufferDiskChange),
-							 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
-
-		/* eof */
-		if (readBytes == 0)
+		if (!ReorderBufferReadOndiskChange(rb, txn, file, segno))
 		{
-			FileClose(*fd);
 			*fd = -1;
-			(*segno)++;
 			continue;
 		}
-		else if (readBytes < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: %m")));
-		else if (readBytes != sizeof(ReorderBufferDiskChange))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
-							readBytes,
-							(uint32) sizeof(ReorderBufferDiskChange))));
-
-		file->curOffset += readBytes;
-
-		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-
-		ReorderBufferSerializeReserve(rb,
-									  sizeof(ReorderBufferDiskChange) + ondisk->size);
-		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-
-		readBytes = FileRead(file->vfd,
-							 rb->outbuf + sizeof(ReorderBufferDiskChange),
-							 ondisk->size - sizeof(ReorderBufferDiskChange),
-							 file->curOffset,
-							 WAIT_EVENT_REORDER_BUFFER_READ);
-
-		if (readBytes < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: %m")));
-		else if (readBytes != ondisk->size - sizeof(ReorderBufferDiskChange))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
-							readBytes,
-							(uint32) (ondisk->size - sizeof(ReorderBufferDiskChange)))));
-
-		file->curOffset += readBytes;
 
 		/*
 		 * ok, read a full change from disk, now restore it into proper
@@ -4383,6 +4339,83 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	return restored;
 }
 
+/*
+ * Read a change spilled to disk and decompress it if compressed.
+ */
+static bool
+ReorderBufferReadOndiskChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+							  TXNEntryFile *file, XLogSegNo *segno)
+{
+	int			readBytes;
+	ReorderBufferDiskHeader *disk_hdr;
+	char	   *header;			/* disk header buffer*/
+	char	   *data;			/* data buffer */
+
+	/*
+	 * Read the statically sized part of a change which has information about
+	 * the total size and compression method. If we couldn't read a record,
+	 * we're at the end of this file.
+	 */
+	header = (char *) palloc0(sizeof(ReorderBufferDiskHeader));
+	readBytes = FileRead(file->vfd, header,
+						 sizeof(ReorderBufferDiskHeader),
+						 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
+
+	/* eof */
+	if (readBytes == 0)
+	{
+
+		FileClose(file->vfd);
+		(*segno)++;
+		pfree(header);
+
+		return false;
+	}
+	else if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != sizeof(ReorderBufferDiskHeader))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) sizeof(ReorderBufferDiskHeader))));
+
+	file->curOffset += readBytes;
+
+	disk_hdr = (ReorderBufferDiskHeader *) header;
+
+	/* Read ondisk data */
+	data = (char *) palloc0(disk_hdr->size - sizeof(ReorderBufferDiskHeader));
+	readBytes = FileRead(file->vfd,
+						 data,
+						 disk_hdr->size - sizeof(ReorderBufferDiskHeader),
+						 file->curOffset,
+						 WAIT_EVENT_REORDER_BUFFER_READ);
+
+	if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != (disk_hdr->size - sizeof(ReorderBufferDiskHeader)))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) (disk_hdr->size - sizeof(ReorderBufferDiskHeader)))));
+
+	/* Decompress data */
+	ReorderBufferDecompress(rb, data, disk_hdr, txn->compressor_state);
+
+	pfree(data);
+	pfree(header);
+
+	file->curOffset += readBytes;
+
+	return true;
+}
+
 /*
  * Convert change from its on-disk format to in-memory format and queue it onto
  * the TXN's ->changes list.
@@ -4395,17 +4428,14 @@ static void
 ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 						   char *data)
 {
-	ReorderBufferDiskChange *ondisk;
 	ReorderBufferChange *change;
 
-	ondisk = (ReorderBufferDiskChange *) data;
-
 	change = ReorderBufferGetChange(rb);
 
 	/* copy static part */
-	memcpy(change, &ondisk->change, sizeof(ReorderBufferChange));
+	memcpy(change, data + sizeof(ReorderBufferDiskHeader), sizeof(ReorderBufferChange));
 
-	data += sizeof(ReorderBufferDiskChange);
+	data += sizeof(ReorderBufferDiskHeader) + sizeof(ReorderBufferChange);
 
 	/* restore individual stuff */
 	switch (change->action)
diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
new file mode 100644
index 00000000000..77f5c76929b
--- /dev/null
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -0,0 +1,502 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer_compression.c
+ *	  Functions for ReorderBuffer compression.
+ *
+ * Copyright (c) 2024-2024, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/common/reorderbuffer_compression.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#include "replication/reorderbuffer_compression.h"
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
+
+/*
+ * Allocate a new LZ4StreamingCompressorState.
+ */
+static void *
+lz4_NewCompressorState(MemoryContext context)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+	return NULL;				/* keep compiler quiet */
+#else
+	LZ4StreamingCompressorState *cstate;
+
+	cstate = (LZ4StreamingCompressorState *)
+		MemoryContextAlloc(context, sizeof(LZ4StreamingCompressorState));
+
+	/*
+	 * We do not allocate LZ4 ring buffers and streaming handlers at this
+	 * point because we have no guarantee that we will need them later. Let's
+	 * allocate only when we are about to use them.
+	 */
+	cstate->lz4_in_buf = NULL;
+	cstate->lz4_out_buf = NULL;
+	cstate->lz4_in_buf_offset = 0;
+	cstate->lz4_out_buf_offset = 0;
+	cstate->lz4_stream = NULL;
+	cstate->lz4_stream_decode = NULL;
+
+	return (void *) cstate;
+#endif
+}
+
+/*
+ * Free LZ4 memory resources and the compressor state.
+ */
+static void
+lz4_FreeCompressorState(MemoryContext context, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext;
+
+	if (compressor_state == NULL)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	if (cstate->lz4_in_buf != NULL)
+	{
+		pfree(cstate->lz4_in_buf);
+		LZ4_freeStream(cstate->lz4_stream);
+	}
+	if (cstate->lz4_out_buf != NULL)
+	{
+		pfree(cstate->lz4_out_buf);
+		LZ4_freeStreamDecode(cstate->lz4_stream_decode);
+	}
+
+	pfree(compressor_state);
+
+	MemoryContextSwitchTo(oldcontext);
+#endif
+}
+
+#ifdef USE_LZ4
+/*
+ * Allocate LZ4 input ring buffer and create the streaming compression handler.
+ */
+static void
+lz4_CreateStreamCompressorState(MemoryContext context, void *compressor_state)
+{
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+	cstate->lz4_in_buf = (char *) palloc0(LZ4_RING_BUFFER_SIZE);
+	cstate->lz4_stream = LZ4_createStream();
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+#ifdef USE_LZ4
+/*
+ * Allocate LZ4 output ring buffer and create the streaming decompression
+ * handler.
+ */
+static void
+lz4_CreateStreamDecodeCompressorState(MemoryContext context,
+									  void *compressor_state)
+{
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+	cstate->lz4_out_buf = (char *) palloc0(LZ4_RING_BUFFER_SIZE);
+	cstate->lz4_stream_decode = LZ4_createStreamDecode();
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+/*
+ * Data compression using LZ4 streaming API.
+ * Caller must ensure that the source data can fit in LZ4 input ring buffer,
+ * this checking must be done by lz4_CanDoStreamingCompression().
+ */
+static void
+lz4_StreamingCompressData(MemoryContext context, char *src, Size src_size,
+						  char **dst, Size *dst_size, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	int			lz4_cmp_size = 0;	/* compressed size */
+	char	   *buf;				/* buffer used for compression */
+	Size		buf_size;			/* buffer size */
+	char	   *lz4_in_bufPtr;		/* input ring buffer pointer */
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	/* Allocate LZ4 input ring buffer and streaming compression handler */
+	if (cstate->lz4_in_buf == NULL)
+		lz4_CreateStreamCompressorState(context, compressor_state);
+
+	/* Ring buffer offset wraparound */
+	if ((cstate->lz4_in_buf_offset + src_size) > LZ4_RING_BUFFER_SIZE)
+		cstate->lz4_in_buf_offset = 0;
+
+	/* Get the pointer of the next entry in the ring buffer */
+	lz4_in_bufPtr = cstate->lz4_in_buf + cstate->lz4_in_buf_offset;
+
+	/* Copy data that should be compressed into LZ4 input ring buffer */
+	memcpy(lz4_in_bufPtr, src, src_size);
+
+	/* Allocate space for compressed data */
+	buf_size = LZ4_COMPRESSBOUND(src_size);
+	buf = (char *) palloc0(buf_size);
+
+	/* Use LZ4 streaming compression API */
+	lz4_cmp_size = LZ4_compress_fast_continue(cstate->lz4_stream,
+											  lz4_in_bufPtr, buf, src_size,
+											  buf_size, 1);
+
+	if (lz4_cmp_size <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("LZ4 compression failed")));
+
+	/* Move the input ring buffer offset */
+	cstate->lz4_in_buf_offset += src_size;
+
+	*dst_size = lz4_cmp_size;
+	*dst = buf;
+#endif
+}
+
+/*
+ * Data compression using LZ4 API.
+ */
+static void
+lz4_CompressData(char *src, Size src_size, char **dst,  Size *dst_size)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	int			lz4_cmp_size = 0;	/* compressed size */
+	char	   *buf;				/* buffer used for compression */
+	Size		buf_size;			/* buffer size */
+
+	buf_size = LZ4_COMPRESSBOUND(src_size);
+	buf = (char *) palloc0(buf_size);
+
+	/* Use LZ4 regular compression API */
+	lz4_cmp_size = LZ4_compress_default(src, buf, src_size, buf_size);
+
+	if (lz4_cmp_size <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("LZ4 compression failed")));
+
+	*dst_size = lz4_cmp_size;
+	*dst = buf;
+#endif
+}
+
+/*
+ * Data decompression using LZ4 streaming API.
+ * LZ4 decompression uses the output ring buffer to store decompressed data,
+ * thus, we don't need to create a new buffer. We return the pointer to data
+ * location.
+ */
+static void
+lz4_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
+							char **dst, Size dst_size, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	char	   *lz4_out_bufPtr;		/* output ring buffer pointer */
+	int			lz4_dec_size;		/* decompressed data size */
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	/* Allocate LZ4 output ring buffer and streaming decompression handler */
+	if (cstate->lz4_out_buf == NULL)
+		lz4_CreateStreamDecodeCompressorState(context, compressor_state);
+
+	/* Ring buffer offset wraparound */
+	if ((cstate->lz4_out_buf_offset + dst_size) > LZ4_RING_BUFFER_SIZE)
+		cstate->lz4_out_buf_offset = 0;
+
+	/* Get current entry pointer in the ring buffer */
+	lz4_out_bufPtr = cstate->lz4_out_buf + cstate->lz4_out_buf_offset;
+
+	lz4_dec_size = LZ4_decompress_safe_continue(cstate->lz4_stream_decode,
+												src,
+												lz4_out_bufPtr,
+												src_size,
+												dst_size);
+
+	Assert(lz4_dec_size == dst_size);
+
+	if (lz4_dec_size < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("compressed LZ4 data is corrupted")));
+	else if (lz4_dec_size != dst_size)
+		ereport(ERROR,
+			(errcode(ERRCODE_DATA_CORRUPTED),
+			 errmsg_internal("decompressed LZ4 data size differs from original size")));
+
+	/* Move the output ring buffer offset */
+	cstate->lz4_out_buf_offset += lz4_dec_size;
+
+	/* Point to the decompressed data location */
+	*dst = lz4_out_bufPtr;
+#endif
+}
+
+/*
+ * Data decompression using LZ4 API.
+ */
+static void
+lz4_DecompressData(char *src, Size src_size, char **dst, Size dst_size)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	int			lz4_dec_bytes;
+	char	   *buf;
+
+	buf = (char *) palloc0(dst_size);
+
+	lz4_dec_bytes = LZ4_decompress_safe(src, buf, src_size, dst_size);
+
+	Assert(lz4_dec_bytes == dst_size);
+
+	if (lz4_dec_bytes < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("compressed LZ4 data is corrupted")));
+	else if (lz4_dec_bytes != dst_size)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("decompressed LZ4 data size differs from original size")));
+
+	*dst = buf;
+#endif
+}
+
+/*
+ * Allocate a new Compressor State, depending on the compression method.
+ */
+void *
+ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
+{
+	switch (compression_method)
+	{
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+			return lz4_NewCompressorState(context);
+			break;
+		case REORDER_BUFFER_NO_COMPRESSION:
+		default:
+			return NULL;
+			break;
+	}
+}
+
+/*
+ * Free memory allocated to a Compressor State, depending on the compression
+ * method.
+ */
+void
+ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
+								 void *compressor_state)
+{
+	switch (compression_method)
+	{
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+			return lz4_FreeCompressorState(context, compressor_state);
+			break;
+		case REORDER_BUFFER_NO_COMPRESSION:
+		default:
+			break;
+	}
+}
+
+/*
+ * Ensure the IO buffer is >= sz.
+ */
+static void
+ReorderBufferReserve(ReorderBuffer *rb, Size sz)
+{
+	if (rb->outbufsize < sz)
+	{
+		rb->outbuf = repalloc(rb->outbuf, sz);
+		rb->outbufsize = sz;
+	}
+}
+
+/*
+ * Compress ReorderBuffer content. This function is called in order to compress
+ * data before spilling on disk.
+ */
+void
+ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
+					  int compression_method, Size data_size,
+					  void *compressor_state)
+{
+	ReorderBufferDiskHeader *hdr = *header;
+
+	switch (compression_method)
+	{
+		/* No compression */
+		case REORDER_BUFFER_NO_COMPRESSION:
+		{
+			hdr->comp_strat = REORDER_BUFFER_STRAT_UNCOMPRESSED;
+			hdr->size = data_size;
+			hdr->raw_size = data_size - sizeof(ReorderBufferDiskHeader);
+
+			break;
+		}
+		/* LZ4 Compression */
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+		{
+			char	   *dst = NULL;
+			Size		dst_size = 0;
+			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+			Size		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+			ReorderBufferCompressionStrategy strat;
+
+			if (lz4_CanDoStreamingCompression(src_size))
+			{
+				/* Use LZ4 streaming compression if possible */
+				lz4_StreamingCompressData(rb->context, src, src_size, &dst,
+										  &dst_size, compressor_state);
+				strat = REORDER_BUFFER_STRAT_LZ4_STREAMING;
+			}
+			else
+			{
+				/* Fallback to LZ4 regular compression */
+				lz4_CompressData(src, src_size, &dst, &dst_size);
+				strat = REORDER_BUFFER_STRAT_LZ4_REGULAR;
+			}
+
+			/*
+			 * Make sure the ReorderBuffer has enough space to store compressed
+			 * data. Compressed data must be smaller than raw data, so, the
+			 * ReorderBuffer should already have room for compressed data, but
+			 * we do this to avoid buffer overflow risks.
+			 */
+			ReorderBufferReserve(rb, (dst_size + sizeof(ReorderBufferDiskHeader)));
+
+			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+			hdr->comp_strat = strat;
+			hdr->size = dst_size + sizeof(ReorderBufferDiskHeader);
+			hdr->raw_size = src_size;
+
+			/*
+			 * Update header: hdr pointer has potentially changed due to
+			 * ReorderBufferReserve()
+			 */
+			*header = hdr;
+
+			/* Copy back compressed data into the ReorderBuffer */
+			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+				   dst_size);
+
+			pfree(dst);
+
+			break;
+		}
+	}
+}
+
+/*
+ * Decompress data read from disk and copy it into the ReorderBuffer.
+ */
+void
+ReorderBufferDecompress(ReorderBuffer *rb, char *data,
+						ReorderBufferDiskHeader *header, void *compressor_state)
+{
+	Size		raw_outbufsize = header->raw_size + sizeof(ReorderBufferDiskHeader);
+	/*
+	 * Make sure the output reorder buffer has enough space to store
+	 * decompressed/raw data.
+	 */
+	if (rb->outbufsize < raw_outbufsize)
+	{
+		rb->outbuf = repalloc(rb->outbuf, raw_outbufsize);
+		rb->outbufsize = raw_outbufsize;
+	}
+
+	/* Make a copy of the header read on disk into the ReorderBuffer */
+	memcpy(rb->outbuf, (char *) header, sizeof(ReorderBufferDiskHeader));
+
+	switch (header->comp_strat)
+	{
+		/* No decompression */
+		case REORDER_BUFFER_STRAT_UNCOMPRESSED:
+			{
+				/*
+				 * Make a copy of what was read on disk into the reorder
+				 * buffer.
+				 */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   data, header->raw_size);
+				break;
+			}
+		/* LZ4 regular decompression */
+		case REORDER_BUFFER_STRAT_LZ4_REGULAR:
+			{
+				char	   *buf;
+				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
+				Size		buf_size = header->raw_size;
+
+				lz4_DecompressData(data, src_size, &buf, buf_size);
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   buf, buf_size);
+
+				pfree(buf);
+				break;
+			}
+		/* LZ4 streaming decompression */
+		case REORDER_BUFFER_STRAT_LZ4_STREAMING:
+			{
+				char	   *buf;
+				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
+				Size		buf_size = header->raw_size;
+
+				lz4_StreamingDecompressData(rb->context, data, src_size, &buf,
+										   buf_size, compressor_state);
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   buf, buf_size);
+				/*
+				 * Not necessary to free buf in this case: it points to the
+				 * decompressed data stored in LZ4 output ring buffer.
+				 */
+				break;
+			}
+		default:
+			/* Other compression methods not yet supported */
+			break;
+	}
+}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index e332635f70b..f1562a77719 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -9,6 +9,10 @@
 #ifndef REORDERBUFFER_H
 #define REORDERBUFFER_H
 
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
 #include "access/htup_details.h"
 #include "lib/ilist.h"
 #include "lib/pairingheap.h"
@@ -427,6 +431,8 @@ typedef struct ReorderBufferTXN
 	 * Private data pointer of the output plugin.
 	 */
 	void	   *output_plugin_private;
+
+	void	   *compressor_state;
 } ReorderBufferTXN;
 
 /* so we can define the callbacks used inside struct ReorderBuffer itself */
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
new file mode 100644
index 00000000000..9aa8aea56f4
--- /dev/null
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -0,0 +1,95 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer_compression.h
+ *	  Functions for ReorderBuffer compression.
+ *
+ * Copyright (c) 2024-2024, PostgreSQL Global Development Group
+ *
+ * src/include/access/reorderbuffer_compression.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef REORDERBUFFER_COMPRESSION_H
+#define REORDERBUFFER_COMPRESSION_H
+
+#include "replication/reorderbuffer.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+/* ReorderBuffer on disk compression algorithms */
+typedef enum ReorderBufferCompressionMethod
+{
+	REORDER_BUFFER_NO_COMPRESSION,
+	REORDER_BUFFER_LZ4_COMPRESSION,
+} ReorderBufferCompressionMethod;
+
+/*
+ * Compression strategy applied to ReorderBuffer records spilled on disk
+ */
+typedef enum ReorderBufferCompressionStrategy
+{
+	REORDER_BUFFER_STRAT_UNCOMPRESSED,
+	REORDER_BUFFER_STRAT_LZ4_STREAMING,
+	REORDER_BUFFER_STRAT_LZ4_REGULAR,
+} ReorderBufferCompressionStrategy;
+
+/* Disk serialization support datastructures */
+typedef struct ReorderBufferDiskHeader
+{
+	ReorderBufferCompressionStrategy comp_strat; /* Compression strategy */
+	Size		size;					/* Ondisk size */
+	Size		raw_size;				/* Raw/uncompressed data size */
+	/* ReorderBufferChange + data follows */
+} ReorderBufferDiskHeader;
+
+#ifdef USE_LZ4
+/*
+ * We use a fairly small LZ4 ring buffer size (64kB). Using a larger buffer
+ * size provide better compression ratio, but as long as we have to allocate
+ * two LZ4 ring buffers per ReorderBufferTXN, we should keep it small.
+ */
+#define LZ4_RING_BUFFER_SIZE (64 * 1024)
+
+/*
+ * Use LZ4 streaming compression iff we can keep at least 2 uncompressed
+ * records into the LZ4 input ring buffer. If raw data size is too large, let's
+ * use regular LZ4 compression.
+ */
+#define lz4_CanDoStreamingCompression(s) (s < (LZ4_RING_BUFFER_SIZE / 2))
+
+/*
+ * LZ4 streaming compression/decompression handlers and ring
+ * buffers.
+ */
+typedef struct LZ4StreamingCompressorState {
+	/* Streaming compression handler */
+	LZ4_stream_t *lz4_stream;
+	/* Streaming decompression handler */
+	LZ4_streamDecode_t *lz4_stream_decode;
+	/* LZ4 in/out ring buffers used for streaming compression */
+	char	   *lz4_in_buf;
+	int			lz4_in_buf_offset;
+	char	   *lz4_out_buf;
+	int			lz4_out_buf_offset;
+} LZ4StreamingCompressorState;
+#else
+#define lz4_CanDoStreamingCompression(s) (false)
+#endif
+
+extern void *ReorderBufferNewCompressorState(MemoryContext context,
+											 int compression_method);
+extern void ReorderBufferFreeCompressorState(MemoryContext context,
+											 int compression_method,
+											 void *compressor_state);
+extern void ReorderBufferCompress(ReorderBuffer *rb,
+								  ReorderBufferDiskHeader **header,
+								  int compression_method, Size data_size,
+								  void *compressor_state);
+extern void ReorderBufferDecompress(ReorderBuffer *rb, char *data,
+									ReorderBufferDiskHeader *header,
+									void *compressor_state);
+
+#endif							/* REORDERBUFFER_COMPRESSION_H */
-- 
2.46.0

v4-0002-review.patchtext/x-patch; charset=UTF-8; name=v4-0002-review.patchDownload
From 17aed41dfdd452727643a40e5dba6c0c4705c64a Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 17 Sep 2024 16:37:49 +0200
Subject: [PATCH v4 02/11] review

---
 .../replication/logical/reorderbuffer.c        | 10 ++++++++--
 .../logical/reorderbuffer_compression.c        | 18 +++++++++++++++++-
 src/include/replication/reorderbuffer.h        |  2 ++
 .../replication/reorderbuffer_compression.h    |  8 ++++++++
 4 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index c36179d44b5..f5c51316d9d 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -113,8 +113,6 @@
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
 
-int			logical_decoding_spill_compression = REORDER_BUFFER_NO_COMPRESSION;
-
 /* entry for a hash table we use to map from xid to our transaction state */
 typedef struct ReorderBufferTXNByIdEnt
 {
@@ -176,6 +174,11 @@ typedef struct ReorderBufferToastEnt
 									 * main tup */
 } ReorderBufferToastEnt;
 
+/*
+ * XXX seems wrong to move ReorderBufferDiskChange typedef, it should be private
+ * in this file, even with compression IMHO
+ */
+
 #define IsSpecInsert(action) \
 ( \
 	((action) == REORDER_BUFFER_CHANGE_INTERNAL_SPEC_INSERT) \
@@ -210,6 +213,9 @@ static const Size max_changes_in_memory = 4096; /* XXX for restore only */
 /* GUC variable */
 int			debug_logical_replication_streaming = DEBUG_LOGICAL_REP_STREAMING_BUFFERED;
 
+/* Compression strategy for spilled data. */
+int			logical_decoding_spill_compression = REORDER_BUFFER_NO_COMPRESSION;
+
 /* ---------------------------------------
  * primary reorderbuffer support routines
  * ---------------------------------------
diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index 77f5c76929b..b4fea15ebd9 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -9,6 +9,8 @@
  * IDENTIFICATION
  *	  src/backend/access/common/reorderbuffer_compression.c
  *
+ *
+ * XXX I'd rename this to reorderbuffer_lz4 or something like that.
  *-------------------------------------------------------------------------
  */
 #include "postgres.h"
@@ -301,6 +303,8 @@ lz4_DecompressData(char *src, Size src_size, char **dst, Size dst_size)
 #endif
 }
 
+/* XXX The rest should be in reorderbuffer.c, with the rest of ReorderBuffer functions */
+
 /*
  * Allocate a new Compressor State, depending on the compression method.
  */
@@ -376,6 +380,12 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 		/* LZ4 Compression */
 		case REORDER_BUFFER_LZ4_COMPRESSION:
 		{
+			/*
+			 * XXX Won't this cause a lot of palloc/pfree traffic? We allocate a new
+			 * buffer for every compression, only to immediately throw it away. Look
+			 * at what astreamer_lz4_compressor_content does to reuse a buffer, maybe
+			 * we should do that here too?
+			 */
 			char	   *dst = NULL;
 			Size		dst_size = 0;
 			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
@@ -428,6 +438,12 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 
 /*
  * Decompress data read from disk and copy it into the ReorderBuffer.
+ *
+ * XXX Does it make sense to support both "regular" and "streaming" for lz4?
+ * Isn't one of those clearly better for this use case?
+ *
+ * XXX Also, we only ever create LZ4StreamingCompressorState, so does that
+ * work for non-streaming state?
  */
 void
 ReorderBufferDecompress(ReorderBuffer *rb, char *data,
@@ -479,7 +495,7 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 		/* LZ4 streaming decompression */
 		case REORDER_BUFFER_STRAT_LZ4_STREAMING:
 			{
-				char	   *buf;
+				char	   *buf;	/* XXX shouldn't this be set to NULL explicitly? */
 				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
 				Size		buf_size = header->raw_size;
 
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index f1562a77719..1a2ccda2eca 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -9,6 +9,7 @@
 #ifndef REORDERBUFFER_H
 #define REORDERBUFFER_H
 
+/* XXX we don't need any lz4 stuff in this header, right? */
 #ifdef USE_LZ4
 #include <lz4.h>
 #endif
@@ -30,6 +31,7 @@
 /* GUC variables */
 extern PGDLLIMPORT int logical_decoding_work_mem;
 extern PGDLLIMPORT int debug_logical_replication_streaming;
+extern PGDLLIMPORT int logical_decoding_spill_compression;
 
 /* possible values for debug_logical_replication_streaming */
 typedef enum
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index 9aa8aea56f4..be739d9ebeb 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -37,6 +37,14 @@ typedef enum ReorderBufferCompressionStrategy
 } ReorderBufferCompressionStrategy;
 
 /* Disk serialization support datastructures */
+
+/* XXX but what does ReorderBufferDiskHeader represent? what is it for?
+ * XXX Also, shouldn't it be ReorderBufferDiskChangeHeader? It's a header
+ * for a change, not for a disk.
+ * XXX I don't quite understand why we renamed this, DiskChange seems quite
+ * fine to me, even if it gets new fields for compression. Still, I'd keep
+ * it as private in reorderbuffer.c.
+ */
 typedef struct ReorderBufferDiskHeader
 {
 	ReorderBufferCompressionStrategy comp_strat; /* Compression strategy */
-- 
2.46.0

v4-0003-Fix-spill_bytes-counter.patchtext/x-patch; charset=UTF-8; name=v4-0003-Fix-spill_bytes-counter.patchDownload
From 487eea83f36dc8731bb486dac1336b4598048471 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Sun, 23 Jun 2024 14:42:04 -0700
Subject: [PATCH v4 03/11] Fix spill_bytes counter

The spill_bytes counter considers now the fact that decoded changes
are spilled on disk compressed.
---
 src/backend/replication/logical/reorderbuffer.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index f5c51316d9d..19b72e90af8 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -252,7 +252,7 @@ static void ReorderBufferExecuteInvalidations(uint32 nmsgs, SharedInvalidationMe
  */
 static void ReorderBufferCheckMemoryLimit(ReorderBuffer *rb);
 static void ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+static Size ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										 int fd, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										TXNEntryFile *file, XLogSegNo *segno);
@@ -3719,6 +3719,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	XLogSegNo	curOpenSegNo = 0;
 	Size		spilled = 0;
 	Size		size = txn->size;
+	Size		spillBytes = 0;
 
 	elog(DEBUG2, "spill %u changes in XID %u to disk",
 		 (uint32) txn->nentries_mem, txn->xid);
@@ -3770,7 +3771,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 						 errmsg("could not open file \"%s\": %m", path)));
 		}
 
-		ReorderBufferSerializeChange(rb, txn, fd, change);
+		spillBytes += ReorderBufferSerializeChange(rb, txn, fd, change);
 		dlist_delete(&change->node);
 		ReorderBufferReturnChange(rb, change, false);
 
@@ -3784,7 +3785,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	if (spilled)
 	{
 		rb->spillCount += 1;
-		rb->spillBytes += size;
+		rb->spillBytes += spillBytes;
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
@@ -3805,7 +3806,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 /*
  * Serialize individual change to disk.
  */
-static void
+static Size
 ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 							 int fd, ReorderBufferChange *change)
 {
@@ -4018,6 +4019,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	 */
 	if (txn->final_lsn < change->lsn)
 		txn->final_lsn = change->lsn;
+
+	/* Return data size written to disk */
+	return disk_hdr->size;
 }
 
 /* Returns true, if the output plugin supports streaming, false, otherwise. */
-- 
2.46.0

v4-0004-review.patchtext/x-patch; charset=UTF-8; name=v4-0004-review.patchDownload
From 2e120c01baa97e82a62fd8c6936612f23dd4ac3c Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 17 Sep 2024 16:47:55 +0200
Subject: [PATCH v4 04/11] review

---
 src/backend/replication/logical/reorderbuffer.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 19b72e90af8..168956d944a 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -3805,6 +3805,14 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 
 /*
  * Serialize individual change to disk.
+ *
+ * Returns the amount of data written to disk (with compression, this is the
+ * size of the compressed representation).
+ *
+ * XXX But is this actually the right thing to do? Even without compression we
+ * don't really count the bytes written to the disk (we don't account for the
+ * DiskChange header), but rather the memory representation. So why should we
+ * do that with compression? Seems a bit strange.
  */
 static Size
 ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
-- 
2.46.0

v4-0005-Compress-ReorderBuffer-spill-files-using-PGLZ.patchtext/x-patch; charset=UTF-8; name=v4-0005-Compress-ReorderBuffer-spill-files-using-PGLZ.patchDownload
From fbbc3693210041e3840e362e2892cdf94fbec709 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 18 Jul 2024 07:36:04 -0700
Subject: [PATCH v4 05/11] Compress ReorderBuffer spill files using PGLZ

---
 .../logical/reorderbuffer_compression.c       | 58 +++++++++++++++++++
 .../replication/reorderbuffer_compression.h   |  2 +
 2 files changed, 60 insertions(+)

diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index b4fea15ebd9..8d158aa51da 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -15,6 +15,8 @@
  */
 #include "postgres.h"
 
+#include "common/pg_lzcompress.h"
+
 #ifdef USE_LZ4
 #include <lz4.h>
 #endif
@@ -317,6 +319,7 @@ ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
 			return lz4_NewCompressorState(context);
 			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
 			return NULL;
 			break;
@@ -337,6 +340,7 @@ ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
 			return lz4_FreeCompressorState(context, compressor_state);
 			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
 			break;
 	}
@@ -431,6 +435,40 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 
 			pfree(dst);
 
+			break;
+		}
+		/* PGLZ compression */
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
+		{
+			int32		dst_size = 0;
+			char	   *dst = NULL;
+			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+			int32		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+			int32		max_size = PGLZ_MAX_OUTPUT(src_size);
+
+			dst = (char *) palloc0(max_size);
+			dst_size = pglz_compress(src, src_size, dst, PGLZ_strategy_always);
+
+			if (dst_size < 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("PGLZ compression failed")));
+
+			ReorderBufferReserve(rb, (Size) (dst_size + sizeof(ReorderBufferDiskHeader)));
+
+			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+			hdr->comp_strat = REORDER_BUFFER_STRAT_PGLZ;
+			hdr->size = (Size) dst_size + sizeof(ReorderBufferDiskHeader);
+			hdr->raw_size = (Size) src_size;
+
+			*header = hdr;
+
+			/* Copy back compressed data into the ReorderBuffer */
+			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+				   dst_size);
+
+			pfree(dst);
+
 			break;
 		}
 	}
@@ -511,6 +549,26 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 				 */
 				break;
 			}
+		/* PGLZ decompression */
+		case REORDER_BUFFER_STRAT_PGLZ:
+			{
+				char	   *buf;
+				int32		src_size = (int32) header->size - sizeof(ReorderBufferDiskHeader);
+				int32		buf_size = (int32) header->raw_size;
+				int32		decBytes;
+
+				/* Decompress data directly into the ReorderBuffer */
+				buf = (char *) rb->outbuf;
+				buf += sizeof(ReorderBufferDiskHeader);
+
+				decBytes = pglz_decompress(data, src_size, buf, buf_size, false);
+
+				if (decBytes < 0)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("compressed PGLZ data is corrupted")));
+				break;
+			}
 		default:
 			/* Other compression methods not yet supported */
 			break;
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index be739d9ebeb..133513880e6 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -24,6 +24,7 @@ typedef enum ReorderBufferCompressionMethod
 {
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_LZ4_COMPRESSION,
+	REORDER_BUFFER_PGLZ_COMPRESSION,
 } ReorderBufferCompressionMethod;
 
 /*
@@ -34,6 +35,7 @@ typedef enum ReorderBufferCompressionStrategy
 	REORDER_BUFFER_STRAT_UNCOMPRESSED,
 	REORDER_BUFFER_STRAT_LZ4_STREAMING,
 	REORDER_BUFFER_STRAT_LZ4_REGULAR,
+	REORDER_BUFFER_STRAT_PGLZ,
 } ReorderBufferCompressionStrategy;
 
 /* Disk serialization support datastructures */
-- 
2.46.0

v4-0006-review.patchtext/x-patch; charset=UTF-8; name=v4-0006-review.patchDownload
From 8e243868ce9d6254480a239012bae730f4db77a9 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 17 Sep 2024 17:05:10 +0200
Subject: [PATCH v4 06/11] review

---
 .../replication/logical/reorderbuffer_compression.c       | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index 8d158aa51da..e8785d27e49 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -438,8 +438,16 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 			break;
 		}
 		/* PGLZ compression */
+		/*
+		 * XXX I'd probably start by adding pglz first, and only then add lz4 as a
+		 * separate patch, simply because pglz is the default and always supported,
+		 * while lz4 requires extra flags.
+		 */
 		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		{
+			/*
+			 * XXX Same comment about palloc/pfree traffic as for lz4 ...
+			 */
 			int32		dst_size = 0;
 			char	   *dst = NULL;
 			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
-- 
2.46.0

v4-0007-Compress-ReorderBuffer-spill-files-using-ZSTD.patchtext/x-patch; charset=UTF-8; name=v4-0007-Compress-ReorderBuffer-spill-files-using-ZSTD.patchDownload
From 8e55c5bf35de74c9a9bf09e0cd3c16c3599dabd5 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 18 Jul 2024 07:39:25 -0700
Subject: [PATCH v4 07/11] Compress ReorderBuffer spill files using ZSTD

---
 .../logical/reorderbuffer_compression.c       | 364 ++++++++++++++++++
 .../replication/reorderbuffer_compression.h   |  39 ++
 2 files changed, 403 insertions(+)

diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index e8785d27e49..96751852ef8 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -21,6 +21,10 @@
 #include <lz4.h>
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
+
 #include "replication/reorderbuffer_compression.h"
 
 #define NO_LZ4_SUPPORT() \
@@ -29,6 +33,12 @@
 			 errmsg("compression method lz4 not supported"), \
 			 errdetail("This functionality requires the server to be built with lz4 support.")))
 
+#define NO_ZSTD_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method zstd not supported"), \
+			 errdetail("This functionality requires the server to be built with zstd support.")))
+
 /*
  * Allocate a new LZ4StreamingCompressorState.
  */
@@ -307,6 +317,309 @@ lz4_DecompressData(char *src, Size src_size, char **dst, Size dst_size)
 
 /* XXX The rest should be in reorderbuffer.c, with the rest of ReorderBuffer functions */
 
+/*
+ * Allocate a new ZSTDStreamingCompressorState.
+ */
+static void *
+zstd_NewCompressorState(MemoryContext context)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+	return NULL;				/* keep compiler quiet */
+#else
+	ZSTDStreamingCompressorState *cstate;
+
+	cstate = (ZSTDStreamingCompressorState *)
+		MemoryContextAlloc(context, sizeof(ZSTDStreamingCompressorState));
+
+	/*
+	 * We do not allocate ZSTD buffers and contexts at this point because we
+	 * have no guarantee that we will need them later. Let's allocate only when
+	 * we are about to use them.
+	 */
+	cstate->zstd_c_ctx = NULL;
+	cstate->zstd_c_in_buf = NULL;
+	cstate->zstd_c_in_buf_size = 0;
+	cstate->zstd_c_out_buf = NULL;
+	cstate->zstd_c_out_buf_size = 0;
+	cstate->zstd_frame_size = 0;
+	cstate->zstd_d_ctx = NULL;
+	cstate->zstd_d_in_buf = NULL;
+	cstate->zstd_d_in_buf_size = 0;
+	cstate->zstd_d_out_buf = NULL;
+	cstate->zstd_d_out_buf_size = 0;
+
+	return (void *) cstate;
+#endif
+}
+
+/*
+ * Free ZSTD memory resources and the compressor state.
+ */
+static void
+zstd_FreeCompressorState(MemoryContext context, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext;
+
+	if (compressor_state == NULL)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+
+	if (cstate->zstd_c_ctx != NULL)
+	{
+		/* Compressor state was used for compression */
+		pfree(cstate->zstd_c_in_buf);
+		pfree(cstate->zstd_c_out_buf);
+		ZSTD_freeCCtx(cstate->zstd_c_ctx);
+	}
+	if (cstate->zstd_d_ctx != NULL)
+	{
+		/* Compressor state was used for decompression */
+		pfree(cstate->zstd_d_in_buf);
+		pfree(cstate->zstd_d_out_buf);
+		ZSTD_freeDCtx(cstate->zstd_d_ctx);
+	}
+
+	pfree(compressor_state);
+
+	MemoryContextSwitchTo(oldcontext);
+#endif
+}
+
+#ifdef USE_ZSTD
+/*
+ * Allocate ZSTD compression buffers and create the ZSTD compression context.
+ */
+static void
+zstd_CreateStreamCompressorState(MemoryContext context, void *compressor_state)
+{
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	cstate->zstd_c_in_buf_size = ZSTD_CStreamInSize();
+	cstate->zstd_c_in_buf = (char *) palloc0(cstate->zstd_c_in_buf_size);
+	cstate->zstd_c_out_buf_size = ZSTD_CStreamOutSize();
+	cstate->zstd_c_out_buf = (char *) palloc0(cstate->zstd_c_out_buf_size);
+	cstate->zstd_c_ctx = ZSTD_createCCtx();
+
+	if (cstate->zstd_c_ctx == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("could not create ZSTD compression context")));
+
+	/* Set compression level */
+	ZSTD_CCtx_setParameter(cstate->zstd_c_ctx, ZSTD_c_compressionLevel,
+						   ZSTD_COMPRESSION_LEVEL);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+#ifdef USE_ZSTD
+/*
+ * Allocate ZSTD decompression buffers and create the ZSTD decompression
+ * context.
+ */
+static void
+zstd_CreateStreamDecodeCompressorState(MemoryContext context, void *compressor_state)
+{
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	cstate->zstd_d_in_buf_size = ZSTD_DStreamInSize();
+	cstate->zstd_d_in_buf = (char *) palloc0(cstate->zstd_d_in_buf_size);
+	cstate->zstd_d_out_buf_size = ZSTD_DStreamOutSize();
+	cstate->zstd_d_out_buf = (char *) palloc0(cstate->zstd_d_out_buf_size);
+	cstate->zstd_d_ctx = ZSTD_createDCtx();
+
+	if (cstate->zstd_d_ctx == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("could not create ZSTD decompression context")));
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+/*
+ * Data compression using ZSTD streaming API.
+ */
+static void
+zstd_StreamingCompressData(MemoryContext context, char *src, Size src_size,
+						   char **dst, Size *dst_size, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+	/* Size of remaining data to be copied from src into ZSTD input buffer */
+	Size		toCpy = src_size;
+	char	   *dst_data;
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	/* Allocate ZSTD buffers and context */
+	if (cstate->zstd_c_ctx == NULL)
+		zstd_CreateStreamCompressorState(context, compressor_state);
+
+	/* Allocate memory that will be used to store compressed data */
+	*dst = (char *) palloc0(ZSTD_compressBound(src_size));
+
+	dst_data = *dst;
+	*dst_size = 0;
+
+	/*
+	 * ZSTD streaming compression works with chunks: the source data needs to
+	 * be splitted out in chunks, each of them is then copied into ZSTD input
+	 * buffer.
+	 * For each chunk, we proceed with compression. Streaming compression is
+	 * not intended to compress the whole input chunk, so we have the call
+	 * ZSTD_compressStream2() multiple times until the entire chunk is
+	 * consumed.
+	 */
+	while (toCpy > 0)
+	{
+		/* Are we on the last chunk? */
+		bool		last_chunk = (toCpy < cstate->zstd_c_in_buf_size);
+		/* Size of the data copied into ZSTD input buffer */
+		Size		cpySize = last_chunk ? toCpy : cstate->zstd_c_in_buf_size;
+		bool		finished = false;
+		ZSTD_inBuffer input;
+		ZSTD_EndDirective mode = last_chunk ? ZSTD_e_flush : ZSTD_e_continue;
+
+		/* Copy data from src into ZSTD input buffer */
+		memcpy(cstate->zstd_c_in_buf, src, cpySize);
+
+		/*
+		 * Close the frame when we are on the last chunk and we've reached max
+		 * frame size.
+		 */
+		if (last_chunk && (cstate->zstd_frame_size > ZSTD_MAX_FRAME_SIZE))
+		{
+			mode = ZSTD_e_end;
+			cstate->zstd_frame_size = 0;
+		}
+
+		cstate->zstd_frame_size += cpySize;
+
+		input.src = cstate->zstd_c_in_buf;
+		input.size = cpySize;
+		input.pos = 0;
+
+		do
+		{
+			Size		remaining;
+			ZSTD_outBuffer output;
+
+			output.dst = cstate->zstd_c_out_buf;
+			output.size = cstate->zstd_c_out_buf_size;
+			output.pos = 0;
+
+			remaining = ZSTD_compressStream2(cstate->zstd_c_ctx, &output,
+											 &input, mode);
+
+			if (ZSTD_isError(remaining))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("ZSTD compression failed")));
+
+			/* Copy back compressed data from ZSTD output buffer */
+			memcpy(dst_data, (char *) cstate->zstd_c_out_buf, output.pos);
+
+			dst_data += output.pos;
+			*dst_size += output.pos;
+
+			/*
+			 * Compression is done when we are working on the last chunk and
+			 * there is nothing left to compress, or, when we reach the end of
+			 * the chunk.
+			 */
+			finished = last_chunk ? (remaining == 0) : (input.pos == input.size);
+		} while (!finished);
+
+		src += cpySize;
+		toCpy -= cpySize;
+	}
+#endif
+}
+
+/*
+ * Data decompression using ZSTD streaming API.
+ */
+static void
+zstd_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
+							char **dst, Size dst_size, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+	/* Size of remaining data to be copied from src into ZSTD input buffer */
+	Size		toCpy = src_size;
+	char	   *dst_data;
+	Size		decBytes = 0;	/* Size of decompressed data */
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	/* Allocate ZSTD buffers and context */
+	if (cstate->zstd_d_ctx == NULL)
+		zstd_CreateStreamDecodeCompressorState(context, compressor_state);
+
+	/* Allocate memory that will be used to store decompressed data */
+	*dst = (char *) palloc0(dst_size);
+
+	dst_data = *dst;
+
+	while (toCpy > 0)
+	{
+		ZSTD_inBuffer input;
+		Size		cpySize = (toCpy > cstate->zstd_d_in_buf_size) ? cstate->zstd_d_in_buf_size : toCpy;
+
+		/* Copy data from src into ZSTD input buffer */
+		memcpy(cstate->zstd_d_in_buf, src, cpySize);
+
+		input.src = cstate->zstd_d_in_buf;
+		input.size = cpySize;
+		input.pos = 0;
+
+		while (input.pos < input.size)
+		{
+			ZSTD_outBuffer output;
+			Size		ret;
+
+			output.dst = cstate->zstd_d_out_buf;
+			output.size = cstate->zstd_d_out_buf_size;
+			output.pos = 0;
+
+			ret = ZSTD_decompressStream(cstate->zstd_d_ctx, &output , &input);
+
+			if (ZSTD_isError(ret))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("ZSTD decompression failed")));
+
+			/* Copy back compressed data from ZSTD output buffer */
+			memcpy(dst_data, (char *) cstate->zstd_d_out_buf, output.pos);
+
+			dst_data += output.pos;
+			decBytes += output.pos;
+		}
+
+		src += cpySize;
+		toCpy -= cpySize;
+	}
+
+	Assert(dst_size == decBytes);
+#endif
+}
+
 /*
  * Allocate a new Compressor State, depending on the compression method.
  */
@@ -318,6 +631,9 @@ ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
 		case REORDER_BUFFER_LZ4_COMPRESSION:
 			return lz4_NewCompressorState(context);
 			break;
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+			return zstd_NewCompressorState(context);
+			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
 		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
@@ -339,6 +655,9 @@ ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
 		case REORDER_BUFFER_LZ4_COMPRESSION:
 			return lz4_FreeCompressorState(context, compressor_state);
 			break;
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+			return zstd_FreeCompressorState(context, compressor_state);
+			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
 		case REORDER_BUFFER_PGLZ_COMPRESSION:
 		default:
@@ -477,6 +796,35 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 
 			pfree(dst);
 
+			break;
+		}
+		/* ZSTD Compression */
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+		{
+			char	   *dst = NULL;
+			Size		dst_size = 0;
+			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+			Size		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+
+			/* Use ZSTD streaming compression */
+			zstd_StreamingCompressData(rb->context, src, src_size, &dst,
+									   &dst_size, compressor_state);
+
+			ReorderBufferReserve(rb, (dst_size + sizeof(ReorderBufferDiskHeader)));
+
+			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+			hdr->comp_strat = REORDER_BUFFER_STRAT_ZSTD_STREAMING;
+			hdr->size = dst_size + sizeof(ReorderBufferDiskHeader);
+			hdr->raw_size = src_size;
+
+			*header = hdr;
+
+			/* Copy back compressed data into the ReorderBuffer */
+			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+				   dst_size);
+
+			pfree(dst);
+
 			break;
 		}
 	}
@@ -577,6 +925,22 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 							 errmsg_internal("compressed PGLZ data is corrupted")));
 				break;
 			}
+		/* ZSTD streaming decompression */
+		case REORDER_BUFFER_STRAT_ZSTD_STREAMING:
+			{
+				char	   *buf;
+				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
+				Size		buf_size = header->raw_size;
+
+				zstd_StreamingDecompressData(rb->context, data, src_size, &buf,
+											 buf_size, compressor_state);
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
+					   buf, buf_size);
+				pfree(buf);
+				break;
+			}
 		default:
 			/* Other compression methods not yet supported */
 			break;
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index 133513880e6..960064d571e 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -19,12 +19,17 @@
 #include <lz4.h>
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
+
 /* ReorderBuffer on disk compression algorithms */
 typedef enum ReorderBufferCompressionMethod
 {
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_LZ4_COMPRESSION,
 	REORDER_BUFFER_PGLZ_COMPRESSION,
+	REORDER_BUFFER_ZSTD_COMPRESSION,
 } ReorderBufferCompressionMethod;
 
 /*
@@ -36,6 +41,7 @@ typedef enum ReorderBufferCompressionStrategy
 	REORDER_BUFFER_STRAT_LZ4_STREAMING,
 	REORDER_BUFFER_STRAT_LZ4_REGULAR,
 	REORDER_BUFFER_STRAT_PGLZ,
+	REORDER_BUFFER_STRAT_ZSTD_STREAMING,
 } ReorderBufferCompressionStrategy;
 
 /* Disk serialization support datastructures */
@@ -89,6 +95,39 @@ typedef struct LZ4StreamingCompressorState {
 #define lz4_CanDoStreamingCompression(s) (false)
 #endif
 
+#ifdef USE_ZSTD
+/*
+ * Low compression level provides high compression speed and decent compression
+ * rate. Minimum level is 1, maximum is 22.
+ */
+#define ZSTD_COMPRESSION_LEVEL 1
+
+/*
+ * Maximum volume of data encoded in the current ZSTD frame. When this
+ * threshold is reached then we close the current frame and start a new one.
+ */
+#define ZSTD_MAX_FRAME_SIZE (64 * 1024)
+
+/*
+ * ZSTD streaming compression/decompression handlers and buffers.
+ */
+typedef struct ZSTDStreamingCompressorState {
+	/* Compression */
+	ZSTD_CCtx  *zstd_c_ctx;
+	Size		zstd_c_in_buf_size;
+	char	   *zstd_c_in_buf;
+	Size		zstd_c_out_buf_size;
+	char	   *zstd_c_out_buf;
+	Size		zstd_frame_size;
+	/* Decompression */
+	ZSTD_DCtx  *zstd_d_ctx;
+	Size		zstd_d_in_buf_size;
+	char	   *zstd_d_in_buf;
+	Size		zstd_d_out_buf_size;
+	char	   *zstd_d_out_buf;
+} ZSTDStreamingCompressorState;
+#endif
+
 extern void *ReorderBufferNewCompressorState(MemoryContext context,
 											 int compression_method);
 extern void ReorderBufferFreeCompressorState(MemoryContext context,
-- 
2.46.0

v4-0008-review.patchtext/x-patch; charset=UTF-8; name=v4-0008-review.patchDownload
From 7cbf94f473454987410373b6ca78e44c06b5fe08 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 17 Sep 2024 17:08:35 +0200
Subject: [PATCH v4 08/11] review

---
 src/backend/replication/logical/reorderbuffer_compression.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index 96751852ef8..67068db9d2c 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -801,6 +801,9 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 		/* ZSTD Compression */
 		case REORDER_BUFFER_ZSTD_COMPRESSION:
 		{
+			/*
+			 * XXX Same comment about palloc/pfree traffic as for lz4 ...
+			 */
 			char	   *dst = NULL;
 			Size		dst_size = 0;
 			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
@@ -928,7 +931,7 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 		/* ZSTD streaming decompression */
 		case REORDER_BUFFER_STRAT_ZSTD_STREAMING:
 			{
-				char	   *buf;
+				char	   *buf;	/* XXX shouldn't this be set to NULL explicitly? */
 				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
 				Size		buf_size = header->raw_size;
 
-- 
2.46.0

v4-0009-Add-the-subscription-option-spill_compression.patchtext/x-patch; charset=UTF-8; name=v4-0009-Add-the-subscription-option-spill_compression.patchDownload
From 406e38977b0dd868b20ac31107c8bfbd89e5ad0d Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 17 Sep 2024 15:30:53 +0200
Subject: [PATCH v4 09/11] Add the subscription option spill_compression

---
 doc/src/sgml/ref/create_subscription.sgml     |  24 +++
 src/backend/catalog/pg_subscription.c         |   6 +
 src/backend/catalog/system_views.sql          |   3 +-
 src/backend/commands/subscriptioncmds.c       |  31 +++-
 .../libpqwalreceiver/libpqwalreceiver.c       |   5 +
 src/backend/replication/logical/logical.c     |   4 +
 .../replication/logical/reorderbuffer.c       |  15 +-
 .../logical/reorderbuffer_compression.c       |  57 +++++++
 src/backend/replication/logical/worker.c      |  13 +-
 src/backend/replication/pgoutput/pgoutput.c   |  28 ++++
 src/bin/pg_dump/pg_dump.c                     |  18 +-
 src/bin/pg_dump/pg_dump.h                     |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl              |   8 +-
 src/bin/psql/describe.c                       |   7 +-
 src/bin/psql/tab-complete.c                   |   5 +-
 src/include/catalog/pg_subscription.h         |   4 +
 src/include/replication/logical.h             |   2 +
 src/include/replication/pgoutput.h            |   1 +
 .../replication/reorderbuffer_compression.h   |   4 +
 src/include/replication/walreceiver.h         |   1 +
 src/test/regress/expected/subscription.out    | 156 +++++++++---------
 src/test/regress/sql/subscription.sql         |   4 +
 22 files changed, 305 insertions(+), 92 deletions(-)

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 740b7d94210..56f733eaf8d 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -428,6 +428,30 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry id="sql-createsubscription-params-with-spill-compression">
+        <term><literal>spill_compression</literal> (<type>enum</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the decoded changes that eventually need to be
+          temporarily written on disk by the publisher are compressed or not.
+          Default value is <literal>off</literal> meaning no data compression
+          involved. Setting <literal>spill_compression</literal> to
+          <literal>on</literal> or <literal>pglz</literal> means that the
+          decoded changes are compressed using the internal
+          <literal>PGLZ</literal> compression algorithm.
+         </para>
+
+         <para>
+          If the <productname>PostgreSQL</productname> server running the
+          publisher node supports the external compression libraries
+          <productname>LZ4</productname> or
+          <productname>Zstandard</productname>,
+          <literal>spill_compression</literal> can be set respectively to
+          <literal>lz4</literal> or <literal>zstd</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist></para>
 
     </listitem>
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 9efc9159f2c..a3329043dca 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -110,6 +110,12 @@ GetSubscription(Oid subid, bool missing_ok)
 	/* Is the subscription owner a superuser? */
 	sub->ownersuperuser = superuser_arg(sub->owner);
 
+	/* Get spill_compression */
+	datum = SysCacheGetAttrNotNull(SUBSCRIPTIONOID,
+								   tup,
+								   Anum_pg_subscription_subspillcompression);
+	sub->spill_compression = TextDatumGetCString(datum);
+
 	ReleaseSysCache(tup);
 
 	return sub;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a18..40768a9ae9d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1356,7 +1356,8 @@ REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
               subbinary, substream, subtwophasestate, subdisableonerr,
 			  subpasswordrequired, subrunasowner, subfailover,
-              subslotname, subsynccommit, subpublications, suborigin)
+              subslotname, subsynccommit, subpublications, suborigin,
+              subspillcompression)
     ON pg_subscription TO public;
 
 CREATE VIEW pg_stat_subscription_stats AS
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 02ccc636b80..018f42494d0 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -40,6 +40,7 @@
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
 #include "replication/origin.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slot.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
@@ -73,6 +74,7 @@
 #define SUBOPT_FAILOVER				0x00002000
 #define SUBOPT_LSN					0x00004000
 #define SUBOPT_ORIGIN				0x00008000
+#define SUBOPT_SPILL_COMPRESSION	0x00010000
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -100,6 +102,7 @@ typedef struct SubOpts
 	bool		failover;
 	char	   *origin;
 	XLogRecPtr	lsn;
+	char	   *spill_compression;
 } SubOpts;
 
 static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
@@ -164,6 +167,8 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		opts->failover = false;
 	if (IsSet(supported_opts, SUBOPT_ORIGIN))
 		opts->origin = pstrdup(LOGICALREP_ORIGIN_ANY);
+	if (IsSet(supported_opts, SUBOPT_SPILL_COMPRESSION))
+		opts->spill_compression = "off";
 
 	/* Parse options */
 	foreach(lc, stmt_options)
@@ -357,6 +362,18 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_LSN;
 			opts->lsn = lsn;
 		}
+		else if (IsSet(supported_opts, SUBOPT_SPILL_COMPRESSION) &&
+				 strcmp(defel->defname, "spill_compression") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_SPILL_COMPRESSION))
+				errorConflictingDefElem(defel, pstate);
+
+			opts->specified_opts |= SUBOPT_SPILL_COMPRESSION;
+			opts->spill_compression = defGetString(defel);
+
+			ReorderBufferValidateCompressionMethod(opts->spill_compression,
+												   ERROR);
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -594,7 +611,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 					  SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
 					  SUBOPT_STREAMING | SUBOPT_TWOPHASE_COMMIT |
 					  SUBOPT_DISABLE_ON_ERR | SUBOPT_PASSWORD_REQUIRED |
-					  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN);
+					  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
+					  SUBOPT_SPILL_COMPRESSION);
 	parse_subscription_options(pstate, stmt->options, supported_opts, &opts);
 
 	/*
@@ -714,6 +732,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 		publicationListToArray(publications);
 	values[Anum_pg_subscription_suborigin - 1] =
 		CStringGetTextDatum(opts.origin);
+	values[Anum_pg_subscription_subspillcompression - 1] =
+		CStringGetTextDatum(opts.spill_compression);
 
 	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
 
@@ -1196,7 +1216,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								  SUBOPT_DISABLE_ON_ERR |
 								  SUBOPT_PASSWORD_REQUIRED |
 								  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
-								  SUBOPT_ORIGIN);
+								  SUBOPT_ORIGIN | SUBOPT_SPILL_COMPRESSION);
 
 				parse_subscription_options(pstate, stmt->options,
 										   supported_opts, &opts);
@@ -1363,6 +1383,13 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 					replaces[Anum_pg_subscription_suborigin - 1] = true;
 				}
 
+				if (IsSet(opts.specified_opts, SUBOPT_SPILL_COMPRESSION))
+				{
+					values[Anum_pg_subscription_subspillcompression - 1] =
+						CStringGetTextDatum(opts.spill_compression);
+					replaces[Anum_pg_subscription_subspillcompression - 1] = true;
+				}
+
 				update_tuple = true;
 				break;
 			}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 97f957cd87b..20f5e4a83e8 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -620,6 +620,11 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
 			PQserverVersion(conn->streamConn) >= 140000)
 			appendStringInfoString(&cmd, ", binary 'true'");
 
+		if (options->proto.logical.spill_compression &&
+			PQserverVersion(conn->streamConn) >= 180000)
+			appendStringInfo(&cmd, ", spill_compression '%s'",
+							 options->proto.logical.spill_compression);
+
 		appendStringInfoChar(&cmd, ')');
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 3fe1774a1e9..54fbbe6fea6 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -36,6 +36,7 @@
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slotsync.h"
 #include "replication/snapbuild.h"
 #include "storage/proc.h"
@@ -298,6 +299,9 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->fast_forward = fast_forward;
 
+	/* No spill files compression by default */
+	ctx->spill_compression_method = REORDER_BUFFER_NO_COMPRESSION;
+
 	MemoryContextSwitchTo(old_context);
 
 	return ctx;
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 168956d944a..6545c310f7f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -410,6 +410,15 @@ ReorderBufferFree(ReorderBuffer *rb)
 	ReorderBufferCleanupSerializedTXNs(NameStr(MyReplicationSlot->data.name));
 }
 
+/* Returns spill files compression method */
+static inline uint8
+ReorderBufferSpillCompressionMethod(ReorderBuffer *rb)
+{
+	LogicalDecodingContext *ctx = rb->private_data;
+
+	return ctx->spill_compression_method;
+}
+
 /*
  * Get an unused, possibly preallocated, ReorderBufferTXN.
  */
@@ -431,7 +440,7 @@ ReorderBufferGetTXN(ReorderBuffer *rb)
 	txn->command_id = InvalidCommandId;
 	txn->output_plugin_private = NULL;
 	txn->compressor_state = ReorderBufferNewCompressorState(rb->context,
-															logical_decoding_spill_compression);
+															ReorderBufferSpillCompressionMethod(rb));
 
 	return txn;
 }
@@ -470,7 +479,7 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	}
 
 	ReorderBufferFreeCompressorState(rb->context,
-									 logical_decoding_spill_compression,
+									 ReorderBufferSpillCompressionMethod(rb),
 									 txn->compressor_state);
 
 	/* Reset the toast hash */
@@ -3996,7 +4005,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	}
 
 	/* Inplace ReorderBuffer content compression before writing it on disk */
-	ReorderBufferCompress(rb, &disk_hdr, logical_decoding_spill_compression,
+	ReorderBufferCompress(rb, &disk_hdr, ReorderBufferSpillCompressionMethod(rb),
 						  sz, txn->compressor_state);
 
 	errno = 0;
diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index 67068db9d2c..707f35cff7c 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -949,3 +949,60 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 			break;
 	}
 }
+
+/*
+ * According to a given compression method (as string representation), returns
+ * the corresponding ReorderBufferCompressionMethod
+ */
+ReorderBufferCompressionMethod
+ReorderBufferParseCompressionMethod(const char *method)
+{
+	if (pg_strcasecmp(method, "on") == 0)
+		return REORDER_BUFFER_PGLZ_COMPRESSION;
+	else if (pg_strcasecmp(method, "pglz") == 0)
+		return REORDER_BUFFER_PGLZ_COMPRESSION;
+	else if (pg_strcasecmp(method, "off") == 0)
+		return REORDER_BUFFER_NO_COMPRESSION;
+#ifdef USE_LZ4
+	else if (pg_strcasecmp(method, "lz4") == 0)
+		return REORDER_BUFFER_LZ4_COMPRESSION;
+#endif
+#ifdef USE_ZSTD
+	else if (pg_strcasecmp(method, "zstd") == 0)
+		return REORDER_BUFFER_ZSTD_COMPRESSION;
+#endif
+	else
+		return REORDER_BUFFER_INVALID_COMPRESSION;
+}
+
+/*
+ * Check whether the passed compression method is valid and report errors at
+ * elevel.
+ *
+ * As this validation is intended to be executed on subscriber side, then we
+ * actually don't know if the server running the publisher supports external
+ * compression libraries. We only check if the compression method is
+ * potentially supported. The real validation is done by the publisher when
+ * the replication starts, an error is then triggered if the compression method
+ * is not supported.
+ */
+void
+ReorderBufferValidateCompressionMethod(const char *method, int elevel)
+{
+	bool		valid = false;
+	char		methods[5][5] = {"on", "off", "pglz", "lz4", "zstd"};
+
+	for (int i = 0; i < 5; i++)
+	{
+		if (pg_strcasecmp(method, methods[i]) == 0)
+		{
+			valid = true;
+			break;
+		}
+	}
+
+	if (!valid)
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("compression method \"%s\" not valid", method)));
+}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 925dff9cc44..eb42e844f99 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -4021,7 +4021,8 @@ maybe_reread_subscription(void)
 		newsub->passwordrequired != MySubscription->passwordrequired ||
 		strcmp(newsub->origin, MySubscription->origin) != 0 ||
 		newsub->owner != MySubscription->owner ||
-		!equal(newsub->publications, MySubscription->publications))
+		!equal(newsub->publications, MySubscription->publications) ||
+		strcmp(newsub->spill_compression, MySubscription->spill_compression) != 0)
 	{
 		if (am_parallel_apply_worker())
 			ereport(LOG,
@@ -4469,6 +4470,16 @@ set_stream_options(WalRcvStreamOptions *options,
 		MyLogicalRepWorker->parallel_apply = false;
 	}
 
+	if (server_version >= 180000 &&
+			 MySubscription->stream == LOGICALREP_STREAM_OFF &&
+			 MySubscription->spill_compression != NULL)
+	{
+		options->proto.logical.spill_compression =
+			pstrdup(MySubscription->spill_compression);
+	}
+	else
+		options->proto.logical.spill_compression = NULL;
+
 	options->proto.logical.twophase = false;
 	options->proto.logical.origin = pstrdup(MySubscription->origin);
 }
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 00e7024563e..521b646bb60 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -27,6 +27,7 @@
 #include "replication/logicalproto.h"
 #include "replication/origin.h"
 #include "replication/pgoutput.h"
+#include "replication/reorderbuffer_compression.h"
 #include "utils/builtins.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
@@ -283,11 +284,13 @@ parse_output_parameters(List *options, PGOutputData *data)
 	bool		streaming_given = false;
 	bool		two_phase_option_given = false;
 	bool		origin_option_given = false;
+	bool		spill_compression_option_given = false;
 
 	data->binary = false;
 	data->streaming = LOGICALREP_STREAM_OFF;
 	data->messages = false;
 	data->two_phase = false;
+	data->spill_compression_method = REORDER_BUFFER_NO_COMPRESSION;
 
 	foreach(lc, options)
 	{
@@ -396,6 +399,28 @@ parse_output_parameters(List *options, PGOutputData *data)
 						errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						errmsg("unrecognized origin value: \"%s\"", origin));
 		}
+		else if (strcmp(defel->defname, "spill_compression") == 0)
+		{
+			uint8		method;
+			char	   *method_str;
+
+			if (spill_compression_option_given)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+			spill_compression_option_given = true;
+
+			method_str = defGetString(defel);
+			method = ReorderBufferParseCompressionMethod(method_str);
+
+			if (method == REORDER_BUFFER_INVALID_COMPRESSION)
+				ereport(ERROR,
+						errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						errmsg("invalid spill files compression method: \"%s\"",
+							   method_str));
+
+			data->spill_compression_method = method;
+		}
 		else
 			elog(ERROR, "unrecognized pgoutput option: %s", defel->defname);
 	}
@@ -508,6 +533,9 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 		data->publications = NIL;
 		publications_valid = false;
 
+		/* Init spill files compression method */
+		ctx->spill_compression_method = data->spill_compression_method;
+
 		/*
 		 * Register callback for pg_publication if we didn't already do that
 		 * during some previous call in this process.
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 6e07984e8d5..e660b3bd444 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4847,6 +4847,7 @@ getSubscriptions(Archive *fout)
 	int			i_suboriginremotelsn;
 	int			i_subenabled;
 	int			i_subfailover;
+	int			i_subspillcompression;
 	int			i,
 				ntups;
 
@@ -4919,10 +4920,17 @@ getSubscriptions(Archive *fout)
 
 	if (fout->remoteVersion >= 170000)
 		appendPQExpBufferStr(query,
-							 " s.subfailover\n");
+							 " s.subfailover,\n");
 	else
 		appendPQExpBuffer(query,
-						  " false AS subfailover\n");
+						  " false AS subfailover,\n");
+
+	if (fout->remoteVersion >= 180000)
+		appendPQExpBufferStr(query,
+							 " s.subspillcompression\n");
+	else
+		appendPQExpBuffer(query,
+						  " 'off' AS subspillcompression\n");
 
 	appendPQExpBufferStr(query,
 						 "FROM pg_subscription s\n");
@@ -4962,6 +4970,7 @@ getSubscriptions(Archive *fout)
 	i_suboriginremotelsn = PQfnumber(res, "suboriginremotelsn");
 	i_subenabled = PQfnumber(res, "subenabled");
 	i_subfailover = PQfnumber(res, "subfailover");
+	i_subspillcompression = PQfnumber(res, "subspillcompression");
 
 	subinfo = pg_malloc(ntups * sizeof(SubscriptionInfo));
 
@@ -5008,6 +5017,8 @@ getSubscriptions(Archive *fout)
 			pg_strdup(PQgetvalue(res, i, i_subenabled));
 		subinfo[i].subfailover =
 			pg_strdup(PQgetvalue(res, i, i_subfailover));
+		subinfo[i].subspillcompression =
+			pg_strdup(PQgetvalue(res, i, i_subspillcompression));
 
 		/* Decide whether we want to dump it */
 		selectDumpableObject(&(subinfo[i].dobj), fout);
@@ -5254,6 +5265,9 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	if (pg_strcasecmp(subinfo->suborigin, LOGICALREP_ORIGIN_ANY) != 0)
 		appendPQExpBuffer(query, ", origin = %s", subinfo->suborigin);
 
+	if (strcmp(subinfo->subspillcompression, "off") != 0)
+		appendPQExpBuffer(query, ", spill_compression = %s", subinfo->subspillcompression);
+
 	appendPQExpBufferStr(query, ");\n");
 
 	/*
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 9f907ed5ad4..ecbf2c2e276 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -673,6 +673,7 @@ typedef struct _SubscriptionInfo
 	char	   *suborigin;
 	char	   *suboriginremotelsn;
 	char	   *subfailover;
+	char	   *subspillcompression;
 } SubscriptionInfo;
 
 /*
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index ab6c8304913..dea2fc40eac 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -3001,9 +3001,9 @@ my %tests = (
 		create_order => 50,
 		create_sql => 'CREATE SUBSCRIPTION sub2
 						 CONNECTION \'dbname=doesnotexist\' PUBLICATION pub1
-						 WITH (connect = false, origin = none);',
+						 WITH (connect = false, origin = none, spill_compression = on);',
 		regexp => qr/^
-			\QCREATE SUBSCRIPTION sub2 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub2', origin = none);\E
+			\QCREATE SUBSCRIPTION sub2 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub2', origin = none, spill_compression = on);\E
 			/xm,
 		like => { %full_runs, section_post_data => 1, },
 	},
@@ -3012,9 +3012,9 @@ my %tests = (
 		create_order => 50,
 		create_sql => 'CREATE SUBSCRIPTION sub3
 						 CONNECTION \'dbname=doesnotexist\' PUBLICATION pub1
-						 WITH (connect = false, origin = any);',
+						 WITH (connect = false, origin = any, spill_compression = pglz);',
 		regexp => qr/^
-			\QCREATE SUBSCRIPTION sub3 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub3');\E
+			\QCREATE SUBSCRIPTION sub3 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub3', spill_compression = pglz);\E
 			/xm,
 		like => { %full_runs, section_post_data => 1, },
 	},
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index faabecbc76f..9378a03bf23 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -6547,7 +6547,7 @@ describeSubscriptions(const char *pattern, bool verbose)
 	printQueryOpt myopt = pset.popt;
 	static const bool translate_columns[] = {false, false, false, false,
 		false, false, false, false, false, false, false, false, false, false,
-	false};
+	false, false};
 
 	if (pset.sversion < 100000)
 	{
@@ -6627,6 +6627,11 @@ describeSubscriptions(const char *pattern, bool verbose)
 			appendPQExpBuffer(&buf,
 							  ", subskiplsn AS \"%s\"\n",
 							  gettext_noop("Skip LSN"));
+
+		if (pset.sversion >= 180000)
+			appendPQExpBuffer(&buf,
+							  ", subspillcompression AS \"%s\"\n",
+							  gettext_noop("Spill files compression"));
 	}
 
 	/* Only display subscriptions in current database. */
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a7ccde6d7df..728f2bd771e 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1948,7 +1948,7 @@ psql_completion(const char *text, int start, int end)
 	else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SET", "("))
 		COMPLETE_WITH("binary", "disable_on_error", "failover", "origin",
 					  "password_required", "run_as_owner", "slot_name",
-					  "streaming", "synchronous_commit", "two_phase");
+					  "spill_compression", "streaming", "synchronous_commit", "two_phase");
 	/* ALTER SUBSCRIPTION <name> SKIP ( */
 	else if (HeadMatches("ALTER", "SUBSCRIPTION", MatchAny) && TailMatches("SKIP", "("))
 		COMPLETE_WITH("lsn");
@@ -3345,7 +3345,8 @@ psql_completion(const char *text, int start, int end)
 		COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
 					  "disable_on_error", "enabled", "failover", "origin",
 					  "password_required", "run_as_owner", "slot_name",
-					  "streaming", "synchronous_commit", "two_phase");
+					  "spill_compression", "streaming", "synchronous_commit",
+					  "two_phase");
 
 /* CREATE TRIGGER --- is allowed inside CREATE SCHEMA, so use TailMatches */
 
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index 0aa14ec4a27..61c284349ca 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -113,6 +113,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
 
 	/* Only publish data originating from the specified origin */
 	text		suborigin BKI_DEFAULT(LOGICALREP_ORIGIN_ANY);
+
+	/* Spill files compression algorithm */
+	text		subspillcompression BKI_FORCE_NOT_NULL;
 #endif
 } FormData_pg_subscription;
 
@@ -157,6 +160,7 @@ typedef struct Subscription
 	List	   *publications;	/* List of publication names to subscribe to */
 	char	   *origin;			/* Only publish data originating from the
 								 * specified origin */
+	char	   *spill_compression;	/* Spill files compression algorithm */
 } Subscription;
 
 /* Disallow streaming in-progress transactions. */
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index aff38e8d049..75c17866c38 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -112,6 +112,8 @@ typedef struct LogicalDecodingContext
 
 	/* Do we need to process any change in fast_forward mode? */
 	bool		processing_required;
+	/* Compression method used to compress spill files */
+	uint8		spill_compression_method;
 } LogicalDecodingContext;
 
 
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 89f94e11472..eabcca62af9 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -33,6 +33,7 @@ typedef struct PGOutputData
 	bool		messages;
 	bool		two_phase;
 	bool		publish_no_origin;
+	uint8		spill_compression_method;
 } PGOutputData;
 
 #endif							/* PGOUTPUT_H */
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index 960064d571e..f72f0e0fbd7 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -26,6 +26,7 @@
 /* ReorderBuffer on disk compression algorithms */
 typedef enum ReorderBufferCompressionMethod
 {
+	REORDER_BUFFER_INVALID_COMPRESSION,
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_LZ4_COMPRESSION,
 	REORDER_BUFFER_PGLZ_COMPRESSION,
@@ -140,5 +141,8 @@ extern void ReorderBufferCompress(ReorderBuffer *rb,
 extern void ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 									ReorderBufferDiskHeader *header,
 									void *compressor_state);
+extern ReorderBufferCompressionMethod ReorderBufferParseCompressionMethod(const char *method);
+extern void ReorderBufferValidateCompressionMethod(const char *method,
+												   int elevel);
 
 #endif							/* REORDERBUFFER_COMPRESSION_H */
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 132e789948b..b759b5807d8 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -186,6 +186,7 @@ typedef struct
 									 * prepare time */
 			char	   *origin; /* Only publish data originating from the
 								 * specified origin */
+			char	   *spill_compression;	/* Spill files compression algo */
 		}			logical;
 	}			proto;
 } WalRcvStreamOptions;
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 17d48b16857..80ea7776513 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -116,18 +116,18 @@ CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PU
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+ regress_testsub4
-                                                                                                                 List of subscriptions
-       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
-------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | none   | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                              List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | none   | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub4 SET (origin = any);
 \dRs+ regress_testsub4
-                                                                                                                 List of subscriptions
-       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
-------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                              List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub3;
@@ -145,10 +145,10 @@ ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 ERROR:  invalid connection string syntax: missing "=" after "foobar" in connection info string
 
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET PUBLICATION testpub2, testpub3 WITH (refresh = false);
@@ -157,10 +157,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = 'newname');
 ALTER SUBSCRIPTION regress_testsub SET (password_required = false);
 ALTER SUBSCRIPTION regress_testsub SET (run_as_owner = true);
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | f                 | t             | f        | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | f                 | t             | f        | off                | dbname=regress_doesnotexist2 | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (password_required = true);
@@ -176,10 +176,10 @@ ERROR:  unrecognized subscription parameter: "create_slot"
 -- ok
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/12345');
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/12345
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/12345  | off
 (1 row)
 
 -- ok - with lsn = NONE
@@ -188,10 +188,10 @@ ALTER SUBSCRIPTION regress_testsub SKIP (lsn = NONE);
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/0');
 ERROR:  invalid WAL location (LSN): 0/0
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/0      | off
 (1 row)
 
 BEGIN;
@@ -222,11 +222,15 @@ ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = local);
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 ERROR:  invalid value for parameter "synchronous_commit": "foobar"
 HINT:  Available values: local, remote_write, remote_apply, on, off.
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = pglz);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = off);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = foobar);
+ERROR:  compression method "foobar" not valid
 \dRs+
-                                                                                                                       List of subscriptions
-        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
----------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | local              | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                    List of subscriptions
+        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+---------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | off       | d                | f                | any    | t                 | f             | f        | local              | dbname=regress_doesnotexist2 | 0/0      | off
 (1 row)
 
 -- rename back to keep the rest simple
@@ -255,19 +259,19 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | t      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | t      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (binary = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -279,27 +283,27 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = parallel);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 -- fail - publication already exists
@@ -314,10 +318,10 @@ ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refr
 ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refresh = false);
 ERROR:  publication "testpub1" is already in subscription "regress_testsub"
 \dRs+
-                                                                                                                        List of subscriptions
-      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                                     List of subscriptions
+      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 -- fail - publication used more than once
@@ -332,10 +336,10 @@ ERROR:  publication "testpub3" is not in subscription "regress_testsub"
 -- ok - delete publications
 ALTER SUBSCRIPTION regress_testsub DROP PUBLICATION testpub1, testpub2 WITH (refresh = false);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -371,19 +375,19 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 -- we can alter streaming when two_phase enabled
 ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -393,10 +397,10 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -409,18 +413,18 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | t                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | t                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 007c9e70374..368696f430c 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -140,6 +140,10 @@ ALTER SUBSCRIPTION regress_testsub RENAME TO regress_testsub_foo;
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = local);
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = pglz);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = off);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = foobar);
+
 \dRs+
 
 -- rename back to keep the rest simple
-- 
2.46.0

v4-0010-Add-ReorderBuffer-ondisk-compression-tests.patchtext/x-patch; charset=UTF-8; name=v4-0010-Add-ReorderBuffer-ondisk-compression-tests.patchDownload
From 238d391dfee9a1dc0020c17fe158f2a5d1bbf9da Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 18 Jul 2024 07:51:29 -0700
Subject: [PATCH v4 10/11] Add ReorderBuffer ondisk compression tests

---
 src/test/subscription/Makefile                |  2 +
 src/test/subscription/meson.build             |  7 +-
 .../t/034_reorderbuffer_compression.pl        | 99 +++++++++++++++++++
 3 files changed, 107 insertions(+), 1 deletion(-)
 create mode 100644 src/test/subscription/t/034_reorderbuffer_compression.pl

diff --git a/src/test/subscription/Makefile b/src/test/subscription/Makefile
index ce1ca430095..9341f1493c5 100644
--- a/src/test/subscription/Makefile
+++ b/src/test/subscription/Makefile
@@ -16,6 +16,8 @@ include $(top_builddir)/src/Makefile.global
 EXTRA_INSTALL = contrib/hstore
 
 export with_icu
+export with_lz4
+export with_zstd
 
 check:
 	$(prove_check)
diff --git a/src/test/subscription/meson.build b/src/test/subscription/meson.build
index c591cd7d619..772eeb817f6 100644
--- a/src/test/subscription/meson.build
+++ b/src/test/subscription/meson.build
@@ -5,7 +5,11 @@ tests += {
   'sd': meson.current_source_dir(),
   'bd': meson.current_build_dir(),
   'tap': {
-    'env': {'with_icu': icu.found() ? 'yes' : 'no'},
+    'env': {
+      'with_icu': icu.found() ? 'yes' : 'no',
+      'with_lz4': lz4.found() ? 'yes' : 'no',
+      'with_zstd': zstd.found() ? 'yes' : 'no',
+    },
     'tests': [
       't/001_rep_changes.pl',
       't/002_types.pl',
@@ -40,6 +44,7 @@ tests += {
       't/031_column_list.pl',
       't/032_subscribe_use_index.pl',
       't/033_run_as_table_owner.pl',
+      't/034_reorderbuffer_compression.pl',
       't/100_bugs.pl',
     ],
   },
diff --git a/src/test/subscription/t/034_reorderbuffer_compression.pl b/src/test/subscription/t/034_reorderbuffer_compression.pl
new file mode 100644
index 00000000000..65c9be14a22
--- /dev/null
+++ b/src/test/subscription/t/034_reorderbuffer_compression.pl
@@ -0,0 +1,99 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test ReorderBuffer compression
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub test_reorderbuffer_compression
+{
+	my ($node_publisher, $node_subscriber, $appname, $compression) = @_;
+
+	# Set subscriber's spill_compression option
+	$node_subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION tap_sub SET (spill_compression = $compression)");
+
+	# Make sure the table is empty
+	$node_publisher->safe_psql('postgres', 'TRUNCATE test_tab');
+
+	# Reset replication slot stats
+	$node_publisher->safe_psql('postgres',
+		"SELECT pg_stat_reset_replication_slot('tap_sub')");
+
+	# Insert 1 million rows in the table
+	$node_publisher->safe_psql('postgres',
+		"INSERT INTO test_tab SELECT i, 'Message number #'||i::TEXT FROM generate_series(1, 1000000) as i"
+	);
+
+	$node_publisher->wait_for_catchup($appname);
+
+	# Check if table content is replicated
+	my $result =
+	  $node_subscriber->safe_psql('postgres',
+		"SELECT count(*) FROM test_tab");
+	is($result, qq(1000000), 'check data was copied to subscriber');
+
+	# Check if the transaction was spilled on disk
+	my $res_stats =
+	  $node_publisher->safe_psql('postgres',
+		"SELECT spill_txns FROM pg_catalog.pg_stat_get_replication_slot('tap_sub');");
+	is($res_stats, qq(1), 'check if the transaction was spilled on disk');
+}
+
+# Create publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf',
+	'logical_decoding_work_mem = 64');
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$node_subscriber->init;
+$node_subscriber->start;
+
+# Setup structure on publisher
+$node_publisher->safe_psql('postgres',
+	"CREATE TABLE test_tab (a int primary key, b text)");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres',
+	"CREATE TABLE test_tab (a int primary key, b text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname = 'tap_sub';
+
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = off)"
+);
+
+test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+	'off');
+test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+	'pglz');
+
+SKIP:
+{
+	skip "LZ4 not supported by this build", 2 if ($ENV{with_lz4} ne 'yes');
+	test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+		'lz4');
+}
+
+SKIP:
+{
+	skip "ZSTD not supported by this build", 2 if ($ENV{with_zstd} ne 'yes');
+	test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+		'zstd');
+}
+
+$node_subscriber->stop;
+$node_publisher->stop;
+
+done_testing();
-- 
2.46.0

v4-0011-pgindent.patchtext/x-patch; charset=UTF-8; name=v4-0011-pgindent.patchDownload
From db7367f9fe2bf1072793fedbb749422e4977b5e0 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <tomas@vondra.me>
Date: Tue, 17 Sep 2024 17:23:26 +0200
Subject: [PATCH v4 11/11] pgindent

---
 .../replication/logical/reorderbuffer.c       |  11 +-
 .../logical/reorderbuffer_compression.c       | 285 +++++++++---------
 src/backend/replication/logical/worker.c      |   4 +-
 .../replication/reorderbuffer_compression.h   |  12 +-
 src/tools/pgindent/typedefs.list              |   5 +
 5 files changed, 166 insertions(+), 151 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 6545c310f7f..1b6ddecb83e 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -257,7 +257,7 @@ static Size ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *tx
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										TXNEntryFile *file, XLogSegNo *segno);
 static bool ReorderBufferReadOndiskChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
-								   TXNEntryFile *file, XLogSegNo *segno);
+										  TXNEntryFile *file, XLogSegNo *segno);
 static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 									   char *data);
 static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
@@ -3833,7 +3833,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	ReorderBufferSerializeReserve(rb, sz);
 
 	disk_hdr = (ReorderBufferDiskHeader *) rb->outbuf;
-	memcpy((char *)rb->outbuf + sizeof(ReorderBufferDiskHeader), change, sizeof(ReorderBufferChange));
+	memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), change, sizeof(ReorderBufferChange));
 
 	switch (change->action)
 	{
@@ -4345,9 +4345,8 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		}
 
 		/*
-		 * Read the full change from disk.
-		 * If ReorderBufferReadOndiskChange returns false, then we are at the
-		 * eof, so, move the next segment.
+		 * Read the full change from disk. If ReorderBufferReadOndiskChange
+		 * returns false, then we are at the eof, so, move the next segment.
 		 */
 		if (!ReorderBufferReadOndiskChange(rb, txn, file, segno))
 		{
@@ -4375,7 +4374,7 @@ ReorderBufferReadOndiskChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 {
 	int			readBytes;
 	ReorderBufferDiskHeader *disk_hdr;
-	char	   *header;			/* disk header buffer*/
+	char	   *header;			/* disk header buffer */
 	char	   *data;			/* data buffer */
 
 	/*
diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index 707f35cff7c..6d19c60b601 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -158,9 +158,9 @@ lz4_StreamingCompressData(MemoryContext context, char *src, Size src_size,
 #else
 	LZ4StreamingCompressorState *cstate;
 	int			lz4_cmp_size = 0;	/* compressed size */
-	char	   *buf;				/* buffer used for compression */
-	Size		buf_size;			/* buffer size */
-	char	   *lz4_in_bufPtr;		/* input ring buffer pointer */
+	char	   *buf;			/* buffer used for compression */
+	Size		buf_size;		/* buffer size */
+	char	   *lz4_in_bufPtr;	/* input ring buffer pointer */
 
 	cstate = (LZ4StreamingCompressorState *) compressor_state;
 
@@ -204,14 +204,14 @@ lz4_StreamingCompressData(MemoryContext context, char *src, Size src_size,
  * Data compression using LZ4 API.
  */
 static void
-lz4_CompressData(char *src, Size src_size, char **dst,  Size *dst_size)
+lz4_CompressData(char *src, Size src_size, char **dst, Size *dst_size)
 {
 #ifndef USE_LZ4
 	NO_LZ4_SUPPORT();
 #else
 	int			lz4_cmp_size = 0;	/* compressed size */
-	char	   *buf;				/* buffer used for compression */
-	Size		buf_size;			/* buffer size */
+	char	   *buf;			/* buffer used for compression */
+	Size		buf_size;		/* buffer size */
 
 	buf_size = LZ4_COMPRESSBOUND(src_size);
 	buf = (char *) palloc0(buf_size);
@@ -243,8 +243,8 @@ lz4_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
 	NO_LZ4_SUPPORT();
 #else
 	LZ4StreamingCompressorState *cstate;
-	char	   *lz4_out_bufPtr;		/* output ring buffer pointer */
-	int			lz4_dec_size;		/* decompressed data size */
+	char	   *lz4_out_bufPtr; /* output ring buffer pointer */
+	int			lz4_dec_size;	/* decompressed data size */
 
 	cstate = (LZ4StreamingCompressorState *) compressor_state;
 
@@ -273,8 +273,8 @@ lz4_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
 				 errmsg_internal("compressed LZ4 data is corrupted")));
 	else if (lz4_dec_size != dst_size)
 		ereport(ERROR,
-			(errcode(ERRCODE_DATA_CORRUPTED),
-			 errmsg_internal("decompressed LZ4 data size differs from original size")));
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("decompressed LZ4 data size differs from original size")));
 
 	/* Move the output ring buffer offset */
 	cstate->lz4_out_buf_offset += lz4_dec_size;
@@ -334,8 +334,8 @@ zstd_NewCompressorState(MemoryContext context)
 
 	/*
 	 * We do not allocate ZSTD buffers and contexts at this point because we
-	 * have no guarantee that we will need them later. Let's allocate only when
-	 * we are about to use them.
+	 * have no guarantee that we will need them later. Let's allocate only
+	 * when we are about to use them.
 	 */
 	cstate->zstd_c_ctx = NULL;
 	cstate->zstd_c_in_buf = NULL;
@@ -461,6 +461,7 @@ zstd_StreamingCompressData(MemoryContext context, char *src, Size src_size,
 	NO_ZSTD_SUPPORT();
 #else
 	ZSTDStreamingCompressorState *cstate;
+
 	/* Size of remaining data to be copied from src into ZSTD input buffer */
 	Size		toCpy = src_size;
 	char	   *dst_data;
@@ -479,16 +480,16 @@ zstd_StreamingCompressData(MemoryContext context, char *src, Size src_size,
 	/*
 	 * ZSTD streaming compression works with chunks: the source data needs to
 	 * be splitted out in chunks, each of them is then copied into ZSTD input
-	 * buffer.
-	 * For each chunk, we proceed with compression. Streaming compression is
-	 * not intended to compress the whole input chunk, so we have the call
-	 * ZSTD_compressStream2() multiple times until the entire chunk is
-	 * consumed.
+	 * buffer. For each chunk, we proceed with compression. Streaming
+	 * compression is not intended to compress the whole input chunk, so we
+	 * have the call ZSTD_compressStream2() multiple times until the entire
+	 * chunk is consumed.
 	 */
 	while (toCpy > 0)
 	{
 		/* Are we on the last chunk? */
 		bool		last_chunk = (toCpy < cstate->zstd_c_in_buf_size);
+
 		/* Size of the data copied into ZSTD input buffer */
 		Size		cpySize = last_chunk ? toCpy : cstate->zstd_c_in_buf_size;
 		bool		finished = false;
@@ -556,12 +557,13 @@ zstd_StreamingCompressData(MemoryContext context, char *src, Size src_size,
  */
 static void
 zstd_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
-							char **dst, Size dst_size, void *compressor_state)
+							 char **dst, Size dst_size, void *compressor_state)
 {
 #ifndef USE_ZSTD
 	NO_ZSTD_SUPPORT();
 #else
 	ZSTDStreamingCompressorState *cstate;
+
 	/* Size of remaining data to be copied from src into ZSTD input buffer */
 	Size		toCpy = src_size;
 	char	   *dst_data;
@@ -598,7 +600,7 @@ zstd_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
 			output.size = cstate->zstd_d_out_buf_size;
 			output.pos = 0;
 
-			ret = ZSTD_decompressStream(cstate->zstd_d_ctx, &output , &input);
+			ret = ZSTD_decompressStream(cstate->zstd_d_ctx, &output, &input);
 
 			if (ZSTD_isError(ret))
 				ereport(ERROR,
@@ -691,145 +693,148 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 
 	switch (compression_method)
 	{
-		/* No compression */
+			/* No compression */
 		case REORDER_BUFFER_NO_COMPRESSION:
-		{
-			hdr->comp_strat = REORDER_BUFFER_STRAT_UNCOMPRESSED;
-			hdr->size = data_size;
-			hdr->raw_size = data_size - sizeof(ReorderBufferDiskHeader);
-
-			break;
-		}
-		/* LZ4 Compression */
-		case REORDER_BUFFER_LZ4_COMPRESSION:
-		{
-			/*
-			 * XXX Won't this cause a lot of palloc/pfree traffic? We allocate a new
-			 * buffer for every compression, only to immediately throw it away. Look
-			 * at what astreamer_lz4_compressor_content does to reuse a buffer, maybe
-			 * we should do that here too?
-			 */
-			char	   *dst = NULL;
-			Size		dst_size = 0;
-			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
-			Size		src_size = data_size - sizeof(ReorderBufferDiskHeader);
-			ReorderBufferCompressionStrategy strat;
-
-			if (lz4_CanDoStreamingCompression(src_size))
 			{
-				/* Use LZ4 streaming compression if possible */
-				lz4_StreamingCompressData(rb->context, src, src_size, &dst,
-										  &dst_size, compressor_state);
-				strat = REORDER_BUFFER_STRAT_LZ4_STREAMING;
+				hdr->comp_strat = REORDER_BUFFER_STRAT_UNCOMPRESSED;
+				hdr->size = data_size;
+				hdr->raw_size = data_size - sizeof(ReorderBufferDiskHeader);
+
+				break;
 			}
-			else
+			/* LZ4 Compression */
+		case REORDER_BUFFER_LZ4_COMPRESSION:
 			{
-				/* Fallback to LZ4 regular compression */
-				lz4_CompressData(src, src_size, &dst, &dst_size);
-				strat = REORDER_BUFFER_STRAT_LZ4_REGULAR;
-			}
+				/*
+				 * XXX Won't this cause a lot of palloc/pfree traffic? We
+				 * allocate a new buffer for every compression, only to
+				 * immediately throw it away. Look at what
+				 * astreamer_lz4_compressor_content does to reuse a buffer,
+				 * maybe we should do that here too?
+				 */
+				char	   *dst = NULL;
+				Size		dst_size = 0;
+				char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+				Size		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+				ReorderBufferCompressionStrategy strat;
+
+				if (lz4_CanDoStreamingCompression(src_size))
+				{
+					/* Use LZ4 streaming compression if possible */
+					lz4_StreamingCompressData(rb->context, src, src_size, &dst,
+											  &dst_size, compressor_state);
+					strat = REORDER_BUFFER_STRAT_LZ4_STREAMING;
+				}
+				else
+				{
+					/* Fallback to LZ4 regular compression */
+					lz4_CompressData(src, src_size, &dst, &dst_size);
+					strat = REORDER_BUFFER_STRAT_LZ4_REGULAR;
+				}
 
-			/*
-			 * Make sure the ReorderBuffer has enough space to store compressed
-			 * data. Compressed data must be smaller than raw data, so, the
-			 * ReorderBuffer should already have room for compressed data, but
-			 * we do this to avoid buffer overflow risks.
-			 */
-			ReorderBufferReserve(rb, (dst_size + sizeof(ReorderBufferDiskHeader)));
+				/*
+				 * Make sure the ReorderBuffer has enough space to store
+				 * compressed data. Compressed data must be smaller than raw
+				 * data, so, the ReorderBuffer should already have room for
+				 * compressed data, but we do this to avoid buffer overflow
+				 * risks.
+				 */
+				ReorderBufferReserve(rb, (dst_size + sizeof(ReorderBufferDiskHeader)));
 
-			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
-			hdr->comp_strat = strat;
-			hdr->size = dst_size + sizeof(ReorderBufferDiskHeader);
-			hdr->raw_size = src_size;
+				hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+				hdr->comp_strat = strat;
+				hdr->size = dst_size + sizeof(ReorderBufferDiskHeader);
+				hdr->raw_size = src_size;
 
-			/*
-			 * Update header: hdr pointer has potentially changed due to
-			 * ReorderBufferReserve()
-			 */
-			*header = hdr;
+				/*
+				 * Update header: hdr pointer has potentially changed due to
+				 * ReorderBufferReserve()
+				 */
+				*header = hdr;
 
-			/* Copy back compressed data into the ReorderBuffer */
-			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
-				   dst_size);
+				/* Copy back compressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+					   dst_size);
 
-			pfree(dst);
+				pfree(dst);
+
+				break;
+			}
+			/* PGLZ compression */
 
-			break;
-		}
-		/* PGLZ compression */
-		/*
-		 * XXX I'd probably start by adding pglz first, and only then add lz4 as a
-		 * separate patch, simply because pglz is the default and always supported,
-		 * while lz4 requires extra flags.
-		 */
-		case REORDER_BUFFER_PGLZ_COMPRESSION:
-		{
 			/*
-			 * XXX Same comment about palloc/pfree traffic as for lz4 ...
+			 * XXX I'd probably start by adding pglz first, and only then add
+			 * lz4 as a separate patch, simply because pglz is the default and
+			 * always supported, while lz4 requires extra flags.
 			 */
-			int32		dst_size = 0;
-			char	   *dst = NULL;
-			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
-			int32		src_size = data_size - sizeof(ReorderBufferDiskHeader);
-			int32		max_size = PGLZ_MAX_OUTPUT(src_size);
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
+			{
+				/*
+				 * XXX Same comment about palloc/pfree traffic as for lz4 ...
+				 */
+				int32		dst_size = 0;
+				char	   *dst = NULL;
+				char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+				int32		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+				int32		max_size = PGLZ_MAX_OUTPUT(src_size);
 
-			dst = (char *) palloc0(max_size);
-			dst_size = pglz_compress(src, src_size, dst, PGLZ_strategy_always);
+				dst = (char *) palloc0(max_size);
+				dst_size = pglz_compress(src, src_size, dst, PGLZ_strategy_always);
 
-			if (dst_size < 0)
-				ereport(ERROR,
-						(errcode(ERRCODE_DATA_CORRUPTED),
-						 errmsg_internal("PGLZ compression failed")));
+				if (dst_size < 0)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("PGLZ compression failed")));
 
-			ReorderBufferReserve(rb, (Size) (dst_size + sizeof(ReorderBufferDiskHeader)));
+				ReorderBufferReserve(rb, (Size) (dst_size + sizeof(ReorderBufferDiskHeader)));
 
-			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
-			hdr->comp_strat = REORDER_BUFFER_STRAT_PGLZ;
-			hdr->size = (Size) dst_size + sizeof(ReorderBufferDiskHeader);
-			hdr->raw_size = (Size) src_size;
+				hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+				hdr->comp_strat = REORDER_BUFFER_STRAT_PGLZ;
+				hdr->size = (Size) dst_size + sizeof(ReorderBufferDiskHeader);
+				hdr->raw_size = (Size) src_size;
 
-			*header = hdr;
+				*header = hdr;
 
-			/* Copy back compressed data into the ReorderBuffer */
-			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
-				   dst_size);
+				/* Copy back compressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+					   dst_size);
 
-			pfree(dst);
+				pfree(dst);
 
-			break;
-		}
-		/* ZSTD Compression */
+				break;
+			}
+			/* ZSTD Compression */
 		case REORDER_BUFFER_ZSTD_COMPRESSION:
-		{
-			/*
-			 * XXX Same comment about palloc/pfree traffic as for lz4 ...
-			 */
-			char	   *dst = NULL;
-			Size		dst_size = 0;
-			char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
-			Size		src_size = data_size - sizeof(ReorderBufferDiskHeader);
+			{
+				/*
+				 * XXX Same comment about palloc/pfree traffic as for lz4 ...
+				 */
+				char	   *dst = NULL;
+				Size		dst_size = 0;
+				char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskHeader);
+				Size		src_size = data_size - sizeof(ReorderBufferDiskHeader);
 
-			/* Use ZSTD streaming compression */
-			zstd_StreamingCompressData(rb->context, src, src_size, &dst,
-									   &dst_size, compressor_state);
+				/* Use ZSTD streaming compression */
+				zstd_StreamingCompressData(rb->context, src, src_size, &dst,
+										   &dst_size, compressor_state);
 
-			ReorderBufferReserve(rb, (dst_size + sizeof(ReorderBufferDiskHeader)));
+				ReorderBufferReserve(rb, (dst_size + sizeof(ReorderBufferDiskHeader)));
 
-			hdr = (ReorderBufferDiskHeader *) rb->outbuf;
-			hdr->comp_strat = REORDER_BUFFER_STRAT_ZSTD_STREAMING;
-			hdr->size = dst_size + sizeof(ReorderBufferDiskHeader);
-			hdr->raw_size = src_size;
+				hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+				hdr->comp_strat = REORDER_BUFFER_STRAT_ZSTD_STREAMING;
+				hdr->size = dst_size + sizeof(ReorderBufferDiskHeader);
+				hdr->raw_size = src_size;
 
-			*header = hdr;
+				*header = hdr;
 
-			/* Copy back compressed data into the ReorderBuffer */
-			memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
-				   dst_size);
+				/* Copy back compressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+					   dst_size);
 
-			pfree(dst);
+				pfree(dst);
 
-			break;
-		}
+				break;
+			}
 	}
 }
 
@@ -847,6 +852,7 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 						ReorderBufferDiskHeader *header, void *compressor_state)
 {
 	Size		raw_outbufsize = header->raw_size + sizeof(ReorderBufferDiskHeader);
+
 	/*
 	 * Make sure the output reorder buffer has enough space to store
 	 * decompressed/raw data.
@@ -862,7 +868,7 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 
 	switch (header->comp_strat)
 	{
-		/* No decompression */
+			/* No decompression */
 		case REORDER_BUFFER_STRAT_UNCOMPRESSED:
 			{
 				/*
@@ -873,7 +879,7 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 					   data, header->raw_size);
 				break;
 			}
-		/* LZ4 regular decompression */
+			/* LZ4 regular decompression */
 		case REORDER_BUFFER_STRAT_LZ4_REGULAR:
 			{
 				char	   *buf;
@@ -889,26 +895,28 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 				pfree(buf);
 				break;
 			}
-		/* LZ4 streaming decompression */
+			/* LZ4 streaming decompression */
 		case REORDER_BUFFER_STRAT_LZ4_STREAMING:
 			{
-				char	   *buf;	/* XXX shouldn't this be set to NULL explicitly? */
+				char	   *buf;	/* XXX shouldn't this be set to NULL
+									 * explicitly? */
 				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
 				Size		buf_size = header->raw_size;
 
 				lz4_StreamingDecompressData(rb->context, data, src_size, &buf,
-										   buf_size, compressor_state);
+											buf_size, compressor_state);
 
 				/* Copy decompressed data into the ReorderBuffer */
 				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader),
 					   buf, buf_size);
+
 				/*
 				 * Not necessary to free buf in this case: it points to the
 				 * decompressed data stored in LZ4 output ring buffer.
 				 */
 				break;
 			}
-		/* PGLZ decompression */
+			/* PGLZ decompression */
 		case REORDER_BUFFER_STRAT_PGLZ:
 			{
 				char	   *buf;
@@ -928,10 +936,11 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 							 errmsg_internal("compressed PGLZ data is corrupted")));
 				break;
 			}
-		/* ZSTD streaming decompression */
+			/* ZSTD streaming decompression */
 		case REORDER_BUFFER_STRAT_ZSTD_STREAMING:
 			{
-				char	   *buf;	/* XXX shouldn't this be set to NULL explicitly? */
+				char	   *buf;	/* XXX shouldn't this be set to NULL
+									 * explicitly? */
 				Size		src_size = header->size - sizeof(ReorderBufferDiskHeader);
 				Size		buf_size = header->raw_size;
 
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index eb42e844f99..32b38b94dd7 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -4471,8 +4471,8 @@ set_stream_options(WalRcvStreamOptions *options,
 	}
 
 	if (server_version >= 180000 &&
-			 MySubscription->stream == LOGICALREP_STREAM_OFF &&
-			 MySubscription->spill_compression != NULL)
+		MySubscription->stream == LOGICALREP_STREAM_OFF &&
+		MySubscription->spill_compression != NULL)
 	{
 		options->proto.logical.spill_compression =
 			pstrdup(MySubscription->spill_compression);
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index f72f0e0fbd7..3f69d2c2610 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -56,9 +56,9 @@ typedef enum ReorderBufferCompressionStrategy
  */
 typedef struct ReorderBufferDiskHeader
 {
-	ReorderBufferCompressionStrategy comp_strat; /* Compression strategy */
-	Size		size;					/* Ondisk size */
-	Size		raw_size;				/* Raw/uncompressed data size */
+	ReorderBufferCompressionStrategy comp_strat;	/* Compression strategy */
+	Size		size;			/* Ondisk size */
+	Size		raw_size;		/* Raw/uncompressed data size */
 	/* ReorderBufferChange + data follows */
 } ReorderBufferDiskHeader;
 
@@ -81,7 +81,8 @@ typedef struct ReorderBufferDiskHeader
  * LZ4 streaming compression/decompression handlers and ring
  * buffers.
  */
-typedef struct LZ4StreamingCompressorState {
+typedef struct LZ4StreamingCompressorState
+{
 	/* Streaming compression handler */
 	LZ4_stream_t *lz4_stream;
 	/* Streaming decompression handler */
@@ -112,7 +113,8 @@ typedef struct LZ4StreamingCompressorState {
 /*
  * ZSTD streaming compression/decompression handlers and buffers.
  */
-typedef struct ZSTDStreamingCompressorState {
+typedef struct ZSTDStreamingCompressorState
+{
 	/* Compression */
 	ZSTD_CCtx  *zstd_c_ctx;
 	Size		zstd_c_in_buf_size;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ace5414fa5b..ea13d621f6c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1487,6 +1487,7 @@ LZ4F_decompressionContext_t
 LZ4F_errorCode_t
 LZ4F_preferences_t
 LZ4State
+LZ4StreamingCompressorState
 LabelProvider
 LagTracker
 LargeObjectDesc
@@ -2412,7 +2413,10 @@ ReorderBufferChange
 ReorderBufferChangeType
 ReorderBufferCommitCB
 ReorderBufferCommitPreparedCB
+ReorderBufferCompressionMethod
+ReorderBufferCompressionStrategy
 ReorderBufferDiskChange
+ReorderBufferDiskHeader
 ReorderBufferIterTXNEntry
 ReorderBufferIterTXNState
 ReorderBufferMessageCB
@@ -3247,6 +3251,7 @@ XmlTableBuilderData
 YYLTYPE
 YYSTYPE
 YY_BUFFER_STATE
+ZSTDStreamingCompressorState
 ZSTD_CCtx
 ZSTD_CStream
 ZSTD_DCtx
-- 
2.46.0

#25Tomas Vondra
tomas@vondra.me
In reply to: Tomas Vondra (#24)
5 attachment(s)
Re: Compress ReorderBuffer spill files using LZ4

Hi,

I've spent a bit more time on this, mostly running tests to get a better
idea of the practical benefits.

Firstly, I think there's a bug in ReorderBufferCompress() - it's legal
for pglz_compress() to return -1. This can happen if the data is not
compressible, and would not fit into the output buffer. The code can't
just do elog(ERROR) in this case, it needs to handle that by storing the
raw data. The attached fixup patch makes this work for me - I'm not
claiming this is the best way to handle this, but it works.

FWIW I find it strange the tests included in the patch did not trigger
this. That probably means the tests are not quite sufficient.

Now, to the testing. Attached are two scripts, testing different cases:

test-columns.sh - Table with a variable number of 'float8' columns.

test-toast.sh - Table with a single text column.

The script always sets up a publication/subscription on two instances,
generates certain amount of data (~1GB for columns, ~3.2GB for TOAST),
waits for it to be replicated to the replica, and measures how much data
was spilled to disk with the different compression methods (off, pglz
and lz4). There's a couple more metrics, but that's irrelevant here.

For the "column" test, it looks like this (this is in MB):

rows columns distribution off pglz lz4
========================================================
100000 1000 compressible 778 20 9
random 778 778 16
--------------------------------------------------------
1000000 100 compressible 916 116 62
random 916 916 67

It's very clear that for the "compressible" data (which just copies the
same value into all columns), both pglz and lz4 can significantly reduce
the amount of data. For 1000 columns it's 780MB -> 20MB/9MB, for 100
columns it's a bit less efficient, but still good.

For the "random" data (where every column gets a random value, but rows
are copied), it's a very different story - pglz does not help at all,
while lz4 still massively reduces the amount of spilled data.

I think the explanation is very simple - for pglz, we compress each row
on it's own, there's no concept of streaming/context. If a row is
compressible, it works fine, but when the row gets random, pglz can't
compress it at all. For lz4, this does not matter, because with the
streaming mode it still sees that rows are just repeated, and so can
compress them efficiently.

For TOAST test, the results look like this:

distribution repeats toast off pglz lz4
===============================================================
compressible 10000 lz4 14 2 1
pglz 40 4 3
1000 lz4 32 16 9
pglz 54 17 10
---------------------------------------------------------
random 10000 lz4 3305 3305 3157
pglz 3305 3305 3157
1000 lz4 3166 3162 1580
pglz 3334 3326 1745
----------------------------------------------------------
random2 10000 lz4 3305 3305 3157
pglz 3305 3305 3158
1000 lz4 3160 3156 3010
pglz 3334 3326 3172

The "repeats" value means how long the string is - it's the number of
"md5" hashes added to the string. The number of rows is calculated to
keep the total amount of data the same. The "toast" column tracks what
compression was used for TOAST, I was wondering if it matters.

This time there are three data distributions - compressible means that
each TOAST value is nicely compressible, "random" means each value is
random (not compressible), but the rows are just copy of the same value
(so on the whole there's a lot of redundancy). And "random2" means each
row is random and unique (so not compressible at all).

The table shows that with compressible TOAST values, compressing the
spill file is rather useless. The reason is that ReorderBufferCompress
is handling raw TOAST data, which is already compressed. Yes, it may
further reduce the amount of data, but it's negligible when compared to
the original amount of data.

For the random cases, the spill compression is rather pointless. Yes,
lz4 can reduce it to 1/2 for the shorter strings, but other than that
it's not very useful.

For a while I was thinking this approach is flawed, because it only sees
and compressed changes one by one, and that seeing a batch of changes
would improve this (e.g. we'd see the copied rows). But I realized lz4
already does that (in the streaming mode at least), and yet it does not
help very much. Presumably that depends on how large the context is. If
the random string is long enough, it won't help.

So maybe this approach is fine, and doing the compression at a lower
layer (for the whole file), would not really improve this. Even then
we'd only see a limited amount of data.

Maybe the right answer to this is that compression does not help cases
where most of the replicated data is TOAST, and that it can help cases
with wide (and redundant) rows, or repeated rows. And that lz4 is a
clearly superior choice. (This also raises the question if we want to
support REORDER_BUFFER_STRAT_LZ4_REGULAR. I haven't looked into this,
but doesn't that behave more like pglz, i.e. no context?)

FWIW when doing these tests, it made me realize how useful would it be
to track both the "raw" and "spilled" amounts. That is before/after
compression. It'd make calculating compression ratio much easier.

regards

--
Tomas Vondra

Attachments:

test-toast.shapplication/x-shellscript; name=test-toast.shDownload
test-columns.shapplication/x-shellscript; name=test-columns.shDownload
results-columns-1727083454.csvtext/csv; charset=UTF-8; name=results-columns-1727083454.csvDownload
results-toast-1727088557.csvtext/csv; charset=UTF-8; name=results-toast-1727088557.csvDownload
0001-compression-fixup.patchtext/x-patch; charset=UTF-8; name=0001-compression-fixup.patchDownload
From c0633fa03e7eefdf4bc5ab6f6608fe51368272a6 Mon Sep 17 00:00:00 2001
From: tomas <tomas>
Date: Wed, 18 Sep 2024 19:39:52 +0200
Subject: [PATCH] compression fixup

---
 .../logical/reorderbuffer_compression.c       | 40 +++++++++++++------
 1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer_compression.c b/src/backend/replication/logical/reorderbuffer_compression.c
index 6d19c60b60..6301ccc932 100644
--- a/src/backend/replication/logical/reorderbuffer_compression.c
+++ b/src/backend/replication/logical/reorderbuffer_compression.c
@@ -781,24 +781,38 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskHeader **header,
 				dst = (char *) palloc0(max_size);
 				dst_size = pglz_compress(src, src_size, dst, PGLZ_strategy_always);
 
-				if (dst_size < 0)
-					ereport(ERROR,
-							(errcode(ERRCODE_DATA_CORRUPTED),
-							 errmsg_internal("PGLZ compression failed")));
+				/*
+				 * If compression succeeded, build the proper compression header. If
+				 * compression fails, it means the data is not compressible. In that
+				 * case just build a no-compress item.
+				 */
+				if (dst_size > 0)		/* compressible */
+				{
+					ReorderBufferReserve(rb, (Size) (dst_size + sizeof(ReorderBufferDiskHeader)));
 
-				ReorderBufferReserve(rb, (Size) (dst_size + sizeof(ReorderBufferDiskHeader)));
+					hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+					hdr->comp_strat = REORDER_BUFFER_STRAT_PGLZ;
+					hdr->size = (Size) dst_size + sizeof(ReorderBufferDiskHeader);
+					hdr->raw_size = (Size) src_size;
 
-				hdr = (ReorderBufferDiskHeader *) rb->outbuf;
-				hdr->comp_strat = REORDER_BUFFER_STRAT_PGLZ;
-				hdr->size = (Size) dst_size + sizeof(ReorderBufferDiskHeader);
-				hdr->raw_size = (Size) src_size;
+					/* Copy back compressed data into the ReorderBuffer */
+					memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
+						   dst_size);
+				}
+				else					/* not compressible */
+				{
+					hdr = (ReorderBufferDiskHeader *) rb->outbuf;
+					hdr->comp_strat = REORDER_BUFFER_STRAT_UNCOMPRESSED;
+					hdr->size = (Size) src_size + sizeof(ReorderBufferDiskHeader);
+					hdr->raw_size = (Size) src_size;
+
+					/* Copy back compressed data into the ReorderBuffer */
+					memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), src,
+						   src_size);
+				}
 
 				*header = hdr;
 
-				/* Copy back compressed data into the ReorderBuffer */
-				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskHeader), dst,
-					   dst_size);
-
 				pfree(dst);
 
 				break;
-- 
2.39.2

#26Julien Tachoires
julmon@gmail.com
In reply to: Tomas Vondra (#25)
Re: Compress ReorderBuffer spill files using LZ4

Hi Tomas,

Le lun. 23 sept. 2024 à 18:13, Tomas Vondra <tomas@vondra.me> a écrit :

Hi,

I've spent a bit more time on this, mostly running tests to get a better
idea of the practical benefits.

Thank you for your code review and testing!

Firstly, I think there's a bug in ReorderBufferCompress() - it's legal
for pglz_compress() to return -1. This can happen if the data is not
compressible, and would not fit into the output buffer. The code can't
just do elog(ERROR) in this case, it needs to handle that by storing the
raw data. The attached fixup patch makes this work for me - I'm not
claiming this is the best way to handle this, but it works.

FWIW I find it strange the tests included in the patch did not trigger
this. That probably means the tests are not quite sufficient.

Now, to the testing. Attached are two scripts, testing different cases:

test-columns.sh - Table with a variable number of 'float8' columns.

test-toast.sh - Table with a single text column.

The script always sets up a publication/subscription on two instances,
generates certain amount of data (~1GB for columns, ~3.2GB for TOAST),
waits for it to be replicated to the replica, and measures how much data
was spilled to disk with the different compression methods (off, pglz
and lz4). There's a couple more metrics, but that's irrelevant here.

It would be interesting to run the same tests with zstd: in my early
testing I found that zstd was able to provide a better compression
ratio than lz4, but seemed to use more CPU resources/is slower.

For the "column" test, it looks like this (this is in MB):

rows columns distribution off pglz lz4
========================================================
100000 1000 compressible 778 20 9
random 778 778 16
--------------------------------------------------------
1000000 100 compressible 916 116 62
random 916 916 67

It's very clear that for the "compressible" data (which just copies the
same value into all columns), both pglz and lz4 can significantly reduce
the amount of data. For 1000 columns it's 780MB -> 20MB/9MB, for 100
columns it's a bit less efficient, but still good.

For the "random" data (where every column gets a random value, but rows
are copied), it's a very different story - pglz does not help at all,
while lz4 still massively reduces the amount of spilled data.

I think the explanation is very simple - for pglz, we compress each row
on it's own, there's no concept of streaming/context. If a row is
compressible, it works fine, but when the row gets random, pglz can't
compress it at all. For lz4, this does not matter, because with the
streaming mode it still sees that rows are just repeated, and so can
compress them efficiently.

That's correct.

For TOAST test, the results look like this:

distribution repeats toast off pglz lz4
===============================================================
compressible 10000 lz4 14 2 1
pglz 40 4 3
1000 lz4 32 16 9
pglz 54 17 10
---------------------------------------------------------
random 10000 lz4 3305 3305 3157
pglz 3305 3305 3157
1000 lz4 3166 3162 1580
pglz 3334 3326 1745
----------------------------------------------------------
random2 10000 lz4 3305 3305 3157
pglz 3305 3305 3158
1000 lz4 3160 3156 3010
pglz 3334 3326 3172

The "repeats" value means how long the string is - it's the number of
"md5" hashes added to the string. The number of rows is calculated to
keep the total amount of data the same. The "toast" column tracks what
compression was used for TOAST, I was wondering if it matters.

This time there are three data distributions - compressible means that
each TOAST value is nicely compressible, "random" means each value is
random (not compressible), but the rows are just copy of the same value
(so on the whole there's a lot of redundancy). And "random2" means each
row is random and unique (so not compressible at all).

The table shows that with compressible TOAST values, compressing the
spill file is rather useless. The reason is that ReorderBufferCompress
is handling raw TOAST data, which is already compressed. Yes, it may
further reduce the amount of data, but it's negligible when compared to
the original amount of data.

For the random cases, the spill compression is rather pointless. Yes,
lz4 can reduce it to 1/2 for the shorter strings, but other than that
it's not very useful.

It's still interesting to confirm that data already compressed or
random data cannot be significantly compressed.

For a while I was thinking this approach is flawed, because it only sees
and compressed changes one by one, and that seeing a batch of changes
would improve this (e.g. we'd see the copied rows). But I realized lz4
already does that (in the streaming mode at least), and yet it does not
help very much. Presumably that depends on how large the context is. If
the random string is long enough, it won't help.

So maybe this approach is fine, and doing the compression at a lower
layer (for the whole file), would not really improve this. Even then
we'd only see a limited amount of data.

Maybe the right answer to this is that compression does not help cases
where most of the replicated data is TOAST, and that it can help cases
with wide (and redundant) rows, or repeated rows. And that lz4 is a
clearly superior choice. (This also raises the question if we want to
support REORDER_BUFFER_STRAT_LZ4_REGULAR. I haven't looked into this,
but doesn't that behave more like pglz, i.e. no context?)

I'm working on a new version of this patch set that will include the
changes you suggested in your review. About using LZ4 regular API, the
goal was to use it when we cannot use the streaming API due to raw
data larger than LZ4 ring buffer. But this is something I'm going to
delete in the new version because I'm planning to use a similar
approach as we do in astreamer_lz4.c: using frames, not blocks. LZ4
frame API looks very similar to ZSTD's streaming API.

FWIW when doing these tests, it made me realize how useful would it be
to track both the "raw" and "spilled" amounts. That is before/after
compression. It'd make calculating compression ratio much easier.

Yes, that's why I tried to "fix" the spill_bytes counter.

Regards,

JT

#27Tomas Vondra
tomas@vondra.me
In reply to: Julien Tachoires (#26)
Re: Compress ReorderBuffer spill files using LZ4

On 9/23/24 21:58, Julien Tachoires wrote:

Hi Tomas,

Le lun. 23 sept. 2024 à 18:13, Tomas Vondra <tomas@vondra.me> a écrit :

Hi,

I've spent a bit more time on this, mostly running tests to get a better
idea of the practical benefits.

Thank you for your code review and testing!

Firstly, I think there's a bug in ReorderBufferCompress() - it's legal
for pglz_compress() to return -1. This can happen if the data is not
compressible, and would not fit into the output buffer. The code can't
just do elog(ERROR) in this case, it needs to handle that by storing the
raw data. The attached fixup patch makes this work for me - I'm not
claiming this is the best way to handle this, but it works.

FWIW I find it strange the tests included in the patch did not trigger
this. That probably means the tests are not quite sufficient.

Now, to the testing. Attached are two scripts, testing different cases:

test-columns.sh - Table with a variable number of 'float8' columns.

test-toast.sh - Table with a single text column.

The script always sets up a publication/subscription on two instances,
generates certain amount of data (~1GB for columns, ~3.2GB for TOAST),
waits for it to be replicated to the replica, and measures how much data
was spilled to disk with the different compression methods (off, pglz
and lz4). There's a couple more metrics, but that's irrelevant here.

It would be interesting to run the same tests with zstd: in my early
testing I found that zstd was able to provide a better compression
ratio than lz4, but seemed to use more CPU resources/is slower.

Oh, I completely forgot about zstd. I don't think it'd substantially
change the conclusions, though. It might compress better/worse for some
cases, but the overall behavior would remain the same.

I can't test this right now, the testmachine is busy with some other
stuff. But it should not be difficult to update the test scripts I
attached and get results yourself. There's a couple hard-coded paths
that need to be updated, ofc.

For the "column" test, it looks like this (this is in MB):

rows columns distribution off pglz lz4
========================================================
100000 1000 compressible 778 20 9
random 778 778 16
--------------------------------------------------------
1000000 100 compressible 916 116 62
random 916 916 67

It's very clear that for the "compressible" data (which just copies the
same value into all columns), both pglz and lz4 can significantly reduce
the amount of data. For 1000 columns it's 780MB -> 20MB/9MB, for 100
columns it's a bit less efficient, but still good.

For the "random" data (where every column gets a random value, but rows
are copied), it's a very different story - pglz does not help at all,
while lz4 still massively reduces the amount of spilled data.

I think the explanation is very simple - for pglz, we compress each row
on it's own, there's no concept of streaming/context. If a row is
compressible, it works fine, but when the row gets random, pglz can't
compress it at all. For lz4, this does not matter, because with the
streaming mode it still sees that rows are just repeated, and so can
compress them efficiently.

That's correct.

For TOAST test, the results look like this:

distribution repeats toast off pglz lz4
===============================================================
compressible 10000 lz4 14 2 1
pglz 40 4 3
1000 lz4 32 16 9
pglz 54 17 10
---------------------------------------------------------
random 10000 lz4 3305 3305 3157
pglz 3305 3305 3157
1000 lz4 3166 3162 1580
pglz 3334 3326 1745
----------------------------------------------------------
random2 10000 lz4 3305 3305 3157
pglz 3305 3305 3158
1000 lz4 3160 3156 3010
pglz 3334 3326 3172

The "repeats" value means how long the string is - it's the number of
"md5" hashes added to the string. The number of rows is calculated to
keep the total amount of data the same. The "toast" column tracks what
compression was used for TOAST, I was wondering if it matters.

This time there are three data distributions - compressible means that
each TOAST value is nicely compressible, "random" means each value is
random (not compressible), but the rows are just copy of the same value
(so on the whole there's a lot of redundancy). And "random2" means each
row is random and unique (so not compressible at all).

The table shows that with compressible TOAST values, compressing the
spill file is rather useless. The reason is that ReorderBufferCompress
is handling raw TOAST data, which is already compressed. Yes, it may
further reduce the amount of data, but it's negligible when compared to
the original amount of data.

For the random cases, the spill compression is rather pointless. Yes,
lz4 can reduce it to 1/2 for the shorter strings, but other than that
it's not very useful.

It's still interesting to confirm that data already compressed or
random data cannot be significantly compressed.

For a while I was thinking this approach is flawed, because it only sees
and compressed changes one by one, and that seeing a batch of changes
would improve this (e.g. we'd see the copied rows). But I realized lz4
already does that (in the streaming mode at least), and yet it does not
help very much. Presumably that depends on how large the context is. If
the random string is long enough, it won't help.

So maybe this approach is fine, and doing the compression at a lower
layer (for the whole file), would not really improve this. Even then
we'd only see a limited amount of data.

Maybe the right answer to this is that compression does not help cases
where most of the replicated data is TOAST, and that it can help cases
with wide (and redundant) rows, or repeated rows. And that lz4 is a
clearly superior choice. (This also raises the question if we want to
support REORDER_BUFFER_STRAT_LZ4_REGULAR. I haven't looked into this,
but doesn't that behave more like pglz, i.e. no context?)

I'm working on a new version of this patch set that will include the
changes you suggested in your review. About using LZ4 regular API, the
goal was to use it when we cannot use the streaming API due to raw
data larger than LZ4 ring buffer. But this is something I'm going to
delete in the new version because I'm planning to use a similar
approach as we do in astreamer_lz4.c: using frames, not blocks. LZ4
frame API looks very similar to ZSTD's streaming API.

FWIW when doing these tests, it made me realize how useful would it be
to track both the "raw" and "spilled" amounts. That is before/after
compression. It'd make calculating compression ratio much easier.

Yes, that's why I tried to "fix" the spill_bytes counter.

But I think the 'fixed' counter only tracks the data after the new
compression, right? I'm suggesting to have two counters - one for "raw"
data (before compression) and "compressed" (after compression).

regards

--
Tomas Vondra

#28Julien Tachoires
julmon@gmail.com
In reply to: Tomas Vondra (#27)
6 attachment(s)
Re: Compress ReorderBuffer spill files using LZ4

Hi Tomas,

Please find a new version of this patch set. I think I have addressed
all the feedbacks you made. But, since
1bf1140be87230c71d0e7b29939f7e2b3d073aa1 the streaming option is now
set to "parallel" by default, making this patch set almost useless.

Regards,

JT

Attachments:

v5-0001-Compress-ReorderBuffer-spill-files-using-PGLZ.patchapplication/octet-stream; name=v5-0001-Compress-ReorderBuffer-spill-files-using-PGLZ.patchDownload
From 1d427b7b61f69e0402d093b6dd35096638cfd033 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Sun, 22 Sep 2024 16:59:10 +0200
Subject: [PATCH 1/6] Compress ReorderBuffer spill files using PGLZ

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.

This behavior happens only when the subscriber's option "streaming"
is set to "off", which is the default value. In this case, decoding
of large transactions by multiple replication slots can lead to disk
space saturation and high I/O utilization.

This patch enables data compression/decompression of these temporary
files. Each transaction change that must be written on disk is now
compressed using the in-house compression method PGLZ.
---
 src/backend/replication/logical/Makefile      |   1 +
 src/backend/replication/logical/meson.build   |   1 +
 .../replication/logical/reorderbuffer.c       | 356 ++++++++++++++----
 .../replication/logical/reorderbuffer_pglz.c  |  56 +++
 src/include/replication/reorderbuffer.h       |   6 +
 .../replication/reorderbuffer_compression.h   |  42 +++
 6 files changed, 392 insertions(+), 70 deletions(-)
 create mode 100644 src/backend/replication/logical/reorderbuffer_pglz.c
 create mode 100644 src/include/replication/reorderbuffer_compression.h

diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 1e08bbbd4e..96c733d009 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -26,6 +26,7 @@ OBJS = \
 	proto.o \
 	relation.o \
 	reorderbuffer.o \
+	reorderbuffer_pglz.o \
 	slotsync.o \
 	snapbuild.o \
 	tablesync.o \
diff --git a/src/backend/replication/logical/meson.build b/src/backend/replication/logical/meson.build
index 3d36249d8a..48df907b35 100644
--- a/src/backend/replication/logical/meson.build
+++ b/src/backend/replication/logical/meson.build
@@ -12,6 +12,7 @@ backend_sources += files(
   'proto.c',
   'relation.c',
   'reorderbuffer.c',
+  'reorderbuffer_pglz.c',
   'slotsync.c',
   'snapbuild.c',
   'tablesync.c',
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index e3a5c7b660..649b196d5f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -97,11 +97,13 @@
 #include "access/xlog_internal.h"
 #include "catalog/catalog.h"
 #include "common/int.h"
+#include "common/pg_lzcompress.h"
 #include "lib/binaryheap.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slot.h"
 #include "replication/snapbuild.h"	/* just for SnapBuildSnapDecRefcount */
 #include "storage/bufmgr.h"
@@ -176,9 +178,10 @@ typedef struct ReorderBufferToastEnt
 /* Disk serialization support datastructures */
 typedef struct ReorderBufferDiskChange
 {
-	Size		size;
-	ReorderBufferChange change;
-	/* data follows */
+	ReorderBufferCompressionStrategy comp_strat;	/* Compression strategy */
+	Size		size;			/* Ondisk size */
+	Size		raw_size;		/* Raw/uncompressed data size */
+	/* ReorderBufferChange + data follow */
 } ReorderBufferDiskChange;
 
 #define IsSpecInsert(action) \
@@ -215,6 +218,9 @@ static const Size max_changes_in_memory = 4096; /* XXX for restore only */
 /* GUC variable */
 int			debug_logical_replication_streaming = DEBUG_LOGICAL_REP_STREAMING_BUFFERED;
 
+/* Compression strategy for spilled data. */
+int			logical_decoding_spill_compression = REORDER_BUFFER_PGLZ_COMPRESSION;
+
 /* ---------------------------------------
  * primary reorderbuffer support routines
  * ---------------------------------------
@@ -255,6 +261,8 @@ static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *tx
 										 int fd, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										TXNEntryFile *file, XLogSegNo *segno);
+static bool ReorderBufferReadOndiskChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+										  TXNEntryFile *file, XLogSegNo *segno);
 static void ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 									   char *data);
 static void ReorderBufferRestoreCleanup(ReorderBuffer *rb, ReorderBufferTXN *txn);
@@ -301,6 +309,24 @@ static void ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
 											ReorderBufferTXN *txn,
 											bool addition, Size sz);
 
+/*
+ * ---------------------------------------
+ * spill files compression
+ * ---------------------------------------
+ */
+static void *ReorderBufferNewCompressorState(MemoryContext context,
+											 int compression_method);
+static void ReorderBufferFreeCompressorState(MemoryContext context,
+											 int compression_method,
+											 void *compressor_state);
+static void ReorderBufferCompress(ReorderBuffer *rb,
+								  ReorderBufferDiskChange **ondisk,
+								  int compression_method, Size data_size,
+								  void *compressor_state);
+static void ReorderBufferDecompress(ReorderBuffer *rb, char *data,
+									ReorderBufferDiskChange *ondisk,
+									void *compressor_state);
+
 /*
  * Allocate a new ReorderBuffer and clean out any old serialized state from
  * prior ReorderBuffer instances for the same slot.
@@ -432,6 +458,8 @@ ReorderBufferGetTXN(ReorderBuffer *rb)
 	/* InvalidCommandId is not zero, so set it explicitly */
 	txn->command_id = InvalidCommandId;
 	txn->output_plugin_private = NULL;
+	txn->compressor_state = ReorderBufferNewCompressorState(rb->context,
+															logical_decoding_spill_compression);
 
 	return txn;
 }
@@ -469,6 +497,10 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 		txn->invalidations = NULL;
 	}
 
+	ReorderBufferFreeCompressorState(rb->context,
+									 logical_decoding_spill_compression,
+									 txn->compressor_state);
+
 	/* Reset the toast hash */
 	ReorderBufferToastReset(rb, txn);
 
@@ -3806,12 +3838,12 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 							 int fd, ReorderBufferChange *change)
 {
 	ReorderBufferDiskChange *ondisk;
-	Size		sz = sizeof(ReorderBufferDiskChange);
+	Size		sz = sizeof(ReorderBufferDiskChange) + sizeof(ReorderBufferChange);
 
 	ReorderBufferSerializeReserve(rb, sz);
 
 	ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-	memcpy(&ondisk->change, change, sizeof(ReorderBufferChange));
+	memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskChange), change, sizeof(ReorderBufferChange));
 
 	switch (change->action)
 	{
@@ -3847,7 +3879,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
 				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
 
@@ -3879,7 +3911,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 					sizeof(Size) + sizeof(Size);
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange) + sizeof(ReorderBufferChange);
 
 				/* might have been reallocated above */
 				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
@@ -3909,7 +3941,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				sz += inval_size;
 
 				ReorderBufferSerializeReserve(rb, sz);
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange) + sizeof(ReorderBufferChange);
 
 				/* might have been reallocated above */
 				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
@@ -3931,7 +3963,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
 				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
 
@@ -3965,7 +3997,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 				/* make sure we have enough space */
 				ReorderBufferSerializeReserve(rb, sz);
 
-				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange);
+				data = ((char *) rb->outbuf) + sizeof(ReorderBufferDiskChange) + sizeof(ReorderBufferChange);
 				/* might have been reallocated above */
 				ondisk = (ReorderBufferDiskChange *) rb->outbuf;
 
@@ -3982,7 +4014,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 			break;
 	}
 
-	ondisk->size = sz;
+	/* Inplace ReorderBuffer content compression before writing it on disk */
+	ReorderBufferCompress(rb, &ondisk, logical_decoding_spill_compression,
+						  sz, txn->compressor_state);
 
 	errno = 0;
 	pgstat_report_wait_start(WAIT_EVENT_REORDER_BUFFER_WRITE);
@@ -4011,8 +4045,6 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	 */
 	if (txn->final_lsn < change->lsn)
 		txn->final_lsn = change->lsn;
-
-	Assert(ondisk->change.action == change->action);
 }
 
 /* Returns true, if the output plugin supports streaming, false, otherwise. */
@@ -4281,9 +4313,6 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 	while (restored < max_changes_in_memory && *segno <= last_segno)
 	{
-		int			readBytes;
-		ReorderBufferDiskChange *ondisk;
-
 		CHECK_FOR_INTERRUPTS();
 
 		if (*fd == -1)
@@ -4322,60 +4351,15 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 		}
 
 		/*
-		 * Read the statically sized part of a change which has information
-		 * about the total size. If we couldn't read a record, we're at the
-		 * end of this file.
+		 * Read the full change from disk. If ReorderBufferReadOndiskChange
+		 * returns false, then we are at the eof, so, move to the next
+		 * segment.
 		 */
-		ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
-		readBytes = FileRead(file->vfd, rb->outbuf,
-							 sizeof(ReorderBufferDiskChange),
-							 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
-
-		/* eof */
-		if (readBytes == 0)
+		if (!ReorderBufferReadOndiskChange(rb, txn, file, segno))
 		{
-			FileClose(*fd);
 			*fd = -1;
-			(*segno)++;
 			continue;
 		}
-		else if (readBytes < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: %m")));
-		else if (readBytes != sizeof(ReorderBufferDiskChange))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
-							readBytes,
-							(uint32) sizeof(ReorderBufferDiskChange))));
-
-		file->curOffset += readBytes;
-
-		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-
-		ReorderBufferSerializeReserve(rb,
-									  sizeof(ReorderBufferDiskChange) + ondisk->size);
-		ondisk = (ReorderBufferDiskChange *) rb->outbuf;
-
-		readBytes = FileRead(file->vfd,
-							 rb->outbuf + sizeof(ReorderBufferDiskChange),
-							 ondisk->size - sizeof(ReorderBufferDiskChange),
-							 file->curOffset,
-							 WAIT_EVENT_REORDER_BUFFER_READ);
-
-		if (readBytes < 0)
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: %m")));
-		else if (readBytes != ondisk->size - sizeof(ReorderBufferDiskChange))
-			ereport(ERROR,
-					(errcode_for_file_access(),
-					 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
-							readBytes,
-							(uint32) (ondisk->size - sizeof(ReorderBufferDiskChange)))));
-
-		file->curOffset += readBytes;
 
 		/*
 		 * ok, read a full change from disk, now restore it into proper
@@ -4388,6 +4372,83 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	return restored;
 }
 
+/*
+ * Read a change spilled to disk and decompress it if compressed.
+ */
+static bool
+ReorderBufferReadOndiskChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+							  TXNEntryFile *file, XLogSegNo *segno)
+{
+	int			readBytes;
+	ReorderBufferDiskChange *ondisk;
+	char	   *header;			/* header buffer */
+	char	   *data;			/* data buffer */
+
+	/*
+	 * Read the statically sized part of a change which has information about
+	 * the total size and compression method. If we couldn't read a record,
+	 * we're at the end of this file.
+	 */
+	header = (char *) palloc0(sizeof(ReorderBufferDiskChange));
+	readBytes = FileRead(file->vfd, header,
+						 sizeof(ReorderBufferDiskChange),
+						 file->curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
+
+	/* eof */
+	if (readBytes == 0)
+	{
+
+		FileClose(file->vfd);
+		(*segno)++;
+		pfree(header);
+
+		return false;
+	}
+	else if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != sizeof(ReorderBufferDiskChange))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) sizeof(ReorderBufferDiskChange))));
+
+	file->curOffset += readBytes;
+
+	ondisk = (ReorderBufferDiskChange *) header;
+
+	/* Read ondisk data */
+	data = (char *) palloc0(ondisk->size - sizeof(ReorderBufferDiskChange));
+	readBytes = FileRead(file->vfd,
+						 data,
+						 ondisk->size - sizeof(ReorderBufferDiskChange),
+						 file->curOffset,
+						 WAIT_EVENT_REORDER_BUFFER_READ);
+
+	if (readBytes < 0)
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: %m")));
+	else if (readBytes != (ondisk->size - sizeof(ReorderBufferDiskChange)))
+		ereport(ERROR,
+				(errcode_for_file_access(),
+				 errmsg("could not read from reorderbuffer spill file: read %d instead of %u bytes",
+						readBytes,
+						(uint32) (ondisk->size - sizeof(ReorderBufferDiskChange)))));
+
+	/* Decompress data */
+	ReorderBufferDecompress(rb, data, ondisk, txn->compressor_state);
+
+	pfree(data);
+	pfree(header);
+
+	file->curOffset += readBytes;
+
+	return true;
+}
+
 /*
  * Convert change from its on-disk format to in-memory format and queue it onto
  * the TXN's ->changes list.
@@ -4400,17 +4461,14 @@ static void
 ReorderBufferRestoreChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 						   char *data)
 {
-	ReorderBufferDiskChange *ondisk;
 	ReorderBufferChange *change;
 
-	ondisk = (ReorderBufferDiskChange *) data;
-
 	change = ReorderBufferGetChange(rb);
 
 	/* copy static part */
-	memcpy(change, &ondisk->change, sizeof(ReorderBufferChange));
+	memcpy(change, data + sizeof(ReorderBufferDiskChange), sizeof(ReorderBufferChange));
 
-	data += sizeof(ReorderBufferDiskChange);
+	data += sizeof(ReorderBufferDiskChange) + sizeof(ReorderBufferChange);
 
 	/* restore individual stuff */
 	switch (change->action)
@@ -5338,3 +5396,161 @@ restart:
 		*cmax = ent->cmax;
 	return true;
 }
+
+/*
+ * Allocate a new Compressor State, depending on the compression method.
+ */
+static void *
+ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
+{
+	switch (compression_method)
+	{
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
+			return pglz_NewCompressorState(context);
+			break;
+		case REORDER_BUFFER_NO_COMPRESSION:
+		default:
+			return NULL;
+			break;
+	}
+}
+
+/*
+ * Free memory allocated to a Compressor State, depending on the compression
+ * method.
+ */
+static void
+ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
+								 void *compressor_state)
+{
+	switch (compression_method)
+	{
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
+			return pglz_FreeCompressorState(context, compressor_state);
+			break;
+		case REORDER_BUFFER_NO_COMPRESSION:
+		default:
+			break;
+	}
+}
+
+/*
+ * Compress ReorderBuffer content. This function is called in order to compress
+ * data before spilling on disk.
+ */
+static void
+ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskChange **ondisk,
+					  int compression_method, Size data_size, void *compressor_state)
+{
+	ReorderBufferDiskChange *hdr = *ondisk;
+
+	switch (compression_method)
+	{
+			/* No compression */
+		case REORDER_BUFFER_NO_COMPRESSION:
+			{
+				hdr->comp_strat = REORDER_BUFFER_STRAT_UNCOMPRESSED;
+				hdr->size = data_size;
+				hdr->raw_size = data_size - sizeof(ReorderBufferDiskChange);
+
+				break;
+			}
+			/* PGLZ compression */
+		case REORDER_BUFFER_PGLZ_COMPRESSION:
+			{
+				int32		dst_size = 0;
+				char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskChange);
+				int32		src_size = data_size - sizeof(ReorderBufferDiskChange);
+				int32		max_size = PGLZ_MAX_OUTPUT(src_size);
+				PGLZCompressorState *cstate = (PGLZCompressorState *) compressor_state;
+
+				/*
+				 * Make sure the buffer used for data compression has enough
+				 * space.
+				 */
+				enlargeStringInfo(cstate->buf, max_size);
+
+				dst_size = pglz_compress(src, src_size, cstate->buf->data, PGLZ_strategy_always);
+
+				cstate->buf->len = dst_size;
+
+				if (dst_size < 0)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("PGLZ compression failed")));
+
+				ReorderBufferSerializeReserve(rb, (Size) (dst_size + sizeof(ReorderBufferDiskChange)));
+
+				hdr = (ReorderBufferDiskChange *) rb->outbuf;
+				hdr->comp_strat = REORDER_BUFFER_STRAT_PGLZ;
+				hdr->size = (Size) dst_size + sizeof(ReorderBufferDiskChange);
+				hdr->raw_size = (Size) src_size;
+
+				*ondisk = hdr;
+
+				/* Copy back compressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskChange),
+					   cstate->buf->data, cstate->buf->len);
+
+				break;
+			}
+		default:
+			/* Other compression methods not yet supported */
+			break;
+	}
+}
+
+/*
+ * Decompress data read from disk and copy it into the ReorderBuffer.
+ */
+static void
+ReorderBufferDecompress(ReorderBuffer *rb, char *data,
+						ReorderBufferDiskChange *ondisk, void *compressor_state)
+{
+	/*
+	 * Make sure the output reorder buffer has enough space to store
+	 * decompressed/raw data.
+	 */
+	ReorderBufferSerializeReserve(rb, (Size) (ondisk->raw_size + sizeof(ReorderBufferDiskChange)));
+
+	/* Make a copy of the header read on disk into the ReorderBuffer */
+	memcpy(rb->outbuf, (char *) ondisk, sizeof(ReorderBufferDiskChange));
+
+	switch (ondisk->comp_strat)
+	{
+			/* No decompression */
+		case REORDER_BUFFER_STRAT_UNCOMPRESSED:
+			{
+				/*
+				 * Make a copy of what was read on disk into the reorder
+				 * buffer.
+				 */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskChange),
+					   data, ondisk->raw_size);
+				break;
+			}
+			/* PGLZ decompression */
+		case REORDER_BUFFER_STRAT_PGLZ:
+			{
+				char	   *buf = NULL;
+				int32		src_size = (int32) ondisk->size - sizeof(ReorderBufferDiskChange);
+				int32		buf_size = (int32) ondisk->raw_size;
+				int32		decBytes;
+
+				/* Decompress data directly into the ReorderBuffer */
+				buf = (char *) rb->outbuf;
+				buf += sizeof(ReorderBufferDiskChange);
+
+				decBytes = pglz_decompress(data, src_size, buf, buf_size, false);
+
+				if (decBytes < 0)
+					ereport(ERROR,
+							(errcode(ERRCODE_DATA_CORRUPTED),
+							 errmsg_internal("compressed PGLZ data is corrupted")));
+				break;
+			}
+		default:
+			/* Other compression methods not yet supported */
+			break;
+	}
+}
diff --git a/src/backend/replication/logical/reorderbuffer_pglz.c b/src/backend/replication/logical/reorderbuffer_pglz.c
new file mode 100644
index 0000000000..1dea81c9fe
--- /dev/null
+++ b/src/backend/replication/logical/reorderbuffer_pglz.c
@@ -0,0 +1,56 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer_pglz.c
+ *	  Functions used for ReorderBuffer compression using PGLZ.
+ *
+ * Copyright (c) 2024-2024, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/common/reorderbuffer_pglz.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "replication/reorderbuffer_compression.h"
+
+/*
+ * Allocate a new PGLZCompressorState.
+ */
+void *
+pglz_NewCompressorState(MemoryContext context)
+{
+	PGLZCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (PGLZCompressorState *)
+		MemoryContextAlloc(context, sizeof(PGLZCompressorState));
+
+	cstate->buf = makeStringInfo();
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return (void *) cstate;
+}
+
+/*
+ * Free PGLZ memory resources and compressor state.
+ */
+void
+pglz_FreeCompressorState(MemoryContext context, void *compressor_state)
+{
+	PGLZCompressorState *cstate;
+	MemoryContext oldcontext;
+
+	if (compressor_state == NULL)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (PGLZCompressorState *) compressor_state;
+
+	destroyStringInfo(cstate->buf);
+	pfree(compressor_state);
+
+	MemoryContextSwitchTo(oldcontext);
+}
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 6ad5a8cb9c..5f231b5f90 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -26,6 +26,7 @@
 /* GUC variables */
 extern PGDLLIMPORT int logical_decoding_work_mem;
 extern PGDLLIMPORT int debug_logical_replication_streaming;
+extern PGDLLIMPORT int logical_decoding_spill_compression;
 
 /* possible values for debug_logical_replication_streaming */
 typedef enum
@@ -426,6 +427,11 @@ typedef struct ReorderBufferTXN
 	 * Private data pointer of the output plugin.
 	 */
 	void	   *output_plugin_private;
+
+	/*
+	 * Streaming compression state used for spill files compression.
+	 */
+	void	   *compressor_state;
 } ReorderBufferTXN;
 
 /* so we can define the callbacks used inside struct ReorderBuffer itself */
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
new file mode 100644
index 0000000000..9e9565ca7f
--- /dev/null
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -0,0 +1,42 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer_compression.h
+ *	  ReorderBuffer spill files compression.
+ *
+ * Copyright (c) 2024-2024, PostgreSQL Global Development Group
+ *
+ * src/include/access/reorderbuffer_compression.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef REORDERBUFFER_COMPRESSION_H
+#define REORDERBUFFER_COMPRESSION_H
+
+/* ReorderBuffer on disk compression algorithms */
+typedef enum ReorderBufferCompressionMethod
+{
+	REORDER_BUFFER_NO_COMPRESSION,
+	REORDER_BUFFER_PGLZ_COMPRESSION,
+}			ReorderBufferCompressionMethod;
+
+/*
+ * Compression strategy applied to ReorderBuffer records spilled on disk
+ */
+typedef enum ReorderBufferCompressionStrategy
+{
+	REORDER_BUFFER_STRAT_UNCOMPRESSED,
+	REORDER_BUFFER_STRAT_PGLZ,
+}			ReorderBufferCompressionStrategy;
+
+typedef struct PGLZCompressorState
+{
+	/* Buffer used to store compressed data */
+	StringInfo	buf;
+}			PGLZCompressorState;
+
+extern void *pglz_NewCompressorState(MemoryContext context);
+extern void pglz_FreeCompressorState(MemoryContext context,
+									 void *compressor_state);
+
+#endif							/* REORDERBUFFER_COMPRESSION_H */
-- 
2.43.0

v5-0002-Compress-ReorderBuffer-spill-files-using-LZ4.patchapplication/octet-stream; name=v5-0002-Compress-ReorderBuffer-spill-files-using-LZ4.patchDownload
From 3fa3afc0f0ae660b5ab7f6c7851f749adea9e421 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Tue, 22 Oct 2024 20:40:42 +0200
Subject: [PATCH 2/6] Compress ReorderBuffer spill files using LZ4

---
 src/backend/replication/logical/Makefile      |   1 +
 src/backend/replication/logical/meson.build   |   1 +
 .../replication/logical/reorderbuffer.c       |  89 +++++-
 .../replication/logical/reorderbuffer_lz4.c   | 268 ++++++++++++++++++
 .../replication/reorderbuffer_compression.h   |  49 ++++
 5 files changed, 407 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/replication/logical/reorderbuffer_lz4.c

diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 96c733d009..58dce86258 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -27,6 +27,7 @@ OBJS = \
 	relation.o \
 	reorderbuffer.o \
 	reorderbuffer_pglz.o \
+	reorderbuffer_lz4.o \
 	slotsync.o \
 	snapbuild.o \
 	tablesync.o \
diff --git a/src/backend/replication/logical/meson.build b/src/backend/replication/logical/meson.build
index 48df907b35..dc2f55d0fa 100644
--- a/src/backend/replication/logical/meson.build
+++ b/src/backend/replication/logical/meson.build
@@ -13,6 +13,7 @@ backend_sources += files(
   'relation.c',
   'reorderbuffer.c',
   'reorderbuffer_pglz.c',
+  'reorderbuffer_lz4.c',
   'slotsync.c',
   'snapbuild.c',
   'tablesync.c',
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 649b196d5f..5d6f4bbfca 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -219,7 +219,7 @@ static const Size max_changes_in_memory = 4096; /* XXX for restore only */
 int			debug_logical_replication_streaming = DEBUG_LOGICAL_REP_STREAMING_BUFFERED;
 
 /* Compression strategy for spilled data. */
-int			logical_decoding_spill_compression = REORDER_BUFFER_PGLZ_COMPRESSION;
+int			logical_decoding_spill_compression = REORDER_BUFFER_LZ4_COMPRESSION;
 
 /* ---------------------------------------
  * primary reorderbuffer support routines
@@ -5408,6 +5408,9 @@ ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
 		case REORDER_BUFFER_PGLZ_COMPRESSION:
 			return pglz_NewCompressorState(context);
 			break;
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+			return lz4_NewCompressorState(context);
+			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
 		default:
 			return NULL;
@@ -5428,6 +5431,9 @@ ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
 		case REORDER_BUFFER_PGLZ_COMPRESSION:
 			return pglz_FreeCompressorState(context, compressor_state);
 			break;
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+			return lz4_FreeCompressorState(context, compressor_state);
+			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
 		default:
 			break;
@@ -5494,6 +5500,67 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskChange **ondisk,
 
 				break;
 			}
+			/* LZ4 Compression */
+		case REORDER_BUFFER_LZ4_COMPRESSION:
+			{
+				Size		dst_size = 0;
+				char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskChange);
+				Size		src_size = data_size - sizeof(ReorderBufferDiskChange);
+				StringInfo	buf = lz4_GetStringInfoBuffer(compressor_state);
+
+				/*
+				 * LZ4 streaming compression implies keeping a copy of the
+				 * "raw" data in LZ4 input ring buffer. If the "raw" data does
+				 * not fit in this buffer, then we should not try to compress
+				 * it. Let's store it uncompressed.
+				 *
+				 * Even if individual changes larger than 64kB shouldn't
+				 * exist, we still want to be sure that this case is covered
+				 * anyway.
+				 */
+				if (unlikely(src_size > LZ4_RING_BUFFER_SIZE))
+					return ReorderBufferCompress(rb, ondisk,
+												 REORDER_BUFFER_NO_COMPRESSION,
+												 data_size, compressor_state);
+
+				/*
+				 * Make sure the buffer we'll use to store compressed data has
+				 * enough space.
+				 */
+				dst_size = lz4_CompressBound(src_size);
+				enlargeStringInfo(buf, dst_size);
+
+				/* Use LZ4 streaming compression */
+				lz4_StreamingCompressData(rb->context, src, src_size, buf->data,
+										  &dst_size, compressor_state);
+				buf->len = dst_size;
+
+				/*
+				 * Make sure the ReorderBuffer has enough space to store
+				 * compressed data. Compressed data must be smaller than raw
+				 * data, so, the ReorderBuffer should already have room for
+				 * compressed data, but we do this to avoid buffer overflow
+				 * risks.
+				 */
+				ReorderBufferSerializeReserve(rb, (dst_size + sizeof(ReorderBufferDiskChange)));
+
+				hdr = (ReorderBufferDiskChange *) rb->outbuf;
+				hdr->comp_strat = REORDER_BUFFER_STRAT_LZ4_STREAMING;
+				hdr->size = dst_size + sizeof(ReorderBufferDiskChange);
+				hdr->raw_size = src_size;
+
+				/*
+				 * Update ondisk: hdr pointer has potentially changed due to
+				 * ReorderBufferSerializeReserve()
+				 */
+				*ondisk = hdr;
+
+				/* Copy back compressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskChange),
+					   buf->data, buf->len);
+
+				break;
+			}
 		default:
 			/* Other compression methods not yet supported */
 			break;
@@ -5549,6 +5616,26 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 							 errmsg_internal("compressed PGLZ data is corrupted")));
 				break;
 			}
+			/* LZ4 streaming decompression */
+		case REORDER_BUFFER_STRAT_LZ4_STREAMING:
+			{
+				char	   *buf = NULL;
+				Size		src_size = ondisk->size - sizeof(ReorderBufferDiskChange);
+				Size		buf_size = ondisk->raw_size;
+
+				lz4_StreamingDecompressData(rb->context, data, src_size, &buf,
+											buf_size, compressor_state);
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskChange),
+					   buf, buf_size);
+
+				/*
+				 * Not necessary to free buf in this case: it points to the
+				 * decompressed data stored in LZ4 output ring buffer.
+				 */
+				break;
+			}
 		default:
 			/* Other compression methods not yet supported */
 			break;
diff --git a/src/backend/replication/logical/reorderbuffer_lz4.c b/src/backend/replication/logical/reorderbuffer_lz4.c
new file mode 100644
index 0000000000..b9a7e717aa
--- /dev/null
+++ b/src/backend/replication/logical/reorderbuffer_lz4.c
@@ -0,0 +1,268 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer_lz4.c
+ *	  Functions for ReorderBuffer compression using LZ4.
+ *
+ * Copyright (c) 2024-2024, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/common/reorderbuffer_lz4.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
+#include "replication/reorderbuffer_compression.h"
+
+#define NO_LZ4_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method lz4 not supported"), \
+			 errdetail("This functionality requires the server to be built with lz4 support.")))
+
+/*
+ * Allocate a new LZ4StreamingCompressorState.
+ */
+void *
+lz4_NewCompressorState(MemoryContext context)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+	return NULL;				/* keep compiler quiet */
+#else
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *)
+		MemoryContextAlloc(context, sizeof(LZ4StreamingCompressorState));
+
+	cstate->buf = makeStringInfo();
+
+	/*
+	 * We do not allocate LZ4 ring buffers and streaming handlers at this
+	 * point because we have no guarantee that we will need them later. Let's
+	 * allocate only when we are about to use them.
+	 */
+	cstate->lz4_in_buf = NULL;
+	cstate->lz4_out_buf = NULL;
+	cstate->lz4_in_buf_offset = 0;
+	cstate->lz4_out_buf_offset = 0;
+	cstate->lz4_stream = NULL;
+	cstate->lz4_stream_decode = NULL;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return (void *) cstate;
+#endif
+}
+
+/*
+ * Free LZ4 memory resources and the compressor state.
+ */
+void
+lz4_FreeCompressorState(MemoryContext context, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext;
+
+	if (compressor_state == NULL)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	destroyStringInfo(cstate->buf);
+
+	if (cstate->lz4_in_buf != NULL)
+	{
+		pfree(cstate->lz4_in_buf);
+		LZ4_freeStream(cstate->lz4_stream);
+	}
+	if (cstate->lz4_out_buf != NULL)
+	{
+		pfree(cstate->lz4_out_buf);
+		LZ4_freeStreamDecode(cstate->lz4_stream_decode);
+	}
+
+	pfree(compressor_state);
+
+	MemoryContextSwitchTo(oldcontext);
+#endif
+}
+
+#ifdef USE_LZ4
+/*
+ * Allocate LZ4 input ring buffer and create the streaming compression handler.
+ */
+static void
+lz4_CreateStreamCompressorState(MemoryContext context, void *compressor_state)
+{
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+	cstate->lz4_in_buf = (char *) palloc0(LZ4_RING_BUFFER_SIZE);
+	cstate->lz4_stream = LZ4_createStream();
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+#ifdef USE_LZ4
+/*
+ * Allocate LZ4 output ring buffer and create the streaming decompression
+ */
+static void
+lz4_CreateStreamDecodeCompressorState(MemoryContext context,
+									  void *compressor_state)
+{
+	LZ4StreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+	cstate->lz4_out_buf = (char *) palloc0(LZ4_RING_BUFFER_SIZE);
+	cstate->lz4_stream_decode = LZ4_createStreamDecode();
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+/*
+ * Data compression using LZ4 streaming API.
+ */
+void
+lz4_StreamingCompressData(MemoryContext context, char *src, Size src_size,
+						  char *dst, Size *dst_size, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	int			lz4_cmp_size = 0;	/* compressed size */
+	char	   *lz4_in_bufPtr;	/* input ring buffer pointer */
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	/* Allocate LZ4 input ring buffer and streaming compression handler */
+	if (cstate->lz4_in_buf == NULL)
+		lz4_CreateStreamCompressorState(context, compressor_state);
+
+	/* Ring buffer offset wraparound */
+	if ((cstate->lz4_in_buf_offset + src_size) > LZ4_RING_BUFFER_SIZE)
+		cstate->lz4_in_buf_offset = 0;
+
+	/* Get the pointer of the next entry in the ring buffer */
+	lz4_in_bufPtr = cstate->lz4_in_buf + cstate->lz4_in_buf_offset;
+
+	/* Copy data that should be compressed into LZ4 input ring buffer */
+	memcpy(lz4_in_bufPtr, src, src_size);
+
+	/* Use LZ4 streaming compression API */
+	lz4_cmp_size = LZ4_compress_fast_continue(cstate->lz4_stream,
+											  lz4_in_bufPtr, dst, src_size,
+											  *dst_size, 1);
+
+	if (lz4_cmp_size <= 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("LZ4 compression failed")));
+
+	/* Move the input ring buffer offset */
+	cstate->lz4_in_buf_offset += src_size;
+
+	*dst_size = lz4_cmp_size;
+#endif
+}
+
+/*
+ * Data decompression using LZ4 streaming API.
+ * LZ4 decompression uses the output ring buffer to store decompressed data,
+ * thus, we don't need to create a new buffer. We return the pointer to data
+ * location.
+ */
+void
+lz4_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
+							char **dst, Size dst_size, void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+#else
+	LZ4StreamingCompressorState *cstate;
+	char	   *lz4_out_bufPtr; /* output ring buffer pointer */
+	int			lz4_dec_size;	/* decompressed data size */
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	/* Allocate LZ4 output ring buffer and streaming decompression handler */
+	if (cstate->lz4_out_buf == NULL)
+		lz4_CreateStreamDecodeCompressorState(context, compressor_state);
+
+	/* Ring buffer offset wraparound */
+	if ((cstate->lz4_out_buf_offset + dst_size) > LZ4_RING_BUFFER_SIZE)
+		cstate->lz4_out_buf_offset = 0;
+
+	/* Get current entry pointer in the ring buffer */
+	lz4_out_bufPtr = cstate->lz4_out_buf + cstate->lz4_out_buf_offset;
+
+	lz4_dec_size = LZ4_decompress_safe_continue(cstate->lz4_stream_decode,
+												src,
+												lz4_out_bufPtr,
+												src_size,
+												dst_size);
+
+	Assert(lz4_dec_size == dst_size);
+
+	if (lz4_dec_size < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("compressed LZ4 data is corrupted")));
+	else if (lz4_dec_size != dst_size)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg_internal("decompressed LZ4 data size differs from original size")));
+
+	/* Move the output ring buffer offset */
+	cstate->lz4_out_buf_offset += lz4_dec_size;
+
+	/* Point to the decompressed data location */
+	*dst = lz4_out_bufPtr;
+#endif
+}
+
+Size
+lz4_CompressBound(Size src_size)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+	return -1;
+#else
+	return LZ4_COMPRESSBOUND(src_size);
+#endif
+}
+
+/*
+ * Returns the StringInfo buffer we use to store compressed/decompressed data.
+ */
+StringInfo
+lz4_GetStringInfoBuffer(void *compressor_state)
+{
+#ifndef USE_LZ4
+	NO_LZ4_SUPPORT();
+	return NULL;
+#else
+	LZ4StreamingCompressorState *cstate;
+
+	cstate = (LZ4StreamingCompressorState *) compressor_state;
+
+	return cstate->buf;
+#endif
+}
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index 9e9565ca7f..3dbf47e18e 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -13,11 +13,16 @@
 #ifndef REORDERBUFFER_COMPRESSION_H
 #define REORDERBUFFER_COMPRESSION_H
 
+#ifdef USE_LZ4
+#include <lz4.h>
+#endif
+
 /* ReorderBuffer on disk compression algorithms */
 typedef enum ReorderBufferCompressionMethod
 {
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_PGLZ_COMPRESSION,
+	REORDER_BUFFER_LZ4_COMPRESSION,
 }			ReorderBufferCompressionMethod;
 
 /*
@@ -27,6 +32,7 @@ typedef enum ReorderBufferCompressionStrategy
 {
 	REORDER_BUFFER_STRAT_UNCOMPRESSED,
 	REORDER_BUFFER_STRAT_PGLZ,
+	REORDER_BUFFER_STRAT_LZ4_STREAMING,
 }			ReorderBufferCompressionStrategy;
 
 typedef struct PGLZCompressorState
@@ -35,6 +41,49 @@ typedef struct PGLZCompressorState
 	StringInfo	buf;
 }			PGLZCompressorState;
 
+#ifdef USE_LZ4
+/*
+ * We use a fairly small LZ4 ring buffer size (64kB). Using a larger buffer
+ * size provide better compression ratio, but as long as we have to allocate
+ * two LZ4 ring buffers per ReorderBufferTXN, we should keep it small.
+
+ * 64kB is also twice the maximum size of a block, which is enough to cover
+ * changes like UPDATE that will contain data of the old and new version of a
+ * tuple.
+ */
+#define LZ4_RING_BUFFER_SIZE (64 * 1024)
+
+/*
+ * LZ4 streaming compression/decompression contextes and buffers.
+ */
+typedef struct LZ4StreamingCompressorState
+{
+	/* Streaming compression handler */
+	LZ4_stream_t *lz4_stream;
+	/* Streaming decompression handler */
+	LZ4_streamDecode_t *lz4_stream_decode;
+	/* LZ4 in/out ring buffers used for streaming compression */
+	char	   *lz4_in_buf;
+	int			lz4_in_buf_offset;
+	char	   *lz4_out_buf;
+	int			lz4_out_buf_offset;
+	/* Buffer used to store compressed data */
+	StringInfo	buf;
+}			LZ4StreamingCompressorState;
+#endif
+
+extern void *lz4_NewCompressorState(MemoryContext context);
+extern void lz4_FreeCompressorState(MemoryContext context,
+									void *compressor_state);
+extern void lz4_StreamingCompressData(MemoryContext context, char *src,
+									  Size src_size, char *dst, Size *dst_size,
+									  void *compressor_state);
+extern void lz4_StreamingDecompressData(MemoryContext context, char *src,
+										Size src_size, char **dst,
+										Size dst_size, void *compressor_state);
+extern Size lz4_CompressBound(Size src_size);
+extern StringInfo lz4_GetStringInfoBuffer(void *compressor_state);
+
 extern void *pglz_NewCompressorState(MemoryContext context);
 extern void pglz_FreeCompressorState(MemoryContext context,
 									 void *compressor_state);
-- 
2.43.0

v5-0003-Compress-ReorderBuffer-spill-files-using-ZSTD.patchapplication/octet-stream; name=v5-0003-Compress-ReorderBuffer-spill-files-using-ZSTD.patchDownload
From 27d33e09549d7f96b750750256ed37ec2eb9a04d Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Tue, 22 Oct 2024 22:31:13 +0200
Subject: [PATCH 3/6] Compress ReorderBuffer spill files using ZSTD

---
 src/backend/replication/logical/Makefile      |   1 +
 src/backend/replication/logical/meson.build   |   1 +
 .../replication/logical/reorderbuffer.c       |  58 ++-
 .../replication/logical/reorderbuffer_zstd.c  | 361 ++++++++++++++++++
 .../replication/reorderbuffer_compression.h   |  54 +++
 5 files changed, 474 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/replication/logical/reorderbuffer_zstd.c

diff --git a/src/backend/replication/logical/Makefile b/src/backend/replication/logical/Makefile
index 58dce86258..c71f06b729 100644
--- a/src/backend/replication/logical/Makefile
+++ b/src/backend/replication/logical/Makefile
@@ -28,6 +28,7 @@ OBJS = \
 	reorderbuffer.o \
 	reorderbuffer_pglz.o \
 	reorderbuffer_lz4.o \
+	reorderbuffer_zstd.o \
 	slotsync.o \
 	snapbuild.o \
 	tablesync.o \
diff --git a/src/backend/replication/logical/meson.build b/src/backend/replication/logical/meson.build
index dc2f55d0fa..70b8777290 100644
--- a/src/backend/replication/logical/meson.build
+++ b/src/backend/replication/logical/meson.build
@@ -14,6 +14,7 @@ backend_sources += files(
   'reorderbuffer.c',
   'reorderbuffer_pglz.c',
   'reorderbuffer_lz4.c',
+  'reorderbuffer_zstd.c',
   'slotsync.c',
   'snapbuild.c',
   'tablesync.c',
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 5d6f4bbfca..17f8208000 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -219,7 +219,7 @@ static const Size max_changes_in_memory = 4096; /* XXX for restore only */
 int			debug_logical_replication_streaming = DEBUG_LOGICAL_REP_STREAMING_BUFFERED;
 
 /* Compression strategy for spilled data. */
-int			logical_decoding_spill_compression = REORDER_BUFFER_LZ4_COMPRESSION;
+int			logical_decoding_spill_compression = REORDER_BUFFER_ZSTD_COMPRESSION;
 
 /* ---------------------------------------
  * primary reorderbuffer support routines
@@ -5411,6 +5411,9 @@ ReorderBufferNewCompressorState(MemoryContext context, int compression_method)
 		case REORDER_BUFFER_LZ4_COMPRESSION:
 			return lz4_NewCompressorState(context);
 			break;
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+			return zstd_NewCompressorState(context);
+			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
 		default:
 			return NULL;
@@ -5434,6 +5437,9 @@ ReorderBufferFreeCompressorState(MemoryContext context, int compression_method,
 		case REORDER_BUFFER_LZ4_COMPRESSION:
 			return lz4_FreeCompressorState(context, compressor_state);
 			break;
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+			return zstd_FreeCompressorState(context, compressor_state);
+			break;
 		case REORDER_BUFFER_NO_COMPRESSION:
 		default:
 			break;
@@ -5561,6 +5567,37 @@ ReorderBufferCompress(ReorderBuffer *rb, ReorderBufferDiskChange **ondisk,
 
 				break;
 			}
+			/* ZSTD Compression */
+		case REORDER_BUFFER_ZSTD_COMPRESSION:
+			{
+				Size		dst_size = 0;
+				char	   *src = (char *) rb->outbuf + sizeof(ReorderBufferDiskChange);
+				Size		src_size = data_size - sizeof(ReorderBufferDiskChange);
+				StringInfo	buf = zstd_GetStringInfoBuffer(compressor_state);
+
+				dst_size = zstd_CompressBound(src_size);
+				enlargeStringInfo(buf, dst_size);
+
+				/* Use ZSTD streaming compression */
+				zstd_StreamingCompressData(rb->context, src, src_size,
+										   buf->data, &dst_size,
+										   compressor_state);
+				buf->len = dst_size;
+
+				ReorderBufferSerializeReserve(rb, (dst_size + sizeof(ReorderBufferDiskChange)));
+
+				hdr = (ReorderBufferDiskChange *) rb->outbuf;
+				hdr->comp_strat = REORDER_BUFFER_STRAT_ZSTD_STREAMING;
+				hdr->size = dst_size + sizeof(ReorderBufferDiskChange);
+				hdr->raw_size = src_size;
+
+				*ondisk = hdr;
+
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskChange),
+					   buf->data, buf->len);
+
+				break;
+			}
 		default:
 			/* Other compression methods not yet supported */
 			break;
@@ -5636,6 +5673,25 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 				 */
 				break;
 			}
+			/* ZSTD streaming decompression */
+		case REORDER_BUFFER_STRAT_ZSTD_STREAMING:
+			{
+				StringInfo	buf = zstd_GetStringInfoBuffer(compressor_state);
+				Size		src_size = ondisk->size - sizeof(ReorderBufferDiskChange);
+				Size		buf_size = ondisk->raw_size;
+
+				enlargeStringInfo(buf, buf_size);
+
+				zstd_StreamingDecompressData(rb->context, data, src_size,
+											 buf->data, buf_size,
+											 compressor_state);
+				buf->len = buf_size;
+
+				/* Copy decompressed data into the ReorderBuffer */
+				memcpy((char *) rb->outbuf + sizeof(ReorderBufferDiskChange),
+					   buf->data, buf->len);
+				break;
+			}
 		default:
 			/* Other compression methods not yet supported */
 			break;
diff --git a/src/backend/replication/logical/reorderbuffer_zstd.c b/src/backend/replication/logical/reorderbuffer_zstd.c
new file mode 100644
index 0000000000..83455a9c8b
--- /dev/null
+++ b/src/backend/replication/logical/reorderbuffer_zstd.c
@@ -0,0 +1,361 @@
+/*-------------------------------------------------------------------------
+ *
+ * reorderbuffer_zstd.c
+ *	  Functions for ReorderBuffer compression using ZSTD.
+ *
+ * Copyright (c) 2024-2024, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/access/common/reorderbuffer_zstd.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
+
+#include "replication/reorderbuffer_compression.h"
+
+#define NO_ZSTD_SUPPORT() \
+	ereport(ERROR, \
+			(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
+			 errmsg("compression method zstd not supported"), \
+			 errdetail("This functionality requires the server to be built with zstd support.")))
+
+/*
+ * Allocate a new ZSTDStreamingCompressorState.
+ */
+void *
+zstd_NewCompressorState(MemoryContext context)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+	return NULL;				/* keep compiler quiet */
+#else
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *)
+		MemoryContextAlloc(context, sizeof(ZSTDStreamingCompressorState));
+
+	cstate->buf = makeStringInfo();
+
+	/*
+	 * We do not allocate ZSTD buffers and contexts at this point because we
+	 * have no guarantee that we will need them later. Let's allocate only
+	 * when we are about to use them.
+	 */
+	cstate->zstd_c_ctx = NULL;
+	cstate->zstd_c_in_buf = NULL;
+	cstate->zstd_c_in_buf_size = 0;
+	cstate->zstd_c_out_buf = NULL;
+	cstate->zstd_c_out_buf_size = 0;
+	cstate->zstd_frame_size = 0;
+	cstate->zstd_d_ctx = NULL;
+	cstate->zstd_d_in_buf = NULL;
+	cstate->zstd_d_in_buf_size = 0;
+	cstate->zstd_d_out_buf = NULL;
+	cstate->zstd_d_out_buf_size = 0;
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return (void *) cstate;
+#endif
+}
+
+/*
+ * Free ZSTD memory resources and the compressor state.
+ */
+void
+zstd_FreeCompressorState(MemoryContext context, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext;
+
+	if (compressor_state == NULL)
+		return;
+
+	oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+
+	destroyStringInfo(cstate->buf);
+
+	if (cstate->zstd_c_ctx != NULL)
+	{
+		/* Compressor state was used for compression */
+		pfree(cstate->zstd_c_in_buf);
+		pfree(cstate->zstd_c_out_buf);
+		ZSTD_freeCCtx(cstate->zstd_c_ctx);
+	}
+	if (cstate->zstd_d_ctx != NULL)
+	{
+		/* Compressor state was used for decompression */
+		pfree(cstate->zstd_d_in_buf);
+		pfree(cstate->zstd_d_out_buf);
+		ZSTD_freeDCtx(cstate->zstd_d_ctx);
+	}
+
+	pfree(compressor_state);
+
+	MemoryContextSwitchTo(oldcontext);
+#endif
+}
+
+#ifdef USE_ZSTD
+/*
+ * Allocate ZSTD compression buffers and create the ZSTD compression context.
+ */
+static void
+zstd_CreateStreamCompressorState(MemoryContext context, void *compressor_state)
+{
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	cstate->zstd_c_in_buf_size = ZSTD_CStreamInSize();
+	cstate->zstd_c_in_buf = (char *) palloc0(cstate->zstd_c_in_buf_size);
+	cstate->zstd_c_out_buf_size = ZSTD_CStreamOutSize();
+	cstate->zstd_c_out_buf = (char *) palloc0(cstate->zstd_c_out_buf_size);
+	cstate->zstd_c_ctx = ZSTD_createCCtx();
+
+	if (cstate->zstd_c_ctx == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("could not create ZSTD compression context")));
+
+	/* Set compression level */
+	ZSTD_CCtx_setParameter(cstate->zstd_c_ctx, ZSTD_c_compressionLevel,
+						   ZSTD_COMPRESSION_LEVEL);
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+#ifdef USE_ZSTD
+/*
+ * Allocate ZSTD decompression buffers and create the ZSTD decompression
+ * context.
+ */
+static void
+zstd_CreateStreamDecodeCompressorState(MemoryContext context, void *compressor_state)
+{
+	ZSTDStreamingCompressorState *cstate;
+	MemoryContext oldcontext = MemoryContextSwitchTo(context);
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	cstate->zstd_d_in_buf_size = ZSTD_DStreamInSize();
+	cstate->zstd_d_in_buf = (char *) palloc0(cstate->zstd_d_in_buf_size);
+	cstate->zstd_d_out_buf_size = ZSTD_DStreamOutSize();
+	cstate->zstd_d_out_buf = (char *) palloc0(cstate->zstd_d_out_buf_size);
+	cstate->zstd_d_ctx = ZSTD_createDCtx();
+
+	if (cstate->zstd_d_ctx == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("could not create ZSTD decompression context")));
+
+	MemoryContextSwitchTo(oldcontext);
+}
+#endif
+
+/*
+ * Data compression using ZSTD streaming API.
+ */
+void
+zstd_StreamingCompressData(MemoryContext context, char *src, Size src_size,
+						   char *dst, Size *dst_size, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+
+	/* Size of remaining data to be copied from src into ZSTD input buffer */
+	Size		toCpy = src_size;
+	char	   *dst_data;
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	/* Allocate ZSTD buffers and context */
+	if (cstate->zstd_c_ctx == NULL)
+		zstd_CreateStreamCompressorState(context, compressor_state);
+
+	dst_data = dst;
+	*dst_size = 0;
+
+	/*
+	 * ZSTD streaming compression works with chunks: the source data needs to
+	 * be splitted out in chunks, each of them is then copied into ZSTD input
+	 * buffer. For each chunk, we proceed with compression. Streaming
+	 * compression is not intended to compress the whole input chunk, so we
+	 * have the call ZSTD_compressStream2() multiple times until the entire
+	 * chunk is consumed.
+	 */
+	while (toCpy > 0)
+	{
+		/* Are we on the last chunk? */
+		bool		last_chunk = (toCpy < cstate->zstd_c_in_buf_size);
+
+		/* Size of the data copied into ZSTD input buffer */
+		Size		cpySize = last_chunk ? toCpy : cstate->zstd_c_in_buf_size;
+		bool		finished = false;
+		ZSTD_inBuffer input;
+		ZSTD_EndDirective mode = last_chunk ? ZSTD_e_flush : ZSTD_e_continue;
+
+		/* Copy data from src into ZSTD input buffer */
+		memcpy(cstate->zstd_c_in_buf, src, cpySize);
+
+		/*
+		 * Close the frame when we are on the last chunk and we've reached max
+		 * frame size.
+		 */
+		if (last_chunk && (cstate->zstd_frame_size > ZSTD_MAX_FRAME_SIZE))
+		{
+			mode = ZSTD_e_end;
+			cstate->zstd_frame_size = 0;
+		}
+
+		cstate->zstd_frame_size += cpySize;
+
+		input.src = cstate->zstd_c_in_buf;
+		input.size = cpySize;
+		input.pos = 0;
+
+		do
+		{
+			Size		remaining;
+			ZSTD_outBuffer output;
+
+			output.dst = cstate->zstd_c_out_buf;
+			output.size = cstate->zstd_c_out_buf_size;
+			output.pos = 0;
+
+			remaining = ZSTD_compressStream2(cstate->zstd_c_ctx, &output,
+											 &input, mode);
+
+			if (ZSTD_isError(remaining))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("ZSTD compression failed")));
+
+			/* Copy back compressed data from ZSTD output buffer */
+			memcpy(dst_data, (char *) cstate->zstd_c_out_buf, output.pos);
+
+			dst_data += output.pos;
+			*dst_size += output.pos;
+
+			/*
+			 * Compression is done when we are working on the last chunk and
+			 * there is nothing left to compress, or, when we reach the end of
+			 * the chunk.
+			 */
+			finished = last_chunk ? (remaining == 0) : (input.pos == input.size);
+		} while (!finished);
+
+		src += cpySize;
+		toCpy -= cpySize;
+	}
+#endif
+}
+
+/*
+ * Data decompression using ZSTD streaming API.
+ */
+void
+zstd_StreamingDecompressData(MemoryContext context, char *src, Size src_size,
+							 char *dst, Size dst_size, void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+#else
+	ZSTDStreamingCompressorState *cstate;
+
+	/* Size of remaining data to be copied from src into ZSTD input buffer */
+	Size		toCpy = src_size;
+	char	   *dst_data;
+	Size		decBytes = 0;	/* Size of decompressed data */
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+	/* Allocate ZSTD buffers and context */
+	if (cstate->zstd_d_ctx == NULL)
+		zstd_CreateStreamDecodeCompressorState(context, compressor_state);
+
+	dst_data = dst;
+
+	while (toCpy > 0)
+	{
+		ZSTD_inBuffer input;
+		Size		cpySize = (toCpy > cstate->zstd_d_in_buf_size) ? cstate->zstd_d_in_buf_size : toCpy;
+
+		/* Copy data from src into ZSTD input buffer */
+		memcpy(cstate->zstd_d_in_buf, src, cpySize);
+
+		input.src = cstate->zstd_d_in_buf;
+		input.size = cpySize;
+		input.pos = 0;
+
+		while (input.pos < input.size)
+		{
+			ZSTD_outBuffer output;
+			Size		ret;
+
+			output.dst = cstate->zstd_d_out_buf;
+			output.size = cstate->zstd_d_out_buf_size;
+			output.pos = 0;
+
+			ret = ZSTD_decompressStream(cstate->zstd_d_ctx, &output, &input);
+
+			if (ZSTD_isError(ret))
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg_internal("ZSTD decompression failed")));
+
+			/* Copy back compressed data from ZSTD output buffer */
+			memcpy(dst_data, (char *) cstate->zstd_d_out_buf, output.pos);
+
+			dst_data += output.pos;
+			decBytes += output.pos;
+		}
+
+		src += cpySize;
+		toCpy -= cpySize;
+	}
+
+	Assert(dst_size == decBytes);
+#endif
+}
+
+Size
+zstd_CompressBound(Size src_size)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+	return -1;
+#else
+	return ZSTD_compressBound(src_size);
+#endif
+}
+
+/*
+ * Returns the StringInfo buffer we use to store compressed/decompressed data.
+ */
+StringInfo
+zstd_GetStringInfoBuffer(void *compressor_state)
+{
+#ifndef USE_ZSTD
+	NO_ZSTD_SUPPORT();
+	return NULL;
+#else
+	ZSTDStreamingCompressorState *cstate;
+
+	cstate = (ZSTDStreamingCompressorState *) compressor_state;
+
+	return cstate->buf;
+#endif
+}
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index 3dbf47e18e..240c188f00 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -17,12 +17,17 @@
 #include <lz4.h>
 #endif
 
+#ifdef USE_ZSTD
+#include <zstd.h>
+#endif
+
 /* ReorderBuffer on disk compression algorithms */
 typedef enum ReorderBufferCompressionMethod
 {
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_PGLZ_COMPRESSION,
 	REORDER_BUFFER_LZ4_COMPRESSION,
+	REORDER_BUFFER_ZSTD_COMPRESSION,
 }			ReorderBufferCompressionMethod;
 
 /*
@@ -33,6 +38,7 @@ typedef enum ReorderBufferCompressionStrategy
 	REORDER_BUFFER_STRAT_UNCOMPRESSED,
 	REORDER_BUFFER_STRAT_PGLZ,
 	REORDER_BUFFER_STRAT_LZ4_STREAMING,
+	REORDER_BUFFER_STRAT_ZSTD_STREAMING,
 }			ReorderBufferCompressionStrategy;
 
 typedef struct PGLZCompressorState
@@ -72,6 +78,42 @@ typedef struct LZ4StreamingCompressorState
 }			LZ4StreamingCompressorState;
 #endif
 
+#ifdef USE_ZSTD
+/*
+ * Low compression level provides high compression speed and decent compression
+ * rate. Minimum level is 1, maximum is 22.
+ */
+#define ZSTD_COMPRESSION_LEVEL 1
+
+/*
+ * Maximum volume of data encoded in the current ZSTD frame. When this
+ * threshold is reached then we close the current frame and start a new one.
+ */
+#define ZSTD_MAX_FRAME_SIZE (64 * 1024)
+
+/*
+ * ZSTD streaming compression/decompression handlers and buffers.
+ */
+typedef struct ZSTDStreamingCompressorState
+{
+	/* Compression */
+	ZSTD_CCtx  *zstd_c_ctx;
+	Size		zstd_c_in_buf_size;
+	char	   *zstd_c_in_buf;
+	Size		zstd_c_out_buf_size;
+	char	   *zstd_c_out_buf;
+	Size		zstd_frame_size;
+	/* Decompression */
+	ZSTD_DCtx  *zstd_d_ctx;
+	Size		zstd_d_in_buf_size;
+	char	   *zstd_d_in_buf;
+	Size		zstd_d_out_buf_size;
+	char	   *zstd_d_out_buf;
+	/* Buffer used to store compressed/decompressed data */
+	StringInfo	buf;
+}			ZSTDStreamingCompressorState;
+#endif
+
 extern void *lz4_NewCompressorState(MemoryContext context);
 extern void lz4_FreeCompressorState(MemoryContext context,
 									void *compressor_state);
@@ -88,4 +130,16 @@ extern void *pglz_NewCompressorState(MemoryContext context);
 extern void pglz_FreeCompressorState(MemoryContext context,
 									 void *compressor_state);
 
+extern void *zstd_NewCompressorState(MemoryContext context);
+extern void zstd_FreeCompressorState(MemoryContext context,
+									 void *compressor_state);
+extern void zstd_StreamingCompressData(MemoryContext context, char *src,
+									   Size src_size, char *dst, Size *dst_size,
+									   void *compressor_state);
+extern void zstd_StreamingDecompressData(MemoryContext context, char *src,
+										 Size src_size, char *dst,
+										 Size dst_size, void *compressor_state);
+extern Size zstd_CompressBound(Size src_size);
+extern StringInfo zstd_GetStringInfoBuffer(void *compressor_state);
+
 #endif							/* REORDERBUFFER_COMPRESSION_H */
-- 
2.43.0

v5-0005-Track-data-size-written-to-disk-during-logical-decod.patchapplication/octet-stream; name=v5-0005-Track-data-size-written-to-disk-during-logical-decod.patchDownload
From e6892c7ef32cd79780808517a4bd21e392d905b8 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Thu, 24 Oct 2024 12:55:09 +0200
Subject: [PATCH 5/6] Track data size written to disk during logical decoding

The pg_stat_replication_slots view now exposes a new counter named
"spill_write_bytes". This counter is in charge of tracking the amount
of data written to disk during logical decoding. This counter differs
from "spill_bytes" as the data can be compressed before being written
to disk.

When data compression is not enabled (default), "spill_bytes" and
"spill_write_bytes" are equal.
---
 contrib/test_decoding/expected/stats.out      | 12 +++----
 doc/src/sgml/monitoring.sgml                  | 22 ++++++++++---
 src/backend/catalog/system_views.sql          |  1 +
 src/backend/replication/logical/logical.c     |  5 ++-
 .../replication/logical/reorderbuffer.c       | 11 +++++--
 src/backend/utils/activity/pgstat_replslot.c  |  1 +
 src/backend/utils/adt/pgstatfuncs.c           | 31 ++++++++++---------
 src/include/catalog/pg_proc.dat               |  6 ++--
 src/include/pgstat.h                          |  1 +
 src/include/replication/reorderbuffer.h       |  4 ++-
 src/test/regress/expected/rules.out           |  3 +-
 11 files changed, 64 insertions(+), 33 deletions(-)

diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index 78d36429c8..caad65b64c 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -78,17 +78,17 @@ SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count,
 
 -- verify accessing/resetting stats for non-existent slot does something reasonable
 SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
-  slot_name   | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset 
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist |          0 |           0 |           0 |           0 |            0 |            0 |          0 |           0 | 
+  slot_name   | spill_txns | spill_count | spill_bytes | spill_write_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset 
+--------------+------------+-------------+-------------+-------------------+-------------+--------------+--------------+------------+-------------+-------------
+ do-not-exist |          0 |           0 |           0 |                 0 |           0 |            0 |            0 |          0 |           0 | 
 (1 row)
 
 SELECT pg_stat_reset_replication_slot('do-not-exist');
 ERROR:  replication slot "do-not-exist" does not exist
 SELECT * FROM pg_stat_get_replication_slot('do-not-exist');
-  slot_name   | spill_txns | spill_count | spill_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset 
---------------+------------+-------------+-------------+-------------+--------------+--------------+------------+-------------+-------------
- do-not-exist |          0 |           0 |           0 |           0 |            0 |            0 |          0 |           0 | 
+  slot_name   | spill_txns | spill_count | spill_bytes | spill_write_bytes | stream_txns | stream_count | stream_bytes | total_txns | total_bytes | stats_reset 
+--------------+------------+-------------+-------------+-------------------+-------------+--------------+--------------+------------+-------------+-------------
+ do-not-exist |          0 |           0 |           0 |                 0 |           0 |            0 |            0 |          0 |           0 | 
 (1 row)
 
 -- spilling the xact
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 331315f8d3..d251e4cfa8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1561,10 +1561,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
         <structfield>spill_bytes</structfield> <type>bigint</type>
        </para>
        <para>
-        Amount of decoded transaction data spilled to disk while performing
-        decoding of changes from WAL for this slot. This and other spill
-        counters can be used to gauge the I/O which occurred during logical
-        decoding and allow tuning <literal>logical_decoding_work_mem</literal>.
+        Amount of decoded transaction data requiring to be eventually compressed
+        and spilled to disk while performing decoding of changes from WAL for
+        this slot. This and other spill counters can be used to gauge the I/O
+        which occurred during logical decoding and allow tuning
+        <literal>logical_decoding_work_mem</literal>.
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+        <structfield>spill_write_bytes</structfield> <type>bigint</type>
+       </para>
+       <para>
+        Amount of decoded transaction data eventually compressed and spilled
+        to disk while performing decoding of changes from WAL for this slot.
+        When using data compression, <literal>spill_write_bytes</literal>
+        should be significantly smaller than <literal>spill_bytes</literal>
+        which represents uncompressed data size.
       </para></entry>
      </row>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a087223677..7ab046cb70 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1030,6 +1030,7 @@ CREATE VIEW pg_stat_replication_slots AS
             s.spill_txns,
             s.spill_count,
             s.spill_bytes,
+            s.spill_write_bytes,
             s.stream_txns,
             s.stream_count,
             s.stream_bytes,
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 54fbbe6fea..951318a994 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -1942,11 +1942,12 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	if (rb->spillBytes <= 0 && rb->streamBytes <= 0 && rb->totalBytes <= 0)
 		return;
 
-	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld",
+	elog(DEBUG2, "UpdateDecodingStats: updating stats %p %lld %lld %lld %lld %lld %lld %lld %lld %lld",
 		 rb,
 		 (long long) rb->spillTxns,
 		 (long long) rb->spillCount,
 		 (long long) rb->spillBytes,
+		 (long long) rb->spillWriteBytes,
 		 (long long) rb->streamTxns,
 		 (long long) rb->streamCount,
 		 (long long) rb->streamBytes,
@@ -1956,6 +1957,7 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	repSlotStat.spill_txns = rb->spillTxns;
 	repSlotStat.spill_count = rb->spillCount;
 	repSlotStat.spill_bytes = rb->spillBytes;
+	repSlotStat.spill_write_bytes = rb->spillWriteBytes;
 	repSlotStat.stream_txns = rb->streamTxns;
 	repSlotStat.stream_count = rb->streamCount;
 	repSlotStat.stream_bytes = rb->streamBytes;
@@ -1967,6 +1969,7 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
 	rb->spillTxns = 0;
 	rb->spillCount = 0;
 	rb->spillBytes = 0;
+	rb->spillWriteBytes = 0;
 	rb->streamTxns = 0;
 	rb->streamCount = 0;
 	rb->streamBytes = 0;
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 9407466393..2b8839a96d 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -257,7 +257,7 @@ static void ReorderBufferExecuteInvalidations(uint32 nmsgs, SharedInvalidationMe
  */
 static void ReorderBufferCheckMemoryLimit(ReorderBuffer *rb);
 static void ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn);
-static void ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
+static Size ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										 int fd, ReorderBufferChange *change);
 static Size ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 										TXNEntryFile *file, XLogSegNo *segno);
@@ -3756,6 +3756,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	XLogSegNo	curOpenSegNo = 0;
 	Size		spilled = 0;
 	Size		size = txn->size;
+	Size		written = 0;
 
 	elog(DEBUG2, "spill %u changes in XID %u to disk",
 		 (uint32) txn->nentries_mem, txn->xid);
@@ -3807,7 +3808,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 						 errmsg("could not open file \"%s\": %m", path)));
 		}
 
-		ReorderBufferSerializeChange(rb, txn, fd, change);
+		written += ReorderBufferSerializeChange(rb, txn, fd, change);
 		dlist_delete(&change->node);
 		ReorderBufferReturnChange(rb, change, false);
 
@@ -3822,6 +3823,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	{
 		rb->spillCount += 1;
 		rb->spillBytes += size;
+		rb->spillWriteBytes += written;
 
 		/* don't consider already serialized transactions */
 		rb->spillTxns += (rbtxn_is_serialized(txn) || rbtxn_is_serialized_clear(txn)) ? 0 : 1;
@@ -3842,7 +3844,7 @@ ReorderBufferSerializeTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 /*
  * Serialize individual change to disk.
  */
-static void
+static Size
 ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 							 int fd, ReorderBufferChange *change)
 {
@@ -4054,6 +4056,9 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	 */
 	if (txn->final_lsn < change->lsn)
 		txn->final_lsn = change->lsn;
+
+	/* Return the size of data written on disk */
+	return ondisk->size - sizeof(ReorderBufferDiskChange);
 }
 
 /* Returns true, if the output plugin supports streaming, false, otherwise. */
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index ddf2ab9928..66d6e36ebf 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -91,6 +91,7 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
 	REPLSLOT_ACC(spill_txns);
 	REPLSLOT_ACC(spill_count);
 	REPLSLOT_ACC(spill_bytes);
+	REPLSLOT_ACC(spill_write_bytes);
 	REPLSLOT_ACC(stream_txns);
 	REPLSLOT_ACC(stream_count);
 	REPLSLOT_ACC(stream_bytes);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index f7b50e0b5a..f7488b2a58 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1907,7 +1907,7 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_REPLICATION_SLOT_COLS 10
+#define PG_STAT_GET_REPLICATION_SLOT_COLS 11
 	text	   *slotname_text = PG_GETARG_TEXT_P(0);
 	NameData	slotname;
 	TupleDesc	tupdesc;
@@ -1926,17 +1926,19 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 					   INT8OID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "spill_bytes",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stream_txns",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "spill_write_bytes",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_count",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "stream_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_bytes",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "stream_count",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "total_txns",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "stream_bytes",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_bytes",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "total_txns",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "total_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset",
 					   TIMESTAMPTZOID, -1, 0);
 	BlessTupleDesc(tupdesc);
 
@@ -1956,16 +1958,17 @@ pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
 	values[1] = Int64GetDatum(slotent->spill_txns);
 	values[2] = Int64GetDatum(slotent->spill_count);
 	values[3] = Int64GetDatum(slotent->spill_bytes);
-	values[4] = Int64GetDatum(slotent->stream_txns);
-	values[5] = Int64GetDatum(slotent->stream_count);
-	values[6] = Int64GetDatum(slotent->stream_bytes);
-	values[7] = Int64GetDatum(slotent->total_txns);
-	values[8] = Int64GetDatum(slotent->total_bytes);
+	values[4] = Int64GetDatum(slotent->spill_write_bytes);
+	values[5] = Int64GetDatum(slotent->stream_txns);
+	values[6] = Int64GetDatum(slotent->stream_count);
+	values[7] = Int64GetDatum(slotent->stream_bytes);
+	values[8] = Int64GetDatum(slotent->total_txns);
+	values[9] = Int64GetDatum(slotent->total_bytes);
 
 	if (slotent->stat_reset_timestamp == 0)
-		nulls[9] = true;
+		nulls[10] = true;
 	else
-		values[9] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
+		values[10] = TimestampTzGetDatum(slotent->stat_reset_timestamp);
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1ec0d6f6b5..6b95dbeebf 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5584,9 +5584,9 @@
 { oid => '6169', descr => 'statistics: information about replication slot',
   proname => 'pg_stat_get_replication_slot', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => 'text',
-  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
-  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
+  proallargtypes => '{text,text,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{slot_name,slot_name,spill_txns,spill_count,spill_bytes,spill_write_bytes,stream_txns,stream_count,stream_bytes,total_txns,total_bytes,stats_reset}',
   prosrc => 'pg_stat_get_replication_slot' },
 
 { oid => '6230', descr => 'statistics: check if a stats object exists',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index df53fa2d4f..733cd74f86 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -403,6 +403,7 @@ typedef struct PgStat_StatReplSlotEntry
 	PgStat_Counter spill_txns;
 	PgStat_Counter spill_count;
 	PgStat_Counter spill_bytes;
+	PgStat_Counter spill_write_bytes;
 	PgStat_Counter stream_txns;
 	PgStat_Counter stream_count;
 	PgStat_Counter stream_bytes;
diff --git a/src/include/replication/reorderbuffer.h b/src/include/replication/reorderbuffer.h
index 5f231b5f90..2b585db7c5 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -659,7 +659,9 @@ struct ReorderBuffer
 	 */
 	int64		spillTxns;		/* number of transactions spilled to disk */
 	int64		spillCount;		/* spill-to-disk invocation counter */
-	int64		spillBytes;		/* amount of data spilled to disk */
+	int64		spillBytes;		/* amount of data spilled to disk, before
+								 * compression */
+	int64		spillWriteBytes;	/* amount of data actually written to disk */
 
 	/* Statistics about transactions streamed to the decoding output plugin */
 	int64		streamTxns;		/* number of transactions streamed */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2b47013f11..26ff02f89d 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2095,6 +2095,7 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.spill_txns,
     s.spill_count,
     s.spill_bytes,
+    s.spill_write_bytes,
     s.stream_txns,
     s.stream_count,
     s.stream_bytes,
@@ -2102,7 +2103,7 @@ pg_stat_replication_slots| SELECT s.slot_name,
     s.total_bytes,
     s.stats_reset
    FROM pg_replication_slots r,
-    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
+    LATERAL pg_stat_get_replication_slot((r.slot_name)::text) s(slot_name, spill_txns, spill_count, spill_bytes, spill_write_bytes, stream_txns, stream_count, stream_bytes, total_txns, total_bytes, stats_reset)
   WHERE (r.datoid IS NOT NULL);
 pg_stat_slru| SELECT name,
     blks_zeroed,
-- 
2.43.0

v5-0004-Set-ReorderBuffer-compression-via-subscription-opt.patchapplication/octet-stream; name=v5-0004-Set-ReorderBuffer-compression-via-subscription-opt.patchDownload
From ede0f93dd44542a1ac401d073e7988c4e61822ce Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Tue, 22 Oct 2024 23:12:44 +0200
Subject: [PATCH 4/6] Set ReorderBuffer compression via subscription opt

The CREATE/ALTER SUBSCRIPTION commands now support a new option
named "spill_compression" that will be used to select the
compression method applied to the logical changes spilled on disk
during decoding.

The default value is "off", meaning no compression will be applied.
Supported values are: "off", "on", "pglz", "lz4", and "zstd".
---
 doc/src/sgml/ref/create_subscription.sgml     |  24 +++
 src/backend/catalog/pg_subscription.c         |   6 +
 src/backend/catalog/system_views.sql          |   3 +-
 src/backend/commands/subscriptioncmds.c       |  31 +++-
 .../libpqwalreceiver/libpqwalreceiver.c       |   5 +
 src/backend/replication/logical/logical.c     |   4 +
 .../replication/logical/reorderbuffer.c       |  74 ++++++++-
 src/backend/replication/logical/worker.c      |  13 +-
 src/backend/replication/pgoutput/pgoutput.c   |  28 ++++
 src/bin/pg_dump/pg_dump.c                     |  18 +-
 src/bin/pg_dump/pg_dump.h                     |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl              |   4 +-
 src/bin/psql/describe.c                       |   7 +-
 src/bin/psql/tab-complete.in.c                |   6 +-
 src/include/catalog/pg_subscription.h         |   4 +
 src/include/replication/logical.h             |   2 +
 src/include/replication/pgoutput.h            |   1 +
 .../replication/reorderbuffer_compression.h   |   4 +
 src/include/replication/walreceiver.h         |   1 +
 src/test/regress/expected/subscription.out    | 156 +++++++++---------
 src/test/regress/sql/subscription.sql         |   4 +
 21 files changed, 305 insertions(+), 91 deletions(-)

diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml
index 6cf7d4f9a1..175965c4a4 100644
--- a/doc/src/sgml/ref/create_subscription.sgml
+++ b/doc/src/sgml/ref/create_subscription.sgml
@@ -435,6 +435,30 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
          </para>
         </listitem>
        </varlistentry>
+
+       <varlistentry id="sql-createsubscription-params-with-spill-compression">
+        <term><literal>spill_compression</literal> (<type>enum</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the decoded changes that eventually need to be
+          temporarily written on disk by the publisher are compressed or not.
+          Default value is <literal>off</literal> meaning no data compression
+          involved. Setting <literal>spill_compression</literal> to
+          <literal>on</literal> or <literal>pglz</literal> means that the
+          decoded changes are compressed using the internal
+          <literal>PGLZ</literal> compression algorithm.
+         </para>
+
+         <para>
+          If the <productname>PostgreSQL</productname> server running the
+          publisher node supports the external compression libraries
+          <productname>LZ4</productname> or
+          <productname>Zstandard</productname>,
+          <literal>spill_compression</literal> can be set respectively to
+          <literal>lz4</literal> or <literal>zstd</literal>.
+         </para>
+        </listitem>
+       </varlistentry>
       </variablelist></para>
 
     </listitem>
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index 89bf5ec933..e2bd881ab0 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -141,6 +141,12 @@ GetSubscription(Oid subid, bool missing_ok)
 	/* Is the subscription owner a superuser? */
 	sub->ownersuperuser = superuser_arg(sub->owner);
 
+	/* Get spill_compression */
+	datum = SysCacheGetAttrNotNull(SUBSCRIPTIONOID,
+								   tup,
+								   Anum_pg_subscription_subspillcompression);
+	sub->spill_compression = TextDatumGetCString(datum);
+
 	ReleaseSysCache(tup);
 
 	return sub;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3456b821bc..a087223677 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1358,7 +1358,8 @@ REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subskiplsn, subname, subowner, subenabled,
               subbinary, substream, subtwophasestate, subdisableonerr,
 			  subpasswordrequired, subrunasowner, subfailover,
-              subslotname, subsynccommit, subpublications, suborigin)
+              subslotname, subsynccommit, subpublications, suborigin,
+              subspillcompression)
     ON pg_subscription TO public;
 
 CREATE VIEW pg_stat_subscription_stats AS
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index 03e97730e7..2c9f1eae4b 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -40,6 +40,7 @@
 #include "replication/logicallauncher.h"
 #include "replication/logicalworker.h"
 #include "replication/origin.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slot.h"
 #include "replication/walreceiver.h"
 #include "replication/walsender.h"
@@ -73,6 +74,7 @@
 #define SUBOPT_FAILOVER				0x00002000
 #define SUBOPT_LSN					0x00004000
 #define SUBOPT_ORIGIN				0x00008000
+#define SUBOPT_SPILL_COMPRESSION	0x00010000
 
 /* check if the 'val' has 'bits' set */
 #define IsSet(val, bits)  (((val) & (bits)) == (bits))
@@ -100,6 +102,7 @@ typedef struct SubOpts
 	bool		failover;
 	char	   *origin;
 	XLogRecPtr	lsn;
+	char	   *spill_compression;
 } SubOpts;
 
 static List *fetch_table_list(WalReceiverConn *wrconn, List *publications);
@@ -164,6 +167,8 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 		opts->failover = false;
 	if (IsSet(supported_opts, SUBOPT_ORIGIN))
 		opts->origin = pstrdup(LOGICALREP_ORIGIN_ANY);
+	if (IsSet(supported_opts, SUBOPT_SPILL_COMPRESSION))
+		opts->spill_compression = "off";
 
 	/* Parse options */
 	foreach(lc, stmt_options)
@@ -357,6 +362,18 @@ parse_subscription_options(ParseState *pstate, List *stmt_options,
 			opts->specified_opts |= SUBOPT_LSN;
 			opts->lsn = lsn;
 		}
+		else if (IsSet(supported_opts, SUBOPT_SPILL_COMPRESSION) &&
+				 strcmp(defel->defname, "spill_compression") == 0)
+		{
+			if (IsSet(opts->specified_opts, SUBOPT_SPILL_COMPRESSION))
+				errorConflictingDefElem(defel, pstate);
+
+			opts->specified_opts |= SUBOPT_SPILL_COMPRESSION;
+			opts->spill_compression = defGetString(defel);
+
+			ReorderBufferValidateCompressionMethod(opts->spill_compression,
+												   ERROR);
+		}
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -563,7 +580,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 					  SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
 					  SUBOPT_STREAMING | SUBOPT_TWOPHASE_COMMIT |
 					  SUBOPT_DISABLE_ON_ERR | SUBOPT_PASSWORD_REQUIRED |
-					  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN);
+					  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
+					  SUBOPT_SPILL_COMPRESSION);
 	parse_subscription_options(pstate, stmt->options, supported_opts, &opts);
 
 	/*
@@ -683,6 +701,8 @@ CreateSubscription(ParseState *pstate, CreateSubscriptionStmt *stmt,
 		publicationListToArray(publications);
 	values[Anum_pg_subscription_suborigin - 1] =
 		CStringGetTextDatum(opts.origin);
+	values[Anum_pg_subscription_subspillcompression - 1] =
+		CStringGetTextDatum(opts.spill_compression);
 
 	tup = heap_form_tuple(RelationGetDescr(rel), values, nulls);
 
@@ -1165,7 +1185,7 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 								  SUBOPT_DISABLE_ON_ERR |
 								  SUBOPT_PASSWORD_REQUIRED |
 								  SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER |
-								  SUBOPT_ORIGIN);
+								  SUBOPT_ORIGIN | SUBOPT_SPILL_COMPRESSION);
 
 				parse_subscription_options(pstate, stmt->options,
 										   supported_opts, &opts);
@@ -1332,6 +1352,13 @@ AlterSubscription(ParseState *pstate, AlterSubscriptionStmt *stmt,
 					replaces[Anum_pg_subscription_suborigin - 1] = true;
 				}
 
+				if (IsSet(opts.specified_opts, SUBOPT_SPILL_COMPRESSION))
+				{
+					values[Anum_pg_subscription_subspillcompression - 1] =
+						CStringGetTextDatum(opts.spill_compression);
+					replaces[Anum_pg_subscription_subspillcompression - 1] = true;
+				}
+
 				update_tuple = true;
 				break;
 			}
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 97f957cd87..20f5e4a83e 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -620,6 +620,11 @@ libpqrcv_startstreaming(WalReceiverConn *conn,
 			PQserverVersion(conn->streamConn) >= 140000)
 			appendStringInfoString(&cmd, ", binary 'true'");
 
+		if (options->proto.logical.spill_compression &&
+			PQserverVersion(conn->streamConn) >= 180000)
+			appendStringInfo(&cmd, ", spill_compression '%s'",
+							 options->proto.logical.spill_compression);
+
 		appendStringInfoChar(&cmd, ')');
 	}
 	else
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 3fe1774a1e..54fbbe6fea 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -36,6 +36,7 @@
 #include "replication/decode.h"
 #include "replication/logical.h"
 #include "replication/reorderbuffer.h"
+#include "replication/reorderbuffer_compression.h"
 #include "replication/slotsync.h"
 #include "replication/snapbuild.h"
 #include "storage/proc.h"
@@ -298,6 +299,9 @@ StartupDecodingContext(List *output_plugin_options,
 
 	ctx->fast_forward = fast_forward;
 
+	/* No spill files compression by default */
+	ctx->spill_compression_method = REORDER_BUFFER_NO_COMPRESSION;
+
 	MemoryContextSwitchTo(old_context);
 
 	return ctx;
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 17f8208000..9407466393 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -219,7 +219,7 @@ static const Size max_changes_in_memory = 4096; /* XXX for restore only */
 int			debug_logical_replication_streaming = DEBUG_LOGICAL_REP_STREAMING_BUFFERED;
 
 /* Compression strategy for spilled data. */
-int			logical_decoding_spill_compression = REORDER_BUFFER_ZSTD_COMPRESSION;
+int			logical_decoding_spill_compression = REORDER_BUFFER_NO_COMPRESSION;
 
 /* ---------------------------------------
  * primary reorderbuffer support routines
@@ -438,6 +438,15 @@ ReorderBufferFree(ReorderBuffer *rb)
 	ReorderBufferCleanupSerializedTXNs(NameStr(MyReplicationSlot->data.name));
 }
 
+/* Returns spill files compression method */
+static inline uint8
+ReorderBufferSpillCompressionMethod(ReorderBuffer *rb)
+{
+	LogicalDecodingContext *ctx = rb->private_data;
+
+	return ctx->spill_compression_method;
+}
+
 /*
  * Get an unused, possibly preallocated, ReorderBufferTXN.
  */
@@ -459,7 +468,7 @@ ReorderBufferGetTXN(ReorderBuffer *rb)
 	txn->command_id = InvalidCommandId;
 	txn->output_plugin_private = NULL;
 	txn->compressor_state = ReorderBufferNewCompressorState(rb->context,
-															logical_decoding_spill_compression);
+															ReorderBufferSpillCompressionMethod(rb));
 
 	return txn;
 }
@@ -498,7 +507,7 @@ ReorderBufferReturnTXN(ReorderBuffer *rb, ReorderBufferTXN *txn)
 	}
 
 	ReorderBufferFreeCompressorState(rb->context,
-									 logical_decoding_spill_compression,
+									 ReorderBufferSpillCompressionMethod(rb),
 									 txn->compressor_state);
 
 	/* Reset the toast hash */
@@ -4015,7 +4024,7 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 	}
 
 	/* Inplace ReorderBuffer content compression before writing it on disk */
-	ReorderBufferCompress(rb, &ondisk, logical_decoding_spill_compression,
+	ReorderBufferCompress(rb, &ondisk, ReorderBufferSpillCompressionMethod(rb),
 						  sz, txn->compressor_state);
 
 	errno = 0;
@@ -5697,3 +5706,60 @@ ReorderBufferDecompress(ReorderBuffer *rb, char *data,
 			break;
 	}
 }
+
+/*
+ * According to a given compression method (as string representation), returns
+ * the corresponding ReorderBufferCompressionMethod
+ */
+ReorderBufferCompressionMethod
+ReorderBufferParseCompressionMethod(const char *method)
+{
+	if (pg_strcasecmp(method, "on") == 0)
+		return REORDER_BUFFER_PGLZ_COMPRESSION;
+	else if (pg_strcasecmp(method, "pglz") == 0)
+		return REORDER_BUFFER_PGLZ_COMPRESSION;
+	else if (pg_strcasecmp(method, "off") == 0)
+		return REORDER_BUFFER_NO_COMPRESSION;
+#ifdef USE_LZ4
+	else if (pg_strcasecmp(method, "lz4") == 0)
+		return REORDER_BUFFER_LZ4_COMPRESSION;
+#endif
+#ifdef USE_ZSTD
+	else if (pg_strcasecmp(method, "zstd") == 0)
+		return REORDER_BUFFER_ZSTD_COMPRESSION;
+#endif
+	else
+		return REORDER_BUFFER_INVALID_COMPRESSION;
+}
+
+/*
+ * Check whether the passed compression method is valid and report errors at
+ * elevel.
+ *
+ * As this validation is intended to be executed on subscriber side, then we
+ * actually don't know if the server running the publisher supports external
+ * compression libraries. We only check if the compression method is
+ * potentially supported. The real validation is done by the publisher when
+ * the replication starts, an error is then triggered if the compression method
+ * is not supported.
+ */
+void
+ReorderBufferValidateCompressionMethod(const char *method, int elevel)
+{
+	bool		valid = false;
+	char		methods[5][5] = {"on", "off", "pglz", "lz4", "zstd"};
+
+	for (int i = 0; i < 5; i++)
+	{
+		if (pg_strcasecmp(method, methods[i]) == 0)
+		{
+			valid = true;
+			break;
+		}
+	}
+
+	if (!valid)
+		ereport(elevel,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("compression method \"%s\" not valid", method)));
+}
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 925dff9cc4..32b38b94dd 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -4021,7 +4021,8 @@ maybe_reread_subscription(void)
 		newsub->passwordrequired != MySubscription->passwordrequired ||
 		strcmp(newsub->origin, MySubscription->origin) != 0 ||
 		newsub->owner != MySubscription->owner ||
-		!equal(newsub->publications, MySubscription->publications))
+		!equal(newsub->publications, MySubscription->publications) ||
+		strcmp(newsub->spill_compression, MySubscription->spill_compression) != 0)
 	{
 		if (am_parallel_apply_worker())
 			ereport(LOG,
@@ -4469,6 +4470,16 @@ set_stream_options(WalRcvStreamOptions *options,
 		MyLogicalRepWorker->parallel_apply = false;
 	}
 
+	if (server_version >= 180000 &&
+		MySubscription->stream == LOGICALREP_STREAM_OFF &&
+		MySubscription->spill_compression != NULL)
+	{
+		options->proto.logical.spill_compression =
+			pstrdup(MySubscription->spill_compression);
+	}
+	else
+		options->proto.logical.spill_compression = NULL;
+
 	options->proto.logical.twophase = false;
 	options->proto.logical.origin = pstrdup(MySubscription->origin);
 }
diff --git a/src/backend/replication/pgoutput/pgoutput.c b/src/backend/replication/pgoutput/pgoutput.c
index 00e7024563..521b646bb6 100644
--- a/src/backend/replication/pgoutput/pgoutput.c
+++ b/src/backend/replication/pgoutput/pgoutput.c
@@ -27,6 +27,7 @@
 #include "replication/logicalproto.h"
 #include "replication/origin.h"
 #include "replication/pgoutput.h"
+#include "replication/reorderbuffer_compression.h"
 #include "utils/builtins.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
@@ -283,11 +284,13 @@ parse_output_parameters(List *options, PGOutputData *data)
 	bool		streaming_given = false;
 	bool		two_phase_option_given = false;
 	bool		origin_option_given = false;
+	bool		spill_compression_option_given = false;
 
 	data->binary = false;
 	data->streaming = LOGICALREP_STREAM_OFF;
 	data->messages = false;
 	data->two_phase = false;
+	data->spill_compression_method = REORDER_BUFFER_NO_COMPRESSION;
 
 	foreach(lc, options)
 	{
@@ -396,6 +399,28 @@ parse_output_parameters(List *options, PGOutputData *data)
 						errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 						errmsg("unrecognized origin value: \"%s\"", origin));
 		}
+		else if (strcmp(defel->defname, "spill_compression") == 0)
+		{
+			uint8		method;
+			char	   *method_str;
+
+			if (spill_compression_option_given)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+			spill_compression_option_given = true;
+
+			method_str = defGetString(defel);
+			method = ReorderBufferParseCompressionMethod(method_str);
+
+			if (method == REORDER_BUFFER_INVALID_COMPRESSION)
+				ereport(ERROR,
+						errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						errmsg("invalid spill files compression method: \"%s\"",
+							   method_str));
+
+			data->spill_compression_method = method;
+		}
 		else
 			elog(ERROR, "unrecognized pgoutput option: %s", defel->defname);
 	}
@@ -508,6 +533,9 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
 		data->publications = NIL;
 		publications_valid = false;
 
+		/* Init spill files compression method */
+		ctx->spill_compression_method = data->spill_compression_method;
+
 		/*
 		 * Register callback for pg_publication if we didn't already do that
 		 * during some previous call in this process.
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d8c6330732..0b9e31ff2f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4850,6 +4850,7 @@ getSubscriptions(Archive *fout)
 	int			i_suboriginremotelsn;
 	int			i_subenabled;
 	int			i_subfailover;
+	int			i_subspillcompression;
 	int			i,
 				ntups;
 
@@ -4922,10 +4923,17 @@ getSubscriptions(Archive *fout)
 
 	if (fout->remoteVersion >= 170000)
 		appendPQExpBufferStr(query,
-							 " s.subfailover\n");
+							 " s.subfailover,\n");
 	else
 		appendPQExpBuffer(query,
-						  " false AS subfailover\n");
+						  " false AS subfailover,\n");
+
+	if (fout->remoteVersion >= 180000)
+		appendPQExpBufferStr(query,
+							 " s.subspillcompression\n");
+	else
+		appendPQExpBuffer(query,
+						  " 'off' AS subspillcompression\n");
 
 	appendPQExpBufferStr(query,
 						 "FROM pg_subscription s\n");
@@ -4965,6 +4973,7 @@ getSubscriptions(Archive *fout)
 	i_suboriginremotelsn = PQfnumber(res, "suboriginremotelsn");
 	i_subenabled = PQfnumber(res, "subenabled");
 	i_subfailover = PQfnumber(res, "subfailover");
+	i_subspillcompression = PQfnumber(res, "subspillcompression");
 
 	subinfo = pg_malloc(ntups * sizeof(SubscriptionInfo));
 
@@ -5011,6 +5020,8 @@ getSubscriptions(Archive *fout)
 			pg_strdup(PQgetvalue(res, i, i_subenabled));
 		subinfo[i].subfailover =
 			pg_strdup(PQgetvalue(res, i, i_subfailover));
+		subinfo[i].subspillcompression =
+			pg_strdup(PQgetvalue(res, i, i_subspillcompression));
 
 		/* Decide whether we want to dump it */
 		selectDumpableObject(&(subinfo[i].dobj), fout);
@@ -5259,6 +5270,9 @@ dumpSubscription(Archive *fout, const SubscriptionInfo *subinfo)
 	if (pg_strcasecmp(subinfo->suborigin, LOGICALREP_ORIGIN_ANY) != 0)
 		appendPQExpBuffer(query, ", origin = %s", subinfo->suborigin);
 
+	if (strcmp(subinfo->subspillcompression, "off") != 0)
+		appendPQExpBuffer(query, ", spill_compression = %s", subinfo->subspillcompression);
+
 	appendPQExpBufferStr(query, ");\n");
 
 	/*
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 9f907ed5ad..ecbf2c2e27 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -673,6 +673,7 @@ typedef struct _SubscriptionInfo
 	char	   *suborigin;
 	char	   *suboriginremotelsn;
 	char	   *subfailover;
+	char	   *subspillcompression;
 } SubscriptionInfo;
 
 /*
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index ac60829d68..4b7f75db8f 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -3001,9 +3001,9 @@ my %tests = (
 		create_order => 50,
 		create_sql => 'CREATE SUBSCRIPTION sub2
 						 CONNECTION \'dbname=doesnotexist\' PUBLICATION pub1
-						 WITH (connect = false, origin = none, streaming = off);',
+						 WITH (connect = false, origin = none, spill_compression = on, streaming = off);',
 		regexp => qr/^
-			\QCREATE SUBSCRIPTION sub2 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub2', streaming = off, origin = none);\E
+			\QCREATE SUBSCRIPTION sub2 CONNECTION 'dbname=doesnotexist' PUBLICATION pub1 WITH (connect = false, slot_name = 'sub2', streaming = off, origin = none, spill_compression = on);\E
 			/xm,
 		like => { %full_runs, section_post_data => 1, },
 	},
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 363a66e718..9735d0b099 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -6543,7 +6543,7 @@ describeSubscriptions(const char *pattern, bool verbose)
 	printQueryOpt myopt = pset.popt;
 	static const bool translate_columns[] = {false, false, false, false,
 		false, false, false, false, false, false, false, false, false, false,
-	false};
+	false, false};
 
 	if (pset.sversion < 100000)
 	{
@@ -6623,6 +6623,11 @@ describeSubscriptions(const char *pattern, bool verbose)
 			appendPQExpBuffer(&buf,
 							  ", subskiplsn AS \"%s\"\n",
 							  gettext_noop("Skip LSN"));
+
+		if (pset.sversion >= 180000)
+			appendPQExpBuffer(&buf,
+							  ", subspillcompression AS \"%s\"\n",
+							  gettext_noop("Spill files compression"));
 	}
 
 	/* Only display subscriptions in current database. */
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 1be0056af7..20969ec562 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -2280,7 +2280,8 @@ match_previous_words(int pattern_id,
 	else if (Matches("ALTER", "SUBSCRIPTION", MatchAny, MatchAnyN, "SET", "("))
 		COMPLETE_WITH("binary", "disable_on_error", "failover", "origin",
 					  "password_required", "run_as_owner", "slot_name",
-					  "streaming", "synchronous_commit", "two_phase");
+					  "spill_compression", "streaming", "synchronous_commit",
+					  "two_phase");
 	/* ALTER SUBSCRIPTION <name> SKIP ( */
 	else if (Matches("ALTER", "SUBSCRIPTION", MatchAny, MatchAnyN, "SKIP", "("))
 		COMPLETE_WITH("lsn");
@@ -3675,7 +3676,8 @@ match_previous_words(int pattern_id,
 		COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
 					  "disable_on_error", "enabled", "failover", "origin",
 					  "password_required", "run_as_owner", "slot_name",
-					  "streaming", "synchronous_commit", "two_phase");
+					  "spill_compression", "streaming", "synchronous_commit",
+					  "two_phase");
 
 /* CREATE TRIGGER --- is allowed inside CREATE SCHEMA, so use TailMatches */
 
diff --git a/src/include/catalog/pg_subscription.h b/src/include/catalog/pg_subscription.h
index b25f3fea56..2171a1f7f0 100644
--- a/src/include/catalog/pg_subscription.h
+++ b/src/include/catalog/pg_subscription.h
@@ -113,6 +113,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId) BKI_SHARED_RELATION BKI_ROW
 
 	/* Only publish data originating from the specified origin */
 	text		suborigin BKI_DEFAULT(LOGICALREP_ORIGIN_ANY);
+
+	/* Spill files compression algorithm */
+	text		subspillcompression BKI_FORCE_NOT_NULL;
 #endif
 } FormData_pg_subscription;
 
@@ -157,6 +160,7 @@ typedef struct Subscription
 	List	   *publications;	/* List of publication names to subscribe to */
 	char	   *origin;			/* Only publish data originating from the
 								 * specified origin */
+	char	   *spill_compression;	/* Spill files compression algorithm */
 } Subscription;
 
 /* Disallow streaming in-progress transactions. */
diff --git a/src/include/replication/logical.h b/src/include/replication/logical.h
index aff38e8d04..75c17866c3 100644
--- a/src/include/replication/logical.h
+++ b/src/include/replication/logical.h
@@ -112,6 +112,8 @@ typedef struct LogicalDecodingContext
 
 	/* Do we need to process any change in fast_forward mode? */
 	bool		processing_required;
+	/* Compression method used to compress spill files */
+	uint8		spill_compression_method;
 } LogicalDecodingContext;
 
 
diff --git a/src/include/replication/pgoutput.h b/src/include/replication/pgoutput.h
index 89f94e1147..eabcca62af 100644
--- a/src/include/replication/pgoutput.h
+++ b/src/include/replication/pgoutput.h
@@ -33,6 +33,7 @@ typedef struct PGOutputData
 	bool		messages;
 	bool		two_phase;
 	bool		publish_no_origin;
+	uint8		spill_compression_method;
 } PGOutputData;
 
 #endif							/* PGOUTPUT_H */
diff --git a/src/include/replication/reorderbuffer_compression.h b/src/include/replication/reorderbuffer_compression.h
index 240c188f00..1df508be91 100644
--- a/src/include/replication/reorderbuffer_compression.h
+++ b/src/include/replication/reorderbuffer_compression.h
@@ -24,6 +24,7 @@
 /* ReorderBuffer on disk compression algorithms */
 typedef enum ReorderBufferCompressionMethod
 {
+	REORDER_BUFFER_INVALID_COMPRESSION,
 	REORDER_BUFFER_NO_COMPRESSION,
 	REORDER_BUFFER_PGLZ_COMPRESSION,
 	REORDER_BUFFER_LZ4_COMPRESSION,
@@ -117,6 +118,9 @@ typedef struct ZSTDStreamingCompressorState
 extern void *lz4_NewCompressorState(MemoryContext context);
 extern void lz4_FreeCompressorState(MemoryContext context,
 									void *compressor_state);
+extern ReorderBufferCompressionMethod ReorderBufferParseCompressionMethod(const char *method);
+extern void ReorderBufferValidateCompressionMethod(const char *method,
+												   int elevel);
 extern void lz4_StreamingCompressData(MemoryContext context, char *src,
 									  Size src_size, char *dst, Size *dst_size,
 									  void *compressor_state);
diff --git a/src/include/replication/walreceiver.h b/src/include/replication/walreceiver.h
index 132e789948..b759b5807d 100644
--- a/src/include/replication/walreceiver.h
+++ b/src/include/replication/walreceiver.h
@@ -186,6 +186,7 @@ typedef struct
 									 * prepare time */
 			char	   *origin; /* Only publish data originating from the
 								 * specified origin */
+			char	   *spill_compression;	/* Spill files compression algo */
 		}			logical;
 	}			proto;
 } WalRcvStreamOptions;
diff --git a/src/test/regress/expected/subscription.out b/src/test/regress/expected/subscription.out
index 1443e1d929..029d42f358 100644
--- a/src/test/regress/expected/subscription.out
+++ b/src/test/regress/expected/subscription.out
@@ -116,18 +116,18 @@ CREATE SUBSCRIPTION regress_testsub4 CONNECTION 'dbname=regress_doesnotexist' PU
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+ regress_testsub4
-                                                                                                                 List of subscriptions
-       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
-------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | none   | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                              List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | none   | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub4 SET (origin = any);
 \dRs+ regress_testsub4
-                                                                                                                 List of subscriptions
-       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
-------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                              List of subscriptions
+       Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub4 | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub3;
@@ -145,10 +145,10 @@ ALTER SUBSCRIPTION regress_testsub CONNECTION 'foobar';
 ERROR:  invalid connection string syntax: missing "=" after "foobar" in connection info string
 
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET PUBLICATION testpub2, testpub3 WITH (refresh = false);
@@ -157,10 +157,10 @@ ALTER SUBSCRIPTION regress_testsub SET (slot_name = 'newname');
 ALTER SUBSCRIPTION regress_testsub SET (password_required = false);
 ALTER SUBSCRIPTION regress_testsub SET (run_as_owner = true);
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | parallel  | d                | f                | any    | f                 | t             | f        | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | parallel  | d                | f                | any    | f                 | t             | f        | off                | dbname=regress_doesnotexist2 | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (password_required = true);
@@ -176,10 +176,10 @@ ERROR:  unrecognized subscription parameter: "create_slot"
 -- ok
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/12345');
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/12345
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/12345  | off
 (1 row)
 
 -- ok - with lsn = NONE
@@ -188,10 +188,10 @@ ALTER SUBSCRIPTION regress_testsub SKIP (lsn = NONE);
 ALTER SUBSCRIPTION regress_testsub SKIP (lsn = '0/0');
 ERROR:  invalid WAL location (LSN): 0/0
 \dRs+
-                                                                                                                     List of subscriptions
-      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                  List of subscriptions
+      Name       |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub2,testpub3} | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist2 | 0/0      | off
 (1 row)
 
 BEGIN;
@@ -222,11 +222,15 @@ ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = local);
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 ERROR:  invalid value for parameter "synchronous_commit": "foobar"
 HINT:  Available values: local, remote_write, remote_apply, on, off.
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = pglz);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = off);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = foobar);
+ERROR:  compression method "foobar" not valid
 \dRs+
-                                                                                                                       List of subscriptions
-        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN 
----------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------
- regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | parallel  | d                | f                | any    | t                 | f             | f        | local              | dbname=regress_doesnotexist2 | 0/0
+                                                                                                                                    List of subscriptions
+        Name         |           Owner           | Enabled |     Publication     | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |           Conninfo           | Skip LSN | Spill files compression 
+---------------------+---------------------------+---------+---------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+------------------------------+----------+-------------------------
+ regress_testsub_foo | regress_subscription_user | f       | {testpub2,testpub3} | f      | parallel  | d                | f                | any    | t                 | f             | f        | local              | dbname=regress_doesnotexist2 | 0/0      | off
 (1 row)
 
 -- rename back to keep the rest simple
@@ -255,19 +259,19 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | t      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | t      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (binary = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -279,27 +283,27 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = parallel);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (streaming = false);
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 -- fail - publication already exists
@@ -314,10 +318,10 @@ ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refr
 ALTER SUBSCRIPTION regress_testsub ADD PUBLICATION testpub1, testpub2 WITH (refresh = false);
 ERROR:  publication "testpub1" is already in subscription "regress_testsub"
 \dRs+
-                                                                                                                        List of subscriptions
-      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                                     List of subscriptions
+      Name       |           Owner           | Enabled |         Publication         | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-----------------------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub,testpub1,testpub2} | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 -- fail - publication used more than once
@@ -332,10 +336,10 @@ ERROR:  publication "testpub3" is not in subscription "regress_testsub"
 -- ok - delete publications
 ALTER SUBSCRIPTION regress_testsub DROP PUBLICATION testpub1, testpub2 WITH (refresh = false);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | off       | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 DROP SUBSCRIPTION regress_testsub;
@@ -371,19 +375,19 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 -- we can alter streaming when two_phase enabled
 ALTER SUBSCRIPTION regress_testsub SET (streaming = true);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -393,10 +397,10 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | on        | p                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
@@ -409,18 +413,18 @@ CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUB
 WARNING:  subscription was created, but is not connected
 HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and refresh the subscription.
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | f                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
 \dRs+
-                                                                                                                List of subscriptions
-      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN 
------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | t                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0
+                                                                                                                             List of subscriptions
+      Name       |           Owner           | Enabled | Publication | Binary | Streaming | Two-phase commit | Disable on error | Origin | Password required | Run as owner? | Failover | Synchronous commit |          Conninfo           | Skip LSN | Spill files compression 
+-----------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------+-------------------------
+ regress_testsub | regress_subscription_user | f       | {testpub}   | f      | parallel  | d                | t                | any    | t                 | f             | f        | off                | dbname=regress_doesnotexist | 0/0      | off
 (1 row)
 
 ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
diff --git a/src/test/regress/sql/subscription.sql b/src/test/regress/sql/subscription.sql
index 007c9e7037..368696f430 100644
--- a/src/test/regress/sql/subscription.sql
+++ b/src/test/regress/sql/subscription.sql
@@ -140,6 +140,10 @@ ALTER SUBSCRIPTION regress_testsub RENAME TO regress_testsub_foo;
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = local);
 ALTER SUBSCRIPTION regress_testsub_foo SET (synchronous_commit = foobar);
 
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = pglz);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = off);
+ALTER SUBSCRIPTION regress_testsub_foo SET (spill_compression = foobar);
+
 \dRs+
 
 -- rename back to keep the rest simple
-- 
2.43.0

v5-0006-Add-ReorderBuffer-ondisk-compression-TAP-tests.patchapplication/octet-stream; name=v5-0006-Add-ReorderBuffer-ondisk-compression-TAP-tests.patchDownload
From 864ef6b3ed5fcbb90af71682a45a5118b3b61845 Mon Sep 17 00:00:00 2001
From: Julien Tachoires <julmon@gmail.com>
Date: Mon, 28 Oct 2024 13:40:28 +0100
Subject: [PATCH 6/6] Add ReorderBuffer ondisk compression TAP tests

---
 src/test/subscription/Makefile                |   2 +
 src/test/subscription/meson.build             |   7 +-
 .../t/034_reorderbuffer_compression.pl        | 107 ++++++++++++++++++
 3 files changed, 115 insertions(+), 1 deletion(-)
 create mode 100644 src/test/subscription/t/034_reorderbuffer_compression.pl

diff --git a/src/test/subscription/Makefile b/src/test/subscription/Makefile
index ce1ca43009..9341f1493c 100644
--- a/src/test/subscription/Makefile
+++ b/src/test/subscription/Makefile
@@ -16,6 +16,8 @@ include $(top_builddir)/src/Makefile.global
 EXTRA_INSTALL = contrib/hstore
 
 export with_icu
+export with_lz4
+export with_zstd
 
 check:
 	$(prove_check)
diff --git a/src/test/subscription/meson.build b/src/test/subscription/meson.build
index c591cd7d61..772eeb817f 100644
--- a/src/test/subscription/meson.build
+++ b/src/test/subscription/meson.build
@@ -5,7 +5,11 @@ tests += {
   'sd': meson.current_source_dir(),
   'bd': meson.current_build_dir(),
   'tap': {
-    'env': {'with_icu': icu.found() ? 'yes' : 'no'},
+    'env': {
+      'with_icu': icu.found() ? 'yes' : 'no',
+      'with_lz4': lz4.found() ? 'yes' : 'no',
+      'with_zstd': zstd.found() ? 'yes' : 'no',
+    },
     'tests': [
       't/001_rep_changes.pl',
       't/002_types.pl',
@@ -40,6 +44,7 @@ tests += {
       't/031_column_list.pl',
       't/032_subscribe_use_index.pl',
       't/033_run_as_table_owner.pl',
+      't/034_reorderbuffer_compression.pl',
       't/100_bugs.pl',
     ],
   },
diff --git a/src/test/subscription/t/034_reorderbuffer_compression.pl b/src/test/subscription/t/034_reorderbuffer_compression.pl
new file mode 100644
index 0000000000..57def74bf9
--- /dev/null
+++ b/src/test/subscription/t/034_reorderbuffer_compression.pl
@@ -0,0 +1,107 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+# Test ReorderBuffer compression
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+sub test_reorderbuffer_compression
+{
+	my ($node_publisher, $node_subscriber, $appname, $compression, $compression_rate) = @_;
+
+	# Set subscriber's spill_compression option
+	$node_subscriber->safe_psql('postgres',
+		"ALTER SUBSCRIPTION tap_sub SET (spill_compression = $compression)");
+
+	# Make sure the table is empty
+	$node_publisher->safe_psql('postgres', 'TRUNCATE test_tab');
+
+	# Reset replication slot stats
+	$node_publisher->safe_psql('postgres',
+		"SELECT pg_stat_reset_replication_slot('tap_sub')");
+
+	# Insert 1 million rows in the table
+	$node_publisher->safe_psql('postgres',
+		"INSERT INTO test_tab SELECT i, 'Message number #'||i::TEXT FROM generate_series(1, 1000000) as i"
+	);
+
+	$node_publisher->wait_for_catchup($appname);
+
+	# Check if table content is replicated
+	my $result =
+	  $node_subscriber->safe_psql('postgres',
+		"SELECT count(*) FROM test_tab");
+	is($result, qq(1000000), 'check data was copied to subscriber');
+
+	# Check if the transaction was spilled on disk
+	my $res_stats =
+	  $node_publisher->safe_psql('postgres',
+		"SELECT spill_txns FROM pg_catalog.pg_stat_get_replication_slot('tap_sub');");
+	is($res_stats, qq(1), 'check if the transaction was spilled on disk');
+
+	# Check the compression ratio
+	my $res_comp_rate =
+	  $node_publisher->safe_psql('postgres',
+		"SELECT ((1 - spill_write_bytes::FLOAT / spill_bytes::FLOAT) * 100)::INT FROM pg_catalog.pg_stat_get_replication_slot('tap_sub');");
+	ok($res_comp_rate >= $compression_rate, "check if the compression rate (spill_compression = '$compression') is greater than or equal to $compression_rate");
+}
+
+# Create publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->append_conf('postgresql.conf',
+	'logical_decoding_work_mem = 64');
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$node_subscriber->init;
+$node_subscriber->start;
+
+# Setup structure on publisher
+$node_publisher->safe_psql('postgres',
+	"CREATE TABLE test_tab (a int primary key, b text)");
+
+# Setup structure on subscriber
+$node_subscriber->safe_psql('postgres',
+	"CREATE TABLE test_tab (a int primary key, b text)");
+
+# Setup logical replication
+my $publisher_connstr = $node_publisher->connstr . ' dbname=postgres';
+$node_publisher->safe_psql('postgres',
+	"CREATE PUBLICATION tap_pub FOR TABLE test_tab");
+
+my $appname = 'tap_sub';
+
+$node_subscriber->safe_psql('postgres',
+	"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub WITH (streaming = off)"
+);
+
+# No data compression expected, compression ratio is 0
+test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+	'off', 0);
+# Compression ratio greater than or equal to 30% for pglz
+test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+	'pglz', 30);
+SKIP:
+{
+	skip "LZ4 not supported by this build", 2 if ($ENV{with_lz4} ne 'yes');
+	# Compression ratio greater than or equal to 70% for lz4
+	test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+		'lz4', 70);
+}
+SKIP:
+{
+	skip "ZSTD not supported by this build", 2 if ($ENV{with_zstd} ne 'yes');
+	# Compression ratio greater than or equal to 80% for zstd
+	test_reorderbuffer_compression($node_publisher, $node_subscriber, $appname,
+		'zstd', 80);
+}
+
+$node_subscriber->stop;
+$node_publisher->stop;
+
+done_testing();
-- 
2.43.0